Variational Data Assimilation Using Targetted Random...

19
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS Int. J. Numer. Meth. Fluids 0000; 00:1–19 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/fld Variational Data Assimilation Using Targetted Random Walks S.L. Cotter *‡§ , M. Dashti , J.C. Robinson , A.M. Stuart OCCAM, Mathematical Institute, University of Oxford, Oxford OX1 3LB, UK Mathematics Institute, University of Warwick, Coventry CV4 7AL, UK SUMMARY The variational approach to data assimilation is a widely used methodology for both online prediction and for reanalysis. In either of these scenarios it can be important to assess uncertainties in the assimilated state. Ideally it is desirable to have complete information concerning the Bayesian posterior distribution for unknown state, given data. We show that complete computational probing of this posterior distribution is now within reach in the offline situation. We introduce an MCMC method which enables us to directly sample from the Bayesian posterior distribution on the unknown functions of interest, given observations. Since we are aware that these methods are currently too computationally expensive to consider using in an online filtering scenario, we frame this in the context of offline reanalysis. Using a simple random walk-type MCMC method, we are able to characterize the posterior distribution using only evaluations of the forward model of the problem, and of the model and data mismatch. No adjoint model is required for the method we use; however more sophisticated MCMC methods are available which do exploit derivative information. For simplicity of exposition we consider the problem of assimilating data, either Eulerian or Lagrangian, into a low Reynolds number flow ina two dimensional periodic geometry. We will show that in many cases it is possible to recover the initial condition and model error (which we describe as unknown forcing to the model) from data, and that with increasing amounts of informative data, the uncertainty in our estimations reduces. Copyright c 0000 John Wiley & Sons, Ltd. Received . . . KEY WORDS: Uncertainty Quantification, Probabilistic Methods, Stochastic Problems, Transport, Incompressible Flow, Partial Differential Equations 1. INTRODUCTION Data assimilation, using either filtering [35, 17] or variational approaches [4, 16, 18], is now a standard component of computational modelling in the geophysical sciences. It presents a major challenge to computational science as the space-time distributed nature of the data makes it an intrinsically four dimensional problem. Filtering methods break this dimensionality by seeking sequential updates of three dimensional problems and are hence very desirable when online prediction is required [34, 36, 27]. However when reanalysis (hindcasting) is needed, for example to facilitate parameter estimation in sub-grid scale models, it is natural to confront the fully four § Email: [email protected] * Correspondence to: OCCAM, Mathematical Institute, University of Oxford, Oxford OX1 3LB, UK Copyright c 0000 John Wiley & Sons, Ltd. Prepared using fldauth.cls [Version: 2010/05/13 v2.00]

Transcript of Variational Data Assimilation Using Targetted Random...

Page 1: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDSInt. J. Numer. Meth. Fluids 0000; 00:1–19Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/fld

Variational Data Assimilation UsingTargetted Random Walks

S.L. Cotter∗‡§, M. Dashti†, J.C. Robinson†, A.M. Stuart†

‡ OCCAM, Mathematical Institute, University of Oxford, Oxford OX1 3LB, UK

† Mathematics Institute, University of Warwick, Coventry CV4 7AL, UK

SUMMARY

The variational approach to data assimilation is a widely used methodology for both online prediction andfor reanalysis. In either of these scenarios it can be important to assess uncertainties in the assimilatedstate. Ideally it is desirable to have complete information concerning the Bayesian posterior distributionfor unknown state, given data. We show that complete computational probing of this posterior distributionis now within reach in the offline situation. We introduce an MCMC method which enables us to directlysample from the Bayesian posterior distribution on the unknown functions of interest, given observations.Since we are aware that these methods are currently too computationally expensive to consider using in anonline filtering scenario, we frame this in the context of offline reanalysis. Using a simple random walk-typeMCMC method, we are able to characterize the posterior distribution using only evaluations of the forwardmodel of the problem, and of the model and data mismatch. No adjoint model is required for the methodwe use; however more sophisticated MCMC methods are available which do exploit derivative information.For simplicity of exposition we consider the problem of assimilating data, either Eulerian or Lagrangian,into a low Reynolds number flow in a two dimensional periodic geometry. We will show that in many casesit is possible to recover the initial condition and model error (which we describe as unknown forcing to themodel) from data, and that with increasing amounts of informative data, the uncertainty in our estimationsreduces.Copyright c© 0000 John Wiley & Sons, Ltd.

Received . . .

KEY WORDS: Uncertainty Quantification, Probabilistic Methods, Stochastic Problems, Transport,Incompressible Flow, Partial Differential Equations

1. INTRODUCTION

Data assimilation, using either filtering [35, 17] or variational approaches [4, 16, 18], is now astandard component of computational modelling in the geophysical sciences. It presents a majorchallenge to computational science as the space-time distributed nature of the data makes it anintrinsically four dimensional problem. Filtering methods break this dimensionality by seekingsequential updates of three dimensional problems and are hence very desirable when onlineprediction is required [34, 36, 27]. However when reanalysis (hindcasting) is needed, for exampleto facilitate parameter estimation in sub-grid scale models, it is natural to confront the fully four

§Email: [email protected]∗Correspondence to: OCCAM, Mathematical Institute, University of Oxford, Oxford OX1 3LB, UK

Copyright c© 0000 John Wiley & Sons, Ltd.Prepared using fldauth.cls [Version: 2010/05/13 v2.00]

Page 2: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

2 S.L. COTTER, M. DASHTI, J.C. ROBINSON, A.M. STUART

dimensional nature of the problem and not to impose a direction of time; variational methods arethen natural [21, 25, 46].

However, for both filtering and variational methods an outstanding computational challenge isthe incorporation of uncertainty into the state estimation. Many of the practical methods used toconfront this issue involve ad hoc and uncontrolled approximations, such as the ensemble Kalmanfilter [23]. Whilst this has led to a very efficient methodology, and will doubtless continue tohave significant impact for some time to come, there remains scope for developing new methodswhich, whilst more computationally demanding, attack the problem of quantifying statisticaluncertainty without making ad hoc or uncontrolled approximations. The natural setting in whichto undertake the development of such new methods is the Bayesian framework in which oneseeks to characterize the posterior distribution of state given data [3]. The Markov chain-MonteCarlo (MCMC) methodology provides a well-founded approach to fully probing this posteriordistribution. The purpose of this paper is to show that sampling of this posterior distribution isnow starting to fall within reach, via MCMC methods, in the offline situation. We also emphasizethe possibility of, in future work, using such MCMC based studies of the posterior distribution tobenchmark more practical algorithms such as filtering and variational approaches, both of whichmay be viewed as providing approximations to the posterior.

The Bayesian approach to inverse problems is a conceptually attractive approach to regularizationwhich simultaneously forces consideration of a number of important modelling issues such asthe precise specification of prior assumptions and the description of observational noise [37].However in the context of PDE inverse problems for functions the approach leads to a significantcomputational difficulty, namely probing a probability measure on function space. To be clear: theprobabilistic viewpoint adds a further degree of high dimensionality to the problem, over and abovethat stemming from the need to represent the unknown function itself. Variational methods, whichminimize a functional which is a sum of terms measuring both model-data mismatch and priorinformation, corresponds to finding the state of maximal posterior probability, known as a MAPestimator in statistics [37]. This will be a successful approach if the posterior distribution is in somesense “close” to Gaussian, and with small variance. However, since the dynamics inherent withinthe observation operator can be highly non-linear, and since the observations may be sparse, it isnot necessarily the case that a Gaussian approximation is valid; even if it is, then the spread aroundthe MAP estimator (the variance) may be important and detailed information about it required.Characterising the whole posterior distribution can be important, then, as it quantifies the uncertaintyinherent in state estimation.

Markov chain-Monte Carlo (MCMC) methods are a highly flexible family of algorithms forsampling probability distributions in high dimensions [40, 47]. There exists substantial literatureon the use of MCMC methodology for the solution of inverse problems from fluid mechanics[2, 29, 30, 42, 43, 44] and from other application domains [10, 22, 38, 39, 45]. A key computationalinnovation in this paper, which distinguishes the methods from those in the preceding references, isthat the algorithms we use estimate the posterior distribution on a function space in a manner whichis robust to mesh refinement.

We employ random walk Metropolis type algorithms, introduced in [32], and generalized tothe high dimensional setting in [6, 7, 15]. The method that we implement here does not use anygradient information (adjoint solvers) for the model-data mismatch. It proposes states using arandom walk on function space and employs a random accept/reject mechanism guided by themodel-data mismatch on the proposed state. This method is hence straightforward to implement,but not the most efficient method: other methods (Langevin or Hybrid Monte-Carlo), which requireimplementation of an adjoint model, can explore the state space in a more efficient manner [15]. Inthis paper we stick to the simpler Random Walk method as a proof of concept, but the reader shouldbe aware that gradient methods, such as the Langevin Algorithm, or Hybrid Monte Carlo, could beimplemented to increase efficiency. The basic mathematical framework within which we work isthat described in [13] in which we formulate Bayesian inverse problems on function space, and thenstudy data assimilation problems arising in fluid mechanics from this point of view. The abstract

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 3: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM WALKS 3

framework of [13] also provides a natural setting for the design of random walk-type MCMCmethods, appropriate to the function space setting.

In section 2 we describe the data assimilation problems that we use to illustrate our BayesianMCMC approach; these problems involve the incorporation of data (Eulerian or Lagrangian) into amodel for Stokes’ flow. Section 3 briefly describes the mathematical setting for our data assimilationproblems, and for the MCMC algorithm that we employ throughout the paper. In sections 4 and5 we consider Eulerian and Lagrangian observations respectively, incorporating them into a twodimensional Stokes’ flow model with periodic boundary conditions. In each case we will considerfirst the problem of recovering the initial condition of the velocity field, given that we assume weknow the forcing present in the system during the window of observation. We will then go on toconsider, in each of the two data scenarios, what happens if we assume we know the forcing inherentin the system and incorporate this into the statistical algorithm, but in fact there is a degree of modelerror which can only be accounted for by altering the distribution on the initial condition. Finally,we will consider full initial condition and model error inference (which together are equivalent tofinding a probability distribution on the state of the velocity field for the duration of the window ofobservation). In particular we study the extent to which model error may be recovered from data,and the uncertainty in this process. In section 6 we will present some brief conclusions.

2. BAYESIAN DATA ASSIMILATION IN STOKES’ FLOW

We describe the three steps required for the Bayesian formulation of any inverse problem involvingdata: specification of the forward model; specification of the observational model; and specificationof prior models on the desired states.

2.1. Forward Model

We consider a fluid governed by a forced Stokes’ flow on a two dimensional box with periodicboundary conditions (for shorthand we write T2, the two dimensional torus):

∂tv − ν4v +∇p = f, ∀(x, t) ∈ T2 × (0,∞) (1)∇ · v = 0, ∀t ∈ (0,∞) (2)

v(x, 0) = u(x), x ∈ T2. (3)

This equation determines a unique velocity field v, given the initial condition u and the forcingf. Our objective is to find u and/or f , given observations of the fluid. It is sometimes convenientto employ the compact notation which follows from writing the Stokes’ equation as on ordinarydifferential equation for a divergence-free velocity field [54]:

dv

dt+ νAv = η, v(0) = u. (4)

Here A is the Stokes’ operator, the Laplacian on divergence free fields; and η is the projection of theforcing f onto divergence free fields.

We have chosen this simple linear model of a fluid for two reasons: (i) the linear nature enablesus, when observations are also linear, to construct exact Gaussian posterior distributions which canbe used to check MCMC methods; and (ii), the linearity of the model means that the FFT can beemployed to rapidly evaluate the forward model, which is desirable as this may need to be performedmillions of times in order to obtain complete statistical sampling of the posterior. Regarding (i) note,however, that we will also consider nonlinear observations and then the posterior is not Gaussian.Regarding (ii) note that the MCMC method is trivially paralellizable and our choice of linearforward model is made simply to facilitate computations without recourse to the use of a largenumber of processors; for more realistic forward models, however, such massive parallelizationwould be necessary.

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 4: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

4 S.L. COTTER, M. DASHTI, J.C. ROBINSON, A.M. STUART

2.2. Observations

We consider two kinds of observations: Eulerian, which are direct measurements of the velocityfield; and Lagrangian which are measurements of passive tracers advected by the velocity field. Welabel the data vector by y and assume, for simplicity, that it is always subject to mean zero Gaussianobservational noise σ with covariance Γ. Note, however, that other non-Gaussian models for theobservational noise are easily incorporated into the methodology herein.

In the Eulerian case the observations are

yj,k = v(xj , tk) + σj,k, j = 1, . . . , J, k = 1, . . . ,K. (5)

In the Lagrangian case we define the J Lagrangian tracers {zj}Jj=1 as solutions of the ODEs

dzjdt

(t) = v(zj(t), t), zj(0) = zj,0, (6)

where we assume that the set of starting positions {zj,0}Jj=1 are known (although they too could bepart of the estimated state if desired). The Lagrangian observations are then

yj,k = zj(tk) + σj,k, j = 1, . . . , J, k = 1, . . . ,K. (7)

In both cases we may define an observation operator H mapping the unknown function u (or uand η) to the place where the observations are made. Thus H(u)j,k = v(xj , tk) in the Eulerian caseand H(u)j,k = zj(tk) in the Lagrangian case. Then the data assimilation problem is to find u fromy given by

y = H(u) + σ. (8)

The observation error σ is not known to us, but the common assumption is that its statisticalproperties are known and should be exploited. To this end we introduce the covariance weightedleast squares function

Φ(u; y) =12

∣∣∣Γ− 12(y −H(u)

)∣∣∣. (9)

Here | · | denotes the Euclidean norm and hence Φ measures the model-data mismatch, normalizedby the scale set by the standard deviations of the observational error.

In the case where the forcing η is to be estimated as well as the initial condition then we viewv(xj , tk) (Eulerian case) or zj(tk) (Lagrangian case) as functions of u and η and H(u) is replacedby H(u, η) in (8) and (9).

2.3. Priors

A key to the particular class of MCMC algorithms that we use in this paper is the specification ofGaussian priors on the unknown states. The algorithms we use require us to be able to sample fromthe prior measure, but do not require explicit construction of the covariance operator, its inverse orits Cholesky factorization.

The prior on the initial condition u will be a mean zero Gaussian with covariance Cu = δA−α,where A is the Stokes operator. The numerical experiments all use the values α = 2.0 and δ = 400.This means that, under the prior measure, the Fourier coefficients of exp(2πik · x) in the streamfunction are independent Gaussians with standard deviation proportional to (4π2|k|2)−1. Samplesfrom the prior can thus be constructed in Fourier space, using this specification.

When we also consider estimation of the forcing η we again use a Gaussian with mean zero andwe implicitly define the covariance operator Cη as follows, by describing how to create draws fromthis prior. These draws are found by solving the following stationary Ornstein-Uhlenbeck (OU)process:

dt+Rη =

√Λξ, η(0) ∼ N(0,

12R−1Λ). (10)

Here ξ is space-time white noise and we assume that R and Λ are self-adjoint positive operatorsdiagonalized in the same basis as the Stokes operator A. In practice this means that draws from the

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 5: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM WALKS 5

prior can be constructed by solving independent Ornstein-Uhlenbeck processes for each coefficientof the forcing, written in a divergence-free Fourier basis. At each time t the spatial distribution of η ismean zero Gaussian with covariance operator 1

2R−1Λ. The choice ofR determines the decorrelation

time of different Fourier modes. Thus by playing with the choices of R and Λ we can match variousspace-time covariance structures.

For all the experiments in this paper we choose Λ and R such that the stationary distribution of(10) is the same as for the initial condition: N (0, δA−α) with α = 2.0 and δ = 400. The operator Ris chosen to be proportional to A so that the decorrelation time in the Fourier mode exp(2πik · x) isinversely proportional to |k|2.

2.4. Bayes Theorem and Relationship to Variational Methods

The previous three sections describe a probability model for the joint random variable (u, y) or(u, η, y). Our aim now is to condition this random variable on a fixed instance of the data y. We firstdescribe the situation where only the initial condition u is estimated and then the generalization toestimating (u, η).

The random variable u is Gaussian N(0, Cu). The random variable y|u is Gaussian N(H(u),Γ).Application of Bayes theorem shows that

P(u|y)P(u)

∝ exp(−Φ(u; y)

); (11)

The MAP estimator for this problem is simply the minimizer of the functional

J(u) :=12‖C−

12

u u‖2 + Φ(u; y), (12)

where ‖ · ‖ denotes the L2(T2) norm. Minimizing J is simply the 4DVAR method, formulated in anon-incremental fashion.

When the pair of states (u, η) are to be estimated then the probability model is as follows. Therandom variable (u, η) is the product of two independent Gaussians, N(0, Cu) for u and N(0, Cη)for η. The random variable y|u, η is Gaussian N(H(u, η),Γ). Application of Bayes theorem showsthat

P(u, η|y)P(u, η)

∝ exp(−Φ(u, η; y)

); (13)

The MAP estimator for this problem is simply the minimizer of the functional

J(u) :=12‖C−

12

u u‖2 +12‖C−

12

η η‖2 + Φ(u, η; y), (14)

where ‖ · ‖ denotes the L2(T2) norm on the u variable and the L2([0, T ];L2(T2)

)norm on η. Note

that finding the initial condition u and space-time dependent forcing η is equivalent to finding thevelocity field v for the duration of the observation window. Minimizing J is simply a weak constraint4DVAR method, formulated in a non-incremental fashion.

3. THE RANDOM WALK ALGORITHM

If, instead of maximizing the posterior probability, we try and draw multiple realizations of theposterior probability in order to get information about it, this will lead to a form of statistical4DVAR. We describe a random walk algorithm which does this. We describe it first for estimation ofthe initial condition alone where we generate {u(j)} from P(u|y) given by (11), starting from {u(0)}.A key fact to note about the algorithm is that the normalization constant in (11) is not needed.

1. Set j = 0.2. Draw ξ(j) from the Gaussian N(0, Cu).

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 6: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

6 S.L. COTTER, M. DASHTI, J.C. ROBINSON, A.M. STUART

3. Set v(j) = (1− β2)12u(j) + βξ(j).

4. Define α(j) = min{1, exp(Φ(u(j); y)− Φ(v(j); y)}.

5. Draw γ(j), a uniform random variable on [0, 1].6. If α(j) > γ(j) set u(j+1) = v(j). Otherwise set u(j+1) = u(j).7. Set j → j + 1 and return to 2.

The parameter β ∈ [0, 1] is a free variable which may be chosen to optimize the rate at which thealgorithm explores the space. Note that if the least squares functional Φ is smaller at the proposednew state v(j) than it is at the current state u(j) then this new state is accepted with probability one.If Φ is larger at the proposed state then the proposed state will be accepted with some probabilityless than one. Thus the algorithm tends to favour minimizing the model-data mismatch functionalΦ, but in a manner which is statistical, and these statistics are tied to the choice of prior. In this sensethe prior regularizes the problem.

If the target distribution is (13) then the algorithm is the same except that a move in (u, η) spaceis proposed, from the two Gaussians N(0, Cu) and N(0, Cη). Then the accept/reject probability α(j)

is based on differences of Φ(u, η; y).The algorithm draws samples from the desired posterior distribution given by Bayes formula (11)

or (13). It is straightforward to implement and involves only repeated solution of the forward model,not an adjoint solver. On the other hand, for probability distributions which are peaked at a singlemaximum it may be natural to solve the 4DVAR variational problem (12) or (14) first, by adjointmethods, and use this is a starting point for the random walk method above.

4. DATA ASSIMILATION OF EULERIAN DATA

Our aim is to sample from the posterior measure (11) using the algorithm from section 3. In the firsttwo subsections we study recovery of the initial condition, first with a perfect model and secondlywith an imperfect model. In the third subsection we study recovery of both the initial condition andthe forcing. Note that the posterior measure is Gaussian in this case, because H is linear, and thishas been used to compute explicit solutions to verify the accuracy of the random walk method [12].

In all of the figures we display only the marginal distribution of the Fourier mode Re(u0,1) forthe initial condition of the vector field, and Re(η0,1(0.5)) for the forcing in the model error case.Other low wave-number Fourier modes behave similarly to this mode. However high wave numbermodes are not greatly influenced by the data, and remain close to their prior distribution. In eachcase, since we know the value of the Fourier mode that was present in the initial condition of thevelocity field that created the data, we plot this value in each example as a vertical black line. In thecases where we are trying to recover an entire time-dependent function, the truth will similarly begiven by a black curve.

In each example, since the amount of data we are attempting to assimilate varies, we would expectthe optimal value of β in the RWMH method to vary accordingly. Since we do not know this valuea priori, we can approximate it during the burn-in process. By monitoring the average acceptanceprobability for short bursts of the MCMC method, we can alter β accordingly until we are happywith this acceptance rate. For the purposes of the numerics that follow, β in each case was tuned sothat the acceptance rate was approximately 50%. The values of β vary quite widely, but are typicallyquite small, ranging from 10−4 to 10−6.

4.1. Recovering the Initial Condition

In this section we assimilate data using a forward model with no forcing: η = 0. We also use datawhich is generated by adding noise to output for the forward model, again when η = 0. In each ofthe following figures we demonstrate a phenomenon known as posterior consistency: as the amountof data is increased the posterior measure becomes more and more sharply peaked on the true valuethat gave rise to the data. To be precise we demonstrate posterior consistency only for the low wave-number modes: as mentioned above, the data contains little information about high wave numbers

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 7: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM WALKS 7

and the random walk method returns the prior for these variables. In this subsection we consideronly cases where we do not try to ascertain the model error. As in all the examples in this paper, thenoise is assumed to be mean zero Gaussian, with covariance matrix Γ = γ2I , with γ = 0.01.

0.598 0.5985 0.599 0.5995 0.6 0.6005 0.6010

1000

2000

3000

4000

5000

6000

7000

8000

9000

Re(u0,1

)

Pro

babi

lity

Den

sity

9 Observation Stations36 Observation Stations100 Observation Stations900 Observation StationsRe(u

0,1)

Figure 1. Increasing numbers of observations in space, Eulerian. True value given by black vertical line.

In our first example, the observations are made at one hundred evenly spaced times, on anevenly spaced grid with an increasing number of points. Figure 1 shows how the posteriordistribution on Re(u0,1) changes as we increase the number of points in space at which we makeEulerian observations of the velocity field. The figure shows converged posterior distributions whichresemble Gaussians on Re(u0,1). Note that the curve corresponding to the distribution using datafrom 9 observation stations is so flat on the scale of this graph that the line appears to lie alongx-axis. We also see that we have posterior consistency, since as the number of spatial observationsincreases, the posterior distribution on Re(u0,1) appears to be converging to an increasingly peakedmeasure on the value that was present in the true initial condition, denoted by the vertical black line.

We also have results (not included here) showing convergence of the posterior distribution to anincreasingly peaked distribution on the correct values as we increase the number of observationtimes on a fixed time interval, keeping the number of observation stations constant.

4.2. Mismatches in Model Forcing

Now we consider the case where we create data with forcing present, but in our algorithm we assumethat there is no forcing so that η ≡ 0. Once again the noise is assumed to be mean zero Gaussian,with covariance matrix Γ = γ2I , with γ = 0.01. We attempt to explain the data arising from a forcedmodel through the initial condition for an unforced model. Figure 2 shows the marginal distributionsof one Fourier mode in such a situation, where the forcing is low frequency in space and constant intime, and where we steadily increase the number of observation times, with 100 observations fixedon a grid. The forcing in this instance is positive for the wave number displayed in the figure. Twothings are noteworthy: (i) the posterior tends towards a peaked distribution as the amount of dataincreases; (ii) this peak is not located at the true initial condition (marked with a black line). Thisincorrect estimate of the initial condition is, of course, because of the mismatch between model usedfor the assimilation and for the data generation. In particular the energy in the posterior on the initialcondition is increased in an attempt to compensate for the model error in the forcing.

Further to this, we consider the similar problem of studying the effect of using a forward modelwithout forcing, when the data is generated with forcing, this time with data created using astationary solution of the OU process (10). This forcing function is then fluctuating constantly in allFourier modes.

Figure 3 shows the marginal distribution for one low frequency Fourier mode with data fromone observation time, with an increasing number of observations in space. Notice that once againthe distributions are becoming more and more peaked as the number of observations increases, but

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 8: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

8 S.L. COTTER, M. DASHTI, J.C. ROBINSON, A.M. STUART

0.6 0.7 0.8 0.9 1 1.10

50

100

150

200

250

Re(u0,1

(0))P

roba

bilit

y D

ensi

ty

1 Observation Time10 Observation Times50 Observation Times100 Observation TimesRe(u

0,1(0))

Figure 2. Re(u0,1(t)): Increasing number of observation times, unmatched low frequency forcing in dataand model, Eulerian data. True value given by black vertical line.

that the peaks of these distributions are centered away from the value of the Fourier mode that waspresent in the actual initial condition that created the data.

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.950

100

200

300

400

500

600

700

Re(u0,1

)(0)

Pro

babi

lity

Den

sity

9 Observation Stations

36 Observation Stations

100 Observation Stations

900 Observation Stations

Re(u0,1

)(0)

Figure 3. Re(u0,1(t)): Increasing number of observation times, unmatched forcing in data and model,Eulerian data, forcing given by a stationary solution of OU process. True value given by black vertical

line.

4.3. Quantifying Model Error

The previous section shows us that if we assume that our model accurately reflects the realdynamical system generating the data, but in fact it does not do so, then our estimates for the initialcondition of the system will be flawed and inaccurate. This motivates us to try to recover not onlythe initial condition of the system, but also the model error forcing η.

We first observe that, in the Eulerian case, parts of the forcing are unobservable and cannot bedetermined by the data. To state this precisely, define

F (t1, t2) =∫ t2

t1

exp(−νA(t2 − t1))η(t)dt.

This is the term in the solution map of the Stokes equations which is dependent on the forcingterm η. If the observations are made at times {ti}Ki=1 then the data is informative about F :=

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 9: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM WALKS 9

{F (tj , tj−1)}K−1j=0 , rather than about η itself. Proof of this is given in [12], by showing that many

different (in fact infinitely many) functions η give rise to the same vector F , and therefore to thesame velocity field at each of the observation times. Because of this, we expect that the posteriormeasure will give much greater certainty to estimates of F than η.

This basic analytical fact is manifest in the numerical results which we now describe. Firstlythere are infinitely many functions compatible with the observed data so that obtaining a convergedposterior on η is a computationally arduous task; secondly the prior measure plays a significant rolein weighting the many possible forcing functions which can explain the data.

As in the non-model error case, we look at a set of experiments where the observations are madeat one hundred evenly spaced times, with the observation stations evenly spaced on a grid with anincreasing number of points. Once again, as we increase the number of observation stations, weare able to recover the value of any given Fourier mode in the initial condition with increasingaccuracy and certainty as we increase the amount of data. We omit the majority of the graphicalrepresentations of this fact and concentrate instead mainly on the posterior distribution on theforcing function η. Once again the noise is assumed to be mean zero Gaussian, with covariancematrix Γ = γ2I , with γ = 0.01.

−0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

2

4

6

8

10

12

14

16

Re(η0,1

(0.5))

Pro

babi

lity

Den

sity

9 Observation Stations36 Observation Stations100 Observation Stations900 Observation StationsRe(η

0,1(0.5))

(a)

0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.060

50

100

150

200

250

300

350

400

450

500

Re(F0,1

(0.5))

Pro

babi

lity

Den

sity

9 Observation Stations

36 Observation Stations

100 Observation Stations

900 Observation Stations

Re(F0,1

(0.5))

(b)

Figure 4. Increasing numbers of observations in space, Eulerian Model Error Case. (a) Re(η0,1(0.5)), (b)Re(F0,1(0.5)). Note much smaller variances in the distributions in (b) in comparison with those in (a). True

values given by black vertical lines.

Figures 4(a) and 4(b) show the marginal posterior distributions on Re(η0,1(0.5)) (the real partof the (0, 1) Fourier coefficient of the forcing at time t = 0.5) and on Re(F0,1(0.5)) respectively,given an increasing number of observation stations. The first figure shows that, even with a largeamount of data the standard deviation about the posterior mean for Re(η0,1(0.5)) is comparable inmagnitude to the posterior mean itself. In contrast, for Re(F0,1(0.5)) and for a large amount of data,the standard deviation around the posterior mean is an order of magnitude smaller than the meanvalue itself. The data is hence much more informative about F than it is about η, as predicted.

We pursue this further by comparing samples from the Markov chain used to explore the posteriordistribution, with the empirical average of that chain. Figure 5(a) shows an example trace ofthe value of Re(η0,1(0.5)) in the Markov chain (fluctuating), together with its empirical average(smoother). The slow random walk of the sample path, around the empirical mean, is the hallmarkof an unconverged Markov chain. In contrast, Figure 5(b) shows how well the value of Re(F0,1(0.5))converges in distribution in the Markov chain. The whole trace stays within a reasonably tight bandaround the empirical mean, fluctuating in a white fashion; it does not display the slow random walkbehaviour of Figure 5(a).

Figure 6 shows the expectation of the entire function Re(F0,1(t)), given an increasing number ofpoints in the vector field to be observed. As the number of observations increases, the expectationof Re(F0,1(t)) nears the true value, given by the black curve.

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 10: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

10 S.L. COTTER, M. DASHTI, J.C. ROBINSON, A.M. STUART

0 1 2 3 4 5 6 7 8 9 10

x 106

0.13

0.135

0.14

0.145

0.15

0.155

0.16

Number of Samples

Re(

η 0,1(0

.5))

n

Re(η

0,1(0.5))

n

Ergodic Average

(a)

0 1 2 3 4 5 6 7 8 9 10

x 106

0.0446

0.0448

0.045

0.0452

0.0454

0.0456

0.0458

Number of Samples

Re(

F0,

1(0.5

))

Re(F

0,1(0.5))

n

Ergodic Average

(b)

Figure 5. Value of (a) Re(η0,1(0.5)) and (b) Re(F0,1(0.5)) in the Markov chain.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Time t

Re(

F0,

1(t))

9 Observation Stations

36 Observation Stations

100 Observation Stations

900 Observation Stations

Re(F0,1

(t))

Figure 6. Re(F0,1(t)): Increasing numbers of observations in space, Eulerian Model Error Case. True valuegiven by black curve.

If we now look at the posterior mean of the initial condition with a varying number of observationstations, and compare this to the true initial condition, we get an error curve as shown in Figure 7.This shows that as the number of observation stations increases in a sensible way (for exampleusing a space-filling algorithm), the posterior mean of the initial condition converges to the trueinitial condition.

Similarly, if we look at the L2([0, T ];L2(T2)

)-norm of the difference between the posterior mean

of the time-dependent forcing, and the true forcing that created the data, we get an error curve asshown in Figure 8(a). This shows that as the number of observation stations increases in a sensibleway, that the posterior mean of the time-dependent forcing converges to the true forcing. Noticehowever, that the convergence is slower in this quantity than in that given in Figure 8(b), whichshows the norm of the difference between the posterior mean of F and the true forcing F. Again,this shows that as we increase the number of observation stations, the posterior mean converges tothe true answer. The convergence of this quantity (gradient ≈ −0.323) is much quicker than thatof η (gradient ≈ −0.216), but slower than that of u (gradient ≈ −0.778). Note that these gradientsare reliant on the way in which the amount of data is increased, where in this case the observationstations were placed on an increasingly refined grid.

We also have analogous results (given in [12]) for the case where we increase the number ofobservations in time, but keep the number of observation stations the same.

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 11: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM WALKS 11

2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7−5.5

−5

−4.5

−4

−3.5

−3

−2.5

−2

−1.5

−1

log(Number of Observation Stations)lo

g(Re

lativ

e L2 E

rror)

Figure 7. ‖E(u)− uAct‖L2 : Increasing numbers of observations in space, Eulerian Model Error Case. uActis actual initial condition that created the data.

2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

log(Number of Observation Stations)

log(

Rela

tive

L2 ([0,T

],H) E

rror o

f !)

(a)

2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7−2.5

−2

−1.5

−1

−0.5

0

log(Number of Observation Stations)

log(

Rela

tive

L2 ([0,T

],H) E

rror)

(b)

Figure 8. Increasing numbers of observations in space, Eulerian Model Error Case, (a) ‖E(η)−ηAct‖L2(0,T ;H) (b) ‖E(F )− FAct‖L2(0,T ;H). ηAct and FAct are the actual forcing functions that created

the data.

We may also be interested in understanding how well we are able to characterize high frequency(in space) forcing from Eulerian data. In the following experiment, all Fourier modes in the forcingthat created the data were set to zero, apart from two high frequency modes for k = (5, 5) andk = (4, 5). An increasing number of observation stations were placed on a grid, with observationsmade at 100 evenly spaced times up to T = 1. Figure 9 shows the mean forcing function (an averageover all the realizations in the Markov chain) for the Fourier mode Re(η5,5(t)), as a function of time.The actual value that was present in the forcing that created the data is indicated by the solid line.Figure 10 shows the absolute value of the difference between the mean function and the true valuethat created the data.

In each Fourier mode, the more observations in space that are assimilated, the better the estimate.Moreover, the variance in these estimates (omitted here) reduces as the amount of informationincreases, leading to peaked distributions on the forcing function which created the data.

Note that high frequency modes require more spatial observations to be able to make accurateestimates than the low frequency modes. This is simply due to the fact that with few spatialobservations, no matter how many observations we have in time, our information about the highfrequency Fourier modes is under-determined, and aliasing leads to a great deal of uncertainty aboutthese modes.

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 12: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

12 S.L. COTTER, M. DASHTI, J.C. ROBINSON, A.M. STUART

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

1.5

2

2.5

3

3.5

4

Time t

E(R

e(η 5,

5))

9 Observations

36 Observations

100 Observations

900 Observations

Actual Value

Figure 9. Re(η5,5(t)): Increasing numbers of observations in space, Eulerian Model Error Case, highfrequency forcing. True value given by black line.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

Time t

|E(R

e(η 5,

5(t))

) −

trut

h(t)

|

9 Observations

36 Observations

100 Observations

900 Observations

Figure 10. Re(η5,5(t)): Absolute value of the difference between the mean and truth, Eulerian Model ErrorCase, high frequency forcing.

So far we have only considered examples where the model error forcing that created the data isconstant in time. In the following experiment we take a draw from the model error prior (10) as theforcing function that is used in the creation of the data. This means that we have non-zero forcingin all of the Fourier modes, and this value is constantly changing at each time step also.

Figure 11 shows the estimates (with varying numbers of spatial observations with a fixed numberof 100 observations in time) of the forcing function for a particular Fourier mode, along withthe actual forcing function that created the data. In Figure 12 the absolute value of the differencebetween the estimates and the function that created the data are plotted.

These graphs show that as we increase the number of spatial observations, the estimates of theforcing functions converge to the function which created the data.

5. LAGRANGIAN DATA ASSIMILATION

Now our aim is to sample from the posterior measure (13) using the algorithm from section 3.Unlike the Eulerian case, the posterior measure is not Gaussian, because H is nonlinear, and so theposterior cannot be computed simply by means of linear algebra. In the first two subsections we

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 13: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM WALKS 13

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Time t

E(R

e(η 0,

1))

9 Observations36 Observations100 Observations900 ObservationsActual Value

Figure 11. Re(η0,1(t)): Increasing numbers of observations in space, Eulerian Model Error Case, forcingfunction taken from prior. True value given by black curve.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Time t

|E(R

e(η 0,

1(t))

) −

trut

h(t)

|

9 Observations36 Observations100 Observations900 Observations

Figure 12. Re(η0,1(t)): Absolute value of the difference between the mean and truth, Eulerian Model ErrorCase, forcing function taken from prior.

study recovery of the initial condition, first with a perfect model and secondly with an imperfectmodel. In the third subsection we study recovery of both the initial condition and the forcing.

5.1. Recovering the Initial Condition

We consider an example where we have a fixed number of 25 tracers, whose initial positionsare evenly spaced on a grid, and assumed to be known. We observe each tracer at an increasingnumber of times evenly spaced on the unit interval, as we have previously done in the Euleriancase, and attempt to recover the initial condition of the fluid. Once again the noise is assumedto be mean zero Gaussian, with covariance matrix Γ = γ2I , with γ = 0.01. Figure 13 shows thatthe marginal distributions for Re(u0,1) have converged to approximate Gaussian distributions, andexhibit posterior consistency as the number of temporal observations increases. We also have similarresults showing posterior consistency in the case that we have a fixed observation time and anincreasing number of tracers whose initial positions are evenly spaced on a grid (given in [12]).

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 14: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

14 S.L. COTTER, M. DASHTI, J.C. ROBINSON, A.M. STUART

0.5985 0.5986 0.5987 0.5988 0.5989 0.599 0.5991 0.5992 0.5993 0.5994 0.59950

2000

4000

6000

8000

10000

12000

14000

16000

Re(u0,1

)P

roba

bilit

y D

ensi

ty

1 Observation Time10 Observation Times50 Observation Times100 Observation TimesRe(u

0,1)

Figure 13. Increasing numbers of observations in time, Lagrangian. True value given by black vertical line.

5.2. Mismatches in Model Forcing

As in subsection 4.2, we now consider the case where the model in our statistical algorithm does notreflect the dynamical system from which we are making our observations. We attempt to explain thedata arising from a forced model through the initial condition for an unforced model. Once againthe noise is assumed to be mean zero Gaussian, with covariance matrix Γ = γ2I , with γ = 0.01.Figure 14 shows the marginal distributions of one Fourier mode in such a situation where weintroduce low frequency constant forcing in the data creation process, but set η ≡ 0 in the statisticalalgorithm. Again we consider Lagrangian data in the case where we steadily increase the numberof observation times. As in the Eulerian example, two things are noteworthy: (i) the posterior tendstowards a peaked distribution as the amount of data increases; (ii) this peak is not located at the trueinitial condition (marked with a black line). This incorrect estimate of the initial condition is dueto the mismatch between model used for the assimilation and for the data generation. In particularthe energy in the posterior on the initial condition is increased in an attempt to compensate for themodel error in the forcing.

0.55 0.6 0.65 0.7 0.75 0.80

20

40

60

80

100

120

140

Re(u0,1

(0))

Pro

babi

lity

Den

sity

1 Observation Time

10 Observation Times

50 Observation Times

100 Observation Times

Re(u0,1

(0))

Figure 14. Re(u0,1(t)): Increasing number of observation times, unmatched forcing in data and model,Lagrangian data, constant low frequency forcing used in the data creation. True value given by black vertical

line.

Lagrangian data is much more sensitive to small changes in the model error forcing than theequivalent Eulerian data, due to particles’ positions being dependent on the entire history of the

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 15: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM WALKS 15

velocity field from the initial time. Therefore creating more complex incongruities between the datacreation schemes and the model used within the statistical algorithm, as we did in section 4.2, cancause more serious problems. To this end, the RWMH algorithm simply failed to converge in severalexperiments of this type. A random walk method is clearly inadequate for exploring these types ofhighly complex probability densities. The assumed amount of observational noise can be altered toallow freer exploration of the state space, but finding a value which simultaneously does not simplyreturn a trivial solution (the prior distribution) or does not converge in a reasonable amount of timedue to the complexity of the posterior distribution can be challenging. It might be instructive toimplement a gradient-type method, or other MCMC methods, in these more challenging scenariosto better understand these types of problems.

5.3. Quantifying Model Error

The problems we experienced in subsection 5.2 serve to highlight the need to estimate not onlythe initial condition but also the model error forcing. The Lagrangian equivalent of the model errorproblem is much harder to sample from in comparison with the Eulerian case. This is due to the factthat the position of a passive tracer is dependent upon the entire history of the velocity vector fieldup to the observation time. Moreover, a small change in the forcing function can result in the smalldisplacement of hyperbolic points in the flow, which in turn can drastically alter the trajectory ofone or more tracers. This makes it hard to move in the Lagrangian model error state space within theMCMC method. As a consequence a simple random walk proposal is highly unlikely to be accepted.Indeed, this was borne out in our initial results, which showed that even after very large amountsof samples, the Markov chains were far from converged, as the size of jump in the proposals of thenext state that is required to give reasonable acceptance probabilities was simply too small to allowefficient exploration of the state space in a reasonable time.

One way to tackle this is to alter the likelihood functional (which calculates the relative likelihoodthat a given choice of functions created the observations) to allow freer exploration of the statespace. The data was created with observational noise with variance Γ = γ2

1I , but the assimilation isperformed with Γ = γ2

2I and γ2 � γ1. Then the acceptance probabilities increase, allowing largersteps in the proposal to be accepted more of the time. The results that follow in this section useγ21 = 10−4 and γ2

2 = 25. We will show that, despite this large disparity, it is possible to obtainreasonable estimates of the true forcing and initial condition. In particular, for large data sets, theposterior mean of these functions is close to the true values which generated the data. However,because γ2

γ1is large, the variance around the mean is much larger than in previous sections where

γ1 = γ2.Equivalently to the Eulerian case, we first consider the scenario where we have 100 equally

spaced observation times up to time T = 1, at which we observe the passive tracers, whose initialpositions are given on a grid with an increasing number of points. Figure 15 shows how the marginaldistributions on Re(u0,1(0)) change as we increase the number of tracers to be observed. This figureindicates that as the number of spatial observations is increased in a sensible way, the marginaldistribution on this particular Fourier mode is converging to an increasingly peaked distribution onthe true value that created the data.

Figure 16(a) shows how the marginal distribution on Re(η0,1(0.5)) changes as we increase thenumber of paths to be observed. In comparison to the Eulerian case, we are able to determine muchmore about the pointwise value of the forcing function, here using the idea of inflated observationalnoise variance in the statistical model. Notice that, as in the Eulerian case, the uncertainty in thepointwise value of the forcing is far greater than that for the initial condition of the dynamicalsystem.

Figure 16(b) shows distributions of Re(F0,1(0.5)) and demonstrates convergence to a sharplypeaked distribution on the true value that created the data in the limit of large data sets. Theuncertainty in this quantity is less than for the pointwise value of the forcing, as in the Euleriancase, but the discrepancy is considerably less than in the Eulerian case.

Figure 17 shows the expectation of the entire function Re(F0,1(t)), given an increasing numberof points in the vector field to be observed. As the number of paths to be observed increases, the

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 16: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

16 S.L. COTTER, M. DASHTI, J.C. ROBINSON, A.M. STUART

−0.5 0 0.5 10

1

2

3

4

5

6

7

8

9

10

Re(u0,1

)P

roba

bilit

y D

ensi

ty

9 Paths

36 Paths

100 Paths

900 Paths

Re(u0,1

)

Figure 15. Re(u0,1): Increasing numbers of observations in space, Lagrangian Model Error Case. True valuegiven by black vertical line.

−1.5 −1 −0.5 0 0.5 1 1.5 20

0.5

1

1.5

2

2.5

3

3.5

Re(η0,1

(0.5))

Pro

babi

lity

Den

sity

9 Paths36 Paths100 Paths900 PathsRe(η

0,1(0.5))

(a)

−0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.50

2

4

6

8

10

12

14

16

Re(F0,1

(0.5))

Pro

babi

lity

Den

sity

9 Paths36 Paths100 Paths900 PathsRe(F

0,1(0.5))

(b)

Figure 16. Increasing numbers of observations in space, Lagrangian Model Error Case, (a) Re(η0,1(0.5)) (b)Re(F0,1(0.5)). True values given by black vertical lines.

approximation does improve in places. However, since we altered the likelihood to make it possibleto sample from this distribution, this also vastly increased the variance in each of the marginaldistributions as there is relatively more influence from the prior. Therefore the expectation of thisfunction does not tell the whole story. This picture does show us however that it is certainly possibleto get ballpark estimates for these functions given Lagrangian data.

We also have similar results (given in [12]) for when a fixed number of tracers are observed at anincreasing number of observation times on a fixed interval.

6. CONCLUSIONS

We have argued that, in situations where uncertainty quantification is important, the use of anMCMC method to fully explore the posterior distribution of state given data is both worthwhileand increasingly viable. We have investigated in some detail the properties of the posterior measurefor simplified models of data assimilation in fluid mechanics, for both Eulerian and Lagrangiandata types, with and without model error. Through this work we were able to say more about whatkind of information we can garner from these different data types in different scenarios. Althoughonly indicative, being a markedly simpler model than any model that is currently used in practical

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 17: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM WALKS 17

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Time t

Re(

F0,

1(t))

9 Paths

36 Paths

100 Paths

900 Paths

Re(η0,1

(0.5))

Figure 17. Re(F0,1(t)): Increasing numbers of observations in space, Lagrangian Model Error Case. Truevalue given by black curve.

meteorological and oceanographical scenarios, it still gives us a better idea of how to attack theseproblems, and of how much information is contained within different types of noisy observation inthem. Furthermore, in current computational practice various simplifications are used – primarilyfiltering or variational methods [13] – and the posterior measure that we compute in this papercould be viewed as an “ideal solution” against which these simpler and more practical methods canbe compared. To that end it would be of interest to apply the ideas in this paper to a number ofsimple model problems from data assimilation in meteorology or oceanography and to compare theresults from filtering and variational methods with the ideal solution found from MCMC methods.

We have also shown that in the limit of a large amount of informative data, whether that becreated by increasing the number of observation times, or by increasing the number of tracers beingobserved, or the number of observation stations in the Eulerian case, that the posterior distributionconverges to an increasingly sharply peaked measure on the field/forcing function that created thedata, with and without model error. Proving such a result presents an interesting mathematicalchallenge which will give insight into the large data scenario.

We have also seen certain scenarios in which the random walk method was pushed up to, andbeyond its’ boundaries of efficacy. Implementation of gradient-based methods such as the MALAalgorithm, as shown in [13], is also of great interest, and may be useful in tackling these morecomplex posterior densities.

There are also many other application domains to which the algorithmic methodology developedhere could be applied and another future direction will be to undertake similar studies to those herefor problems arising in other application areas such as subsurface geophysics and applications inimage processing such as shape registration.

REFERENCES

1. A. Apte, M. Hairer, A.M. Stuart, and J. Voss. Sampling the posterior: An approach to non-Gaussian data assimilation.Physica D, 230:50–64, 2007.

2. A. Apte, C.K.R.T. Jones, and A.M. Stuart. A Bayesian approach to Lagrangian data assimilation. 2007. Tellus60(2008), 336–347.

3. A. Apte, C.K.R.T. Jones, A.M. Stuart and J. Voss, Data assimilation: mathematical and statistical perspectives. Int.J. Num. Meth. Fluids 56(2008), 1033–1046.

4. A.F. Bennett. Inverse Modeling of the Ocean and Atmosphere. Cambridge University Press, 2002.5. L.M. Berliner. Monte Carlo based ensemble forecasting. Stat. Comput., 11:269–275, 2001.6. A. Beskos and A.M. Stuart, MCMC Methods for Sampling Function Space, Proceedings of ICIAM 2007, to appear.7. A. Beskos and A. M. Stuart. Computational complexity of Metropolis-Hastings methods in high dimensions. In

Proceedings of MCQMC08, 2008, 2010.8. V.I. Bogachev. Gaussian Meausures. American Mathematical Society, 1998.

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 18: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

18 S.L. COTTER, M. DASHTI, J.C. ROBINSON, A.M. STUART

9. A. Beskos, G.O. Roberts, A.M. Stuart and J. Voss. An MCMC Method for diffusion bridges. Stochastics andDynamics, to appear.

10. D. Calvetti and E. Somersalo. Large-scale statistical parameter estimation in complex systems with an applicationto metabolic models. Multiscale modeling and simulation, 5:1333–1366, 2008.

11. J.-Y. Chemin and N. Lerner. Flot de champs de vecteurs non lipschitziens et equations de Navier-Stokes. J.Differential Equations, 121: 314–328, 1995

12. S.L. Cotter, PhD Thesis, Applications of MCMC Sampling on Function Spaces. Warwick University, 2010.13. S.L.Cotter, M.Dashti, J.C. Robinson and A.M. Stuart, Bayesian inverse problems for functions and applications to

fluid mechanics. Inverse Problems, 25, 115008, 2009.14. S.L.Cotter, M.Dashti and A.M. Stuart, Approximation of Bayesian inverse problems for PDEs. SIAM Journal on

Numerical Analysis, to appear 2010.15. S.L. Cotter, G.O. Roberts, A.M. Stuart and D.White, In preparation. Statistical Science, Submitted, 2010.16. P. Courtier. Dual formulation of variational assimilation. Quart. J. Royal Met. Soc., 123:2449–2461, 1997.17. P. Courtier, E. Anderson, W. Heckley, J. Pailleux, D. Vasiljevic, M. Hamrud, A. Hollingworth, F. Rabier, and

M. Fisher. The ECMWF implementation of three-dimensional variational assimilation (3d-var). Quart. J. Roy.Met. Soc., 124:1783–1808, 1998.

18. P. Courtier and O. Talagrand. Variational assimilation of meteorological observations with the adjoint vorticityequation. ii: numerical results. Quart. J. Royal Met. Soc., 113:1329–1347, 1987.

19. G. DaPrato and J. Zabczyk, Stochastic equations in infinite dimensions. Cambridge, 1992.20. M. Dashti and J.C. Robinson, Uniqueness of the particle trajectories of the weak solutions of the two-dimensional

Navier-Stokes equations. Nonlinearity, 22(2009), 735–746.21. J.C. Derber. A variational continuous assimilation technique. Mon. Wea. Rev., 117:2437–2446, 1989.22. P. Dostert, Y. Efendiev, Hou T.Y., and W. Luo. Coarse-grain langevin algorithms for dynamic data integration. J.

Comp. Phys., 217:123–142, 2006.23. G. Evensen. Data Assimilation: the Ensemble Kalman Filter. Springer, New York, 2007.24. C.L. Farmer. Bayesian field theory applied to scattered data interpolation and inverse problems. In Algorithms for

Approximation, Editors A. Iske and J. Leveseley, pages 147–166. Springer, 2007.25. F. Fang, C.C. Pain, I.M. Navon, M.D. Piggott, G.J. Gorman, P.E. Farrell, and A.J.H. Allison, P.A. asnd Goddard. A

POD reduced-order 4D-Var adaptive mesh ocean modelling approach. Int. J. Num. Meth. Fluids, 60:709–732, 2009.26. W.K. Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, 57:97–109,

1970.27. M. Jardak, I.M. Navon, and M. Zupanski. Comparison of sequential data assimilation methods for the Kuramoto-

Sivashinsky equation. Int. J. Num. Meth. Fluids, 62:374–402, 2010.28. M.A.Lifshits. Gaussian Random Functions, volume 322 of Mathematics and its Applications. Kluwer, Dordrecht,

1995.29. R. Herbei and I.W. McKeague. Geometric ergodicity of hybrid samplers for ill-posed inverse problems. Scand. J.

Stat., 2009.30. R. Herbei, I.W. McKeague, and K. Speer. Gyres and jets: inversion of tracer data for ocean circulation structure. J.

Phys. Ocean., 38:1180–1202, 2008.31. M.J. Martin, M.J. Bell, and N.K. Nichols. Estimation of systematic error in an equatorial ocean model using data

assimilation. Int. J. Num. Meth. Fluids, 40:435–444, 2002.32. N. Metropolis, R.W. Rosenbluth, M.N. Teller, and E. Teller. Equations of state calculations by fast computing

machines. J. Chem. Phys., 21:1087–1092, 1953.33. S.P. Meyn and R.L. Tweedie. Markov chains and stochastic stability. Communications and Control Engineering

Series. Springer-Verlag London Ltd., London, 1993.34. K. Ide, L. Kuznetsov, and C.K.R.T. Jones. Lagrangian data assimilation for point-vortex system. J. Turbulence,

3:053, 200235. E. Kalnay. Atmospheric Modeling, Data Assimilation and Predictability. Cambridge University Press, 2003.36. L. Kuznetsov, K. Ide, and C.K.R.T. Jones. A method for assimilation of Lagrangian data. Monthly Weather Review,

131:2247, 2003.37. J. Kaipio and E. Somersalo. Statistical and Computational Inverse Problems. Springer, 2004.38. J.P. Kaipio and E. Somersalo. Statistical inversion and Monte Carlo sampling methods in electrical impedance

tomography. Inverse Problems, 16:1487–1522, 2000.39. J.P. Kaipio and E. Somersalo. Statistical inverse problems: discretization, model reduction and inverse crimes. J.

Comp. Appl. Math., 198:493–504, 2007.40. J. Liu, Monte Carlo Strategies in Scientific Computing. Springer Texts in Statistics, Springer-Verlag, New York,

2001.41. A.C. Lorenc and T. Payne. 4D-Var and the butterfly effect: Statistical four-dimensional data assimilation for a wide

range of scales. Q. J. Royal Met. Soc. 133(2007), 607–614.42. D. McLaughlin, and L. R. Townley, A Reassessment of the Groundwater Inverse Problem, Water Resour. Res.,

32(5)(1996), 1131–1161.43. I.W. McKeague, G. Nicholls, K. Speer, and R. Herbei. Statistical inversion of south atlantic circulation in an abyssal

neutral density layer. J. Marine Res., 63:683–704, 2005.44. A.M. Michalak and P. K. Kitanidis, A method for enforcing parameter nonnegativity in Bayesian inverse problems

with an application to contaminant source identification, Water Resour. Res., 39(2)(2003), 1033.45. K. Mosegaard and A. Tarantola. Monte Carlo sampling of solutions to inverse problems. Journal of Geophysical

Research, 100(1995), 431–447.46. M. Nodet, Variational Assimilation of Lagrangian Data in Oceanography, Inverse Problems, 22(2006), 245–263.47. C.P. Robert and G.C. Casella, Monte Carlo Statistical Methods. Springer Texts in Statistics, Springer-Verlag, 1999.48. G.O. Roberts and J. Rosenthal, Optimal scaling for various Metropolis-Hastings algorithms. Statistical Science

16(2001), 351–367.

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld

Page 19: Variational Data Assimilation Using Targetted Random Walkshomepages.warwick.ac.uk/~masdr/PREPRINTS/nsemc.pdf · 2011-01-06 · VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM

VARIATIONAL DATA ASSIMILATION USINGTARGETTED RANDOM WALKS 19

49. J.C. Robinson, Infinite Dimensional Dynamical Systems. Cambridge University Press, 2001.50. P. Spanos and R. Ghanem, Stochastic finite element expansion for random media, J. Eng. Mech. 115(1989), 1035–

1053.51. A.M. Stuart, Inverse Problems: A Bayesian Approach. Acta Numerica 2010.52. L. Tierney. A note on Metropolis-Hastings kernels for general state spaces. Ann. Appl. Probab., 8(1):1–9, 1998.53. A. Tarantola. Inverse Problem Theory. SIAM, 2005.54. R. Temam. Navier-Stokes Equations and Nonlinear Functional Analysis, Regional Conference Series in Applied

Mathematics, SIAM, Philadelphia, 1983.55. R. Todor and C. Schwab, Convergence rates for sparse chaos approximations of elliptic problems with stochastic

coefficients, IMA J. Num. Anal. 27(2007),232–261.56. C.R. Vogel. Computational Methods for Inverse Problems. SIAM, Frontiers in Applied Mathematics 23,

Philadelphia, 2002.Acknowledgements AMS is grateful to EPSRC, ERC and ONR for financial support. MD wassupported by the Warwick Postgraduate Fellowship fund and, as a postdoc, by the ERC. Theresearch of SLC has received postgraduate funding from EPSRC, and postdoctoral funding fromthe ERC (under the European Community’s Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement No. 239870) and from Award No KUK-C1-013-04, made by King AbdullahUniversity of Science and Technology (KAUST).

Copyright c© 0000 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Fluids (0000)Prepared using fldauth.cls DOI: 10.1002/fld