Bayesian inference using sparse grid collocation...To this end, some authors introduce a truncated...

IOP PUBLISHING INVERSE PROBLEMS

Inverse Problems 25 (2009) 035013 (27pp) doi:10.1088/0266-5611/25/3/035013

An efficient Bayesian inference approach to inverseproblems based on an adaptive sparse grid collocationmethod

Xiang Ma and Nicholas Zabaras1

Materials Process Design and Control Laboratory, Sibley School of Mechanical and AerospaceEngineering, 101 Frank H. T. Rhodes Hall, Cornell University, Ithaca, NY 14853-3801, USA

E-mail: [email protected]

Received 3 October 2008, in final form 10 December 2008Published 3 Februray 2009Online at stacks.iop.org/IP/25/035013

Abstract

A new approach to modeling inverse problems using a Bayesian inferencemethod is introduced. The Bayesian approach considers the unknownparameters as random variables and seeks the probabilistic distribution ofthe unknowns. By introducing the concept of the stochastic prior statespace to the Bayesian formulation, we reformulate the deterministic forwardproblem as a stochastic one. The adaptive hierarchical sparse grid collocation(ASGC) method is used for constructing an interpolant to the solution of theforward model in this prior space which is large enough to capture all thevariability/uncertainty in the posterior distribution of the unknown parameters.This solution can be considered as a function of the random unknownsand serves as a stochastic surrogate model for the likelihood calculation.Hierarchical Bayesian formulation is used to derive the posterior probabilitydensity function (PPDF). The spatial model is represented as a convolution ofa smooth kernel and a Markov random field. The state space of the PPDF isexplored using Markov chain Monte Carlo algorithms to obtain statistics ofthe unknowns. The likelihood calculation is performed by directly samplingthe approximate stochastic solution obtained through the ASGC method. Thetechnique is assessed on two nonlinear inverse problems: source inversion andpermeability estimation in flow through porous media.

(Some figures in this article are in colour only in the electronic version)

1. Introduction

Inverse problems arise frequently in diverse engineering applications, e.g. heat transfer,geophysics, fluid mechanics and solid mechanics. In a typical inverse problem, one is interested

1 Corresponding author.

0266-5611/09/035013+27$30.00 © 2009 IOP Publishing Ltd Printed in the UK 1

http://dx.doi.org/10.1088/0266-5611/25/3/035013

mailto:[email protected]

http://stacks.iop.org/IP/25/035013

Inverse Problems 25 (2009) 035013 X Ma and N Zabaras

in identifying the initial, boundary and/or material properties given sensor measurements ofthe dependent variable inside the domain. A typical example is that of estimating permeabilityfrom measurements of flow data. The inverse problem is often ill-posed in the sense that itssolution may not exist or may not be unique. The majority of the deterministic approachesrestate the problem as a least-squares minimization problem and lead to estimates of unknownswithout rigorously considering system uncertainties and without providing quantification ofuncertainty in the inverse problem [1, 2]. Several methods have been introduced to addressinverse problems under uncertainties, such as the extended maximum likelihood method [3],the spectral stochastic method [4, 5], the sparse grid collocation approach [6] and the Bayesianinference approach [7, 8].

The Bayesian inference approach provides a systematic means of taking systemvariabilities and parameter fluctuations into account. This framework formulates acomplete probabilistic description of the unknown parameters and system uncertainties givenmeasurement data [7]. The Bayesian approach incorporates the known information regardingthe unknown parameters into a prior distribution model that is then combined with thelikelihood to formulate the posterior probability density function (PPDF). The PPDF servesas a solution to the inverse problem, and various statistics can be estimated from the samplesof this distribution, such as the mean, marginal distribution and quantiles. This methodologyhas been used with great success to solve a variety of problems [9–14].

Among the components of Bayesian formulation, the choice of the prior modeling affectssignificantly the accuracy of the solution to the inverse problem. In addition, inverse problemsinvolving unknown spatial or temporal fields, such as permeability, are generally very ill-posed, since the unknown is infinite dimensional. A standard Bayesian approach is to employGaussian process (GP) or Markov random field (MRF) priors [13, 15]. Then the unknownfield is discretized on a set of finite grid points and its value is obtained on these points.Therefore, the dimension of the unknown parameters is generally very large and one seeksthe solution in a high-dimensional prior space. This presents a computational difficulty forthe Bayesian approach. To this end, some authors introduce a truncated Karhunen–Loeve(KL) expansion to reduce the dimensionality of the unknown parameter space and transformthe inverse problem to one that infers the coefficients of the expansion [16, 17]. However,this method often requires that the covariance function of the prior stochastic space is knowna priori. In this paper, a process convolution approach is used for the modeling of spatialprocesses [18]. The spatial model is represented as a convolution of smooth kernels and whitenoise process on a set of discrete points in the physical domain [19]. Thus, the dimensionalityof the parameter space is significantly reduced which greatly aids in the ability to conductinference.

With the recent propagation of Markov chain Monte Carlo (MCMC) simulation methods[20], the application of Bayesian inference to engineering inverse problems becomes tractable.MCMC provides a large sample data set drawn from the PPDF. These samples can be used toapproximate the expectation of any function of the random unknowns. Running a Markov chainusually involves a repetitive solution to the direct problem, which is prohibitively expensivefor most nonlinear inverse problems associated with partial differential equations (PDEs). Anumber of methods have been proposed to address this computational challenge. In [16],a two-stage MCMC algorithm is introduced where by using the results from a coarse-scalemodel, the acceptance rate of the MCMC proposals on the fine-scale model is improved. In[10, 21], proper orthogonal decomposition (POD) is used to construct a reduced-order modelfor the direct simulations.

The authors in [22] utilized the stochastic response surface method (SRSM) in order toprovide a statistically equivalent reduced model in the numerical Bayesian inference step [23].

2


The SRSM represents a solution to the stochastic forward model in terms of polynomial chaos(PC) expansion [24] and the coefficients are calculated through collocation or regression fromthe results of a limited number of model simulations at a set of Gauss quadrature points.This is the first time in Bayesian inference to incorporate ideas from the area of uncertaintyquantification (UQ) [24, 25]. Based on this, the stochastic Galerkin method (SGM) wasused for contaminant source inversion [26]. The SGM is another popular technique in UQdue to its fast convergence that also uses PC to approximate the solution to the stochasticforward model. However, unlike the SRSM, the expansion coefficients are determined byan intrusive Galerkin projection. Both methods regard the unknown parameters as randomvariables/processes and the forward problem becomes a system of stochastic PDEs. Forcomputing the forward propagation of the prior uncertainty, the solution is expanded in termsof PC in the prior space. This serves as a computationally efficient surrogate of the originalmodel for the likelihood calculation. However, both methods are subject to the so-called curseof dimensionality. The SRSM employs the tensor product-type collocation points, which limitits usage in high-dimensional stochastic space. Both methods rely exclusively on the Wiener–Askey chaos [24, 25]. The required number of polynomial terms in the SRSM and SGMincreases combinatorially with the number of stochastic dimensions and expansion orders,thus reducing their efficiency. Also, the SGM results in a set of coupled equations for theunknown expansion coefficients which makes its implementation extremely complex if thestochastic dimension is large. In addition, it requires substantial effort to convert a legacydeterministic code into a stochastic one. Although the SRSM can use the computer code as a‘black box’, the expansion coefficients are obtained by solving a system of linear equations.This is not an easy task if the number of unknowns in the system of linear equations is large.This discussion motivates the search for a method that couples the fast convergence of theSGM with the decoupled nature of the SRSM. This is achieved by the stochastic collocationmethod (SCM) [27–29].

The conventional sparse grid collocation (CSGC) method uses the Smolyak algorithm[30] to construct an interpolant of the solution to the stochastic forward problem in the high-dimensional stochastic space. Using this method, interpolation schemes can be constructedwith orders of magnitude reduction in the number of sampled points to give the same level ofapproximation (up to a logarithmic factor) as interpolation on a uniform grid (tensor product).Ma and Zabaras [31] extended this methodology to adaptive sparse grid collocation (ASGC).This method uses hierarchical surplus as an error indicator to detect a non-smooth regionin the stochastic space and place automatically more points around this region. It results infurther computational gains and guarantees that a user-defined error threshold is met. In [21],a similar idea is introduced in Bayesian inference; however, only tensor product interpolationis used. In this paper, we use ASGC to obtain an approximate stochastic solution to thestochastic forward problem using piecewise linear interpolation. As in the SRSM and SGM,this approximation then serves as a surrogate of the stochastic forward model for the likelihoodcalculation. Instead of solving the deterministic forward model within each MCMC iterationfor each proposed sample, we can easily calculate the function value of this surrogate modelat each sample point to obtain the likelihood within certain accuracy. Unlike the SRSM, dueto the interpolatory nature, no linear system solver is involved in ASGC.

The outline of the paper is as follows. In the following section, the fundamental ideas ofBayesian inference are described. Section 3 presents ASGC as an accurate surrogate stochasticforward model. Hierarchical Bayesian formulation and the MCMC method are introducedin section 4. Section 5 presents applications to source inversion and permeability estimationproblems. Finally, concluding remarks are given in section 6.

3


2. Bayesian inference approach to inverse problems

2.1. Mathematical preliminaries

Let us define a complete probability space (�,F,P) with a sample space � which correspondsto the outcomes of some experiments, F being the σ -algebra of subsets in � (these subsets arecalled events) and P : F → [0, 1] the probability measure [32]. In this framework, a singlereal-valued random variable M is defined as a function that maps the probability space � toR, i.e.

M : � → R, (1)

which assigns to each element ω of � a real value M(ω). We define m = M(ω), ω ∈ �, arealization of M. In this paper, we will restrict ourselves to continuous random variables. Fora single-valued random variable M, the set of values of M for all ω ∈ � is called the imageM(�) of �, i.e.

�M = {M(ω) : ω ∈ �} ⊆ R. (2)

That is, �M is the range (of all values) of M on the real line and therefore it is sometimesalso called the state space of M. Let f : R → R be a real-valued function. The compositionU = f ◦ M is a function from � into R, defined by

U(ω) = f (M(ω)) for all ω ∈ �, (3)

with the state space �U . For each u = U(ω) ∈ �U , we have

u = f (m) for all m ∈ �M. (4)

Thus, the function U = f ◦M also defines a random variable, thus called a stochastic functionsince it is also a function of random variables.

The above definition can be generalized to vectors of random variables. Let us assumethat {Mi}Ni=1 are components of M : � → R

N and their images �i ≡ Mi(�) are boundedintervals in R for i = 1, . . . , N . Then the state space of M is defined as

� ≡N∏

i=1

�i ⊂ RN, (5)

with the joint PDF denoted as p(m). For example, if Mi is an independent uniform randomvariable in [−1, 1], then � = [−1, 1]N . In this work, it is assumed that the space � is bounded.

2.2. Bayesian inference formulation

We consider the general Bayesian inference problem for the following forward problem:

F(m) ≈ d, (6)

where m is a vector of unknown model parameters and d is a vector of observable data(measurements). The forward model F yields predictions of the data as a function of theunknown parameters through numerical methods, such as the finite element method (FEM).Without loss of generality, the data d are the solution u to the forward problem G(m, u) = 0at given sensor locations. In most engineering problems, this model is often nonlinear and thecomputer code is only available as a ‘black box’.

In Bayesian inference, both m and d are assumed to be random variables. A Bayesianinference approach derives the conditional probability density function of the unknown

4


parameters given observed data. This conditional density function is called the posteriorprobability density function (PPDF) and can be derived according to Bayes’ formula:

p(m|d) ∝ p(d|m)p(m), (7)

where p(m|d) is the PPDF, the conditional probability p(d|m) is the likelihood function andthe marginal density p(m) is called the prior probability density function. Data enter theformulation through the likelihood p(d|m). To evaluate the likelihood, a simulation run isneeded. A most common and simple model assumes that the experimental noise is independentadditive Gaussian random errors with mean zero and standard deviation σ as follows:

d = F(m) + ζ, (8)

where the components of ζ are ζi ∼ N(0, σ 2). Then the likelihood function can be written as

p(d|m) ∝ (σ 2)−n/2exp

(−‖F(m) − d‖2

2

2σ 2

), (9)

where ‖ · ‖2 refers to the Euclidean norm and n is the number of measurements. It is wellrecognized that in a large data set with random errors, the Gaussian distribution fits quite wellthe actual distribution.

The MCMC method is used to explore the PPDF. In this method, in order to find thevalue of the likelihood function (see equation (9)), the forward problem F(m) is solved in adeterministic way for each proposed sample m. Thus, in the classic MCMC method, we do nottake fully into account the stochastic nature of the forward model, i.e. its response is actuallya stochastic function defined in the sample space; see equation (3). In the following section,we introduce a novel way to calculate the likelihood function through probability theory.

2.3. Stochastic forward problem

As discussed previously, the unknown parameters are considered as a random vector. In thisframework, the prior distribution of M is assumed to be known. Based on the probabilitytheory introduced in section 2.1, we can define a complete probability space (�, T ,P), wherethe sample space � is the set of all possible outcomes of M. The forward model is now takenas a stochastic model. Find a function u : � → R such that for P-almost everywhere (a.e.)ω ∈ �, the following equation holds:

G(M(ω); u(ω)) = 0, for all ω ∈ �. (10)

The random vector M has a state space � and a joint PDF p(m) which is the prior PDF definedin the Bayesian formulation in equation (7). Therefore, for ω ∈ �, the realization of randomvector m = M(ω) takes value in �. We define � as the stochastic prior state space. Throughequation (2), the stochastic posterior state space can be also defined as

� = {M(ω) : ω ∈ T }, (11)

where the event space T is the inverse image (under M) of �. It is clear that the posteriorstate space is a subset of the prior state space since observations give us more informationabout the unknowns. We denote u(x, ω) as the predicted data of the forward problem (10).Since the unknown parameters are always discretized into a random vector M, then by usingthe Doob–Dynkin lemma [33], u is also a function of M, i.e. u(x, M(ω)). Also, define D as ad-dimensional bounded physical domain D ⊂ R

d(d = 1, 2, 3). Therefore, we can restate thedeterministic forward problem as the following stochastic forward problem: find a functionu(x, m) : D × � → R

n such that the following holds:

G(x, m; u) = 0, (x, m) ∈ D × �, (12)

5


with appropriate boundary conditions. Various methods (stochastic Galerkin or stochasticcollocation method) can be used to solve equation (12) in the prior space and the resultingequations become a set of deterministic equations in the physical space that can be solvedby any standard deterministic discretization technique. In the following section, a newlydeveloped method for finding u is reviewed [31].

It is clear that after obtaining the approximation of u(x, m) in the prior space, we actuallyhave an explicit functional form of u for the predicted data as a function of m. In other words,for each realization m = M(ω), ω ∈ �, the function value u(m) gives one realization of thepredicted data (see equation (4)), which is equivalent to the solution of the deterministic forwardproblem using the same m as the input. In this way, the repetitive solution of the deterministicforward problem in MCMC is substituted with the solution to the stochastic forward problem(12). Therefore, u(x, m) is called the stochastic surrogate model. The likelihood functionwhich is calculated through the stochastic surrogate model is called surrogate likelihood, i.e.

p(d|m) ∝ (σ 2)−n/2exp

(−‖u(m) − d‖2

2

2σ 2

). (13)

Remark 1. It is noted here that we only consider a bounded prior space �. When this spaceis unbounded, e.g. in the case of a Gaussian random variable, we can always truncate it toa bounded one based on some prior information about the unknowns. The only requirementis that the prior space is large enough to contain the posterior space completely, i.e. � ⊂ �,since it is always possible to choose a truncated prior state space �, such that T ⊂ M−1(�).

3. Adaptive sparse grid collocation method for Bayesian inference

In this section, we consider using the ASGC method to construct an accurate low-complexitysurrogate model u(x, m) for the stochastic forward problem defined in equation (12). Therepetitive evaluation of the deterministic forward problem is then reduced to sampling fromthe surrogate model. We briefly describe the development of the ASGC strategy here. Formore details, the interested reader is referred to [31].

The basic idea of this method is to have a finite element approximation for thespatial domain and approximate the multi-dimensional stochastic space � using interpolatingfunctions on a set of collocation points {mi}ki=1 ∈ �. Suppose that we can find a finiteelement approximate solution u to the deterministic solution of the problem in equation (6) foreach realization mi ; we are then interested in constructing an interpolant of u by using linearcombinations of the solutions u(·, mi ). The interpolation can be constructed by using either afull-tensor product of a 1D interpolation rule or the so-called sparse grid interpolation methodbased on the Smolyak algorithm [30].

3.1. Smolyak algorithm

The Smolyak algorithm provides a way to construct interpolation functions based on a minimalnumber of points in a multi-dimensional space. In this algorithm, the stochastic prior statespace is assumed to be the bounded hypercube � = [0, 1]N .

Let us consider a smooth function f : [0, 1]N → R. In the 1D case (N = 1), we considerthe following interpolation formula to approximate f :

U i (f ) =ki∑

j=1

f(mi

j

) · aij , (14)

6


with the set of support nodes Xi = {mi

j

∣∣mij ∈ [0, 1] for j = 1, 2, . . . , ki

}, where

i ∈ N, aij ≡ aj

(mi

j

) ∈ C([0, 1]) are the interpolation nodal basis functions and ki is thenumber of elements of set Xi . In the context of incorporating adaptivity, we have utilized theNewton–Cotes grid using equidistant support nodes [31]. The number of nodes is defined aski = 1, if i = 1; else ki = 2i−1 + 1. Then the support nodes are

mij =

⎧⎨⎩

j − 1

ki − 1, for j = 1, . . . , ki, if ki > 1,

0.5, for j = 1, if ki = 1.

(15)

By using equidistant nodes, it is easy to refine the grid locally. Furthermore, by using thelinear hat function as the univariate nodal basis function [34], one ensures a local support incontrast to the global support of Lagrange polynomials [31]. This ensures that discontinuitiesin the stochastic space can be resolved. The piecewise linear basis functions can be definedas a1

1 = 1, for i=1, and

aij =

{1 − (ki − 1) · ∣∣m − mi

j

∣∣, if∣∣m − mi

j

∣∣ < 1/(ki − 1),

0, otherwise,(16)

for i > 1 and j = 1, . . . , ki .Using the nested property (Xi ⊂ Xi+1) of the grid points, we can rewrite equation (14) in a

hierarchical fashion. We define i(f ) = U i (f )−U i−1(f ). With U i (f ) = ∑mi

j ∈Xi aij ·f (

mij

)and U i−1(f ) = U i (U i−1(f )), we obtain [31]

i(f ) =∑

mij ∈Xi

aij · (

f(mi

j

) − U i−1(f )(mi

j

)), (17)

and, since f(mi

j

) − U i−1(f )(mi

j

) = 0,∀mij ∈ Xi−1, we obtain

i(f ) =∑

mij ∈Xi

aij · (

f(mi

j

) − U i−1(f )(mi

j

)), (18)

recalling that Xi = Xi \ Xi−1. Clearly, Xi

has ki = ki − ki−1 points, since Xi−1 ⊂ Xi . By

consecutively numbering the elements in Xi, and denoting the j th point of Xi

as mij , we can

rewrite the above equation as [31]

i(f ) =ki∑

j=1

aij · (

f(mi

j

) − U i−1(f )(mi

j

))︸︷︷︸wi

j

. (19)

Here, we define wij as the 1D hierarchical surplus, which is just the difference between the

function values at the current and the previous interpolation levels.In the multivariate case (N > 1), the tensor product formulae are defined as

(U i1 ⊗ · · · ⊗ U iN )(f ) =k1∑

j1=1

· · ·kN∑

jN =1

f(m

i1j1, . . . , m

iNjN

) · (a

i1j1

⊗ · · · ⊗ aiNjN

), (20)

which serve as building blocks for the Smolyak algorithm. The N-dimensional multilinearbasis functions can be defined as ai

j(m) := ai1j1

⊗· · ·⊗aiNjN

= ∏Np=1 a

ipjp

, where the multi-index

i = (i1, . . . , iN ) ∈ NN and the multi-index j = (j1, . . . , jN) ∈ N

N . Here ip, p = 1, . . . , N , isthe level of interpolation along the pth direction and jp, p = 1, . . . , N , denotes the locationof a given support node in the pth dimension. Furthermore, through a new multi-index set

Bi := {j ∈ N

N : mipjp

∈ Xip for jp = 1, . . . , k

ip , p = 1, . . . , N

}, (21)

we can define the hierarchical basis as{ai

j : j ∈ Bp, p � i}.

7


0.5

0 1

0.25 0.75

0.125 0.375 0.625 0.875

. . . . . . . . . . . . . . . . . . . . . . . .

Figure 1. 1D tree-like structure of the sparse grid.

Using the 1D equation (19), the sparse interpolant Aq,N , where q is the depth of sparsegrid interpolation (q � 0, q ∈ N0) and N is the number of stochastic dimensions, is given bythe Smolyak algorithm as

Aq,N (f ) = Aq−1,N (f ) + Aq,N (f ),

Aq,N (f ) =∑

|i|=N+q

(i1 ⊗ · · · ⊗ iN ), (22)

with A−1,N = 0 and where |i| = i1 + · · · + iN . This can be further simplified as

Aq−1,N (f ) =∑

|i|�N+q−1

(i1 ⊗ · · · ⊗ iN ), (23)

and

Aq,N (f ) =∑

|i|=N+q

∑j∈Bi

(a

i1j1

⊗ · · · ⊗ aiNjN

)· (f (m

i1j1, . . . , m

iNjN

) − Aq−1,N (f )(m

i1j1, . . . , m

iNjN

)). (24)

Here, we define

wij = f

(m

i1j1, . . . , m

iNjN

) − A|i|−1,N (f )(m

i1j1, . . . , m

iNjN

)(25)

as the hierarchical surplus, which is just the difference between the function value at thecurrent point and interpolation value from the coarser grid. As described in [31], we canwork in either the nodal basis functional space or the hierarchical basis space. For smoothfunctions, the hierarchical surpluses tend to zero as the interpolation level tends to infinity. Onthe other hand, for non-smooth functions, steep gradients/finite discontinuities are indicatedby the magnitude of the hierarchical surplus. The bigger the magnitude is, the stronger theunderlying discontinuity is. Therefore, the hierarchical surplus is a natural candidate for errorcontrol and implementation of adaptivity. The interpolation error is given in [31].

3.2. Adaptive sparse grid interpolation

The 1D equidistant points of the sparse grid can be considered as a tree-like data structure asshown in figure 1. We can consider the interpolation level of a grid point m as the depth of thetree. Denote the father of a grid point as F(m), where the father of the root 0.5 is itself, i.e.F(0.5) = 0.5.

We denote the sons of a grid point m = (m1, . . . , mN) by

Sons(m) = {S = (S1, S2, . . . , SN)|(F (S1), S2, . . . , SN) = m or

(S1, F (S2), . . . , SN) = m, . . . , (S1, S2, . . . , F (SN)) = m}.(26)

8


From this definition, it is noted that, in general, for each grid point there are two sons in eachdimension; therefore, for a grid point in a N-dimensional stochastic space, there are 2N sons.It is also noted that the sons are also the neighbor points of the father. The neighbor points arejust the support nodes of the hierarchical basis functions in the next interpolation level [31].By adding the neighbor points, we actually add the support nodes from the next interpolationlevel, i.e. we perform interpolation from level |i| to level |i| + 1. Therefore, in this way, werefine the grid locally while not violating the developments of the Smolyak algorithm (24).

The basic idea here is to use hierarchical surpluses as an error indicator to detect thesmoothness of the solution and refine the hierarchical basis functions ai

j whose magnitude ofthe hierarchical surplus satisfies

∣∣wij

∣∣ � ε. If this criterion is satisfied, we simply add the2N neighbor points of the current point from equation (26) to the sparse grid. Therefore,let ε > 0 be the parameter for the adaptive refinement threshold. We propose the followingiterative refinement algorithm beginning with the coarsest adaptive sparse grid GN,N , i.e. withthe N-dimensional multi-index i = (1, . . . , 1), which is just the point (0.5, . . . , 0.5).

Algorithm I

(i) Set a level of Smolyak construction q = 0.(ii) Construct the first-level adaptive sparse grid GN,N .

• Calculate the function value at the point (0.5, . . . , 0.5).• Generate the 2N neighbor points and add them to the active index set.• Set q = q + 1.

(iii) While q � qmax and the active index set is not empty:

• Copy the points in the active index set to an old index set and clear the active indexset.

• Calculate in parallel the hierarchical surplus of each point in the old index setaccording to

wij = f

(m

i1j1, . . . , m

iNjN

) − Aq−1,N (f )(m

i1j1, . . . , m

iNjN

). (27)

Here, we use all of the existing collocation points in the current adaptive sparse gridGN+q−1,N . This allows us to evaluate the surplus for each point from the old indexset in parallel.

• For each point in the old index set, if∣∣wi

j

∣∣ � ε,– Generate 2N neighbor points of the current active point according to equation

(26).– Add them to the active index set.

• Add the points in the old index set to the existing adaptive sparse grid Gq−1,N . Nowthe adaptive sparse grid becomes Gq,N .

• q = q + 1.

3.3. From adaptive sparse grid interpolation to Bayesian likelihood calculation

By using the ASGC method, the stochastic solution u(x, m) to the stochastic forward problem(12) can now be approximated by the following reduced form from equation (24):

u(x, m) =∑

|i|�N+q

∑j∈Bi

wij(x) · ai

j(m). (28)

This is just a simple weighted sum of the value of the basis functions for all collocation pointsin the current sparse grid [31]. The hierarchical surplus wi

j is computed through equation (25)

9


by solving the deterministic problem at each collocation point. It is also noted that we need toconstruct the approximation u(x, m) at each measurement location.

To obtain the predicted data u for any m ∈ �, instead of solving the deterministicforward problem, we can simply substitute m into equation (28) to compute the value of uwithin a certain accuracy. Furthermore, we can easily compute the value of the likelihoodfunction in equation (13). In this case, most of the computational time is spent on constructingthe interpolant for the solution. This construction is embarrassingly parallel and thus thecomputational time is minimal. Once the interpolation in equation (28) is constructed, we cansimply store it for future use even when new observed data are available.

Remark 2. It is noted that we only consider the unit hypercube here. However, other shapesof the bounded prior space can be easily transformed to the unit hypercube.

4. Bayesian inference approach

After introducing the ASGC method to construct the stochastic surrogate model and computethe likelihood through function value evaluation, we are ready to review the Bayesian inferenceapproach and the numerical method to explore the posterior state space.

4.1. Markov random field as prior distribution

To define a proper stochastic forward problem, it is important to find the prior space first. Apossible choice is to choose a non-informative prior, e.g. uniform distribution. However, auniform prior may not provide sufficient regularity to the solution of the inverse problem. Onthe other hand, a Gaussian prior density provides regularity to the solution in a similar way asTikhonov regularization [9]. Since the Gaussian random variable is unbounded, it is criticalto truncate the unbounded state space to a suitable bounded prior space based on the priorinformation.

Let us consider an inverse problem in which the unknown quantities comprise a real-valued field k(x). In the general Bayesian setting, this field and the forward model must bediscretized. If the finite element method is used, the field is discretized onto the nodes of thefinite element mesh. Denote by Nh the number of nodes of the mesh; then we can write boththe prior and posterior densities in terms of the unknown parameter m = (k(x1), . . . , k(xNh

)).A possible choice for the prior of m is the Gaussian process (GP), i.e. multivariate Gaussiandistribution. Another popular model is via a pair-wise MRF [12, 13]. A Gaussian MRF forthe unknown m is of the form

p(m) ∝ λN/2 exp

{−λ

2mT Wm

}, (29)

where N is the dimension of m. In the one-parameter model of equation (29), the entries of theN × N matrix W are determined as Wij = ni if i = j,Wij = −1 if sites i and j are adjacent(termed as neighbor sites) and Wij = 0 otherwise. Here, ni is the number of neighbors of sitei and λ is a scaling parameter. The neighborhood of a site is defined by its spatially adjacentneighbors. This model captures spatial dependence through a single parameter λ and hasbeen shown to perform well in several application areas, such as heat conduction [9], flowestimation [12, 13] and image processing [15].

However, it is noted that the dimension of m is generally very high since Nh is above1000 in a typical finite element mesh. Exploring the posterior state space using MCMC isnot trivial and the acceptance rate is generally very low. To this end, we use the processconvolution approach to reduce the dimensionality. For example, one can approximate the

10


GP by a convolution of kernels at a discrete set of Nm points xi , i = 1, . . . , Nm, and anindependent white noise process ψ = (ψ1, . . . , ψNm

) [18]. Then the GP k(x) at any locationk(x) ∈ D can be represented as

k(x) =Nm∑i=1

g(x − xi )ψi(xi ), (30)

where the kernel function g(x) controls the spatial structure of the underlying process andψi(xi ) are independent N

(0, λ−1

ψ

). Thus, the prior density of ψ is

p(ψ) ∝ λNm/2ψ exp

{−λψ

2ψT ψ

}. (31)

The inverse problem has now been transformed to an inference problem on the coefficientsψi, i = 1, ..., Nm. Once ψi are computed using Bayesian inference, the unknown field k(x)

is determined by equation (30).As an alternative to using an underlying white noise process, if the locations of the kernels

xi , i = 1, . . . , Nm, are at a regular grid, a MRF as in equation (29) can also be used for theprior distribution of the underlying process ψ on this grid [19]. The advantage of this MRFconvolution model over the standard GP convolution model is that the spatial dependenceof this process not only depends on the function form of the kernel but is also affected bythe smoothness of the MRF. Therefore, it can be easily fitted by data with a more complexcorrelation structure. It is noted that equation (31) is a special case of equation (29) whenthe components are independent of each other, i.e. W = I. For the applications in this paper,g(x) is chosen to be a mean zero Gaussian kernel. It is also noted that the standard deviationof the kernel is usually equal to the distance between adjacent locations of the kernels and thelattice grid of the kernel locations is usually larger than the space of the observed data in orderto eliminate inaccurate results near the boundaries of the domain [35].

In order to keep the notation consistent, the unknown vector ψ is also denoted as m. Thenwith the likelihood (9) and prior distribution (29), the PPDF can be written as

p(d|m) ∝ (σ 2)−n/2exp

(−‖u(m) − d‖2

2

2σ 2

)λN/2 exp

{−λ

2mT Wm

}. (32)

The maximum a posteriori (MAP) estimate of m can be derived as

mmap = argminm

{‖u(m) − d‖22 + λσ 2mT Wm

}. (33)

It is seen that the MAP estimator has a mathematical form similar to the estimate obtainedusing Tikhonov regularization [8, 9]. However, unlike the deterministic methods where theregularization parameter needs to be chosen a priori, we can allow here an automatic selectionof the regularization parameter through a hierarchical Bayesian formulation.

4.2. Hierarchical Bayesian formulation and MCMC

In this study, a gamma distribution and an inverse gamma distribution are chosen for λ and σ 2

respectively. Then a hierarchical Bayesian posterior distribution can be computed as follows:

p(m, λ, σ |d) ∝ (σ 2)−n/2exp

(−‖u(m) − d‖2

2

2σ 2

)· λN/2 exp

{−λ

2mT Wm

}(34)

· λα1−1exp(−β1λ)1

(σ 2)α2+1exp

(− β2

σ 2

),

where (α1, β1) and (α2, β2) are the parameters of the gamma distribution p(λ) ∝ λα−1 e−βλ

and the inverse gamma distribution p(σ 2) ∝ (σ 2)−(α+1) e−β/σ 2, respectively. The PPDF

11


p(m, λ, σ |d) is then sampled using the following hybrid of the Metropolis–Hastings andGibbs algorithms [36]:

Algorithm II

(i) Initialize m(0) and σ (0)

(ii) For i = 0 : Nmcmc − 1

– sample u ∼ U(0, 1),– sample m(∗) ∼ q(m(∗)|m(i)),

– if u < A(m(∗), m(i)) = min{1,

p(m(∗),σ (i)|d)q(m(i)|m(∗))

p(m(i),σ (i)|d)q(m(∗)|m(i))

}m(i+1) = m(∗),

– elsem(i+1) = m(i),

– sample λ(i+1) ∼ p(λ|m(i+1), σ (i)),– sample σ (i+1) ∼ p(σ 2|m(i+1), λ(i+1)).

The full conditionals p(λ|m(i+1), σ (i)) and p(σ 2|m(i+1), λ(i+1)) can be easily derived as

p(λ|m(i+1), σ (i)) ∼ G

(N

2+ α1,

1

2mT Wm + β1

)(35)

p(σ 2|m(i+1), λ(i+1)) ∼ IG

(n

2+ α2,

1

2‖u(m) − d‖2

2 + β2

), (36)

which are gamma and inverse gamma distributions, respectively.It is again emphasized that evaluation of the acceptance ratio A requires computing the

likelihood for the proposed move m(∗); thus in the classical Bayesian approach, it requiresrunning a forward problem at m(∗). However, by using the stochastic surrogate model u(m),the computation of the likelihood is accelerated by several orders of magnitude as will beshown in the following numerical examples.

5. Numerical examples

5.1. Example 1: 2D heat source inversion

In the first example, we want to demonstrate the accuracy of the adaptive sparse grid collocationmethod to construct an accurate low complexity surrogate model for Bayesian likelihoodcalculation. Therefore, we consider a simple heat source inversion problem on the domainD = [0, 1] × [0, 1] with adiabatic boundaries:

∂T

∂t= ∇2T + e−t s

2πτ 2exp

(−|x − m|2

2τ 2

), (37)

∇T · n = 0, (38)

T (x, 0) = 0, (39)

where we prescribe s = 5.0 and τ = 0.2 and leave the source location m = (m0,m1)

unknown. The forward model is solved using the finite element method with a time step0.001. Simulation data are generated by adding independent random noise N(0, σ 2) to thedeterministic simulation results at a 5 × 5 sensor network. At each sensor location, twomeasurements are taken at time t = 0.05 and t = 0.1, which corresponds to a total of

12


50 measurements. Therefore, we try to infer the source location from these noisy temperaturemeasurements. The data are obtained on a finer mesh with an 80 × 80 finite element gridand the stochastic surrogate model is constructed on a 40 × 40 grid. A similar example issolved in [26], where the SGM is utilized to construct the surrogate model by employing thePC expansion.

In this example, we first take the priors to be mi ∼ U(0, 1) uniform distribution toconstrain the source to lie in the domain. The noise level σ is assumed random and unknown.Thus, the PPDF for this example is as follows:

p(m, σ |d) ∝ (σ 2)−n/2exp

(−‖u(m) − d‖2

2

2σ 2

)1

(σ 2)α+1exp

(− β

σ 2,

). (40)

5.1.1. Solution to the stochastic forward problem. In this subsection, we construct theinterpolant of the forward solution by ASGC, which then serves as a low-complexity surrogatemodel in the Bayesian likelihood calculation. The accuracy of this method is also investigated.

Here, the exact source location is taken as (0.5, 0.5). At this stage, measurement data donot enter the inference procedure. Therefore, we can construct the surrogate model off-line andsave it for future use. The stochastic forward problem is solved in the prior space � = [0, 1]2

by ASGC.The accuracy and convergence of the forward solution can be interrogated in many ways.

Figure 2 shows the surface response of the temperature as a function of the source locationat the point x = (0.5, 0.5) in the physical domain and t = 0.05 for different thresholds ε.The surface response plot is obtained by choosing points on a 100 × 100 grid in the space� and computing the function value u(m) at each point using equation (28). Convergenceis observed for decreasing ε. The temperature is smooth over �. However, as seen in thecorresponding sparse grid in figure 3 with ε = 10−2, the sparse grid is refined at the centerand the four corners. This is consistent with the surface plot of the solution in figure 2 wheresteep gradients indeed occur in these regions.

The PDF of the stochastic solution is a useful diagnostic for forward uncertaintypropagation. The result is always compared with that from direct Monte Carlo (MC)simulation. The direct MC method is to sample m from � randomly and solve the forwarddeterministic problem for each sample. Then the corresponding PDF is constructed from thehistogram distribution of the collection of the forward model outputs. For the AGSC method,instead, one can sample m and substitute it into equation (28) to calculate the approximatesolution, again forming a histogram of the resulting values. The resulting PDF is shown infigure 4 at the same location (x = 0.5, y = 0.5) in the physical domain and two successivetimes: t = 0.05 and t = 0.1. As in figure 2, the probability density converges to thecounterpart obtained by direct computation with decreasing ε. From the above discussion,it is now clear that the solution obtained by the ASGC method indeed converges to the truedirect simulation.

Next, we examine the accuracy of the surrogate likelihood in equation (13) for noisy data.The synthetic noisy data are generated by

di = Fi(m) + ζi, i = 1, . . . , n, (41)

where ζi is a Gaussian random variable N(0, σ 2) and the noise level σ = 0.05. Figure 5shows the contours of the likelihood with 5% noise in the data. The contours are plotted ona 100 × 100 grid in � where the surrogate likelihood value is computed at each grid pointas a function of m. With ε = 10−3, excellent agreement between the direct computationby the FEM and the surrogate likelihood obtained by ASGC is observed. To further assess

13


(a) (b)

(c) (d)

Figure 2. Stochastic forward problem solution shown as a surface response on the prior supportfor different thresholds. (a) ε = 10−1, (b) ε = 5 × 10−2, (c) ε = 10−3, (d) ε = 10−4.

the accuracy of the surrogate likelihood, we compute the Kullback–Leibler (KL) divergencebetween the exact posterior p(m|d) by the FEM and the surrogate posterior p(m|d) by ASGC.The KL divergence is a quantitative assessment of the error for two probability densities andis defined as

DKL(p|p) =∫

p(m) logp(m)

p(m)dm. (42)

Figure 6 plots DKL(p|p) for the surrogate posterior with decreasing ε. In terms of theKL divergence, an algebraic convergence rate of the surrogate posterior to the exact one isobserved. When ε = 10−3, the KL divergence between the two densities is 1.005 × 10−4.Therefore, we can conclude that the surrogate model is nearly the same as the exact one whenε = 10−3.

5.1.2. Solution to the inverse problem. We are now ready to use the surrogate modelfor accelerating the Bayesian inference approach on the source location inverse problem.According to the discussion from the last section, a threshold ε = 10−3 is accurate enoughto construct the surrogate model. The hierarchical Bayesian formulation (40) is adoptedfor simultaneous estimation of the source location m and noise level σ . The initial guessfor m and σ is (0, 0) and 1.0, respectively. The pair of parameters (α, β) for the inversegamma distribution is (1 × 10−3, 1 × 10−3). Here, we choose a rather diffusive prior forσ since no information is known a priori. The proposal distribution q(m(∗)|m(i)) for the

14


m0

m1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Figure 3. Adaptive sparse grid with ε = 10−2. The dots denote the location of the interpolationpoints.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.5

1

1.5

2

2.5

3

T

PD

F

Direct

ε = 10-1

ε = 10-2

ε = 10-3

0 0.1 0.2 0.3 0.4 0.5 0.60

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

T

PD

F

Direct

ε = 10-1

ε = 10-2

ε = 10-3

Figure 4. Probability density function of T (x = 0.5, y = 0.5) at two successive times. Left:t = 0.05 and right: t = 0.1.

Metropolis–Hastings step in the hybrid Algorithm II is a random walker sampler:

q(m(∗)|m(i)) ∝ exp

(−‖m(∗) − m(i)‖2

2

2σ 2m

), (43)

with a suitable standard deviation σm = 0.01 unless otherwise specified. The length of eachMarkov chain NMCMC is 30 000 and only the last 20 000 realizations are used to compute therelevant statistics.

It is important to check the convergence and mixing of the chain before analyzing theresult [20]. The simplest way is by visualizing the trace plots of the chain. Results in

15


m0

m1

0.2 0.3 0.4 0.5 0.6 0.7 0.80.2

0.3

0.4

0.5

0.6

0.7

0.8

m0

m1

0.2 0.3 0.4 0.5 0.6 0.7 0.80.2

0.3

0.4

0.5

0.6

0.7

0.8

m0

m1

0.2 0.3 0.4 0.5 0.6 0.7 0.80.2

0.3

0.4

0.5

0.6

0.7

0.8

( )a ( )b

( )c

Figure 5. Contours of the likelihood function with 5% noise in the data. Solid lines are obtainedvia FEM and dashed lines are obtained via the ASGC method: (a) ε = 10−1, (b) ε = 10−2,(c) ε = 10−3.

figure 7 show the 2D and 1D trace plots of the Markov chain. The 2D view is reminiscentof the likelihood contour shown in figure 5, which implies that the chain moves around mostof the posterior state space. Visual inspection of the 1D trace plot suggests that the chainmixes very well for the last 20 000 samples, which implies that the samples are indeed fromthe stationary distribution p(d|m) of the Markov chain.

Careful design of the proposal distribution q(m(∗)|m(i)) significantly affects the qualityof the samples forming the chain. The autocorrelation function (ACF) γ (s) is another usefultool to assess the convergence of the chain. If σm is too large, the chain easily moves outof the posterior support. Most of the proposal samples will be rejected resulting in a verylong correlation. On the other hand, if σm is too small, most of the proposal samples will beaccepted and the chain can only move around a small portion of the posterior space and mixpoorly. A suitable σm should result in a fast decay of the autocorrelation with a lag along thechain. As shown in figure 8, when there is 5% noise in the data, with the choice of σm = 0.01,the autocorrelation decays to zero at a very small lag, which is consistent with the good mixingof the chain observed in figure 7.

Tables 1 and 2 summarize the numerical results with various levels of noise σ in the datagiven by direct FEM computation and using ASGC to calculate the likelihood, respectively.In the tables, ∗ denotes the posterior mean of the unknown parameters. The acceptance rate

16


ε

D

10-3

10-2

10-1

10-4

10-3

10-2

Figure 6. DKL(p|p) of the surrogate posterior with decreasing ε.

0.46 0.48 0.5 0.52 0.540.45

0.47

0.49

0.51

0.53

0.55

m0

m1

0 0.5 1 1.5 2

x 104

0.45

0.47

0.49

0.51

0.53

0.55

k

m0(k

)

Figure 7. 2D (left) and 1D (right) trace plots of the Markov chain with 5% noise in the data byASGC.

for a single chain is about 0.314, where a desirable value for MCMC is between 0.2 and 0.4,which again verifies the good mixing of the chain. However, the standard deviation for theproposal distribution is 0.004 with 1% noise in the data. This is because the posterior supportin this case is smaller than other cases. Thus, a smaller σm is chosen and the acceptance rateis 0.295, a rather reasonable value. The numerical results obtained by the two approaches arenearly the same, which is consistent with the discussion from the last section that the surrogatemodel is accurate enough to capture all the prior uncertainty. However, the computationaltime required by the ASGC method is only a small fraction of that by direct FEM analysis.In ASGC, most of the time is spent on constructing the surrogate model. It is noted that this

17


lag s

AC

Fγ(

s)

0 20 40 60 80 100-0.2

0

0.2

0.4

0.6

0.8

1

1.2

σm

= 0.003σ

m= 0.01

σm

= 0.5

Figure 8. The autocorrelation function at lag s for various proposal samplers with 5% noise in thedata.

Table 1. The numerical results given by direct FEM computation.

σ σ m0 σm0 m1 σm1

1 × 10−2 1.23 × 10−2 0.499 1.84 × 10−3 0.500 1.83 × 10−3

3 × 10−2 3.19 × 10−2 0.503 5.30 × 10−3 0.498 5.26 × 10−3

5 × 10−2 5.24 × 10−2 0.502 8.92 × 10−3 0.503 8.71 × 10−3

1 × 10−1 9.71 × 10−2 0.486 1.66 × 10−2 0.508 1.64 × 10−2

Table 2. The numerical results given by the ASGC method.

σ σ m0 σm0 m1 σm1

1 × 10−2 1.13 × 10−2 0.499 1.87 × 10−3 0.500 1.86 × 10−3

3 × 10−2 3.18 × 10−2 0.503 5.25 × 10−3 0.498 5.39 × 10−3

5 × 10−2 5.24 × 10−2 0.501 8.78 × 10−3 0.503 8.78 × 10−3

1 × 10−1 9.67 × 10−2 0.486 1.60 × 10−2 0.508 1.57 × 10−2

process is embarrassingly parallel which only needs 84.18 s on 20 nodes on our in-house Linuxcluster and it took only 26.9 s for a single MCMC chain with a length of 30 000 on a singleprocessor. On the other hand, the FEM takes nearly 121890.75 s ≈ 34 h on one processorsince we have to solve the deterministic forward problem for each proposal move sequentially.The number of collocation points (thus the number of direct computations needed) in thesparse grid is 1081. However, the direct FEM computation for the likelihood needs 30 000evaluations. Therefore, the surrogate model achieves a hundred orders of magnitude speed-up.

18


0.47 0.48 0.49 0.5 0.51 0.52 0.530

50

100

150

200

250

m1

p (

m1)

σ = 1%

σ = 5%

0.03 0.04 0.05 0.06 0.07 0.080

10

20

30

40

50

60

70

80

σ

p (

σ )

Figure 9. The posterior marginal densities. Left: m1 with 1% and 5% noise in the data. Right: σ

with 5% noise in the data.

Table 3. The numerical results with four other source locations given by ASGC.

m m0 σm0 m1 σm1

(0.1, 0.9) 0.104 4.46 × 10−3 0.897 4.46 × 10−3

(0.75, 0.25) 0.753 2.75 × 10−3 0.250 2.77 × 10−3

(0.0092, 0.4837) 0.00913 2.20 × 10−3 0.4837 2.73 × 10−3

(0.449, 0.804) 0.449 1.87 × 10−3 0.800 3.15 × 10−3

From the tables, it can be seen that the posterior mean of the source location m is inexcellent agreement with the exact solution with up to 10% noise in the data. However, theposterior mean becomes less accurate and the standard deviation of the samples (σm0 , σm1)

increases with an increasing noise level, which indicates that the variation in the posteriorsolution becomes larger. This can be verified from the marginal distribution of the componentof the unknowns via kernel density estimation using the Gaussian kernel [20]. Figure 9 showsthe posterior marginal densities for m1 and σ . Although the prior distribution is only uniform,the posterior density of m1 greatly refines the prior distribution. It is seen that the range ofdistribution m1 is much larger with 5% noise, whereas it is much more concentrated around theexact value with 1% noise. It is also interesting to note that the posterior density of the noiselevel σ concentrates around the exact value, although the prior density contains no informationexcept for enforcing the non-negativity. Its posterior mean is 5.24 × 10−2 which is quite closeto the exact value 0.05. From the tables, it can be seen that the current Bayesian hierarchicalformulation can successfully detect the noise levels.

Besides the great computational savings, another advantage of using this method is that itis reusable when there are new measurements coming after constructing the surrogate model.Table 3 shows the results for four other source locations: (0.1, 0.9), (0.75, 0.25) and two otherrandom draws from the prior uniform distribution with 1% noise in the data. The surrogatemodel is the same as before with ε = 10−3. It can be seen that no matter where the sourcelocation is, the method can always infer the exact value without performing any additionaldirect FEM computation as in the traditional MCMC method. In addition, the computationaltime for the solution of the inverse problem does not change since the same surrogate modelis used.

19


Table 4. The maximum error with different ε and noise level.

ε 1% noise 3% noise 5% noise 10% noise

1 × 10−1 5.50 × 10−3 4.58 × 10−3 6.90 × 10−3 2.14 × 10−2

5 × 10−2 1.51 × 10−3 1.44 × 10−3 6.56 × 10−3 1.99 × 10−2

1 × 10−2 2.87 × 10−4 2.40 × 10−3 7.33 × 10−3 1.91 × 10−2

1 × 10−3 1.00 × 10−4 1.99 × 10−3 7.90 × 10−3 2.05 × 10−2

Table 5. The numerical results with the MRF prior using ASGC.

σ σ λ m0 σm0 m1 σm1

1 × 10−2 1.13 × 10−2 1.607 0.499 1.86 × 10−3 0.500 1.81 × 10−3

3 × 10−2 3.19 × 10−2 1.598 0.502 5.24 × 10−3 0.498 5.45 × 10−3

5 × 10−2 5.25 × 10−2 1.612 0.500 9.01 × 10−3 0.503 8.66 × 10−3

1 × 10−1 9.68 × 10−2 1.605 0.486 1.58 × 10−2 0.507 1.57 × 10−2

It is also interesting to compare the results with respect to different ε and noise level.The true source location is chosen at (0.2, 0.8). The results are shown in table 4. In thistable, the error is defined as the maximum error between the posterior mean of (m0, m1)

and the true source locations. From the table, it is seen that for a given ε, the accuracy isaffected significantly by the measurement error. Surprisingly, for a given noise level, even abig threshold can give us rather accurate results. However, for a very small noise level of 1%,a small ε can give better results. This is possibly due to error cancellations between the errorof the surrogate model and the measurement error. When the measurement is close to the truedata, we need a very accurate surrogate model, hence a small ε.

We next demonstrate that the accuracy of the method is not affected by the prior distributionprovided the prior space is large enough to contain the posterior space. The hierarchicalformulation (34) is used, where the prior is the MRF. Since there are only two unknowns, thematrix W = I. Thus, the prior space is a two-dimensional unbounded space. There is no needto search the solution in such a large space. In the example here, it is obvious that the unitsquare contains the posterior space. So, we truncate the unbounded space to the unit squareand we do not need to perform the calculation using ASGC once again, since the prior spaceremains unchanged. The true source location is still (0.5, 0.5). Other parameters remain thesame, except that the pair parameter for the gamma distribution is (1.0, 1.0) and the initialvalue for λ is 10. The result is shown in table 5. It is seen that we obtain nearly the sameresult as in table 2 where the prior was a uniform distribution. It is interesting to note thatthe posterior mean of λ is nearly the same for all four noise levels which suggests that theautomatic selection of the regularization parameter is rather optimal. Therefore, the newlydeveloped method is indeed an accurate and efficient alternative to the direct FEM computationfor computationally expensive nonlinear inverse problems.

In order to numerically verify the comments in remark 1, we conduct several computationswith different sizes of prior space �. Both uniform and MRF priors are used. The surrogatemodel is constructed with ε = 10−2 and 5% noise in the data. The true source location is chosenas (0.2, 0.8). The results are summarized in table 6. As long as the prior space contains theposterior space, we can always obtain accurate posterior mean estimation. Increasing � doesnot affect the accuracy but it affects the computational cost. The number of collocation points

20


Table 6. Numerical results obtained with a different prior space �.

Uniform prior MRF prior

� m0 m1 m0 m1

[−1, 1]2 0.2076 0.8032 0.2072 0.8028[−2, 2]2 0.2078 0.8029 0.2074 0.8025[−3, 3]2 0.2077 0.8033 0.2075 0.8033

increases with larger � and also the burn-in length of the MCMC chain is elongated. Therefore,it is important to choose an appropriate space which balances accuracy and computational cost.

In [26], the SGM was used to solve the same problem. As in this work, they defined aninitial distribution of uncertainty that is propagated through the forward model. For a uniforminitial distribution, errors in the source inversion problem show little dependence on the sourcelocation. When the initial distribution was Gaussian, the source locations near the boundaryresulted in larger errors. However, through this example, it is shown that the accuracy of theASGC method is not affected by the source location and the choice of the prior distribution aslong as the prior space is large enough.

5.2. Example 2: permeability estimation

In this example, we illustrate the Bayesian inference approach on the nonlinear inverseproblem of estimating the permeability field. To estimate the permeability from flowdata, a deterministic forward model G(m) that relates permeability to pressure and velocityis indispensable. The pressure and velocity are characterized by the following set ofdimensionless equations:

∇ · u(x) = f (x), (44)

u(x) = −k(x)∇p(x), (45)

where f (x) denotes the source/sink term. The schematic of the domain of interest is aquarter-five spot problem in a unit square D = [0, 1]2. Flow is driven by an injection wellat the left bottom corner of the domain and a production well at the top right corner. Toimpose the non-negativity of the permeability, from now on we will treat the logarithm of thepermeability as the main unknown of our inverse problem.

To generate simulation data, we consider smooth permeability of the following form:

log k(x, y) = 2 ∗ (x − 0.5) + 2 ∗ (y − 0.5). (46)

Pressure is measured at a 5 × 5 sensor network, and 5% noise is added. A similar problem hasbeen studied in [13] by using tracer breakthrough time as the given data. The inverse problemthus consists of inferring the true permeability from these pressure measurements. A mixedfinite element method is utilized to solve the forward problem [37]. The data are generated ona finer grid with 80 × 80 discretization, whereas the inverse solution is obtained on a 40 × 40grid.

5.2.1. Locations of the kernels are fixed. As discussed in section 4.1, in order to eliminatethe edge effects, we choose a 5 × 5 lattice in [−0.5, 1.5]2 as kernel locations to constructthe discrete process convolution model which corresponds to a total of 25 kernels, where theGaussian kernel N(0, 0.52) is used. Therefore, the background process in equation (30) at

21


x

y

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.81.41

0.60.2

-0.2-0.6-1

-1.4-1.8

x

y

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.81.410.60.2

-0.2-0.6

-1-1.4-1.8

Figure 10. True (left) and posterior mean (right) of the log-permeability field with 5% noise in thedata.

each lattice site can be modeled by the MRF as discussed in section 4.1. The dimension ofthe stochastic prior space is 25 and the surrogate model is constructed in the parameter space� = [−1, 1]25, which is shown to be big enough to contain the posterior state space. A 25-dimensional ASGC method with threshold ε = 0.1 is used that is accurate enough to representthe solution. The refinement of the adaptive sparse grid automatically stops at level 7 whichhas 150 322 collocation points. However, the corresponding number of collocation pointsfor a conventional sparse grid is 199 876 961, which is computational rather than difficult tosolve. Thus, the advantage of using ASGC is obvious.

The pair parameters (α1, β1) and (α2, β2) for the gamma and inverse gamma distributionsare (0.1, 10) and (1 × 10−3, 1 × 10−3), respectively. The initial values for λ and σ are 10 and1, respectively. A random walker sampler (equation (43)) is used as a proposal distributionwith σm = 0.01. In order to increase the acceptance ratio, instead of updating all of theunknowns at once, we update five of them at a time. So there are five updates within eachMCMC iteration. The initial value for each component is 0.5. By monitoring the value of thelikelihood in several MCMC chains, it is shown that a Markov chain of length 50 000 is longenough for the chain to converge. Thus, the last half of the samples from the chain are usedfor computing statistics.

The posterior mean of the log permeability is shown in figure 10. It is seen that theposterior mean captures all the typical features of the true log-permeability field and matchesthe true profile very well. This is because the MRF prior provides sufficient regularizationto the solution, and the automatic selection of the parameter through hierarchical Bayesian israther optimal which is shown in figure 11. λ has a wide range of variabilities which indicatesits adjustment with the proposed samples. The posterior mean of the standard deviation σ is5.90 × 10−2, which is quite close to the exact value 5 × 10−2. Thus, the process convolutionmodel based on the MRF indeed provides sufficient regularity to the solution and fits the datavery well.

The result with 1% noise in the data is also given in the left plot of figure 12. Comparedwith the result of 5% noise, the shape of the true profile is much better inferred. The posteriormean of the noise level is 1.54 × 10−2, which is also close to the exact value. The posteriormean and quantiles for each ψi are given in figure 13 for the two noise levels. The quantileis an indication of the range of the highest probability region of the posterior state space.

22


0.01 0.03 0.05 0.07 0.09 0.11 0.130

5

10

15

20

25

30

35

40

45

σ

p(σ

)

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

1.4

λ

p(λ

)

Figure 11. Posterior density of σ (left) and λ (right) with 5% noise in the data.

x

y

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.81.410.60.2

-0.2-0.6

-1-1.4-1.8

x

y

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.81.4

10.60.2

-0.2-0.6-1-1.4

-1.8

Figure 12. Posterior mean of log permeability with 1% noise in the data. Left: 5 × 5 sensor grid,Right: 9 × 9 sensor grid.

When the noise level in the data is large, the 0.05/0.95 quantiles are far from the mean whichreflects the wide variability in the unknowns. On the other hand, a small noise level resultsin rather tight probability bounds which means that we have much more confidence on thetrue value of the unknowns based on the data. We also increase the number of sensors,which is shown in the right plot of figure 12. It is noticed that some details are added,especially on the top right corner, and the result is more close to the true profile. It is alsonoticed that if the locations of the kernels are too far apart, then the convolution model maynot smooth the process effectively. However, once reasonable saturation has been reached,there is no need to add more kernels to the model since there is almost no difference in theresults [35].

The computational time for constructing the stochastic surrogate model using ASGC wasabout 1.6 h using 80 processors in our in-house Linux cluster. It takes about 8.9 h to run asingle MCMC chain on one processor. On the other hand, using the direct FEM method tocompute the likelihood will require 5 × 50 000 = 250 000 solutions to the forward problem,which is expected to take much more time since each problem runs one after another on one

23


0 5 10 15 20 25-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Kernel Index

ψ

+ 0.05/0.95 quantiles

* posterior mean

0 5 10 15 20 25-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Kernel Index

ψ

+ 0.05/0.95 quantiles

* posterior mean

Figure 13. Plot of the posterior marginal of ψi for each kernel with 5% (left) and 1% (right) noisein the data.

processor due to the nature of the MCMC algorithm. Although in such a high-dimensionalproblem, ASGC still requires 150 322 evaluations of the forward problem, this process takesplace in parallel.

5.2.2. Locations of the kernels are random. We want to next investigate the effect of therandom kernel locations. It is still assumed that the shape of the kernels is the same, i.e.N(0, 0.52). However, the locations of the kernels are assumed random this time. Since thelocations of the background processes are not at a regular lattice now, we cannot use theMRF as the prior distribution for the background processes. Instead, we only assume thatthey are independent Gaussian random variables in equation (31) as in the original definitionof the process convolution model. Each kernel has three degrees of freedom, i.e. ψi andx, y locations. First, we consider four kernels in the domain. The parameter space for thebackground processes is chosen as [−3, 3]4 and the kernel locations are chosen as [0, 1]8.Thus, the stochastic prior space is � = [−3, 3]4 × [0, 1]8, which is a 12-dimensional problem.A level 6 ASGC method with threshold ε = 0.1 is used that is accurate enough to representthe solution to the stochastic forward problem. The pair parameters (α1, β1) and (α2, β2) forthe gamma and inverse gamma distributions are (1, 1) and (1 × 10−3, 1 × 10−3), respectively.The initial values for λ and σ are 10 and 1, respectively. The random walker sampler isused as proposal distribution with σm = 0.005. The posterior mean of the log permeabilityis shown in the left plot of figure 14, which is obtained from the last 1 × 105 realizations ofa Markov chain of length 2 × 105. The posterior mean of the locations of the four kernelsare at (0.51, 0.21), (0.34, 0.34), (0.95, 0.94) and (0.022, 0.025), which tend to lie along thediagonal of the domain. It is seen that the posterior mean captures most of the characteristicsof the true permeability. Although the result is not as good as in the case when the locationsare fixed a priori, it makes more sense since in engineering applications we may have littleinformation about the true permeability profile and cannot determine the number and locationof the kernels in advance. The posterior mean of σ is 2.65 × 10−2. Due to the complexityof the problem, it took about 24 h to complete this simulation. Direct FEM simulation is notconducted in this case since it is expected to be much more expensive.

In order to study the convergence of the results, we also used five kernels. Thestochastic prior space is � = [−3, 3]5 × [0, 1]10, which is a 15-dimensional problem. Alevel 6 ASGC method with threshold ε = 0.1 is also used. The results are shown in the

24


x

y

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.8

1.4

1.2

10.8

0.6

0.4

0.20

-0.2

-0.4-0.6

-0.8

-1

-1.2-1.4

-1.6

-1.8

x

y

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.8

1.61.4

1.2

10.8

0.6

0.40.2

0

-0.2-0.4

-0.6

-0.8-1

-1.2

-1.4

-1.6-1.8

Figure 14. Posterior mean of the log-permeability field with 1% noise in the data. Left: 4 kernels.Right: 5 kernels.

right plot of figure 14. The posterior means of the locations of the five kernels are at(0.38, 0.49), (0.86, 0.86), (0.18, 0.18), (0.037, 0.033) and (0.4, 0.9). It is seen that moredetails of the permeability are added in this case and the computed field matches the trueprofile very well, except at the two corners. It is clear that having some kernels outside thedomain can certainly eliminate such edge effects and using the MRF can provide additionalregularity to the solution.

6. Conclusions

A method for accelerating Bayesian inference is introduced based on the use of an adaptivesparse grid collocation technique to construct a stochastic surrogate model. This methodreformulates the deterministic forward problem as a stochastic one and constructs theinterpolation of the solution to this stochastic problem over the prior support space whichis assumed large enough to capture the posterior uncertainty. Thus, instead of solving thedeterministic forward problem within each MCMC iteration, we only need to evaluate amuch less expensive approximation for each proposed move. Two numerical examples areconducted to show the accuracy of this method in the solution to Bayesian nonlinear inverseproblems. The numerical examples considered indicated that the ASGC method is much fasterthan the direct FEM computation. Furthermore, once the surrogate model for the likelihoodis constructed, it can be used with no modifications whenever new data arrive.

The MRF prior was used to impose regularization on the solution. As long as the priorspace includes the support of the posterior, the method provides very good results with thecomputational cost increasing with the size of the prior space. Through numerical examples,it was shown that the accuracy of the results depends more on the data noise than on thethreshold ε. Small ε results in an accurate surrogate at the expense of having more collocationpoints. Therefore, it is crucial to choose appropriate � and ε. However, there are also somelimitations of this method. ASGC relies exclusively on a bounded input support. If theposterior distribution has infinite support, then the present method is not directly applicable.In addition, if the dimension of the stochastic space is very high, then the efficiency of themethod depends on the regularity of the stochastic prior space [31]. However, when the ASGCmethod is applicable, it is much faster than using the classical MCMC method.

25


Acknowledgments

This research was supported by the Computational Mathematics program of AFOSR (grantF49620-00-1-0373) and by the Computational Mathematics program of the NSF (award DMS-0809062). The computational work was supported by an allocation through the Tera GridAdvanced Support Program.

References

[1] Tikhonov A N 1985 Solution of Ill-Posed Problems (Washington: Halsted Press)[2] Sampath R and Zabaras N 1988 A functional optimization approach to an inverse magneto-convection problem

Num. Heat Transfer 13 527–33[3] Emery A F, Nenarokomov V A and Fadale T D 2000 Uncertainties in parameter estimation: the optimal

experimental design Int. J. Heat Mass Transfer 43 3331–9[4] Velamur Asokan B and Zabaras N 2004 Stochastic inverse heat conduction using a spectral approach Int. J.

Num. Methods Eng. 60 1569–93[5] Jin B and Zou J 2008 Inversion of Robin coefficient by a spectral stochastic finite element approach J. Comput.

Phys. 227 3282–306[6] Zabaras N and Ganapathysubramanian B 2008 A scalable framework for the solution of stochastic inverse

problems using a sparse grid collocation approach J. Comput. Phys. 227 4697–735[7] Kaipio J and Somersalo E 2005 Statistical and Computational Inverse Problems (New York: Springer)[8] Calvetti D and Somersalo E 2007 Introduction to Bayesian Scientific Computing: Ten Lectures on Subjective

Computing (New York: Springer)[9] Wang J and Zabaras N 2004 A Bayesian inference approach to the stochastic inverse heat conduction problem

Int. J. Heat Mass Transfer 47 3927–41[10] Wang J and Zabaras N 2005 Using Bayesian statistics in the estimation of heat source in radiation Int. J. Heat

Mass Transfer 48 15–29[11] Wang J and Zabaras N 2005 Hierarchical Bayesian models for inverse problems in heat conduction Inverse

Problems 21 183–206[12] Wang J and Zabaras N 2006 A Markov random field model to contamination source identification in porous

media flow Int. J. Heat Mass Transfer 49 939–50[13] Lee H K, Higdon D M, Bi Z, Ferreira M A R and West M 2002 Markov random field models for high-dimensional

parameters in simulations of fluid flow in porous media Technomerics 44 230–41[14] Kaipio J, Kolehmainen V, Somersalo E and Vauhkonen M 2000 Statistical inversion and Monte Carlo sampling

methods in electrical impedance tomography Inverse Problems 16 1487–522[15] Besag J, Green P, Higdon D and Mengersen K 1995 Bayesian computation and stochastic systems Stat.

Sci. 10 3–41[16] Efendiev Y, Hou T and Luo W 2006 Preconditioning Markov chain Monte Carlo simulations using coarse-scale

models SIAM J. Sci. Comput. 28 776–803[17] Marzouk Y M and Najm H N 2009 Dimensionality reduction and polynomial chaos acceleration of Bayesian

inference in inverse problems J. Comput. Phys. at press doi:10.1016/j/jcp.2008.11.024[18] Higdon D 2002 Space and space-time modeling using process convolutions Quantitative Methods for Current

Environmental Issues ed C Anderson, V Barnett, P C Chatwin and A H El-Shaarawi (London: Springer-Verlag) pp 37–56

[19] Lee H, Holloman C H, Calder C A and Higdon D 2002 Flexible Gaussian processes via convolution Instituteof Statistics and Decision Sciences Technical Report 02–09, Duke University

[20] Gilks W R, Richardson S and Spiegelhalter D J 1996 Markov Chain Monte Carlo in Practice (London:Chapman and Hall)

[21] Jin B 2008 Fast Bayesian approach for parameter estimation Int. J. Numer. Methods Eng. 76 230–52[22] Balakrishnan S, Roy A, Ierapetritou M G, Flach G P and Georgopoulos P G 2003 Uncertainty reduction and

characterization for complex environmental fate and transport models: an empirical Bayesian frameworkincorporating the stochastic response surface method Water Resour. Res. 39 1350–62

[23] Isukapalli S S, Roy A and Georgopoulos P 1998 Stochastic response surface methods for uncertaintypropagation: application to environmental and biological systems Risk Anal. 18 351–63

[24] Ghanem R and Spanos P 1991 Stochastic Finite Elements: A Spectral Approach (New York: Springer Verlag)[25] Xiu D and Karniadakis G E 2002 The Wiener–Askey polynomial chaos for stochastic differential equations

SIAM J. Sci. Comput. 24 619–44

26

http://dx.doi.org/10.1016/S0017-9310(99)00378-6

http://dx.doi.org/10.1002/nme.1015

http://dx.doi.org/10.1016/j.jcp.2007.11.042


http://dx.doi.org/10.1016/j.ijheatmasstransfer.2004.02.028


http://dx.doi.org/10.1088/0266-5611/21/1/012


http://dx.doi.org/10.1198/004017002188618419

http://dx.doi.org/10.1088/0266-5611/16/5/321

http://dx.doi.org/10.1214/ss/1177010123

http://dx.doi.org/10.1137/050628568

http://dx.doi.org/10.1002/nme.2319

http://dx.doi.org/10.1029/2002WR001810

http://dx.doi.org/10.1111/j.1539-6924.1998.tb01301.x

http://dx.doi.org/10.1137/S1064827501387826


[26] Marzouk Y M, Najm H N and Rahn LA 2007 Stochastic spectral methods for efficient Bayesian solution ofinverse problems J. Comput. Phys. 224 560–86

[27] Ganapathysubramanian B and Zabaras N 2007 Sparse grid collocation schemes for stochastic natural convectionproblems J. Comput. Phys. 225 652–85

[28] Xiu D and Hesthaven J S 2005 High order collocation methods for the differential equation with random inputsSIAM J. Sci. Comput. 27 1118–39

[29] Nobile F, Tempone R and Webster C 2008 A sparse grid collocation method for elliptic partial differentialequations with random input data SIAM J. Numer. Anal. 45 2309–45

[30] Smolyak S 1963 Quadrature and interpolation formulas for tensor product of certain classes of functions Sov.Math. Dokl. 4 240–3

[31] Ma X and Zabaras N 2008 An adaptive hierarchical sparse grid collocation method for the solution of stochasticdifferential equations J. Comput. Phys. at press

[32] Loeve M 1977 Probability Theory 4th edn (Berlin: Springer-Verlag)[33] Oksendal B 1998 Stochastic Differential Equations: An Introduction with Applications (New York: Springer-

Verlag)[34] Klimke A and Wohlmuth B 2005 Algorithm 847: spinterp: piecewise multilinear hierarchical sparse grid

interpolation in MATLAB ACM Trans. Math. Softw. 31 561–79[35] Ferreira M A R and Lee H 2007 Multiscale Modeling: A Bayesian Perspective (New York: Springer)[36] Andrieu C, Freitas N D, Doucet A and Gordan M 2003 An introduction to MCMC for machine learning Mach.

Learn. 50 5–43[37] Ganapathysubramanian B and Zabaras N 2009 A stochastic multiscale framework for modeling flow through

heterogeneous porous media J. Comput. Phys. 228 591–618

27



http://dx.doi.org/10.1137/040615201

http://dx.doi.org/10.1137/060663660

http://dx.doi.org/10.1145/1114268.1114275

http://dx.doi.org/10.1023/A:1020281327116

Bayesian inference using sparse grid collocation...To this end, some authors introduce a truncated...

Documents

Transcript of Bayesian inference using sparse grid collocation...To this end, some authors introduce a truncated...