Ensemble Grouping Strategies for Embedded Stochastic...

31
SIAM/ASA J. UNCERTAINTY QUANTIFICATION c 2018 Society for Industrial and Applied Mathematics Vol. 6, No. 1, pp. 87–117 and American Statistical Association Ensemble Grouping Strategies for Embedded Stochastic Collocation Methods Applied to Anisotropic Diffusion Problems * M. D’Elia , H. C. Edwards , J. Hu , E. Phipps , and S. Rajamanickam Abstract. Previous work has demonstrated that propagating groups of samples, called ensembles, together through forward simulations can dramatically reduce the aggregate cost of sampling-based uncer- tainty propagation methods [E. Phipps, M. D’Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, SIAM J. Sci. Comput., 39 (2017), pp. C162–C193]. However, critical to the success of this approach when applied to challenging problems of scientific interest is the grouping of samples into ensembles to minimize the total computational work. For example, the total number of linear solver iterations for ensemble systems may be strongly influenced by which samples form the ensemble when applying iterative linear solvers to parameterized and stochastic linear systems. In this work we explore sample grouping strategies for local adaptive stochastic collocation methods applied to PDEs with uncertain input data, in particular canonical anisotropic diffusion problems where the diffusion coefficient is modeled by truncated Karhunen–Lo` eve expansions. We demon- strate that a measure of the total anisotropy of the diffusion coefficient is a good surrogate for the number of linear solver iterations for each sample and therefore provides a simple and effective metric for grouping samples. Key words. sampling methods, stochastic collocation methods, stochastic partial differential equations, anisotropic diffusion models, forward uncertainty propagation, embedded ensemble propagation AMS subject classifications. 60H15, 60H35, 35R60, 65L60 DOI. 10.1137/16M1066324 1. Introduction. It is well known that quantifying uncertainties in computational sim- ulations has become a foundational component of modern, predictive simulation. Accord- ingly, numerous uncertainty quantification (UQ) methods have been developed and studied in the literature, including random sampling [19, 29, 37, 38, 39], stochastic collocation [1, 41, 40, 55], and stochastic Galerkin [23, 24, 56], with an emphasis on applying UQ methods to problems relevant to large-scale scientific computing. Frequently these problems exhibit high- dimensional uncertain input spaces and localized or nonsmooth behavior, which has motivated research on reducing the number of samples needed, e.g., locally adaptive sampling methods [21, 27, 52], multilevel methods that exploit a hierarchy of physical and temporal discretiza- tions [5, 6, 7, 13, 25], and methods that attempt to construct minimal or optimal uncertainty * Received by the editors March 17, 2016; accepted for publication (in revised form) July 17, 2017; published electronically January 18, 2018. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell In- ternational, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. This material is based upon work supported by the U.S. Department of Energy, Office of Science, and Office of Advanced Scientific Computing Research (ASCR). This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility. http://www.siam.org/journals/juq/6-1/M106632.html Center for Computing Research, Sandia National Laboratories, Albuquerque, NM 87185, and Livermore, CA 94551-0969 ([email protected], [email protected], [email protected], [email protected], [email protected]). 87

Transcript of Ensemble Grouping Strategies for Embedded Stochastic...

Page 1: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

SIAM/ASA J. UNCERTAINTY QUANTIFICATION c© 2018 Society for Industrial and Applied MathematicsVol. 6, No. 1, pp. 87–117 and American Statistical Association

Ensemble Grouping Strategies for Embedded Stochastic Collocation MethodsApplied to Anisotropic Diffusion Problems∗

M. D’Elia† , H. C. Edwards† , J. Hu† , E. Phipps† , and S. Rajamanickam†

Abstract. Previous work has demonstrated that propagating groups of samples, called ensembles, togetherthrough forward simulations can dramatically reduce the aggregate cost of sampling-based uncer-tainty propagation methods [E. Phipps, M. D’Elia, H. C. Edwards, M. Hoemmen, J. Hu, andS. Rajamanickam, SIAM J. Sci. Comput., 39 (2017), pp. C162–C193]. However, critical to thesuccess of this approach when applied to challenging problems of scientific interest is the groupingof samples into ensembles to minimize the total computational work. For example, the total numberof linear solver iterations for ensemble systems may be strongly influenced by which samples formthe ensemble when applying iterative linear solvers to parameterized and stochastic linear systems.In this work we explore sample grouping strategies for local adaptive stochastic collocation methodsapplied to PDEs with uncertain input data, in particular canonical anisotropic diffusion problemswhere the diffusion coefficient is modeled by truncated Karhunen–Loeve expansions. We demon-strate that a measure of the total anisotropy of the diffusion coefficient is a good surrogate for thenumber of linear solver iterations for each sample and therefore provides a simple and effective metricfor grouping samples.

Key words. sampling methods, stochastic collocation methods, stochastic partial differential equations,anisotropic diffusion models, forward uncertainty propagation, embedded ensemble propagation

AMS subject classifications. 60H15, 60H35, 35R60, 65L60

DOI. 10.1137/16M1066324

1. Introduction. It is well known that quantifying uncertainties in computational sim-ulations has become a foundational component of modern, predictive simulation. Accord-ingly, numerous uncertainty quantification (UQ) methods have been developed and studiedin the literature, including random sampling [19, 29, 37, 38, 39], stochastic collocation [1, 41,40, 55], and stochastic Galerkin [23, 24, 56], with an emphasis on applying UQ methods toproblems relevant to large-scale scientific computing. Frequently these problems exhibit high-dimensional uncertain input spaces and localized or nonsmooth behavior, which has motivatedresearch on reducing the number of samples needed, e.g., locally adaptive sampling methods[21, 27, 52], multilevel methods that exploit a hierarchy of physical and temporal discretiza-tions [5, 6, 7, 13, 25], and methods that attempt to construct minimal or optimal uncertainty

∗Received by the editors March 17, 2016; accepted for publication (in revised form) July 17, 2017; publishedelectronically January 18, 2018. Sandia National Laboratories is a multimission laboratory managed and operatedby National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell In-ternational, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contractDE-NA-0003525. This material is based upon work supported by the U.S. Department of Energy, Office of Science,and Office of Advanced Scientific Computing Research (ASCR). This research used resources of the Oak RidgeLeadership Computing Facility, which is a DOE Office of Science User Facility.

http://www.siam.org/journals/juq/6-1/M106632.html†Center for Computing Research, Sandia National Laboratories, Albuquerque, NM 87185, and Livermore, CA

94551-0969 ([email protected], [email protected], [email protected], [email protected], [email protected]).

87

Page 2: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

88 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

representations such as compressed sensing [16, 36] and tensor methods [1, 2, 3, 14, 20, 22,41, 40, 48, 55].

However, even with continued progress in this area, the fact remains that applying thesemethodologies to large-scale scientific computing problems is often prohibitively expensive dueto the very large computational cost associated with each sample evaluation.

Therefore we have recently undertaken work that further attempts to reduce computa-tional cost for sampling-based UQ methods by reducing the cost of the evaluation of eachsample. In particular we have shown [45] that performance can be substantially improvedwhen multiple samples are propagated through a computational simulation together, a tech-nique we call embedded ensemble propagation. In [45] ensembles of samples were propagatedthrough a canonical model of a stochastic isotropic diffusion equation with uncertain diffusioncoefficient by replacing all sample-dependent scalars within the simulation code1 with smallarrays. It was found that the cost of assembling and solving the resulting linear equations ofthe ensemble system was substantially smaller compared to assembling and solving each sys-tem sequentially when implemented on a variety of contemporary and emerging computationalarchitectures for several reasons:

• Sample independent data and calculations are reused across the ensemble, reducingthe aggregate computation and memory bandwidth usage.• Random memory accesses of sample-dependent quantities are replaced by contiguous

accesses of ensemble arrays, substantially reducing the aggregate cost of these accesses.• Arithmetic on ensemble arrays is mapped to fine-grained parallelism such as vector

instructions and fine-grained threads, providing better utilization of these computingresources.• The number of interprocessor communication steps in aggregate is reduced, reducing

the overall communication cost.Furthermore it was shown that an approach based on C++ templates and operator overloadingmade it possible to incorporate the technique in large, complex science simulation codes.

However, for such an approach to be effective in this problem space, a critical algorithmicquestion is determining how samples generated from the sampling-based UQ method shouldbe grouped into ensembles to minimize the total computational cost, which is the subject ofthis contribution. Any such approach is likely to be highly dependent on both the UQ methodas well as the computational problem it is being applied to. Accordingly we investigate state-of-the-art local hierarchical stochastic collocation methods [26, 35] applied to representativestochastic anisotropic diffusion problems, where the diffusion coefficient is modeled by severaltruncated Karhunen–Loeve (KL) expansions. These problems have been chosen to inducelarge variation of the number of preconditioned linear solver iterations from sample to sample,making the decision on how to group samples into ensembles critically important. To this end,we investigate several approaches for grouping samples based on (1) the level of anisotropyinduced by the diffusion coefficient, (2) the effect of each random variable on an uncertaindiffusion coefficient, and (3) the geometric location of the samples in the uncertain samplespace. We find the first approach to be the most effective in minimizing the number of solver

1In particular all scalars involved with evaluation of the diffusion equation matrix, right-hand side, linearsolvers, and preconditioners.

Page 3: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 89

iterations of the ensemble system, providing a simple means of grouping samples provided thesampling algorithm has access to the evaluation of the diffusion coefficient.

This paper is organized as follows. We review local hierarchical stochastic collocationmethods applied to stochastic PDEs (SPDEs) in section 2 and the ensemble propagationtechnique from [45] in section 3. Then in section 4 several grouping strategies are discussed,and results of applying these strategies to anisotropic diffusion problems are presented insection 5. Finally in section 6 we summarize our results as well as thoughts on furthergrouping strategies we will investigate in the future.

2. Notation and preliminaries. We follow [28] to introduce the basic concepts of SPDEsand stochastic collocation methods. In particular, we focus on anisotropic diffusion equationsand on local hierarchical stochastic collocation methods.

2.1. PDEs with random input parameters. Let D ⊂ Rd (d = 1, 2, 3) be a boundeddomain with boundary ∂D and let (Ω,F ,P) be a complete probability space. Here, Ω is a setof realizations, F is a σ-algebra of events, and P : F → [0, 1] is a probability measure. Weconsider the following stochastic elliptic boundary value problem. Find u : D × Ω such thatalmost surely we have that2

(2.1)

−∇ · (A(a(x, ω))∇u) = f, x ∈ D, ω ∈ Ω,

u = 0, x ∈ ∂D,

where f ∈ L2(D) is a forcing term and A(a(·, ω)) :Ω→ Rd×d is a diffusion tensor parameterizedby a(·, ω) : Ω → R. For details regarding the functional spaces and the well-posedness ofproblem (2.1) we refer to [28].

We make the following assumptions on the parameters:1. a(x, ω) is bounded from above and below with probability 1.2. a(x, ω) can be written as a(x,y(ω)) in D × Ω, where y(ω) = (y1(ω) . . . yN (ω)) ∈ RN

is a random vector with uncorrelated components.3. a(x,y(ω)) is σ-measurable with respect to y.

A classical example of random parameter that satisfies 1–3 is given by a truncated KLexpansion [33, 34]. Given a second-order correlated random field a(x, ω) with continuouscovariance function cov(x,x′), the Mercer’s theorem [34] allows us to write it as

a(x, ω) = E[a(x, ·)] +∞∑n=1

√λn bn(x)ξn(ω),

where λn are the eigenvalues, in decreasing order, of the covariance function, bn(x) are the cor-responding eigenfunctions, and ξn(ω) ∈ R are uncorrelated random variables. The truncatedKL expansion corresponds to truncating the summation to the Nth term, so that for ξn = yn,n = 1, . . . N,

2Note that the method that we propose in this work could be applied to a more general problem with, e.g.,a nonlinear elliptic operator, an uncertain forcing term, a nonhomogeneous uncertain boundary data, and/ora different boundary condition on a portion of ∂D.

Page 4: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

90 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

(2.2) a(x, ω) = E[a(x, ·)] +N∑n=1

√λn bn(x)yn(ω)

is a truncated KL approximation of the random field a(x, ω). Note that the random variablesynNn=1 map the sample space Ω into RN ; for Γn = yn(Ω) ⊂ R, we define the parameter spaceas Γ =

∏Nn=1 Γn. Also, we denote the probability density function of y by ρ(y) : Γ → R+

with ρ ∈ L∞(Γ).According to the assumptions above, we rewrite (2.1) as

(2.3)

−∇ · (A(x,y)∇u) = f, x ∈ D,y ∈ Γ,

u = 0, x ∈ ∂D.

Here the diffusivity tensor A(·,y) : RN → Rd×d is defined as A = QΣQT , where Q is a rotationmatrix and Σ is a diagonal matrix defined as, for e.g., d = 2, Σ(x,y)= diag(a(x,y), a), witha ∈ R+. For a rotation angle θ the rotation matrix is defined as

Q =[

cos θ − sin θsin θ cos θ

].

The parameter a(x,y) > 0 is a truncated KL approximation of a random field, i.e.,

(2.4) a(x,y) = amin + a exp

N∑n=1

√λnbn(x)yn

.

Note that instead of using the classical KL expansion, to preserve the positive-definitenessof the diffusion tensor (required for the well-posedness of problem (2.3)) we consider theexpansion of the logarithm of the random field. We note that for x ∈ D and y ∈ Γ,

(a(x, y), a

)are the eigenvalues of A(x, y); their values are indicators of the pointwise anisotropy of thediffusion tensor for a specific y in the sample space.

2.2. Local hierarchical stochastic collocation methods. For the finite-dimensional ap-proximation of problem (2.3) we focus on stochastic collocation methods; these are nonintru-sive stochastic sampling methods based on decoupled deterministic solves. We provide a briefdescription of global methods and then focus on local approximations.

Given a Galerkin method for spatial discretizations of (2.3), we denote by uh(·,y) thesemidiscrete approximation of u(x,y) for all random vectors y ∈ Γ. The main idea of globalstochastic collocation methods is to collocate uh(·,y) on a suitable set of samples ymMm=1 ⊂ Γto determine M semidiscrete solutions and then use the latter to construct a global polynomialto represent the fully discrete approximation ugSCh,M (x,y), where gSC stands for global stochasticcollocation. The polynomial can be either interpolatory or based on a projection onto anorthonormal basis over Γ. As an example, in the first case, given the set of points ymMm=1,we introduce a set of basis functions ψm(y)Mm=1 ∈ PJ (p)(Γ) and write the fully discreteapproximation as

(2.5) ugSCh,M (x,y) =M∑m=1

cm(x)ψm(y).

Page 5: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 91

The space PJ (p)(Γ) is a multivariate polynomial space over Γ corresponding to the set ofindexes J (p) defined as PJ (p)(Γ) = span

∏Nn=1 y

pnn : p ∈ J (p), yn ∈ Γn. Among others,

common definitions of J (p) include p ∈ NN : maxn pn ≤ p and p ∈ NN :∑N

n=1 pn ≤ p.Note that the number of basis functions is not necessarily the same as the number of points.However, for the simple case of Lagrange interpolation considered in this paper we have asmany basis functions as points. The coefficient functions cm(x) in (2.5) are determined bysolving the system of interpolation conditions

(2.6)M∑m=1

cm(x)ψm(ym′) = uh(x,ym′) ∀m′ = 1, . . .M,

so that cm’s are linear combinations of the M Galerkin approximations. When we considerLagrange interpolation the basis functions satisfy the delta property ψm′(ym) = δmm′ ; thisimplies that the coefficients correspond to the finite element solutions, i.e., cm(x) = uh(x,ym).Then, (2.5) can be rewritten as

(2.7) ugSCh,M (x,y) =M∑m=1

uh(x,ym)ψm(y).

Common approaches for the construction of the approximate solution with respect to y arebased on the Smolyak sparse grid gSC algorithm introduced by Smolyak in [50] for quadratureand interpolation. We briefly describe a generalized version used in [3, 41, 40] that relies ontensor products of one-dimensional approximations.

Generalized sparse grids. Let l ∈ N+ be a one-dimensional level of approximation andylk

m(l)k=1 ⊂ γ ⊂ R be a sequence of one-dimensional interpolation points; m(l) defines the

number of collocation points at level l, being m(0) = 0 and m(1) = 1. Given a functionv ∈ C0(γ) we define the sequence of one-dimensional interpolation operators U m(l) : C0(γ)→Pm(l)−1(γ) as

U m(l)[v](y) =m(l)∑k=1

v(ylk)ψlk(y),

where ψlk ∈ Pm(l)−1(γ), k = 1, . . .m(l), are the Lagrange fundamental polynomials of degreem(l)− 1. We also define the difference operator as

(2.8) ∆m(l) = U m(l) −U m(l)−1 with U m(0) = 0.

In the multivariate case we let ln ∈ N+ be the one-dimensional level of approximationfor the random variable yn and yn,k ∈ Γn be a sequence of one-dimensional interpolationpoints. Also, we let l = (l1 . . . lN ) ∈ NN

+ be a multi-index, L ∈ N+ be the total level ofthe sparse-grid approximation, and ∆n be the difference operator corresponding to ln. Wedefine the N-dimensional hierarchical surplus operator and the Lth-level generalized sparse-grid operator as

∆m =N⊗n=1

∆m(ln)n and Jm,g

L =∑g(l)≤L

∆m,

Page 6: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

92 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

where the former is obtained by tensor product of (2.8). Here, g : NN+ → N+ is a mapping

between the multi-index l and the level used to construct the sparse grid.3 Then, we canwrite the generalized sparse-grid approximation of uh as ugSCh,L = Jm,g

L [uh]; note that thisapproximation only requires the independent evaluation of uh(x,y) at the collocation points

H mgL =

⋃g(l)≤L

N⊗n=1

ylnn,km(ln)k=1 .

Remark 2.1. There are several methods for the generation of the set of points within eachlevel. We mention, e.g., Gaussian and Clenshaw–Curtis points and we refer to [28] for furtherdetails.

As already mentioned, the great advantage of interpolatory approximation is that there isa complete decoupling of spatial and probabilistic discretizations. Also, they are very easy toimplement (requiring only codes for deterministic PDEs to be used as black boxes) and em-barrassingly parallelizable. On the other hand, global stochastic collocation methods performwell only when the solution u(x,y) is smooth with respect to the random parameters ynNn=1and fail to approximate solutions that have an irregular dependence. Because our ultimategoal is to study the latter scenario, we resort to local stochastic collocation methods;4 theseapproaches use locally supported piecewise polynomials to approximate the dependence of thesolution on the random parameters and choose the basis ψmMm=1 to be a piecewise hierarchi-cal polynomial basis [26, 12]. As opposed to gSC methods that achieve higher accuracy witha higher polynomial degree, local methods rely on grid refinement in the parameter space,keeping the polynomial degree fixed.

As done for the gSC methods we introduce univariate interpolation and then extend itto the multivariate case by tensor products. For simplicity and without loss of generality weconsider one-dimensional hat functions defined in [−1, 1] as

ψl,i(y) = ψ

(y + 1− i h(l)

h(l)

)with ψ(y) = max0, 1− |y|.

This function has local support (yl,i−h(l), yl,i+h(l)) and it is centered in yl,i; here, l = 0, 1, . . .is the resolution level, h(l) = 2−l+1 is the grid size of level l, and yl,i = i h(l)−1, i = 0, 1, . . . 2l,are the grid points. Clearly, in this case m(l) = 2l + 1.

Given the space L2ρ(Γ) of square integrable functions with respect to the probability density

function ρ, a common choice of finite-dimensional subspace is the finite element space ofcontinuous piecewise linear polynomials defined as Zl = spanψl,i(y) : i = 0, 1, . . . 2l forl = 0, 1, . . .. For each level l, the set of nodal basis functions is defined as ψl,i(y)2l

i=0. Asan alternative, we consider a hierarchical basis. Let Bl, for l = 1, 2, . . ., be hierarchical indexsets defined as Bl = i ∈ N : i = 1, 3, 5, . . . 2l − 1 and let Wl be the sequence of incrementalhierarchical subspaces of L2

ρ(Γ) defined as Wl = spanψl,i : i ∈ Bl. The hierarchical basisfor Zl is then given by

3We assume that the mapping g satisfies the admissibility criterion that ensures the validity of the telescopicsum expansion; see, e.g., [35].

4Note that the random field used in this work is smooth with respect to the random vector, so that a globalapproach is a viable option as well.

Page 7: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 93

ψ0,0, ψ0,1 ∪

l⋃

l′=1

ψl′,i(y)i∈Bl′

.

For each grid level l the interpolant of a function v ∈ L2ρ(Γ) in terms of the nodal basis is

given by

Il(v(y)) =2l∑i=0

v(yi)ψl,i(y).

We define the incremental interpolation operator as

∆l(v) = Il(v)− Il−1(v);

the paper [28] shows that ∆l can be written in terms of the hierarchical basis functions atlevel l, i.e.,

∆l(v) =∑i∈Bl

cl,i ψl,i(y) with cl,i = v(yl,i)− Il−1(v(yl,i)).

We refer to cl,i as surpluses on level l; these quantities play a crucial role in the adaptivegeneration of the sparse-grid approximation.

For the interpolation of a multivariate function v(y) defined on [−1, 1]N we extend theone-dimensional hierarchical basis to N dimensions by tensorization. Specifically, we usetensor products to define the basis function associated with the point yl,i = (yl1,i1 , . . . ylN ,iN ):

ψl,i(y) =N∏n=1

ψln,in(yn),

where ψln,in is the one-dimensional hierarchical basis function associated with yln,in = in hln−1for hln = 2−ln+1; l is a multi-index indicating the resolution level along each dimension.Accordingly, we define the N -dimensional incremental subspace Wl as

Wl =N⊗n=1

Wln = spanψl,i : i ∈ Bl,

where Bl is defined as

Bl =

i ∈ NN : in ∈ 1, 3, 5, . . . 2ln − 1, n = 1, . . . N, ln > 0

in ∈ 0, 1, n = 1, . . . N, ln = 0

.

Then, we define the sequence of subspaces Zl as

Zl =l⊕

l′=0

⊕α(l′)=l′

Wl′ ,

where Wl is an incremental subspace and α is a mapping between the multi-index l and thelevel of the sparse-grid approximation; see [28] for details. Common choices of α are

• α(l) = maxn=1,...N ln, leading to a full tensor-product space, and• α(l) = |l| =

∑Nn=1 ln, leading to a sparse polynomial space.

Page 8: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

94 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

Using the latter, the l-level hierarchical sparse-grid interpolant of v(y) is given by

(2.9) vl(y) =l∑

l′=0

∑|l′|=l′

(∆l′1⊗ . . .∆l′N

)v(y) = vl−1(y) +

∑|l′|=l′

∑i∈Bl′

cl′,iψl′,i(y),

where cl′,i = v(yl′,i)−vl′−1(yl′,i) is the N -dimensional hierarchical surplus. The correspondingset of sparse-grid points is then given by H l(Γ) = yl,i : i ∈ Bl. Thus, the sparse gridassociated with vl is given by

H Nl (Γ) =

l⋃l′=0

⋃|l′|=l′

H l′(Γ) with cardinality∣∣H N

l

∣∣ = Ml.

Choosing the hierarchical basis described above we write the fully discrete approximation ofu(x,y) as

(2.10) uh,L(x,y) =L∑l=0

∑|l|=l

∑i∈Bl

cl,i(x) ψl,i(y),

where the coefficients cl,i(x) depend on the finite element solutions corresponding to the sparse-grid points in H N

L . Specifically, they are linear combinations of the spatial finite element basisφj(x)Jj=1 (where J is the number of degrees of freedom), i.e., cl,i(x) =

∑Jj=1 cj,l,i φj(x).

Thus, we can rewrite (2.10) as

(2.11) uh,L(x,y) =J∑j=1

L∑l=0

∑|l|=l

∑i∈Bl

cj,l,i ψl,i(y)

φj(x).

Given the ML finite element solutions uh(xj ,yl,i), for j = 1, . . . J , |l| ≤ L, and i ∈ Bl, thesurpluses cj,l,i can be obtained solving the triangular linear system5

uh,L(xj ,yl′,i′) =L∑l=0

∑|l|=l

∑i∈Bl

cj,l,i ψl,i(yl′,i′) = uh(xj ,yl′,i′) for |l′| ≤ L and i ∈ Bl.

Adaptivity. Note that using the properties of the hierarchical surpluses we can rewrite theapproximation (2.11) in a hierarchical manner [28]:

uh,L(x,y) = uh,L−1(x,y) + ∆uh,L(x,y),

where uh,L−1 is the sparse-grid approximation in ZL−1 and ∆uh,L is the hierarchical surplusinterpolant in the subspace WL obtained by tensorization. In [12] Bungartz and Griebelshow that for smooth functions the surpluses cj,l,i of the sparse-grid interpolant uh,L are such

5The triangular structure of the system is a consequence of the hierarchical nature of the basis, whichsatisfies ψl,i(yl′,i′) = 0 if l′ ≤ l.

Page 9: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 95

that cj,l,i→ 0 as l→∞. As a consequence, the magnitude of the surpluses can be used asan error indicator for the construction of adaptive sparse-grid interpolants; this technique isparticularly powerful with irregular functions, featuring, e.g., steep slopes or discontinuities.We describe the algorithm following [35].

In one dimension the adaptive construction of the sparse grid is straightforward. At eachsuccessive interpolation level the surpluses cj,l,i, for j = 1, . . . J , are evaluated at the pointsyl,i, for i ∈ Bl; if maxj |cj,l,i| ≥ ε, then the grid is refined around yl,i adding the two neighborpoints. Here, ε is a prescribed error tolerance.

We generalize this strategy to the N -dimensional case keeping in mind that each grid pointhas 2N children at each successive level. Note that the children of a parent point are associatedwith hierarchical basis functions on the next interpolation level; thus, we can construct theinterpolant uh,L by adding only those points on level L whose parents have surpluses greaterthan a prescribed tolerance. We recall that at each sparse-grid point yl,i we have J surpluses;thus, on each level l we define the new set of indexes Bε

l , for |l| = l, as follows:

Bεl =

i ∈ Bl : max

j=1,...J|cj,l,i| ≥ ε

.

This set only contains the indexes of the surpluses with magnitude larger than ε for allj = 1, . . . J ; we refer to this strategy as classic refinement. In this way the sparse grid islocally and adaptively refined and the resulting grid is a subgrid of the isotropic (full tensorproduct) sparse grid.

This algorithm does not necessarily result in a stable interpolant, i.e., it may fail toconverge as l→∞. Such instabilities may be caused by situations in which yl,i is associatedwith a large surplus, while some of its parents are not included in the point set at the previouslevel. For this reason, other refinement techniques (based on the classic refinement) have beenconsidered; see [51] for a summary of alternative adaptive-refinement strategies and theirimplementation.

Quantity of interest. In the SPDE setting the goal of UQ is to determine statistical infor-mation about an output of interest that depends on the solution. In most of the cases theoutput of interest is not the solution itself but a functional Gu(y). Common functionals arethe spatial average of u(x,y) or its maximum value over the domain, i.e.,

Gu(y) =1|D|

∫Du(x,y) dx or Gu(y) = max

x∈Du(x,y),

respectively. Then, the statistical information may come in the form of moments of Gu(y); asan example, the quantity of interest (QOI) could be the expected value of Gu(y) with respectto the probability density function ρ(y), i.e.,

QOI = E [Gu(y)] =∫

ΓGu(y)ρ(y) dy.

Note that the adaptive procedure targets a desired accuracy for the functional Gu(y) and notnecessarily for u itself.

Page 10: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

96 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

3. Numerical solution via ensembles. The global and local stochastic collocation meth-ods described above have a significant advantage in that they are fully nonintrusive, i.e., theycan be applied to a simulation code that numerically solves (2.1) with little or no modification.This makes applying these ideas to a broad set of simulation codes appealing. However, inlarge-scale, high-performance scientific computing, the dominant cost by far in implementingthe collocation method is solving the PDE system at each interpolation point. In fact, thecost of each sample evaluation can be so large that applying the stochastic collocation methodto more than a handful of random variables yn is intractable.6 Therefore it is reasonable toask if performance of the method could be improved by “opening up the box” and exploitingfurther structure within each PDE evaluation.

In [45], this idea was explored within the context of embedded ensemble propagation.Within scientific simulation in general, there is a tremendous amount of data and computationthat is the same for each realization of the uncertain input data. In the context of (2.1), forexample, the mesh upon which the spatial discretization of u is constructed does not dependon the random variables y. In [45], an approach for reusing this information was investigatedby propagating multiple samples at a time, which we called ensembles. It was shown thatby exploiting features of modern and emerging computer architectures, substantial speed-upscould be obtained by solving the PDE at multiple sample points simultaneously compared toone point at a time. Within each level of the adaptive collocation method described above,evaluation of the PDE at each sample point is trivially parallelizable, and all of the samplepoints could in theory be evaluated in parallel. However, in practice this is almost neverpossible for the kinds of simulations of interest to large-scale scientific computing, even whenimplemented on the largest supercomputers available today. Evaluation of the simulation codefor a single sample often uses a significant portion of the available computing resources, mak-ing it possible to parallelize only a small fraction of the needed sample evaluations, with theremaining fraction evaluated sequentially. Here the speed-up is computed as the ratio of thecomputational time required by the scalar solves of all samples in the same ensemble andthe time required by the ensemble system. In what follows, we briefly review the ensembleformulation from [45] and summarize the main computational results.

Consider a finite element discretization of (2.1). For every sample ym, m = 1, . . .ML, wewrite the resulting algebraic system as follows:

(3.1) LmUm = Fm, Lm ∈ RJ×J , Um ∈ RJ , Fm ∈ RJ ,

where J is the number of spatial degrees of freedom.7 Let an ensemble size S be given andconsider solving (3.1) for S samples ym1 , . . . ,ymS ,

Lm1Um1 = Fm1 ,

...LmSUmS = FmS ,(3.2)

6With the purpose of improving the performance, several methods have been designed; among others, wemention parallel implementations of stochastic collocation methods (see, e.g., [57]).

7While in (2.1) and onward the forcing function is assumed to be deterministic, we allow for dependenceof the right-hand-side F on the sample m for generality. Furthermore, the generalization to nonlinear andtime-dependent problems is straightforward and not discussed here.

Page 11: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 97

which can be written more compactly through Kronecker product notation:

(3.3)

(S∑i=1

eieTi ⊗ Lmi

)(S∑i=1

ei ⊗Umi

)=

S∑i=1

ei ⊗ Fmi .

Here ei is the ith column of the S×S identity matrix. Furthermore, a symmetric permutationmay be applied to (3.3), which results in commuting the order of the terms in each Kroneckerproduct:

(3.4)

(S∑i=1

Lmi ⊗ eieTi

)(S∑i=1

Umi ⊗ ei

)=

S∑i=1

Fmi ⊗ ei.

Both (3.3) and (3.4) represent precisely the same linear system, but with different orderings ofdegrees of freedom. In (3.3), all spatial degrees of freedom for a given sample ymi are orderedconsecutively, whereas in (3.4) degrees of freedom for all samples are ordered consecutivelyfor a given spatial degree of freedom.

In [45] it was shown that (3.4) can be solved efficiently by replacing each sample-dependentquantity used within the simulation code for evaluating L and F as well as solving for U witha length-S array. This has a number of implications that affect performance of the resultingsimulation:

• Sample independent quantities, such as the spatial mesh used in evaluation L and F,as well as the (sparse) matrix graph used in solving linear systems involving L, areautomatically reused. This reduces computation by computing these quantities onlyonce per ensemble, reduces memory usage by storing them only once per ensemble,and reduces memory traffic by loading/storing them only once per ensemble.• Even sample-dependent quantities such as preconditioners for L (or L itself in nonlin-

ear problems) can be approximated by a single quantity once per ensemble, e.g., byevaluating the preconditioner for L at the mean of the samples within the ensemble.This further reduces computational cost in evaluating and applying these quantities,at the expense of possibly increased solver iterations. This is an algorithmic questionthat is not explored within this paper.• Random memory accesses of sample-dependent quantities are replaced by contiguous

accesses of ensemble arrays. This amortizes the latency costs associated with theseaccesses over the ensemble, since consecutive memory locations can usually be accessedwith no additional latency cost. It was demonstrated in [45] that this effect, combinedwith reuse of the sparse matrix graph, can result in 50% reduction in the cost of matrix-vector products associated with sparse iterative linear system solvers on emergingcomputational architectures, when applied to scalar diffusion problems such as thoseconsidered here.• Arithmetic on ensemble arrays can be naturally mapped to fine-grained vector paral-

lelism present in most computer architectures today, and this vector parallelism canbe more easily extracted by compilers than can typically be extracted from the finiteelement simulation itself.

Page 12: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

98 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

• The number of distributed memory communication steps of sample-dependent infor-mation (e.g., within sparse iterative linear system solvers or evaluation of L or F) isreduced by a factor of S, with the size of each communication message increased bya factor of S. This both reduces the latency cost associated with these messages byS as well as improves the throughput of each message since larger messages can oftenbe communicated with higher bandwidth. It was demonstrated in [45] that this cansubstantially improve scalability to large processor counts when the costs associatedwith distributed memory communication become significant.

Furthermore, it was also shown in [45] that the translation from scalar to ensemble propaga-tion within a simulation code can be facilitated through the use of a template-based genericprogramming approach [43, 44] whereby the traditional floating point scalar type is replacedby a template parameter. This template code can then be instantiated on the original floatingpoint type to recover the original simulation, as well as a new C++ ensemble scalar typethat internally stores the length-S ensemble array to implement the ensemble propagation.Such a scalar type is provided by the Stokhos [46] package within Trilinos [30, 31] and hasbeen integrated with the Kokkos [17, 18] package for portable shared-memory parallel pro-gramming as well as the Tpetra package [4] for hybrid shared-distributed linear algebra. Asexamples of the kinds of performance improvement that can be achieved with this ensem-ble propagation approach, Figure 1 displays the speed-up observed when solving (3.4) usingthese techniques on both CPU and GPU architectures for several choices of ensemble sizeS, compared to solving for each sample solution U sequentially. Here the speed-up is com-puted as the ratio of the computational time required by the scalar solves of all sampleswithin the ensemble and the time required to solve the ensemble system. In these calcula-tions the isotropic diffusion parameter is a scalar modeled by the truncated KL expansiona(x,y) = amin + a

∑Nn=1√λnbn(x)yn, where λn and bn are the eigenvalues and eigenfunctions

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

1 8 64 512 4096

Spe

ed-­‐Up

Compute Nodes

Ensemble Mul>grid Precondi>oned CG Solve Speedup (Cray XK7 CPU)

Ensemble Size = 4

Ensemble Size = 8

Ensemble Size = 16

Ensemble Size = 32

(a)

1.0

3.0

5.0

7.0

9.0

11.0

13.0

1 2 4 8

Spe

ed-­‐Up

Compute Nodes

Ensemble Mul>grid Precondi>oned CG Solve Speedup (NVIDIA K20X GPU)

Ensemble Size = 16

Ensemble Size = 32

(b)

Figure 1. Ensemble speed-up for AMG preconditioned CG solve on a Cray XK7 CPU (a) and NVIDIAK20X GPU (b). The finite element mesh is held fixed at 64 × 64 × 64 mesh cells per compute node and scaledup to 8192 compute nodes with 16 processors per node (131,072 processors total) on the Cray and 8 computenodes on the GPU cluster.

Page 13: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 99

of an exponential covariance (see section 5), and yn ∈ [−1, 1]. The resulting linear equationsare solved by the conjugate gradient (CG) method preconditioned with algebraic multigrid(AMG). In this case the number of CG iterations is independent of the sample value andtherefore the number of CG iterations for each ensemble is independent of the choice of whichsamples are grouped together in each ensemble. For more realistic problems, this is unlikelyto be the case and how the samples are grouped into ensembles does have a strong effect onthe resulting performance, which is discussed in the next section.

4. Grouping strategies. In this section we focus on how to group together samples withineach ensemble to maximize the performance of the algorithm introduced in the previoussection.

Our strategies are not based on a formal analysis, nor do they provide a general algorithmthat can be universally applied to any PDE model or numerical scheme; in fact, they are tiedto the solution of anisotropic diffusion problems and to AMG solvers. The reason for the lackof a rigorous analysis is that finding a mapping between the samples and the convergencebehavior of the numerical solver is challenging and highly nontrivial. Such a mapping woulddepend on the combination and interaction of several factors such as the discretization method,the computational domain, the regularity of the problem parameters, etc. In particular,one should determine how such factors affect different components of the linear solvers andtheir preconditioners. For these reasons we only provide heuristic algorithms based on theknowledge of the mathematical properties of the PDE and on available rigorous analysis ofthe numerical solvers (see section 4.1). Even though empirical, and far from being universal,our study does provide guidelines for designing an efficient grouping strategy to be appliedwhen the convergence properties of the numerical solvers are known.

An important observation is that in the solution of (3.1) the convergence of the linearsolver (or its number of iterations) is almost always affected by the spectral properties of thematrices Lm;8 the latter are strongly related to quantities such as the condition number orindicators of the spatial variations of the parameters, e.g., the total variation, the magnitudeof the gradient, the strength of the anisotropy, etc. Different quantities affect different solvers;as an example, it is well known that the condition number strongly affects the convergence ofthe CG method or, as another example, stretched and irregular grids may affect the behaviorof AMG solvers or preconditioners.

A second important observation is that, regardless of the rearrangement of rows andcolumns via Kronecker product in (3.4), the spectra of the ensemble matrices are the union ofthe spectra of the matrices within each ensemble. Thus, it is likely that the condition numberof the ensemble matrix is bigger than that of any of the finite element matrices within theensemble. As a consequence, the convergence of the solver for the ensemble system is alwayspoorer (the number of iterations is always higher) than that of the solver applied to eachsample individually.

In order to minimize the deterioration of the convergence it is beneficial to group togethersamples with similar spectral properties, i.e., requiring a similar number of iterations. For thisreason we consider as a benchmark the grouping obtained by ordering the samples on the basis

8Note that in the numerical experiments performed in [45] the finite element matrices Lm have very similarspectra; this usually implies having similar convergence behavior.

Page 14: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

100 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

of increasing number of iterations required for the numerical solution of the correspondingPDE;9 the grouping then follows from dividing the samples into ensembles of size S. Ofcourse, this information is not known a priori, but we may use the condition number orindicators of the spatial variation of the parameters for predicting which samples feature asimilar convergence behavior.

4.1. The numerical solvers. The matrix Lm in (3.1), the finite element stiffness matrixcorresponding to A(x,ym), is always symmetric positive definite (SPD) by construction. Thus,the discretization problem admits a unique solution and it is perfectly suitable for an iterativesolver based on the CG method. However, it is well known that the convergence of CG isdetermined by the condition number of the matrix, and specifically, it can be very slow whenLm has a widespread spectrum. For this reason it is preferable to use a CG method with anappropriate preconditioner (PCG). Multigrid methods are often the preconditioner of choicefor diffusion problems. For SPD matrices, it is known that the convergence of the multigridcycle is independent of the PDE mesh-spacing h [53]. AMG methods are attractive froman application perspective, as the development of the coarse-level approximations is handledautomatically. In practice, however, a variety of issues can hamper the effectiveness of thecoarsening algorithms in AMG, even for SPD matrices: mesh stretching, irregular meshes,highly anisotropic problem coefficients, choice of discretization, etc. If these and related issuesare not handled appropriately, the resulting CG preconditioned with an AMG preconditionermay no longer be h independent.

As anisotropic diffusion problems are the focus of this paper, we discuss some relevantdetails of the AMG coarsening process. The aggregation-based multigrid determines coarsedegrees of freedom by grouping fine-level degrees of freedom (matrix rows) together into“aggregates” (interpolation weights are calculated separately through a local orthogonalizationprocess). The goal of the coarsening process is to form aggregates containing unknowns thatare strongly coupled to one another, where that coupling is deduced by comparing relativemagnitudes of matrix coefficients and ignoring coupling deemed too small. The standardapproach for comparing coupling strength is by using a scalar-valued threshold [54]. Anotherapproach is to use mesh coordinates as a proxy for strength of connection. However, this tendsto be effective only if the anisotropy is due to mesh stretching. Studies have shown that ifthe matrix has strongly varying entries (as happens for highly anisotropic problems) or if thecoupling is not reflected in the discretization matrix [42], the aggregation can be problematic,leading to a deterioration of the convergence and, as a consequence, to an increase of thenumber of linear solver iterations.

4.2. Proposed grouping strategies. We propose three grouping approaches based onfundamentally different considerations. We refer to the first approach as parameter-based,meaning that it depends on the values, in space, of the diffusion tensor in correspondenceof a single sample. We refer to the second as KL-based; this strategy is based on the effectthat each random variable has on the uncertain parameter and it is strongly related to thechoice of the KL representation. The third approach is based on the geometric location of

9Note that this grouping strategy is not necessarily optimal; in fact, samples that require a similar numberof iterations may have different spectral properties.

Page 15: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 101

the samples in Γ; we refer to this strategy as HSFC-based, where HSFC stands for Hilbertspace-filling curve. In this approach the grouping is determined by the partition of the samplespace induced by the Hilbert curve (which we define below). All these strategies provide anordering of the samples; once ordered, the samples are then divided into ensembles of size S.

Remark 4.1. In our tests we perform an adaptive generation of the sparse grid, as de-scribed in section 2.2. Thus, we group the samples at each iteration (or level) of the adaptiverefinement. Note that since the total number of samples within each level is automaticallydetermined by the algorithm, this number is not necessarily a multiple of S. In this casethe ensemble matrix corresponding to the last group of samples is completed with the matrixassociated with the first sample in such group.

Parameter-based grouping. Given the poor performance of AMG methods with diffusionproblems featuring pronounced anisotropy (as pointed out in section 4.1), we propose as anindicator of slow convergence for a sample y the quantity

(4.1) I(y) = ‖r(x, y)‖L∞(D) where r(x, y) =λmax(A(x, y))λmin(A(x, y))

.

The ordering is based on increasing values of I; we expect smaller values of I to correspondto a smaller number of iterations.

In this approach we basically identify the intensity of the anisotropy at each point in thespatial domain with the ratio between the maximum and minimum eigenvalues of the diffusiontensor; the maximum value of this quantity over D then provides a measure of the anisotropyassociated with the sample y. Note that the computation of this indicator comes at a cost.In fact, prior to assembling the ensemble matrix we need to compute for each sample thediffusion tensor and its eigenvalues. However, in our case the computation of the eigenvaluesis straightforward:

r(x, y) =maxi

[Σ(x, y)

]ii

mini[Σ(x, y)

]ii

.

KL-based grouping. This approach is strongly related to the choice of the truncated KLexpansion for the approximation of the random field. The main idea is that larger valuesof the exponent of a(x,y) correspond to larger values of the parameter and, hence, to ahigher level of anisotropy. As pointed out before, the latter implies a slower convergence. Wewant the KL-based indicator to provide a measure of the magnitude of a(x, y) at a point ywithout actually computing the parameter. Compared to the parameter-based approach, theadvantage of this strategy is that it requires less computational effort and does not assumeany knowledge of the parameters or of the SPDE itself.

Note that when the eigenvalues of the covariance function have a fast decay, approximation(2.4) suggests that the first components of a point y ∈ Γ will be the most influential, beingassociated with the largest eigenvalues. On the other hand, when the eigenvalues feature a slowdecay the influence of the components of y is determined by the shape of the eigenfunctionsof the covariance function. In Figure 2 we report, for d = 2, the first, second, and thirdeigenfunctions for a squared exponential (top) and a γ-exponential (bottom) covariance (seesection 5); all other covariance functions exhibit the same behavior. Here, the covariance

Page 16: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

102 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

0

0.2

0.4

0.6

0.8

0

0.2

0.4

0.6

0.8

−0.015

−0.01

−0.005

0

0

0.2

0.4

0.6

0.8

0

0.2

0.4

0.6

0.8

−0.02

−0.015

−0.01

−0.005

0

0.005

0.01

0.015

0.02

0

0.2

0.4

0.6

0.8

0

0.2

0.4

0.6

0.8

−0.02

−0.015

−0.01

−0.005

0

0.005

0.01

0.015

0.02

0

0.2

0.4

0.6

0.8

0

0.2

0.4

0.6

0.8

0

0.005

0.01

0.015

0

0.2

0.4

0.6

0.8

0

0.2

0.4

0.6

0.8

−0.02

−0.015

−0.01

−0.005

0

0.005

0.01

0.015

0.02

0

0.2

0.4

0.6

0.8

0

0.2

0.4

0.6

0.8

−0.02

−0.015

−0.01

−0.005

0

0.005

0.01

0.015

0.02

Figure 2. For the squared exponential covariance (top) and γ-exponential covariance (bottom) with γ = 1.5,the first three eigenfunctions.

matrix has size (16.4·103)2. We note that the first basis function is either negative or positive,whereas the second and third are odd functions.10 Based on the previous considerations, ourconjecture is that if the first eigenfunction is positive, increasing values of y1 correspond toa slower convergence since they are likely to generate a parameter with high magnitude. Onthe other hand, if the first eigenfunction is negative, the same behavior will be observed fordecreasing values of y1. For the same reason, we also conjecture that increasing values of |y2|and |y3| will have the same effect. As a consequence, in case of, e.g., positive first eigenfunction,we expect higher values of y1, |y2|, and |y3| to correspond to a slower convergence. Thus, forN = 3, we define the indicator as I(y) = ±y1 + |y2| + |y3|, where the sign of the first termdepends on the sign of the first eigenfunction, as described above. When the dimension of thesample space is N > 3 it is not trivial to design a KL-based indicator; this happens becausethe eigenfunctions do not have a symmetric behavior with respect to the origin (though theystill feature oscillations). Nevertheless, we define the KL-based indicator as

(4.2) I(y) = ±y1 +N∑n=2

|yn|.

10The sign of the first eigenfunction only depends on the solver used for its computation. In this casewe compute the approximate eigenfunctions as eigenvectors of the approximate covariance matrix using theMATLAB built-in function eigs.

Page 17: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 103

As for the parameter-based indicator, the ordering is based on increasing values of I and weexpect smaller values of I to correspond to a smaller number of iterations.

HSFC-based grouping. Frequently the dependence of a computational simulation on theuncertain parameters may not be precisely known, in particular how values of those parametersaffect computational cost. Furthermore, it may be difficult or expensive to access parameter-dependent data in the simulation in order to implement a grouping strategy. Therefore it isuseful to consider grouping approaches that are more “black box” and could be in principleapplied to any SPDE or simulation. The most natural approach for grouping points withoutusing any notion of the SPDE is by clustering in the sample space on the basis of the geometriclocation. However, clustering is not ideal when points in the grid are sparse. Thus, unlessthe adaptive refinement generates a grid with areas of high point density, a clustering-basedstrategy is likely to perform poorly. Nevertheless, we consider a grouping strategy solelybased on the location of the samples. Specifically, this approach is based on the partition ofthe sample space given by the HSFC. The Hilbert curve provides a mapping between one-dimensional and two-/three-dimensional spaces, using the fact that a partition of an intervalinduces a partition on a plane or in the space and vice versa. In this approach we partitionthe sample space using an HSFC algorithm so that each box in the partition contains a smallnumber of samples (one in most cases); since the boxes are ordered (due to the mapping) thepartition induces an ordering of the samples. When a box contains more than one sample,the way they are ordered is irrelevant.

The idea of using a space-filling curve is very natural; however, it has a drawback: onceordered, points that are close to each other are necessarily close in the sample space, but onthe other hand, points that are close in the space are not necessarily close in the ordering.This suggests that this approach may not be as effective as the two introduced before.

The HSFC algorithm used in this work is implemented in Zoltan [11, 15], a library forparallel partitioning, load balancing, and data-management services.

5. Numerical tests. In this section we present the results of numerical tests for anisotropicdiffusion problems in two- and three-dimensional spatial domains and multidimensional pa-rameter spaces. Though preliminary, these results show the efficacy of our grouping strategiesin terms of computational time saving and set the ground for realistic simulations.

We consider the following covariance functions:A. squared exponential (Gaussian):

cov(x,x′) = σ20 exp

−‖x− x′‖2

2δ2

;

B. exponential:

cov(x,x′) = σ20 exp

−‖x− x′‖

δ

;

C. γ-exponential:

cov(x,x′) = σ20 exp

−‖x− x′‖γ

δγ

, γ ∈ (0, 2];

Page 18: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

104 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

D. rational quadratic:

cov(x,x′) =(

1 +‖x− x′‖2

2αδ2

)−α, α > 0;

where δ is the characteristic distance of the spatial domain, i.e., the distance for which pointsin the spatial domain are significantly correlated.

To assess the computational savings brought by our grouping strategies we consider thequantities

(5.1) Rl =S

Kl∑k=1

ITSk

Kl∑k=1

S∑i=1

itski

and R =S

K∑k=1

ITSk

K∑k=1

S∑i=1

itski

,

where ITSk is the number of iterations required by the kth ensemble, itski is the number ofiterations required by the ith sample in the kth ensemble, Kl is the number of ensembles atlevel l, and K is the total number of ensembles. Rl represents the increase in computationalwork (as indicated by the number of solver iterations) induced by the ensemble propagation,and R represents the total increase in work over all levels. This increase in work is mitigatedby the computational savings induced by the ensemble propagation technique described insection 3, referred to as speed-up. The achieved speed-up in practice is then reduced by afactor of R.

For testing purposes, we choose the `2-norm of the vector of the values of the discretesolution at the spatial degrees of freedom as the output of interest, i.e., Gu(y) = ‖U(y)‖2`2 ,where U(y) is the discrete solution in correspondence of the sample y.

5.1. Two-dimensional test cases. We let D = [0, 1]2 and Γ = [−17, 17]N , with N = 3and 6. In our simulations the finite element discretization is performed using Intrelab, theMATLAB interface of the Trilinos package Intrepid [9]; the latter is a library of interoperabletools for compatible discretizations of PDEs and it provides a large class of finite elementdiscretizations. In this work, for the discretization of (2.3), we use bilinear basis functions ona uniform, structured, 64×64 Cartesian mesh.

As anticipated in the previous sections, for the solution of the linear systems associatedwith the finite element discretization we use a PCG method where the preconditioner is anAMG solver. The software we use is ML, an AMG library in the Trilinos project. It isdesigned to solve large sparse linear systems of equations arising primarily from elliptic PDEdiscretizations. ML is used to define and build multigrid solvers and preconditioners, and itcontains black-box classes to construct highly scalable smoothed aggregation preconditioners.In our simulations we use a symmetric Gauss–Seidel smoother. ML has been successfullyapplied to linear systems arising from diffusion, convection-diffusion, drift-diffusion, magneto-hydrodynamics, and eddy current problems [10, 32, 49]. More specifically, in our tests, giventhe finite element matrices Lm assembled with Intrelab, the ensemble matrix is explicitlyformed by using the Kronecker product (3.4) and passed to ML.

For the construction of the sparse grid we use TASMANIAN [51] (toolkit for adaptivestochastic modeling and nonintrusive approximation), a set of libraries for high-dimensional

Page 19: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 105

−15 −10 −5 0 5 10 15

−15

−10

−5

0

5

10

15

y1

y2

−15 −10 −5 0 5 10 15

−15

−10

−5

0

5

10

15

y1y3

−15 −10 −5 0 5 10 15

−15

−10

−5

0

5

10

15

y2

y3

Figure 3. For N = 3 and for the squared exponential covariance, from left to right, the planes (y1, y2),(y1, y3), and (y2, y3) of the 9-level sparse grid (777 points) generated with adaptive refinement.

integration and interpolation, and parameter calibration, sponsored by the Oak Ridge Na-tional Laboratory. TASMANIAN implements a wide class of one-dimensional rules (and ex-tends them to the multidimensional case by tensor products) based on global and local basisfunctions. In this work the sparse grid is obtained using piecewise linear local basis functionsand classic refinement. It is common to apply the adaptive refinement to a sparse grid of levell > 1; here, we set the initial sparse-grid level to l = 4. As an example, Figure 3 shows thethree-dimensional sparse grid generated using the squared exponential covariance; from leftto right, we report the (y1, y2), (y1, y3), and (y2, y3) planes in the sample space. The adaptivealgorithm is such that the grids are not full tensor grids and are refined only where the outputof interest exhibits steep gradients.

All our two-dimensional tests are performed on an Intel Xeon CPU E5-2650, v32.30 GHz,64 G RAM.

5.1.1. Test 1. In this section we consider N = 3, a = 100, δ = 0.05, σ0 = 1, amin = 1,and a = 1. For S = 8, 16, 32 we report the results of our tests for the covariance functionsA–D. The values of θ and ε are chosen in such a way that the grids have a similar number ofpoints.

A. Squared exponential covariance. For θ = 0.1π and ε = 0.003 the adaptive algorithmgenerates a sparse grid with ML = 777 points.

In Figure 4 we compare our benchmark, i.e., the ordering based on the number of iterations,with the parameter-based and KL-based approaches: for each level l = 4, . . . 9 of the sparsegrid, we report the number of iterations required by a single sample for the ordering based onthe increasing number of iterations (red squares), on I (blue circles) and on I (green triangles).These results show that the parameter-based indicator is a good predictor of the performanceof the PCG solver as it generates an ordering that is in very good agreement with the onebased on the number of iterations. The KL-based indicator also performs well; however, forsome sparse-grid levels, the ordering based on I is not in good agreement with the one basedon the number of iterations. In Table 1, row A, we report the values of Rl, l = 4, . . . 9,and R. The strategies “par,” “KL,” and “NO” (no ordering) correspond to parameter-basedgrouping, KL-based grouping, and the grouping based on the order in which the samples are

Page 20: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

106 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

0 10 20 30 40 50 60 700

50

100

150

200

250

300

350

400

samples

#iterations

I

I

its

l = 4

0 20 40 60 80 100 1200

50

100

150

200

250

300

350

samples

#iterations

I

I

its

l = 5

0 50 100 150 2000

50

100

150

200

250

300

350

samples

#iterations

I

I

its

l = 6

0 50 100 150 2000

50

100

150

200

250

300

350

samples

#iterations

I

I

its

l = 7

0 50 100 1500

50

100

150

200

250

300

350

400

samples

#iterations

I

I

its

l = 8

0 10 20 30 4020

30

40

50

60

70

80

90

samples

#iterations

I

I

its

l = 9

Figure 4. For Test 1 and for the squared exponential covariance, comparison of the orderings based on thenumber of iterations (red squares), on I (blue circles), and on I (green triangles). Each plot corresponds to alevel (l = 4, . . . 9) of the sparse grid generated with the adaptive algorithm.

generated by TASMANIAN. These results confirm that I is the best indicator to minimize thedeterioration of the convergence of the linear solver and also confirm that I does not performwell for every sparse-grid level.

In Figure 5 we report for each grid level the number of iterations of the samples orderedwith the HSFC algorithm. As expected, this algorithm does not perform well as it groupstogether samples associated with very different numbers of iterations.

In the following paragraphs we report the results of the same experiments for the covariancefunctions B, C, and D; similar considerations can be inferred and are summarized in theconclusion (see section 6).

B. Exponential covariance. We report the results in Figures 6 and 7 and in Table 1, row B.For θ = 0.05π and ε = 0.002 the adaptive algorithm generates a sparse grid with ML = 941points.

C. γ-Exponential covariance. We report the results in Figures 8 and 9 and in Table 1, rowC. For θ = π, γ = 1.5, and ε = 0.001 the adaptive algorithm generates a sparse grid withML = 1119 points.

D. Rational quadratic covariance. We report the results in Figures 10 and 11 and in Table1, row D. For θ = π, α = 3, and ε = 0.001 the adaptive algorithm generates a sparse gridwith ML = 941 points.

5.1.2. Test 2. In this section we consider N = 6, a = 5, δ = 0.05, σ0 = 1, amin = 1,and a = 1. For S = 8, 16, 32 we report the results of our tests for the covariance functions

Page 21: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 107

Table 1For Test 1 (N = 3) and for the covariance functions A–D, values of Rl, l = 4, . . . 9, and R using different

grouping strategies.

cov Strategy S R4 R5 R6 R7 R8 R9 R

A par 8 1.418 1.390 1.343 1.369 1.410 1.223 1.374KL 8 1.731 1.628 1.452 1.447 1.502 1.291 1.508NO 8 1.817 1.873 1.695 1.780 1.903 1.647 1.793par 16 1.606 1.532 1.413 1.434 1.498 1.406 1.469KL 16 2.069 2.046 1.692 1.633 1.590 1.503 1.739NO 16 2.390 2.317 1.936 2.171 2.488 1.784 2.197par 32 2.098 1.813 1.586 1.516 1.625 1.488 1.652KL 32 2.580 2.142 2.048 1.765 1.842 1.760 1.989NO 32 3.102 2.687 2.669 2.741 3.422 1.786 2.852

B par 8 1.275 1.275 1.250 1.267 1.315 1.192 1.274KL 8 1.398 1.331 1.320 1.310 1.328 1.246 1.325NO 8 1.530 1.413 1.429 1.440 1.484 1.194 1.448par 16 1.343 1.326 1.289 1.341 1.387 1.442 1.337KL 16 1.474 1.440 1.423 1.423 1.418 1.634 1.430NO 16 1.855 1.613 1.614 1.690 1.698 1.445 1.673par 32 1.530 1.502 1.342 1.415 1.454 1.707 1.427KL 32 1.537 1.654 1.584 1.578 1.490 1.707 1.567NO 32 2.012 1.856 1.732 1.918 1.821 1.707 1.847

C par 8 1.265 1.212 1.222 1.209 1.214 1.207 1.217KL 8 1.413 1.351 1.354 1.317 1.275 1.263 1.324NO 8 1.679 1.432 1.510 1.485 1.525 1.257 1.503par 16 1.417 1.290 1.260 1.252 1.257 1.337 1.272KL 16 1.532 1.521 1.510 1.472 1.388 1.368 1.467NO 16 2.178 1.719 1.770 1.765 1.820 1.387 1.794par 32 1.680 1.470 1.345 1.348 1.352 1.457 1.384KL 32 1.682 1.677 1.654 1.653 1.540 1.512 1.627NO 32 2.490 2.102 2.187 2.334 2.135 1.553 2.223

D par 8 1.385 1.342 1.325 1.365 1.351 1.253 1.348KL 8 1.521 1.573 1.458 1.461 1.589 1.305 1.499NO 8 1.883 1.983 1.740 1.720 1.846 1.442 1.785par 16 1.590 1.481 1.385 1.421 1.437 1.472 1.436KL 16 1.819 1.887 1.716 1.638 1.702 1.463 1.709NO 16 2.481 2.406 2.243 2.054 2.217 1.609 2.198par 32 2.117 1.859 1.454 1.503 1.606 1.519 1.601KL 32 2.102 2.157 1.894 1.837 1.965 1.463 1.925NO 32 2.504 2.675 2.598 2.571 2.936 1.707 2.633

A–D. We perform the same experiments as in section 5.1.1. The tolerance and the number ofpoints in the sparse grids corresponding to different covariance functions are as follows. ForA, ε = 0.08 and ML = 2055; for B, ε = 0.08 and ML = 1827; for C, ε = 0.04 and ML = 2203;for D, ε = 0.1 and ML = 1961. Results are reported in Table 2. While the indicator I stillperforms well, the performance of I is not as good as for N = 3. As anticipated in section4.2 this is due to the fact that it is not straightforward to establish a relation between theeigenfunctions and the uncertain diffusion parameter.

Page 22: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

108 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

0 10 20 30 40 50 60 700

50

100

150

200

250

300

350

400

samples

#iterations

l = 4

0 20 40 60 80 1000

50

100

150

200

250

300

350

samples

#iterations

l = 5

0 50 100 150 2000

50

100

150

200

250

300

350

samples

#iterations

l = 6

0 50 100 150 2000

50

100

150

200

250

300

350

samples

#iterations

l = 70 50 100 150

0

50

100

150

200

250

300

350

samples

#iterations

l = 80 10 20 30 40 50

20

30

40

50

60

70

80

90

samples

#iterations

l = 9

Figure 5. For Test 1 and for the squared exponential covariance, the number of iterations of the samplesordered with the HSFC algorithm within each grid level.

0 10 20 30 40 50 60 7020

40

60

80

100

120

140

160

180

samples

#iterations

I

I

its

l = 4

0 20 40 60 80 10020

40

60

80

100

120

140

160

samples

#iterations

I

I

its

l = 5

0 50 100 150 20020

40

60

80

100

120

140

160

samples

#iterations

I

I

its

l = 6

0 50 100 150 200 250 30020

40

60

80

100

120

140

160

samples

#iterations

I

I

its

l = 7

0 50 100 150 20020

40

60

80

100

120

140

160

samples

#iterations

I

I

its

l = 8

0 5 10 1535

40

45

50

55

60

65

70

75

80

85

samples

#iterations

I

I

its

l = 9

Figure 6. For Test 1 and for the exponential covariance, comparison of the orderings based on the numberof iterations (red squares), on I (blue circles), and on I (green triangles). Each plot corresponds to a level(l = 4, . . . 9) of the sparse grid generated with the adaptive algorithm.

Page 23: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 109

0 10 20 30 40 50 60 7020

40

60

80

100

120

140

160

180

samples

#iterations

l = 4

0 20 40 60 80 10020

40

60

80

100

120

140

160

samples

#iterations

l = 5

0 50 100 150 20020

40

60

80

100

120

140

160

samples

#iterations

l = 6

0 50 100 150 200 250 30020

40

60

80

100

120

140

160

samples

#iterations

l = 7

0 50 100 150 20020

40

60

80

100

120

140

160

samples

#iterations

l = 8

0 5 10 15

30

40

50

60

70

80

samples

#iterations

l = 9

Figure 7. For Test 1 and for the exponential covariance, the number of iterations of the samples orderedwith the HSFC algorithm within each grid level.

0 10 20 30 40 50 60 700

50

100

150

200

250

300

samples

#iterations

I

I

its

l = 4

0 20 40 60 80 1000

50

100

150

200

250

samples

#iterations

I

I

its

l = 5

0 50 100 150 2000

50

100

150

200

250

samples

#iterations

I

I

its

l = 6

0 100 200 3000

50

100

150

200

250

samples

#iterations

I

I

its

l = 7

0 50 100 150 200 250 3000

50

100

150

200

250

samples

#iterations

I

Iits

l = 8

0 10 20 30 40 5030

40

50

60

70

80

samples

#iterations

I

I

its

l = 9

Figure 8. For Test 1 and for the γ-exponential covariance, comparison of the orderings based on thenumber of iterations (red squares), on I (blue circles), and on I (green triangles). Each plot corresponds to alevel (l = 4, . . . 9) of the sparse grid generated with the adaptive algorithm.

Page 24: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

110 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

0 10 20 30 40 50 60 700

50

100

150

200

250

300

samples

#iterations

l = 4

0 20 40 60 80 1000

50

100

150

200

250

samples

#iterations

l = 5

0 50 100 150 200 250

50

100

150

200

250

samples

#iterations

l = 6

0 100 200 3000

50

100

150

200

250

samples

#iterations

l = 7

0 50 100 150 200 250 3000

50

100

150

200

250

samples

#iterations

l = 8

0 10 20 30 40 5030

40

50

60

70

80

samples

#iterations

l = 9

Figure 9. For Test 1 and for the γ-exponential covariance, the number of iterations of the samples orderedwith the HSFC algorithm within each grid level.

0 10 20 30 40 50 60 700

100

200

300

400

500

samples

#iterations

I

I

its

l = 4

0 20 40 60 80 1000

50

100

150

200

250

300

350

400

450

samples

#iterations

I

Iits

l = 5

0 50 100 150 2000

50

100

150

200

250

300

350

400

450

samples

#iterations

I

I

its

l = 6

0 50 100 150 200 2500

100

200

300

400

500

samples

#iterations

I

I

its

l = 7

0 50 100 150 2000

50

100

150

200

250

300

350

samples

#iterations

I

I

its

l = 8

0 10 20 30 40 5020

40

60

80

100

120

140

samples

#iterations

I

I

its

l = 9

Figure 10. For Test 1 and for the rational quadratic covariance, comparison of the orderings based on thenumber of iterations (red squares), on I (blue circles), and on I (green triangles). Each plot corresponds to alevel (l = 4, . . . 9) of the sparse grid generated with the adaptive algorithm.

Page 25: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 111

0 10 20 30 40 50 60 70

0

100

200

300

400

500

samples

#iterations

l = 40 20 40 60 80 100

0

50

100

150

200

250

300

350

400

samples

#iterations

l = 5

0 50 100 150 2000

50

100

150

200

250

300

350

400

450

samples

#iterations

l = 6

0 50 100 150 200 250

0

100

200

300

400

500

samples

#iterations

l = 70 50 100 150 200

0

50

100

150

200

250

300

350

samples

#iterations

l = 8

0 10 20 30 40 5020

40

60

80

100

120

140

samples

#iterations

l = 9

Figure 11. For Test 1 and for the rational quadratic covariance, the number of iterations of the samplesordered with the HSFC algorithm within each grid level.

5.2. Three-dimensional test case. Due to the better performance of the parameter-basedapproach, we consider that only strategy for computational tests in three-dimensional spa-tial domains, i.e., d = 3. We consider a diagonal diffusion tensor defined as A(x,y) =diag(a(x,y), ay, az) with ay = az = 1, we set N = 4, and we use an exponential covariancefunction, i.e., of type B, in all our three-dimensional experiments. Also, we set δ = 1/4,σ0 =

√300, amin = 1, a = 100, and Γ = [−1, 1]N . We discretize (2.3) using trilinear finite

elements and 3203 mesh cells.The Kokkos [17, 18] and Tpetra [4] packages within Trilinos [30, 31] are used to as-

semble and solve the linear systems for each sample value using hybrid shared-distributedmemory parallelism via OpenMP and MPI. The equations are solved via CG implementedby the Belos package [8] with a linear solver tolerance of 10−7. CG is preconditioned viasmoothed-aggregation AMG as provided by the MueLu package [47]. A second-order Cheby-shev smoother is used at each level of the AMG hierarchy and a sparse-direct solve for thecoarsest grid. The linear system assembly, CG solve, and AMG preconditioner are templatedon the scalar type for the template-based generic programming approach to implement theembedded ensemble propagation as described in section 3, allowing the code to be instantiatedon double for single sample evaluation and the ensemble scalar type provided by Stokhos [46]for ensembles. The calculations are implemented on 128 nodes of the Titan CPU architecture(16 core AMD Opteron processors using 2 MPI ranks and 8 OpenMP threads per MPI rank).For the adaptive grid generation we use TASMANIAN and we consider classic refinement; weset the initial sparse-grid level to l = 1, and ε = 0.003.

Page 26: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

112 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

Table 2For Test 2 (N = 6) and for the covariance functions A–D, values of Rl, l = 4, . . . 9, and R using different

grouping strategies.

cov Strategy S R4 R5 R6 R7 R8 R9 R

A par 8 1.765 1.930 1.898 1.867 1.829 1.737 1.868KL 8 2.433 2.403 2.584 2.457 2.602 2.194 2.488NO 8 2.712 2.526 2.656 2.541 2.801 2.259 2.620par 16 1.826 1.998 1.988 1.918 1.949 1.881 1.944KL 16 2.814 2.732 2.893 2.781 3.089 2.461 2.836NO 16 3.264 2.940 3.001 2.931 3.490 2.714 3.064par 32 2.912 2.116 2.052 1.987 2.096 2.391 2.040KL 32 3.169 2.876 3.134 3.090 3.417 3.128 3.105NO 32 4.296 3.379 3.383 3.329 3.968 3.234 3.572

B par 8 1.578 1.730 1.684 1.609 1.379 1.776 1.630KL 8 1.984 2.071 2.096 2.055 1.661 2.250 2.021NO 8 2.201 2.165 2.150 2.094 1.816 3.329 2.116par 16 1.665 1.788 1.761 1.651 1.475 2.374 1.702KL 16 2.207 2.221 2.327 2.230 1.967 2.927 2.241NO 16 2.638 2.465 2.480 2.385 2.131 3.002 2.449par 32 1.765 1.882 1.810 1.724 1.599 3.900 1.794KL 32 2.568 2.345 2.490 2.347 2.145 3.900 2.412NO 32 3.277 2.893 2.794 2.690 2.385 3.900 2.837

C par 8 1.444 1.660 1.777 1.919 1.970 1.976 1.770KL 8 1.810 2.098 2.262 2.449 2.567 2.458 2.254NO 8 2.033 2.254 2.414 2.490 2.634 2.653 2.380par 16 1.515 1.739 1.872 1.997 2.072 2.299 1.863KL 16 2.031 2.393 2.553 2.765 3.029 3.045 2.571NO 16 2.405 2.685 2.864 2.794 3.184 3.826 2.815par 32 1.642 1.847 2.008 2.099 2.258 2.761 1.999KL 32 2.314 2.692 2.746 2.999 3.441 3.304 2.837NO 32 2.882 3.206 3.419 3.302 3.822 5.134 3.375

D par 8 1.830 1.820 1.837 1.665 1.698 1.787 1.773KL 8 2.339 2.077 2.292 2.197 2.211 2.149 2.219NO 8 2.701 2.307 2.332 2.224 2.327 2.181 2.354par 16 1.918 1.933 1.887 1.719 1.772 1.964 1.848KL 16 2.482 2.214 2.471 2.384 2.576 2.525 2.415NO 16 3.151 2.665 2.661 2.539 2.724 2.516 2.713par 32 2.008 2.013 1.950 1.769 1.869 2.141 1.923KL 32 2.837 2.360 2.612 2.541 2.847 2.680 2.611NO 32 3.628 2.945 3.059 2.853 3.153 2.704 3.079

For S = 4, 8, 16, 32 we report the results of our tests in Table 3. The adaptive algorithmgenerates a sparse grids of size |Y| = 575 after achieving the prescribed error tolerance ε withseven levels of refinement. Table 3 displays the calculated Rl for each level l of the adaptivegrid generation and the final R for the entire sample propagation. The labels “par” and “NO”are defined as in the previous section. The table also displays the measured computationalspeed-up for each ensemble approach, given by the total solve time over all samples for theone-at-a-time sample evaluation divided by the total solve time over all ensembles.

Page 27: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 113

Table 3Computational results for d = 3 and N = 4, displaying Rl for each level l the final R, and the measured

computational speed-up over one-at-a-time sample evaluation, for the parameter-based (“par”) and no ordering(“NO”) methods.

I S R1 R2 R3 R4 R5 R6 R7 R Speed-up

par 4 1.80 1.30 0.99 1.22 1.06 1.12 1.31 1.13 1.79NO 4 1.86 1.35 1.18 1.42 1.34 1.30 1.32 1.32 1.45par 8 2.62 1.32 1.14 1.40 1.32 1.27 1.64 1.33 1.80NO 8 2.62 1.78 1.45 1.73 1.51 1.58 2.18 1.63 1.47par 16 3.10 1.57 1.24 1.54 1.36 1.57 2.90 1.58 1.53NO 16 3.10 1.55 1.79 2.14 1.63 2.03 4.07 2.05 1.24par 32 6.42 1.88 1.98 1.92 1.60 1.76 5.42 2.07 1.32NO 32 6.42 1.74 1.99 2.52 2.10 2.88 7.72 2.86 0.96

6. Conclusion. From the results reported in the previous section we can infer that theparameter-based indicator predicts very well the performance of the PCG solver; this is clearfrom Figures 4, 6, 8, and 10, where the ordering based on the number of iterations and theone based on I are always in very good agreement. Table 1 confirms that the parameter-based strategy is the best way to maximize the performance improvement brought by theembedded ensemble propagation. The performance of I is even more pronounced for N > 3as we can observe from the results in Table 2; here, for all covariance functions, the values ofRl and R corresponding to the parameter-based approach are significantly smaller than thoseobtained with the KL-based approach. Furthermore, the performance of I relative to theother approaches appears to increase as the ensemble size S increases, which is important forpractical applications of the ensemble-based approach since the speed-up generally increaseswith increases in S.

From Figures 4, 6, 8, and 10 we can observe that, for N = 3, I also performs well;however, for some levels, the orderings based on the number of iterations and on I are notin good agreement. This behavior is confirmed by the results in Table 1, where the value ofRl and R are almost as good as those associated with I and significantly smaller than thoseobtained without performing any ordering. On the other hand, as expected, for N = 6 (seeTable 2), the KL-based indicator does not perform well. As mentioned in section 4, the reasonof this poor performance is that it is not trivial to understand how the components of thevector y affect the uncertain parameter for N > 3. However, one has to keep in mind thatwhile the computation of I comes at no cost, computing I comes at a price, i.e., the samplingalgorithm must have access to the computation of the diffusion coefficient.

With respect to the HSFC-based approach, as expected, the algorithm does not performwell, i.e., the number of iterations of samples within the same ensemble spans a large rangeof values, making it worthless to use with the embedded propagation algorithm.

Numerical results on three-dimensional spatial domains are consistent with those of thetwo-dimensional tests, where significant improvements in R are observed over no ordering,particularly for large ensemble sizes. Note, however, that even with the parameter-basedgrouping, R still increases as the ensemble size increases, leading to overall reduced speed-ups.These results demonstrate the importance of a good grouping approach to achieve speed-upwith the ensemble approach for more difficult computational problems.

Page 28: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

114 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

It is important to point out that our strategy, based on the properties of the uncertainparameter, is strongly tied to the specific PDE under consideration and to the AMG solversutilized for its solution. However, anisotropic diffusion problems are the building block ofmathematical models for many engineering and scientific applications and AMG solvers arewidely used for the numerical solution of a diverse class of PDE models. Rigorous studies[42, 54] have identified factors that may affect the convergence properties of such methods;more specifically, it has been demonstrated that their performance can be affected by, e.g.,stretched or irregular meshes or highly anisotropic coefficients and that it varies accordingto the discretization scheme used in the computations (see section 4.1). These studies guidethe choice of effective grouping indicators for the problem at hand. Thus, even though ourempirical studies do not give a universal strategy, they do provide guidelines for designing anefficient grouping strategy based on the knowledge of the entities that affect the convergenceof the numerical solvers.

The indicators introduced in this work are designed to induce an ordering as close aspossible to the one associated with the number of iterations for the single sample. As pointedout, this information is not known a priori; however, within the adaptive algorithm for thegeneration of the sparse grid it is possible to use quantities computed at the previous level topredict the number of iterations for the new samples. Our current work includes the design ofsurrogates for the number of iterations associated with new points in the sample space; suchsurrogates would be used at each step of the adaptive algorithm to order the new sampleson the basis of increasing values of the predicted number of iterations. As an example, thesurrogate may be a sparse-grid interpolant (of any order) updated at each level using thenumber of iterations of the current set of points. One of the challenges of this approach isextracting from the linear solver of the ensemble system the number of iterations associatedwith each sample.

Future work also includes the design of indicators based on the hierarchy of the polynomialbasis. Note that points in the sparse grid can be represented in a tree structure where theparents are the points around which the grid is adaptively refined. As we expect, childrenof the same parent to generate similar uncertain parameters, we plan to keep track of thefamily history throughout the adaptive algorithm and group together samples with the sameancestors.

Finally, we recall that in this work the uncertain parameter a is a smooth function of therandom vector y. However, our ultimate goal is to investigate the performance of our methodwhen the parameters, and hence the solution, have an irregular behavior with respect to therandom vector. Our current work deals with SPDEs whose uncertain parameters presentdiscontinuities in y; preliminary results show that the parameter-based grouping strategyperforms equally well. The latter and the surrogate-based grouping approach are the subjectof a follow-up paper.

REFERENCES

[1] I. Babuska, F. Nobile, and R. Tempone, A stochastic collocation method for elliptic partial differentialequations with random input data, SIAM J. Numer. Anal., 45 (2007), pp. 1005–1034.

[2] I. Babuska, R. Tempone, and G. E. Zouraris, Galerkin finite element approximations of stochasticelliptic partial differential equations, SIAM J. Numer. Anal., 42 (2004), pp. 800–825.

Page 29: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 115

[3] J. Back, F. Nobile, L. Tamellini, and R. Tempone, Stochastic spectral Galerkin and collocationmethods for PDEs with random coefficients: A numerical comparison, in Spectral and High OrderMethods for Partial Differential Equations, Springer, New York, 2011, pp. 43–62.

[4] C. G. Baker and M. A. Heroux, Tpetra, and the use of generic programming in scientific computing,Sci. Program., 20 (2012), pp. 115–128.

[5] A. Barth and A. Lang, Multilevel Monte Carlo method with applications to stochastic partial differentialequations, Int. J. Comput. Math., 89 (2012), pp. 2479–2498.

[6] A. Barth, A. Lang, and C. Schwab, Multilevel Monte Carlo method for parabolic stochastic partialdifferential equations, BIT, 53 (2013), pp. 3–27.

[7] A. Barth, C. Schwab, and N. Zollinger, Multi-level Monte Carlo finite element method for ellipticPDEs with stochastic coefficients, Numer. Math., 119 (2011), pp. 123–161.

[8] E. Bavier, M. Hoemmen, S. Rajamanickam, and H. Thornquist, Amesos2 and Belos: Direct anditerative solvers for large sparse linear systems, Sci. Program., 20 (2012), pp. 241–255.

[9] P. Bochev, H. C. Edwards, R. C. Kirby, K. Peterson, and D. Ridzal, Solving PDEs with Intrepid,Sci. Program., 20 (2012), pp. 151–180.

[10] P. B. Bochev, J. J. Hu, C. M. Siefert, and R. S. Tuminaro, An algebraic multigrid approachbased on a compatible gauge reformulation of Maxwell’s equations, SIAM J. Sci. Comput., 31 (2008),pp. 557–583.

[11] E. Boman, K. Devine, L. A. Fisk, R. Heaphy, B. Hendrickson, C. Vaughan, U. Catalyurek,D. Bozdag, W. Mitchell, and J. Teresco, Zoltan 3.0: Parallel Partitioning, Load-Balancing,and Data Management Services; User’s Guide, Tech. report, Sandia National Laboratories, 2007.

[12] H. J. Bungartz and M. Griebel, Sparse grids, Acta Numer., 13 (2004), pp. 147–269.[13] K. A. Cliffe, M. B. Giles, R. Scheichl, and A. L. Teckentrup, Multilevel Monte Carlo methods

and applications to elliptic PDEs with random coefficients, Comput. Vis. Sci., 14 (2011), pp. 3–15.[14] A. Cohen, R. DeVore, and C. Schwab, Analytic regularity and polynomial approximation of parametric

and stochastic elliptic PDE’s, Anal. Appl., 9 (2011), pp. 11–47.[15] K. Devine, E. Boman, R. Heaphy, B. Hendrickson, and C. Vaughan, Zoltan data management

services for parallel dynamic applications, Computing Sci. Eng., 4 (2002), pp. 90–96.[16] A. Doostan and H. Owhadi, A non-adapted sparse approximation of PDEs with stochastic inputs, J.

Comput. Phys., 230 (2011), pp. 3015–3034.[17] H. C. Edwards, D. Sunderland, V. Porter, C. Amsler, and S. Mish, Manycore performance-

portability: Kokkos multidimensional array library, Sci. Program., 20 (2012), pp. 89–114.[18] H. C. Edwards, C. R. Trott, and D. Sunderland, Kokkos: Enabling manycore performance porta-

bility through polymorphic memory access patterns, J. Parallel Distributed Comput., 74 (2014),pp. 3202–3216.

[19] G. S. Fishman, Monte Carlo: Concepts, Algorithms, and Applications, Springer Ser. Oper. Res., Springer,New York, 1996.

[20] P. Frauenfelder, C. Schwab, and R. A. Todor, Finite elements for elliptic problems with stochasticcoefficients, Comput. Methods Appl. Mech. Engrg., 194 (2005), pp. 205–228.

[21] D. Galindo, P. Jantsch, C. G. Webster, and G. Zhang, Accelerating Stochastic Collocation Methodsfor PDEs with Random Input Data, Tech. Report TM–2015/219, Oak Ridge National Laboratory,2015.

[22] B. Ganapathysubramanian and N. Zabaras, Sparse grid collocation schemes for stochastic naturalconvection problems, J. Comput. Phys., 225 (2007), pp. 652–685.

[23] R. G. Ghanem and P. D. Spanos, Polynomial chaos in stochastic finite elements, J. Appl. Mech., 57(1990).

[24] R. G. Ghanem and P. D. Spanos, Stochastic Finite Elements: A Spectral Approach, Springer, NewYork, 1991.

[25] M. B. Giles, Multilevel Monte Carlo path simulation, Oper. Res., 56 (2008), pp. 607–617.[26] M. Griebel, Adaptive sparse grid multilevel methods for elliptic PDEs based on finite differences, Com-

puting, 61 (1998), pp. 151–179.[27] M. Gunzburger, C. G. Webster, and G. Zhang, An adaptive wavelet stochastic collocation method

for irregular solutions of partial differential equations with random input data, in Sparse Grids andApplications, Springer, New York, 2014, pp. 137–170.

Page 30: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

116 D’ELIA, EDWARDS, HU, PHIPPS, AND RAJAMANICKAM

[28] M. D. Gunzburger, C. G. Webster, and G. Zhang, Stochastic finite element methods for partialdifferential equations with random input data, Acta Numer., 23 (2014), pp. 521–650.

[29] J. C. Helton and F. J. Davis, Latin hypercube sampling and the propagation of uncertainty in analysesof complex systems, Reliability Engineering System Safety, 81 (2003), pp. 23–69.

[30] M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B.Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist,R. S. Tuminaro, J. M. Willenbring, A. B. Williams, and K. S. Stanley, An overview of theTrilinos package, ACM Trans. Math. Softw., 31 (2005).

[31] M. A. Heroux and J. M. Willenbring, A new overview of the Trilinos project, Sci. Program., 20(2012), pp. 83–88.

[32] P. T. Lin, J. N. Shadid, R. S. Tuminaro, M. Sala, G. L. Hennigan, and R. P. Pawlowski, A par-allel fully coupled algebraic multilevel preconditioner applied to multiphysics PDE applications: Drift-diffusion, flow/transport/reaction, resistive MHD, Internat. J. Numer. Methods Fluids, 64 (2010),pp. 1148–1179.

[33] M. Loeve, Probability Theory I, 4th ed., Grad. Texts in Math. 45, Springer, New York, 1977.[34] M. Loeve, Probability Theory II, 4th ed., Grad. Texts in Math. 46, Springer, New York, 1978.[35] X. Ma and N. Zabaras, An adaptive hierarchical sparse grid collocation algorithm for the solution of

stochastic differential equations, J. Comput. Phys., 228 (2009), pp. 3084–3113.[36] L. Mathelin and K. A. Gallivan, A compressed sensing approach for partial differential equations with

random input data, Commun. Comput. Phys., 12 (2012), pp. 919–954.[37] M. D. McKay, R. J. Beckman, and W. J. Conover, A comparison of three methods for selecting

values of input variables in the analysis of output from a computer code, Technometrics, 21 (1979),pp. 239–245.

[38] N. Metropolis and S. Ulam, The Monte Carlo method, J. Amer. Statist. Assoc., 44 (1949), pp. 335–341.[39] H. Niederreiter, Quasi-Monte Carlo methods and pseudo-random numbers, Bull. Amer. Math. Soc., 84

(1978), pp. 957–1041.[40] F. Nobile, R. Tempone, and C. G. Webster, An anisotropic sparse grid stochastic collocation

method for partial differential equations with random input data, SIAM J. Numer. Anal., 46 (2008),pp. 2411–2442.

[41] F. Nobile, R. Tempone, and C. G. Webster, A sparse grid stochastic collocation method for partialdifferential equations with random input data, SIAM J. Numer. Anal., 46 (2008), pp. 2309–2345.

[42] L. N. Olson, J. Schroder, and R. S. Tuminaro, A new perspective on strength measures in algebraicmultigrid, Numer. Linear Algebra Appl., 17 (2010), pp. 713–733.

[43] R. P. Pawlowski, E. T. Phipps, and A. G. Salinger, Automating embedded analysis capabilities andmanaging software complexity in multiphysics simulation, Part I: Template-based generic program-ming, Sci. Program., 20 (2012), pp. 197–219.

[44] R. P. Pawlowski, E. T. Phipps, A. G. Salinger, S. J. Owen, C. M. Siefert, and M. L. Staten,Automating embedded analysis capabilities and managing software complexity in multiphysics simula-tion Part II: Application to partial differential equations, Sci. Program., 20 (2012), pp. 327–345.

[45] E. Phipps, M. D’Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, Embeddedensemble propagation for improving performance, portability, and scalability of uncertainty quantifi-cation on emerging computational architectures, SIAM J. Sci. Comput., 39 (2017), pp. C162–C193.

[46] E. T. Phipps, Stokhos Stochastic Galerkin Uncertainty Quantification Methods, http://trilinos.org/packages/stokhos (2015).

[47] A. Prokopenko, J. J. Hu, T. A. Wiesner, C. M. Siefert, and R. S. Tuminaro, MueLu User’sGuide 1.0, Tech. Report SAND2014-18874, Sandia National Laboratories, 2014.

[48] L. J. Roman and M. Sarkis, Stochastic Galerkin method for elliptic SPDEs: A white noise approach,Discrete Contin. Dyn. Syst. Ser. B, 6 (2006), pp. 941–955.

[49] M. Sala and R. Tuminaro, A new PetrovGalerkin smoothed aggregation preconditioner for nonsym-metric linear systems, SIAM J. Sci. Comput., 31 (2008), pp. 143–166.

[50] S. A. Smolyak, Quadrature and interpolation formulas for tensor products of certain classes of functions,Dokl. Akad. Nauk SSSR, 4 (1963), pp. 240–243.

[51] M. Stoyanov, Hierarchy–Direction Selective Approach for Locally Adaptive Sparse Grids, Tech. ReportTM–2013/384, Oak Ridge National Laboratory, 2013.

Page 31: Ensemble Grouping Strategies for Embedded Stochastic ...pdfs.semanticscholar.org/0fb1/00da17adc37de66a926d764da4...Received by the editors March 17, 2016; accepted for publication

ENSEMBLE GROUPING FOR COLLOCATION METHODS 117

[52] M. Stoyanov and C. G. Webster, A Dynamically Adaptive Sparse Grid Method for Quasi-Optimal In-terpolation of Multidimensional Analytic Functions, Tech. Report TM–2015/341, Oak Ridge NationalLaboratory, 2015.

[53] U. Trottenberg, C. Oosterlee, and A. Schuller, Multigrid, Academic Press, New York, 2000.[54] P. Vanek, J. Mandel, and M. Brezina, Algebraic multigrid based on smoothed aggregation for second

and fourth order problems, Computing, 56 (1996), pp. 179–196.[55] D. B. Xiu and J. S. Hesthaven, High-order collocation methods for differential equations with random

inputs, SIAM J. Sci. Comput., 27 (2005), pp. 1118–1139.[56] D. B. Xiu and G. E. Karniadakis, The Wiener-Askey polynomial chaos for stochastic differential

equations, SIAM J. Sci. Comput., 24 (2002), pp. 619–644.[57] A. C. Yucel, H. Bagci, S. Hesthaven, and E. Michielssen, A fast stroud-based collocation method

for statistically characterizing EMI/EMC phenomena on complex platforms, IEEE Trans. Electro-magnetic Compatibility, 51 (2009), pp. 301–311.