Two-dimensional frequency-domain visco-elastic full waveform inversion … · 2013-10-22 · Full...

Computers & Geosciences 37 (2011) 444–455

Contents lists available at ScienceDirect

Computers & Geosciences

0098-30

doi:10.1

E-m1 N

Joseph F

journal homepage: www.elsevier.com/locate/cageo

Two-dimensional frequency-domain visco-elastic full waveform inversion:Parallel algorithms, optimization and performance

R. Brossier 1

Geoazur, Universite de Nice-Sophia Antipolis - CNRS, 250 rue Albert Einstein, 06560 Valbonne, France

a r t i c l e i n f o

Article history:

Received 21 August 2009

Received in revised form

17 September 2010

Accepted 21 September 2010Available online 10 November 2010

Keywords:

Seismic wave modelling

Finite element discontinuous Galerkin

Multi-parameter seismic imaging

Quasi-Newton optimization

Massively parallel computing

04/$ - see front matter & 2010 Elsevier Ltd. A

016/j.cageo.2010.09.013

ail address: [email protected]

ow at Laboratoire de Geophysique Interne et

ourier, Grenoble, France.

a b s t r a c t

Full waveform inversion (FWI) is an appealing seismic data-fitting procedure for the derivation of high-

resolution quantitative models of the subsurface at various scales. Full modelling and inversion of visco-

elastic waves from multiple seismic sources allow for the recovering of different physical parameters,

although they remain computationally challenging tasks. An efficient massively parallel, frequency-

domain FWI algorithm is implemented here on large-scale distributed-memory platforms for imaging

two-dimensional visco-elastic media. The resolution of the elastodynamic equations, as the forward

problem of the inversion, is performed in the frequency domain on unstructured triangular meshes, using

a low-order finite element discontinuous Galerkin method. The linear system resulting from discretiza-

tion of the forward problem is solved with a parallel direct solver. The inverse problem, which is presented

as a non-linear local optimization problem, is solved in parallel with a quasi-Newton method, and this

allows for reliable estimation of multiple classes of visco-elastic parameters. Two levels of parallelism are

implemented in the algorithm, based on message passing interfaces and multi-threading, for optimal use

of computational time and the core-memory resources available on modern distributed-memory multi-

core computational platforms. The algorithm allows for imaging of realistic targets at various scales,

ranging from near-surface geotechnic applications to crustal-scale exploration.

& 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Quantitative imaging of the subsurface is essential for geotechnicand civil engineering applications, for seismic hazard and naturalresources exploration, and for our geodynamics knowledge. Thephysical properties of the subsurface are often estimated throughseismic-wave analysis, with wave travel-times generally used toperform tomography of the Earth interior. In the 1980s, the fullwaveform inversion (FWI) method was introduced by Tarantola(1984) to exploit the complete information contained in the wave-forms of seismograms, and to infer high-resolution models of thesubsurface. FWI was originally developed in the time-domain(Tarantola, 1984, 1987), and it has become tractable and popularin the frequency-domain since the pioneering work of Pratt andWorthington (1990) and Pratt (1990). FWI is currently a field ofactive research, and particularly for active seismic surveys at variousscales. Computationally efficient frequency-domain FWI wasdesigned by limiting the inversion to a few discrete frequencies,taking advantage of the redundant wavenumber cover provided bywide-aperture surveys (Sirgue and Pratt, 2004; Brenders and Pratt,

ll rights reserved.

.fr

Tectonophysique, Universite

2007a). Frequency-domain FWI potentially provides high-resolu-tion quantitative images of physical parameters, although it suffersfrom two main difficulties.

The first of these difficulties relates to the computational cost ofthe forward problem; namely, the numerical resolution of the waveequation in heterogeneous media for multiple sources. In twodimensions, the forward problem can be solved efficiently in thefrequency-domain (Marfurt, 1984), leading to the resolution of alinear system per frequency, the right-hand side of which is thesource. This system is solved efficiently for multiple sources with adirect solver, which performs one LU factorization of the matrix,followed by a forward and a backward substitution per source. Forthe acoustic wave equation, optimal finite-difference (FD) schemeshave been designed for optimal use of the CPU core-memory whenused with direct solvers (Jo et al., 1996; Hustedt et al., 2004).However, these optimal schemes are difficult to design for theelastic wave equation, because the numerical scheme directlydepends on the Poisson ratio of the medium, which leads tostability issues when considering free-surface boundary conditionsand liquid/solid interfaces (Stekl and Pratt, 1998). Moreover, FDsuffers from the usual limitation linked to Cartesian regular grids:the grid interval is defined by the smallest propagated wavelengthto avoid numerical dispersion, or by complex interface geometries,which require fine discretization to avoid parasite diffractions fromthe corners of the Cartesian grid. These two constraints generally

www.elsevier.com/locate/cageo

dx.doi.org/10.1016/j.cageo.2010.09.013

mailto:[email protected]

dx.doi.org/10.1029/2005JB003835

R. Brossier / Computers & Geosciences 37 (2011) 444–455 445

lead to local oversampling of the medium, and therefore a waste ofcomputational resources (Saenger and Bohlen, 2004; Bohlen andSaenger, 2006). To deal with non-regular grids, finite element (FE)methods can be considered with either low interpolation orders(Marfurt, 1984; Bielak et al., 2003), or high interpolation orders,using the so-called spectral element method (Komatitsch andVilotte, 1998; Chaljub et al., 2003). These methods require, however,explicit boundary conditions for modelling the water/solid contact,and are generally developed for quadrangle grids, which leads tomeshing issues. Another alternative is the finite element discontin-uous Galerkin (DG) method (Kaser and Dumbser, 2006; Dumbserand Kaser, 2006; Brossier et al., 2008). This DG method allows for theuse of triangular/tetrahedral meshes, and it is suitable for thehandling of strong physical contrasts in the medium, includingliquid/solid contact. The DG method is chosen in the present study toperform the FWI forward problem in the frequency-domain.

The second difficulty of FWI is related to the ill-posedness andnonlinearity of the inverse problem, which is generally formulated asa least-squares local optimization (Tarantola, 1987; Pratt andWorthington, 1990), so as to manage the numerical cost of theforward problem. The ill-posedness of FWI mainly arises from the lackof low frequencies in the source bandwidth and the incompleteillumination of the subsurface provided by conventional seismicsurveys. Consequently, the problem is highly non-linear and theresults strongly depend on the accuracy of the starting model in theframework of local optimization and on the presence of noise. Severalhierarchical multi-scale strategies that proceed from low frequenciesto higher frequencies have been proposed to mitigate the nonlinearityof the inverse problem (Pratt and Worthington, 1990; Bunks et al.,1995; Sirgue and Pratt, 2004; Brossier et al., 2009a). Subsets of dataand parameter classes can also be hierarchically inverted to mitigatethe nonlinearity and nonuniqueness issues in the framework of elasticinversion (Sears et al., 2008; Brossier et al., 2009b).

To tackle targets of realistic size and complexity with FWI, efficientalgorithms must be implemented for modern computational plat-forms for both the forward and the inverse problems. An example ofsuch algorithms is presented in Sourbier et al. (2009a), who imple-mented a parallel FWI algorithm for large-scale distributed-memoryplatforms in the framework of the two-dimensional (2D) acousticmonoparameter approximation, with a FD forward problem andsteepest-descent-based optimization scheme.

The present study presents the implementation of a massivelyparallel frequency-domain FWI algorithm for imaging 2D visco-elastic media. The second section will focus on the forward problemresolution, as performed with a low-order DG method. In thethird section, the inversion algorithm problem is presented, withspecial emphasis on the reconstruction of multi-parameter classesfrom vectorial wavefields using a quasi-Newton method. The lastsection will review the parallel implementation of the algorithm,with an analysis of its performance on a distributed-memoryplatform.

2. Forward problem

2.1. Theory and discretization

Two-dimensional, elastic, frequency-domain FWI requires thefrequency-domain solutions of the 2D P-SV wave equations forheterogeneous media and multiple sources. We consider the first-order hyperbolic system where both particle velocities (Vx,Vz) andstresses ðsxx,szz,sxzÞ are unknown quantities, as described by thesystem:

�ioVx ¼1

rðxÞ@sxx

@xþ@sxz

@z

� �þ fx

�ioVz ¼1

rðxÞ@sxz

@xþ@szz

@z

� �þ fz

�iosxx ¼ ðlðxÞþ2mðxÞÞ @Vx

@xþlðxÞ

@Vz

@z�iosxx0

�ioszz ¼ lðxÞ@Vx

@xþðlðxÞþ2mðxÞÞ @Vz

@z�ioszz0

�iosxz ¼ mðxÞ@Vx

@zþ@Vz

@x

� ��iosxz0

, ð1Þ

where l andm are the Lame coefficients,r is the density, ando is theangular frequency. Source terms are either point forces (fx, fz) orapplied stresses ðsxx0

,szz0,sxz0Þ. The symbol i denotes the purely

imaginary term defined by i¼ffiffiffiffiffiffiffi�1p

. Only isotropic media areconsidered in this analysis. The intrinsic attenuation of the mediumis easily taken into account in the frequency-domain using complex-valued wave velocities. In the present study, the Kolsky–Futtermanmodel (Kolsky, 1956; Futterman, 1962) without the dispersion termis used (Toksoz and Johnston, 1981). Please note, however, that thedispersion term could be included without extra cost in theformulation.

A change in variable is applied to the system of Eqs. (1) to develop apseudo-conservative formulation that is useful for the derivation of theDG formulation. We now consider the following vector with threecomponents ðT1,T2,T3Þ ¼ ððsxxþszzÞ=2,ðsxx�szzÞ=2, sxzÞ. Note that,when considering this variable change, T1 represents the hydrostaticpressure measured in liquids. In the case of liquid/solid propagation,such as in marine seismic experiments, the T1 variable provides directaccess to the pressure measured by hydrophones, while T2 and T3 arezero in liquid, relative to the deviatoric part of the stress tensor.

Moreover, we must consider a finite domain, and therefore, weapply perfectly matched layer (PML) absorbing conditions (Berenger,1994) through the functions sx, sz, defined in the direction r assr ¼ 1=ð1þ igr=oÞ, where gr is zero inside the computational domainand nonzero inside the absorbing layers. The new differential systemthat includes the PML functions, which is equivalent to the system ofEqs. (1), can be written as

�iorVx ¼ sx@ðT1þT2Þ

@xþsz

@T3

@zþrfx

�iorVz ¼ sx@T3

@xþsz

@ðT1�T2Þ

@zþrfz

�ioT1

lþm ¼ sx@Vx

@xþsz

@Vz

@z�

ioT01

lþm

�ioT2

m¼ sx

@Vx

@x�sz

@Vz

@z�

ioT02

m

�ioT3

m ¼ sx@Vz

@xþsz

@Vx

@z�

ioT03

m : ð2Þ

To derive the DG formulation, the model is first discretized withpolygonal cells. For each cell, the system of Eqs. (2) is multiplied by atest function, corresponding to a k th-order polynomial. The testfunction is nonzero only in the polygonal cell that ensures the discon-tinuous property of the scheme. In the present study, the nodal formu-lation of DG method is used (Hesthaven and Warburton, 2008), basedon Lagrange polynomials that can be easily defined and integrated ontriangle shapes. The system of Eqs. (2) is then integrated over each cell,which leads to the so-called weak formulation of the system. Weassume that the physical properties of the medium are constant insideeach cell, and we use centred numerical fluxes to exchange energybetween cells (Remaki, 2000). The reader is referred to Kaser andDumbser (2006), Dumbser and Kaser (2006), Brossier et al. (2008),

R. Brossier / Computers & Geosciences 37 (2011) 444–455446

Etienne et al. (2010) for a detailed description of the DG formulationapplied to the elastodynamic equations.

Discretization of Eqs. (2) with the DG method leads to a linearsystem for each frequency considered:

Au¼ s, ð3Þ

where A is the sparse impedance matrix, which is also known as theforward problem operator, and which depends on the frequency,the mesh geometry, the DG interpolation order of each cell, and thephysical properties. The right-hand side s represents the externalexcitation. The unknown vector u contains the particle velocity andthe stress wavefields in the entire computational domain, and isrepresented by a local polynomial inside each cell. In the presentstudy, the system of Eqs. (3) is solved with the massively paralleldirect solver MUMPS (Amestoy et al., 2006). More details on thisalgorithm and its performance will be discussed in Section 4.1.

Note that visco-acoustic modelling and inversion can be per-formed using the elastic algorithm: the fourth and fifth lines ofsystem 2 are not considered in the forward problem; the T2 and T3

wavefields are forced to zero; and VS is set to zero.

2.2. Which interpolation?

The theoretical resolution of the FWI is l=2 if normal-incidencedata are recorded, where l is the local propagated wavelength(Sirgue and Pratt, 2004). A suitable discretization for the inversionmesh should therefore be close to l=4 to ensure a correct samplingof the medium. Since the physical properties are assumed to beconstant inside each cell, a suitable discretization of the forwardproblem based on the same mesh as that of the inversion shouldalso be close to l=4 (Sourbier et al., 2009a).

DG methods generally take advantage of high orders of interpola-tion to coarsely discretize the medium and to obtain a high order ofaccuracy (Kaser and Dumbser, 2006; Dumbser and Kaser, 2006). Incontrast, I will focus here on low interpolation orders for FWIapplications because these low orders of interpolation will providethe best compromise between accuracy and computational cost of theforward problem for the discretization imposed by the FWI resolution.

The lowest interpolation order, P0 (Brossier et al., 2008), is apossible choice that represents the wavefield by a series of piece-wise constant functions inside each cell, and it gives sufficientlyaccurate solutions for 10–15 cells per wavelength in structuredequilateral triangular meshes. P0 interpolation cannot, however, beused on unstructured meshes, where it does not converge to thecorrect solution, whatever the size of the cell. Moreover, the liquid/solid interface causes some numerical instabilities for structuredmeshes, and the P0 interpolation cannot, therefore, be used in this

Tim

e (s

)

1000 2000 3000 4000Distance (m)

00

1

2

3

Fig. 1. Time-domain comparison between DG P1 (solid lines) and analytical (dashed lin

(b) the vertical components of the particle velocity at the free surface.

case. Finally, the P0 scheme can be related to the finite volume P0

scheme, and is very close to OðDx2Þ FD schemes.The higher orders of interpolation, P1 and P2, allow for the use of

unstructured meshes, which is an appealing feature for dealing withcomplex topographies and heterogeneous media, which shouldrequire a local adaptation of the mesh size to the local properties(h-adaptive meshes). Sufficiently accurate solutions can be obtainedin unstructured meshes with discretization of 10–15 cells perwavelength for P1, and three to three and a half cells per wavelengthfor P2. Note that for inversion, P2 should be oversampled with fourcells per wavelength to correctly sample the medium and mitigateerrors in the gradient building (see Section 3.3).

Seismograms computed with the DG P1 and DG P2 in ahomogeneous halfspace are compared with the analytical solutionof the Garvin (1956) problem (Figs. 1 and 2). The source is anexplosion at depth and the receivers are on the free surface. Thecomparison is performed in the time-domain, from DG solutionscomputed in the frequency-domain. To assess the accuracy of thesurface wave and of the direct wave, I have here used a discretiza-tion of 12 cells and three cells per minimum wavelength inunstructured triangular meshes for P1 and P2 interpolations,respectively. The comparison between the DG and analyticalseismograms shows that sufficiently accurate solutions for bothwave types are achieved for these discretizations. For the Garvintest computed in the frequency-domain at 5 Hz, Fig. 3 reviews theaccuracy of the three interpolation orders as functions of themesh size.

The P2 interpolation provides the best consistency between thediscretizations required by the forward problem to computesufficiently accurate solutions (Figs. 2 and 3) and the spatialresolution of FWI (i.e., the mesh interval corresponding to a quarterof a wavelength), considering my formulation based on constantproperties per cell. Higher orders of interpolation would allow oneto use coarser meshes for the same level of accuracy, but thesecoarse meshes would lead to an undersampling of the medium forFWI. Therefore, high orders of interpolation should be applied onunnecessarily fine meshes, at the expense of the computationalefficiency, and should, therefore, be rejected for FWI applications.As I will illustrate later, lower orders of interpolation (i.e., P0 and P1)can, however, be useful as an alternative to P2, depending on themedium properties. Please note that, if heterogeneous propertieswere considered in cells, higher orders of interpolation could leadto more computationally-efficient modelling engines for FWI. Thisapproach has not been investigated so far.

For computational efficiency, an interesting feature of the DGmethod is the possibility of mixing the interpolation orders(p-adaptivity), because the degrees of freedom are not shared by

Tim

e (s

)

1000 2000 3000 4000Distance (m)

00

1

2

3

es) solutions for the Garvin test. The seismograms represent: (a) the horizontal and

Tim

e (s

)

1000 2000 3000 4000Distance (m)

00

1

2

3

Tim

e (s

)

1000 2000 3000 4000Distance (m)

00

1

2

3

Fig. 2. Time-domain comparison between DG P2 (solid lines) and analytical (dashed lines) solutions for the Garvin test. The seismograms represent: (a) the horizontal and

(b) the vertical components of the particle velocity at the free surface.

0.001

0.01

0.1

1

5 10 15 20

rela

tive

L2 e

rror

Discretization size (average cell edge size/wavelength)

P0 structuredP1 unstructuredP2 unstructured

Fig. 3. Relative L2 error ðJuref�uJ2=Juref J2Þ at the free surface between the analytical

solution (uref) and the numerical solutions (u) computed with the DG P0, P1 and P2

interpolation orders as a function of the mesh size. The relative L2 error is computed

for the 5 Hz frequency, and for the horizontal and vertical geophone components.


neighbouring cells in the DG formalism. An efficient mix is seen inP0–P1, because both interpolations share the same discretizationrule (i.e., 10 cells per wavelength), and therefore, they facilitatethe meshing at the transition zones between the P0 and P1

interpolations.Table 1 summarizes the computational time and memory

requirements for the DG method for two simple models, wherethe P0, P1, P0–P1 and P2 orders of interpolation are used. These have,respectively, a discretization of: 10 cells per minimum wavelengthin a structured equilateral mesh; 10 cells per local wavelength in anunstructured h-adaptive mesh; 10 cells per local wavelength with astructured/unstructured hp-adaptive mesh; and three cells perlocal wavelength in an unstructured h-adaptive mesh. It is worthremembering that these four combinations of interpolation orderand discretization give similar accuracies (Fig. 3). Simulations areperformed in parallel using 16 processors. The first model is ahomogeneous half space, and the second one is a two-layer halfspace. The first simulation clearly shows that the P0 interpolationon the regular mesh provides the best time and memory perfor-mances when the model contains a limited range of velocity. Inthe second model, the thickness of the upper layer is one tenth ofthe thickness of the half space, and the velocity in the upper layer isone tenth of the velocity of the lower layer. The upper layer mimicsa complex weathering zone in the near surface, which requires avery fine mesh and, therefore, a dramatic increase in the number ofcells when regular structured meshes are used (as it should also be

with FD). The P0–P1 mesh uses h-adaptive unstructured cells (P1) inthe upper layer and structured equilateral cells (P0) in the lowerlayer. In this case, the P2 interpolation on the h-adaptive unstruc-tured mesh provides the best time and memory performances, dueto the local refinement of the mesh (Table 1). These two extremecanonical cases illustrate how the heterogeneity and the velocityrange of the subsurface drive the choice of the mesh type andinterpolation order. In the case of a broad velocity bandwidth in themodel, P2 interpolation used with h-adaptive unstructured meshesshould give the best parameterization. In the case of a narrowvelocity bandwidth, the P0 or the P0–P1 interpolations should bemore relevant. If a flat free surface is present without liquid, the P0

interpolation can be used, while the P0–P1 interpolation should beused when the model contains a liquid/solid interface in a shallowwater environment or a complex topography above a shallowweathered layer. For example, a layer of unstructured cellscomputed with P1 allows discretization of a complex topographyor water/solid interface of arbitrary shape, above a P0 regularequilateral mesh, discretizing a more homogeneous medium. TheP0–P1 approach is close in spirit to the hybrid FE/FD methodproposed by Moczo et al. (1997). We can, however, note that theP1 interpolation used alone never appears to be the optimal choice.

Figs. 4–6 further illustrate the mesh design for three realisticoffshore and onshore models. The first represents a velocity modelof the Valhall gas field in a shallow-water environment in the NorthSea (Munns, 1985). The sea bed is at 70-m depth. The narrowvelocity range in most parts of the model requires the use of aregular mesh as much as possible for computational efficiency. Tocorrectly discretize the shallow-water layer, a p-adaptivity imple-mented with a mixed P0–P1 interpolation is chosen: a refinedunstructured P1 layer of cells is used for the first 130 m of thesubsurface for accurate modelling of the interface waves at theliquid/solid interface, and for accurate positioning of the sources,which are located 5 m below the surface, and of the receivers,which are on the sea bed. The second example represents anonshore foothill model that contains a complex topography, severaldipping structures, and a thin weathered layer at the surface(VP¼700 m/s, VS¼350 m/s): a dramatic configuration for regulargrid-based methods. An h-adaptive unstructured mesh that issuitable for the P2 interpolation order was designed for accuratemodelling of the free-surface effects of the topography, and for thelocal adaptation of the cell size to the local wave speeds. The thirdexample corresponds to a crustal-scale model, which representsthe Nankai subduction zone, offshore of Japan (Operto et al., 2006).We use a P2 h-adaptive unstructured mesh to deal with the broadvelocity bandwidth spanned by the crustal model. For FWIapplications in such kind of environments, the hp-adaptive

Table 1Computational resources required for the forward problem solved with DGs P0, P1, P0–P1 and P2, in two simple cases, on 16 processors.

Test Resource P0 P1 P0–P1 P2

Homogeneous Cell numbers 113 097 136 724 116 363 12 222

Degrees of freedom 565 485 2 050 860 695 795 366 660

Time for factorization (s) 2.1 59.2 4.1 9.4

Memory for factorization (Gb) 2.39 19.80 3.09 4.57

Time to solve 116 RHS (s) 6.2 28.3 8.1 6.1

Two-layers Cell numbers 2 804 850 291 577 247 303 32 664

Degrees of freedom 14 024 250 4 373 655 2 360 405 979 920

Time for factorization (s) 333.2 109.5 31.8 22.1

Memory for factorization (Gb) 80.50 33.26 14.37 10.84

Time to solve 116 RHS (s) 190.4 72.4 30.2 16.2

Fig. 4. The synthetic Valhall P-wave velocity model representative of oil and gas shallow water targets, and a close-up of the mesh used for mixed DG P0–P1 simulations.

Fig. 5. The synthetic IFP/TOTAL Foothill P-wave velocity model representative of onshore targets with complex topographies and weathering zones, and a close-up of the mesh

for DG P2 simulations.

Fig. 6. The crustal P-wave velocity model representative of the Japanese Nankai trough, and a close-up of the h-adaptive unstructured mesh for DG P2 simulations.


strategies just mentioned can be used during the forward model-ling. The unstructured mesh is designed before inversion accordingto the velocity distribution in the starting model. As the velocitymodel is updated over FWI iterations, a quality control of the meshcan be necessary to check that the discretization criterion of theforward problem is still satisfied. If not, a remeshing can benecessary to locally refine the mesh according to the velocityupdate. This quality control and the remeshing are not automatedso far and require human intervention. The meshing and remeshingsteps are performed with Triangle software (Shewchuk, 1996) thatallows a local control of cells size.

3. Inverse problem

3.1. Theory review

We start with a brief review of FWI. A more extensive overview ispresented in Virieux and Operto (2009). FWI is an optimizationproblem that is generally recast as a local non-linear least-squaresproblem that attempts to minimize the misfit between the recordedand the modelled wavefields (Tarantola, 1987). The inverse problemcan be formulated in the frequency-domain (Pratt and Worthington,1990), and the associated objective function to be minimized is


defined by

CðmÞ ¼Xnf

if ¼ 1

Xns

is ¼ 1

1

2DdðmÞtWdDdðmÞ, ð4Þ

whereDdðmÞ ¼ dobs�dcalcðmÞ is the data misfit vector, the differencebetween the observed data dobs and the modelled data dcalc(m)computed in the model m updated at the iteration n of the inversion.dcalc(m) is obtained by applying a sampling operator S to the incidentwavefield u. Superscripts t and � indicate the transposed and theconjugate, respectively, and Wd is a diagonal weighting matrix that isapplied to the misfit vector to scale the relative contributions of eachof its components. The summations in Eq. (4) are performed over thens sources and a group of nf simultaneously inverted frequencies.

With the objective function assumed to be locally parabolic, itsminimization with the Newton method provides the followingexpression of the perturbation model dmðnÞ:

BðnÞdmðnÞ ¼ �GðnÞ, ð5Þ

where B(n) and GðnÞ denote the Hessian matrix and the gradient ofthe objective function at iteration n, respectively.

The gradient G of the objective function with respect to themodel parameters m¼ fmigi ¼ 1,N , where N denotes the number ofunknowns, can be derived from the adjoint-state formalism usingthe back-propagation technique (Plessix, 2006). The adjoint wave-field l can therefore be computed as

Al ¼ StWdDd, ð6Þ

considering that the matrix A is symmetric in the domain ofinterest (outside the PML) thanks to the reciprocity of Greenfunctions. This gives for the ith component of the gradient G:

Gmi¼�

Xnf

if ¼ 1

Xns

is ¼ 1

R ut @At

@mil

( ), ð7Þ

where R denotes the real part of a complex number. The gradient iscomputed by a zero-lag correlation in time, between the incidentwavefield u from the source, and the adjoint back-propagatedwavefield l, using residuals at receiver positions as a compositesource. Therefore, only two forward problems per shot are requiredfor gradient building. The radiation pattern of the diffraction by themodel parameter mi is denoted by the sparse matrix @A=@mi.

Due to the computational cost of building the Hessian operatorand solving Eq. (5), Newton and Gauss–Newton methods aregenerally not considered for realistically sized problems (Prattet al., 1998). Steepest-descent or conjugate-gradient methods thatare preconditioned by the diagonal terms of an approximateHessian are more conventionally used to compute dmðnÞ (Prattet al., 1998; Operto et al., 2006; Shin et al., 2001).

The model is then updated with an optimal step-length aðnÞ,computed by a linear inversion (Tarantola, 1987) or by parabolafitting (Sourbier et al., 2009a)

mðnþ1Þ ¼mðnÞ þaðnÞdmðnÞ: ð8Þ

3.2. Inversion algorithm

CPU-efficient frequency-domain FWI is generally carried out bysuccessive inversions of single frequencies, by proceeding from thelow frequencies to the higher ones (Pratt and Worthington, 1990;Sirgue and Pratt, 2004). This defines a multiresolution frameworkthat mitigates the nonlinearity of the inverse problem associatedwith high frequency cycle-skipping artefacts. CPU-efficient algo-rithms can be designed by selecting a few coarsely sampledfrequencies, such that the wavenumber redundancy, which resultsfrom dense sampling of frequencies and aperture angles, isdecimated. This strategy has proven to be effective for several

applications of acoustic FWI (Ravaut et al., 2004; Operto et al.,2006; Brenders and Pratt, 2007a). However, it might lack robust-ness when complex wave phenomena are present, such as P-to-Sconversions, multiples and surface waves.

A two-level hierarchical procedure (Algorithm 1) to mitigate theFWI nonlinearities in the case of elastic onshore problems wasproposed by Brossier et al. (2009a). This algorithm embeds threemain loops:

1.
The outer loop is over the frequency groups; namely, a set offrequencies that are simultaneously inverted. In the case ofcomplex wave phenomena, the use of simultaneous frequenciesin the inversion procedure constrains better the optimizationfor convergence towards the global minimum, taking intoaccount the more redundant information contained in thesimultaneous frequencies.
2.
The second loop is over the time-damping factors that controlthe amount of information over time that is preserved in theseismograms for inversion. Time-damping is performed infrequency-domain modelling through the use of complex-valued frequencies, which is equivalent to damp seismogramsin time by an exponential decay (Shin et al., 2002; Brenders andPratt, 2007b). Moreover, this data preconditioning provides asignificant improvement in the signal-to-noise ratio with realdata (Brenders et al., 2009).
3.
The third loop is over iterations of the non-linear problem forone frequency group inversion.
Algorithm 1. Two-level hierarchical frequency-domain FWIalgorithm

1:
for ig ¼ 1 to ng do 2: for id¼1 to nd do 3: while (NOT convergence AND nonmax) 4: for if¼1 to nf do 5: Compute incident wavefields u from sources 6: Compute residual vectors Dd and cost function C 7: Compute back-propagated wavefields k
8:
Build gradient vector GðnÞ (with Algorithm 2) 9: end for 10: Compute perturbation vectordmðnÞ (with Algorithm 3) 11: Define optimal step length aðnÞ by parabola fitting 12: Update model mðnþ1Þ ¼mðnÞ þaðnÞdmðnÞ
13:
end while 14: end for 15: end for
This algorithm was applied successfully for imaging the elasticparameters of onshore and offshore synthetic data in Brossier et al.(2009a, b).

3.3. Gradient computation

The gradient computation with the adjoint-state method isdescribed formally by Eq. (7). For computational time and memoryuse efficiency, some adaptations and approximations can beapplied to compute the gradient vector.

Let me first introduce several notations. The matrix A1 denotesthe impedance matrix resulting from the discretization of the first-order velocity/stress formulation of the elastodynamic equations(Eq. (2)). The matrix A2 denotes the impedance matrix resultingfrom the discretization of the second-order velocity formulationof the elastodynamic equations. This matrix can be derived from thefirst-order velocity/stress system through a parsimonious formula-tion (Luo and Schuster, 1990), as developed in Brossier et al. (2008)


for DG P0. Note, however, that building the A2 matrix through theparsimonious formulation is more complicated for higher order ofinterpolation, particularly when p-adaptivity is used. This complex-ity drove me to perform the forward problem with the first-ordermatrix A1. Subscript pk denotes the k order of interpolation usedduring forward problem modelling. The projection matrix PP0

interpolates a pk wavefields vector at the barycentre of each cell,giving a P0 representation of fields. Finally the restriction operator PV

allows one to extract only the velocity components from a velocity/stress wavefield vector.

Considering these notations, Eq. (7) can be rewritten as

Gmi¼�

Xnf

if ¼ 1

Xns

is ¼ 1

R utpk

@A1tpk

@mikpk

( ): ð9Þ

Because the first-order velocity/stress and the second-ordervelocity systems are theoretically equivalent for the computing ofthe velocity wavefields, Eq. (9) can be rewritten, without approx-imation, as

Gmi¼�

Xnf

if ¼ 1

Xns

is ¼ 1

R utpk

PtV

@A2tpk

@miPVkpk

( )¼�

Xnf

if ¼ 1

Xns

is ¼ 1

R utV

@A2tpk

@mikV

( ):

ð10Þ

This expression allows one to compute the gradient G consider-ing only the velocity wavefields (uV, kV ), the stress wavefields aretherefore not required. Since the incident and adjoint wavefieldsare stored in core-memory for all the sources in the algorithm, areduction of the memory storage of the wavefields by a factor twoand a half is obtained keeping only velocity wavefields.

Since the matrix A2 is difficult to build for P1 and P2 interpola-tion orders, Eq. (10) is not used in practice. Instead, we perform themodelling with the first-order velocity–stress matrix A1pk

asmentioned above and extract the modelled data dcalc at receiverpositions (either, the pressure or the particle velocities dependingwhether the hydrophone or the geophone components areinverted) for each source before deleting the stress wavefieldsfor gradient computation. Second, we use a P0 radiation patternmatrix @A2p0

=@mi instead of a Pk one, because the second-orderimpedance matrix A2p0

is easier to build. The P0 representation ofthe radiation pattern requires to project the Pk incident and the Pk

adjoint velocity wavefields onto the P0 mesh. This projection stepleads to an additional saving of memory and computational timecompared to that achieved with Eq. (10).

This leads to the following approximate expression of thegradient:

Gmi¼�

Xnf

if ¼ 1

Xns

is ¼ 1

R utpk

PtV Pt

P0

@A2tp0

@miPP0

PVkpk

( )

¼�Xnf

if ¼ 1

Xns

is ¼ 1

R utV=P0

@A2tp0

@mikV=P0

( ): ð11Þ

The approximation of the gradient, resulting from the P0

projection of the Pk incident and adjoint wavefields and from theP0 radiation pattern matrix on coarse mesh, can be justified by thefact that the gradient computation does not need to be discretizedwith the same order of accuracy as the impedance matrix used inthe forward problem (Sourbier et al., 2009b). This approximationallows one to save a factor of three and six, both in computationaltime to build the gradient and in core-memory to store the velocitywavefields, when the forward problem is solved with DG P1 and P2,respectively. This approximation introduces, however, some errorsin the gradient. In the unfavourable case of P2 interpolation andcoarse meshes of four cells per wavelength, these errors have been

estimated to be less that nine percents on individual sensitivitykernels, which affects marginally the inversion results in theframework of iterative FWI. Please note, however, that only theincident and adjoint wavefields are P0 projected for gradientbuilding. The data residualDd is computed without approximationusing the k interpolation order before the Pk modelling of theadjoint wavefield.

Finally, the gradient vector is computed through Eq. (11) withAlgorithm 2.

Algorithm 2. Algorithm of gradient computation

1:
for mi ¼ 1 to mN do 2: Compute @A2t
P0=@mi

3:
for is¼1 to ns do 4: Compute b’ð@A2P0
=@miÞuV=P0through full vector/sparse

matrix product routine
5: Compute Gmi
’Gmi�RðbtkV=P0

Þ with sparse vector/full

vector dot product routine
6: end for 7: end for
3.4. L-BFGS and multi-parameter inversion

FWI is generally solved with the steepest-descent or conjugate-gradient methods, preconditioned by a diagonal approximation ofthe Hessian (Pratt et al., 1998; Operto et al., 2006; Shin et al., 2001).Limited-memory, quasi-Newton methods, such as the limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) method(Nocedal, 1980), can, however, provide an efficient alternative toclassic steepest-descent or conjugate-gradient methods for FWIproblems (Epanomeritakis et al., 2008; Symes, 2008; Brossier et al.,2009a).

The Hessian matrix allows for: correction of the gradientfrom the geometrical amplitude spreading of the incident andadjoint wavefields; improvement of the imaging resolution bycorrectly deconvolving the gradient from limited-bandwidtheffects, taking into account the off-diagonal terms of the Hessian;suitable scaling of the model perturbation vector for differentparameter classes; and accounting for double scattering effects(Pratt et al., 1998; Nocedal and Wright, 1999). L-BFGS providesan approximation of the inverse H of the Hessian matrix fromfew gradient and model difference vectors, stored from previousiterations. Moreover, the double-loop recursive algorithm designedby Nocedal (1980) does not explicitly build and store H(n),but directly computes the perturbation vector dmðnÞ ¼ �HðnÞGðnÞwith additions, differences and inner products of vectors. Adiagonal approximation of the inverse Hessian computed fromthe diagonal terms of the approximate Hessian (Pratt et al., 1998;Operto et al., 2006) or pseudo-Hessian (Shin et al., 2001), can beprovided to the L-BFGS algorithm for a better and faster estimationof HðnÞGðnÞ.

In the framework of the inversion of multiple classes ofparameter with different units and amplitudes (for example,reconstruction of wave speeds, density, attenuation factors), theL-BFGS algorithm requires some normalizations for consistentdimensionless computation of the model perturbation vector.Algorithms 3 and 4 review the computation of the perturbationmodel with appropriate normalizations that account for thedifferent classes of parameters in the L-BFGS algorithm (Nocedal,1980, modified following Gratton (personal communication)). Thenormalization of the gradient, the diagonal Hessian, and theperturbation model are performed according to a characteristicvalue mi0 that is defined for each parameter class, taken as the meanof the initial model of each parameter class. This normalization is


equivalent to the computation of the gradient and the diagonalHessian for normalized parameters mi/mi0, where i denotes theparameter class.

Algorithm 3. Perturbation computation: outer part

1:
if n¼1 then 2: Compute diagonal of pseudo-Hessian B0 (Shin et al., 2001) 3: for i¼1 to nparameter_class do 4: Normalize diagonal of Hessian ~B0 mi
¼ B0mim2

i0

5:
end for 6: Apply damping to the diagonal of Hessian (Levenberg–
Marquardt method) ~B0 ¼~B0þeI

7:
end if 8: for i¼1 to nparameter_class do
9:
Normalize gradient ~GðnÞmi¼ GðnÞmi
mi0

10:
end for
11:
Compute ~DðnÞ ¼ ~B0�1 ~GðnÞ
12:
if n4k then 13: Discard the vector pair ðyn�k�1,sn�k�1Þ
14:
end if 15: if n41 then
16:
Compute and store sn�1’mðnÞ�mðn�1Þ
17:
Compute and store yn�1’ ~DðnÞ� ~Dðn�1Þ
18:
end if
19:
Compute ~dmðnÞ
with Algorithm 4
20: for i¼1 to nparameter_class do
21:
Denormalize perturbation vector dmðnÞ ¼ ~dmðnÞ
mi0

22:
end for
Algorithm 4. Perturbation computation: L-BFGS algorithm(Nocedal and Wright, 1999, p. 178), modified following Gratton(personal communication)

1:
q’ ~DðnÞ
2:
for i¼n�1 to n�k do 3: ri’
1

yti si

4:
ai’risti q
5:
q’q�aiyi
6:
end for 7: gn’
stn�1

yn�1

ytn�1

yn�1

8:
r’gnq 9: for i¼n�k to n�1 do 10: b’riy
ti r

11:
r’rþsiðai�bÞ 12: end for 13: ~dm
ðnÞ¼ �r

Fig. 7 shows the FWI results in a canonical configuration: 113horizontal point force sources are located all around the target each50 m, and each source wavefield is recorded by 112 two-compo-nent geophones all around the target each 50 m. Absorbingboundary conditions are used on the four sides of the domain tosimulate an infinite medium. The true models contain one circularheterogeneity of 200 m diameter at different locations for each ofthe five parameters: P-wave and S-wave velocities (VP and VS),density ðrÞ, and P-wave and S-wave quality factors (Qp and Qs)(Fig. 7). Table 2 summarizes the background and heterogeneityvalues for each parameter. The inversion is performed for 10frequencies between 4 and 13 Hz each 1 Hz, starting from homo-geneous background models. The inverted parameters are l, m

(the Lame parameters), r, Qp and Qs, to deliberately invert para-meters with contrasted amplitudes (Qp and QsH101

� 102 while land mH108

� 109). Despite these differences, the normalizationprocedure allows for a correct estimation of the Hessian throughL-BFGS, and allows for a convergence towards acceptable modelswithout any need for heuristic weighting of each parameter class.Note that the weighting provided by the inverse Hessian estimationdoes not require the performing of extra forward problems,whereas the subspace method of Sambridge et al. (1991), whichis generally used in multi-parameter inversions, requires theperforming of one extra forward problem per parameter class toestimate the steplength. Note, however, that in Fig. 7 there aresome coupling effects between some parameters that result fromthe physics of the diffraction pattern of each parameter class, andnot from the choice of the optimization method.

The algorithm allows one to reconstruct one quality factor foreach wave type (Qp and Qs). A non-dispersive attenuation model iscurrently used for both modelling and inversion. A dispersivemodel, as the Kolsky–Futterman model (Kolsky, 1956; Futterman,1962), could, however, be taken into account without extracomputational cost. Note, however, that reliable reconstructionof attenuation factors from seismic data is a difficult task thatstrongly depends on acquisition geometry (Mulder and Hak, 2009).A detailed analysis of the ability of my algorithm to reconstructattenuation is beyond the scope of this paper.

The reader can refer to Brossier et al. (2009a, b) for realisticapplications of that strategy to onshore and offshore synthetic datafrom the imaging of P-wave and S-wave velocity models.

4. Parallelization and performance

To tackle imaging of realistically sized targets, the computa-tionally demanding FWI method needs to be implemented inparallel to exploit multiple CPU facilities, also with a large amountof available core-memory, compared to sequential execution.

4.1. Forward problem

The forward problem of FWI, as the resolution of the linearsystem of Eq. (3) for multiple sources (right-hand-side (RHS)), isperformed with the massively parallel direct solver MUMPS(Amestoy et al., 2006), which performs a LU decomposition ofthe matrix A, based on a multi-frontal approach (Duff and Reid,1983). The parallelization allows for the speeding up of thefactorization by more than one order of magnitude, compared tosequential execution, and for the distribution of the LU factors inthe core-memory over all of the processors (Sourbier et al., 2009b),which makes this quite efficient for solving large problems withoutintensive I/O resources. The MUMPS software is based on messagepassing interface (MPI) standards for applications with distributedmemory architectures. The workflow of the MUMPS algorithm canbe summarized as: (1) analysis of the A matrix pattern (note thatunknowns are treated independently in the current version of thealgorithm, without taking advantage of the block structure pro-vided by elements), followed by reordering of the matrix to limitthe fill-in during factorization. The analysis step is performedsequentially on the master processor with the current MUMPSversion (but does not modify the workflow if done in parallel); (2)parallel factorization of the A matrix with dynamic pivoting; (3)parallel forward and backward substitutions to obtain solutionvectors u from RHS vectors b.

Table 2Physical parameters in the background and heterogeneities for the canonical

inversion test.

Parameter Background

value

Heterogeneity

value

VP (m/s) 1500 1800

VS (m/s) 1000 800

r (kg/m3) 1000 1400

QP 100 10

QS 100 10

Fig. 7. Example of multi-parameter inversion. (a–e) True models: (a) VP, (b) VS, (c)r, (d) QP and (e) QS models. (f–j) Reconstructed FWI models: (f) l, (g)m, (h)r, (i) QP and (j) QS.


4.2. Overview of the parallel inverse problem

In the framework of iterative FWI, the physical properties of themodel change at each iteration, but the mesh geometry is notchanged. Therefore, the A matrix keeps the same pattern, allowing asingle analysis phase in the whole inversion procedure. This featureis interesting because the matrix analysis is the only time-demand-ing step performed sequentially on the master process. After thesubstitution step, the solution vectors u are returned by MUMPS in adistributed form over the in-core memory of the processors.However, the distribution of the solutions performed by MUMPSdoes not always provide a well-balanced load for the processors,because the MUMPS strategy is to optimize LU decomposition withaccurate dynamic pivoting and reduced fill in. The distribution istherefore not suitable for the application of the projection operatorPP0

PV to the local in-core solutions (i.e., the solutions at the differentnodes for one wavefield of a triangular cell are not systematicallystored in-core on the same processor). To overcome this limitation,the solutions are re-ordered following well-balanced mesh parti-tioning, performed with the METIS software (Karypis and Kumar,1999). The MUMPS-distributed solutions are mapped to the METISdecomposition with MPI point-to-point communications, before theapplication of operator PP0

PV . This mapping/projection step repre-sents about 25% of the MUMPS substitution time. The forward andback-propagated wavefields are stored in the in-core memory for allsources. The gradient and the diagonal of the Hessian are thereforeefficiently computed in parallel with local in-core wavefields for thelocal subdomain associated with each process, before being cen-tralized on the master processor for perturbation-model building. Ofnote, the assemblage and storage of the sparse A matrix areperformed by the master process, as the storage of the mesh-related

tables, the wavefield solutions at the receiver positions for comput-ing the objective function, and the composite residual sources foradjoint-wavefield computation. To avoid prohibitive memory allo-cation on the master processor, the master process is not involved inthe factorization and substitution/projection steps, the wavefieldstorage, and the gradient/Hessian computation.

4.3. Two parallelism levels

The scalability of parallel direct solvers is intrinsically limitedbecause of the large amount of communications between the MPIprocesses, and the memory overhead generated by the number of MPIprocess involved in the factorization (Sourbier et al., 2009b). There-fore, the single parallelization level with MPI cannot be adapted forlarge-scale applications. A second parallelism level, based on multi-threading, can, however, be efficiently implemented on multi-corearchitectures. The main computationally demanding parallel tasks ofthe FWI algorithm are the forward problem resolutions (factorizationand substitutions/projection) and the building of the gradient anddiagonal Hessian.

The MUMPS solver uses the Basic Linear Algebra Subroutines(BLAS) library mainly for level 2 and 3 operations (matrix/vector,and matrix/matrix operations). Several BLAS libraries are availablein multi-thread distributions, which allow an efficient low-level ofparallelization for multi-core architectures with shared memory.

The gradient and diagonal Hessian computations have severalnested loops embedded (Algorithm 2). In addition to the MPI-baseddomain-decomposition parallelism, the outer loop over the local modelparameters can be easily parallelized with shared-memory threadtechnology (as with standard OpenMP) to speed up the loop on theavailable computing-cores without duplicating or exchanging data.

An alternative to classical parallelization using one MPI processper computing-core is, therefore, to run one MPI process on n

computing-cores that share the same memory (physically on thesame node), and to launch n shared-memory threads per MPI-processduring the BLAS operations, and the gradient and diagonal Hessiancomputations. These two levels of parallelism can mitigate thememory overhead of the factorization, the network usage forintensive communications, and the computational time for LU factors.

A third level of parallelism could also be implemented overfrequencies and/or time-damping factors (Shin and Min, 2006).This third level of parallelism has, however, not been investigatedin the current algorithm.


4.4. Performance

Simulations are performed on a distributed-memory architec-ture composed of 18 nodes, each of which includes two quad-core2.7 GHz Opteron processors with 64 GB of shared memory. Thenodes are interconnected by an Infiniband ConnectX network. Thiscluster is located at Geoazur Institute, and it provides 144computing-cores and 1.125 TB of memory.

A first test corresponds to a middle-scale application that isfocused on the elastic Valhall model (Fig. 4). A 1 042 720-cell meshwas designed with the Triangle software (Shewchuk, 1996). The meshis composed of a thin unstructured layer on top for P1 interpolation,and a large zone of equilateral structured triangles for P0 in depth. Thismesh allows frequencies of up to 7 Hz to be modelled in the elasticcase. The substitution/projection phase is performed for 315 RHSs,corresponding to 315 explosive sources. Fig. 8 shows the evolution ofthe computational times and the memory usage for the maininversion steps as a function of the number of computing-cores,using one- and two-level parallelism.

We clearly see a decrease in the computational time when thecomputing-cores number increases, although the efficiency is limitedby the intrinsic scalability of the MUMPS direct solver. The best trade-off between performance and computational resources is obtainedwith 24 computing-cores, with one MPI-process per computing-coreused. When two levels of parallelism are used with shared-memorythreads, the in-core memory required by the factorization decreases,an appealing feature if the memory available is limited. However,the memory saving is at the expense of the computational time,particularly for the substitution/projection step. Here, I postulate thatfor this middle-scale application, the speed-up provided by the multi-thread computation is not significant with respect to the timeoverheads due to thread launching, leading to a limited timeperformance for this application.

0

20

40

60

80

100

120

140

160

10 20 30 40 50 60

Ela

psed

Tim

e (s

)

computing-cores number

Factorization time1 MPI-process/core, 1 Thread/MPI-process

1 MPI-process/2 cores, 2 Threads/MPI-process1 MPI-process/4 cores, 4 Threads/MPI-process

0

100

200

300

400

500

10 20 30 40 50 60

Ela

psed

Tim

e (s

)


Substitution/projection time for 315 RHS

1 MPI-process/core, 1 Thread/MPI-process1 MPI-process/2 cores, 2 Threads/MPI-process1 MPI-process/4 cores, 4 Threads/MPI-process

Fig. 8. Computational time and memory use for the main inversion tasks for a middle-

strategy, as indicated.

A second test focuses on a large-scale elastic crustal model(Fig. 6). An h-adaptive unstructured mesh was designed withTriangle for DG P2 modelling up to 6.5 Hz, leading to 528 080 cells.To mimic the SFJ-OBS experiment (Operto et al., 2006), 93 sourceswere considered, corresponding to 93 ocean-bottom seismometers,for the substitution/projection phase. Fig. 9 summarises the com-putational times and memory usage for this test. Note that the timefor gradient building is negligible compared to that of the MUMPSsteps, due to the small number of RHSs and the cell number in themesh. For this large-scale application that requires a large amount ofdistributed memory to store the 200 GB of LU factors, shared-memory-threads technology is the alternative choice to the one-level parallelism, to minimize the number of MPI processes, andtherefore the memory overheads during LU factorization. The dataobtained with multiple threads show that both the computationaltime for the factorization and the substitution/projection steps canbe decreased, while saving a significant amount of core memory.

The results of these two applications suggest that the two-levelparallelism over MPI processes and shared-memory threads that areembedded in my FWI algorithm provide an efficient and flexible tool totackle realistic size target imaging. Depending on the computingfacilities available and the size of the case study, a judicious combina-tion of the two levels of parallelism can be found for optimalperformances in terms of computational time and core-memory usage.

5. Conclusion

A massively parallel algorithm was developed to perform fre-quency-domain full-waveform modelling and inversion for imaging 2Dvisco-elastic media. The forward problem of inversion, as the resolutionof the elastodynamics equation in the frequency-domain, is solvedusing a low-order finite element discontinuous Galerkin method. This

15

20

25

30

35

10 20 30 40 50 60

Mem

ory

Gb


Total memory usage for factorization1 MPI-process/core, 1 Thread/MPI-process


0

50

100

150

200

10 20 30 40 50 60

Ela

psed

Tim

e (s

)


Gradient building time1 MPI-process/core, 1 Thread/MPI-process


scale application, as a function of the number of computing-cores, and the parallel

400

450

500

550

600

650

700

750

800

850

60 80 100 120 140

Ela

psed

Tim

e (s

)


Factorization time


150

200

250

300

350

60 80 100 120 140

Mem

ory

Gb


Total memory usage for factorization


60

80

100

120

140

160

180

200

60 80 100 120 140

Ela

psed

Tim

e (s

)


Substitution/projection time for 93 RHS1 MPI-process/core, 1 Thread/MPI-process


0

1

2

3

4

5

60 80 100 120 140

Ela

psed

Tim

e (s

)


Gradient building time1 MPI-process/core, 1 Thread/MPI-process


Fig. 9. Computational time and memory use for the main inversion tasks for a large-scale application, as a function of the number of computing-cores, and the parallel strategy,

as indicated.


allows for complex topographies and high velocity-contrasts to betaken into account, and to use unstructured h-adaptive triangularmeshes combined with p-adaptive interpolations. The linear system,resulting from this discretization is solved with the parallel MUMPSdirect solver, allowing the LU factors to be stored in a distributed formover the processors. The wavefield solutions are also stored in adistributed form, following mesh partitioning for a well-balancedworkload over the processors. The inverse problem of the FWI issolved with a non-linear local optimization scheme. The implementa-tion of the gradient of the objective function has been optimized forcomputational time and memory saving. A quasi-Newton L-BFGSoptimization has been implemented to estimate the inverse of theHessian matrix at a negligible computational cost, improving thereconstruction of several classes of parameter (P-wave and S-wavespeeds, density, and attenuation factors QP and QS). A two-parallelismlevel through MPI-processes and shared-memory threads has beenimplemented for optimal use of the computational time and core-memory resources, which will depend on the computer facilities.

This FWI algorithm will be used to tackle the imaging of realisticallysized targets at various scales, from near-surface geotechnic applica-tions to crustal-scale exploration. Ongoing studies include the exten-sion to anisotropic (vertically transverse isotropic) media and theadaptation to teleseismic configurations for lithospheric imaging.

Acknowledgements

This study was funded by the SEISCOPE consortium (http://seiscope.oca.eu), which is sponsored by BP, CGG-VERITAS, ENI,EXXON-MOBIL, SHELL and TOTAL, and by Agence Nationale de laRecherche (ANR) under project ANR-05-NT05-2-42427. The author

is grateful to S. Operto (Geoazur, CNRS) and J. Virieux (LGIT,Universite Joseph Fourier) for fruitful and stimulating discussionson full waveform modelling and inversion, on parallel implementa-tion, and for manuscript review. Many thanks also go to SergeGratton (CERFACS and CNES) for interesting discussions on quasi-Newton optimization methods. The LU factorization of the impe-dance matrix was performed with MUMPS (http://graal.ens-lyon.fr/MUMPS/index.html). The mesh generation was performed with thehelp of Triangle (http://www.cs.cmu.edu/�quake/triangle.html).The Garvin problem analytical solutions were computed with codeprovided by U. Iturraran and F. J. Sanchez-Sesma. Access to the high-performance computing facilities of the Mesocentre SIGAMM com-puter center, the IDRIS national center (project 082280) and theGEOAZUR computing center, provided the required computerresources. Many thanks go to J. Kommendal and L. Sirgue fromBP, for providing the elastic synthetic models of Valhall. I would liketo thank the two anonymous reviewers for their fruitful comments.

References

Amestoy, P.R., Guermouche, A., L’Excellent, J.Y., Pralet, S., 2006. Hybrid schedulingfor the parallel solution of linear systems. Parallel Computing 32, 136–156.

Berenger, J.-P., 1994. A perfectly matched layer for absorption of electromagneticwaves. Journal of Computational Physics 114, 185–200.

Bielak, J., Loukakis, K., Hisada, Y., Yoshimura, C., 2003. Domain reduction method forthree-dimensional earthquake modeling in localized regions, part I: theory.Bulletin of the Seismological Society of America 93 (2), 817–824.

Bohlen, T., Saenger, E.H., 2006. Accuracy of heterogeneous staggered-grid finite-difference modeling of Rayleigh waves. Geophysics 71, 109–115.

Brenders, A., Pratt, R., Charles, S., 2009. Waveform tomography of 2-d seismic data inthe canadian foothills—data preconditioning by exponential time-damping. In:Expanded Abstracts. EAGE, p. U041.

http://seiscope.unice.fr

http://seiscope.unice.fr

http://graal.ens-lyon.fr/MUMPS/index.html

http://graal.ens-lyon.fr/MUMPS/index.html

http://www.cs.cmu.edu/&sim;quake/triangle.html

http://www.cs.cmu.edu/&sim;quake/triangle.html


Brenders, A.J., Pratt, R.G., 2007a. Efficient waveform tomography for lithosphericimaging: implications for realistic 2D acquisition geometries and low frequencydata. Geophysical Journal International 168, 152–170.

Brenders, A.J., Pratt, R.G., 2007b. Full waveform tomography for lithosphericimaging: results from a blind test in a realistic crustal model. GeophysicalJournal International 168, 133–151.

Brossier, R., Operto, S., Virieux, J., 2009a. Seismic imaging of complex onshorestructures by 2D elastic frequency-domain full-waveform inversion. Geophy-sics 74 (6), WCC63–WCC76.

Brossier, R., Operto, S., Virieux, J., 2009b. Two-dimensional seismic imaging of theValhall model from synthetic OBC data by frequency-domain elastic full-waveform inversion. SEG Technical Program Expanded Abstracts, Houston 28(1), 2293–2297.

Brossier, R., Virieux, J., Operto, S., 2008. Parsimonious finite-volume frequency-domain method for 2-D P-SV-wave modelling. Geophysical Journal Interna-tional 175 (2), 541–559.

Bunks, C., Salek, F.M., Zaleski, S., Chavent, G., 1995. Multiscale seismic waveforminversion. Geophysics 60 (5), 1457–1473.

Chaljub, E., Capdeville, Y., Vilotte, J., 2003. Solving elastodynamics in a fluid-solidheterogeneous sphere: a parallel spectral element approximation on non-conforming grids. Journal of Computational Physics 187, 457–491.

Duff, I.S., Reid, J.K., 1983. The multifrontal solution of indefinite sparse symmetriclinear systems. ACM Transactions on Mathematical Software 9, 302–325.

Dumbser, M., Kaser, M., 2006. An arbitrary high order discontinuous Galerkinmethod for elastic waves on unstructured meshes II: the three-dimensionalisotropic case. Geophysical Journal International 167 (1), 319–336.

Epanomeritakis, I., Akc-elik, V., Ghattas, O., Bielak, J., 2008. A Newton-CG method forlarge-scale three-dimensional elastic full waveform seismic inversion. InverseProblems 24, 1–26.

Etienne, V., Chaljub, E., Virieux, J., Glinsky, N., 2010. An hp-adaptive discontinuousGalerkin finite-element method for 3D elastic wave modelling. GeophysicalJournal International 183 (2), 941–962.

Futterman, W., 1962. Dispersive body waves. Journal of Geophysics Research 67,5279–5291.

Garvin, W.W., 1956. Exact transient solution of the buried line source problem.Proceedings of Royal Society of London 234, 528–541.

Hesthaven, J.S., Warburton, T., 2008. Nodal Discontinuous Galerkin Method.Algorithms, Analysis, and Application. Springer, New York.

Hustedt, B., Operto, S., Virieux, J., 2004. Mixed-grid and staggered-grid finitedifference methods for frequency domain acoustic wave modelling. GeophysicalJournal International 157, 1269–1296.

Jo, C.H., Shin, C., Suh, J.H., 1996. An optimal 9-point, finite-difference, frequency-space 2D scalar extrapolator. Geophysics 61, 529–537.

Karypis, G., Kumar, V., 1999. A fast and high quality multilevel scheme forpartitioning irregular graphs. SIAM Journal on Scientific Computing 20 (1),359–392.

Kaser, M., Dumbser, M., 2006. An arbitrary high order discontinuous Galerkinmethod for elastic waves on unstructured meshes I: the two-dimensionalisotropic case with external source terms. Geophysical Journal International166, 855–877.

Kolsky, H., 1956. The propagation of stress pulses in viscoelastic solids. PhilosophicalMagazine 1, 693–710.

Komatitsch, D., Vilotte, J.P., 1998. The spectral element method: an efficient tool tosimulate the seismic response of 2D and 3D geological structures. BulletinSeismology Society of America 88, 368–392.

Luo, Y., Schuster, G.T., 1990. Parsimonious staggered grid finite-differencing of thewave equation. Geophysical Research Letters 17 (2), 155–158.

Marfurt, K., 1984. Accuracy of finite-difference and finite-elements modeling of thescalar and elastic wave equation. Geophysics 49, 533–549.

Moczo, P., Bystricky, E., Carcione, J.M., Bouchon, M., 1997. Hybrid modeling of P-SVseismic motion at inhomogeneous viscoelastic topographic structures. BulletinSeismology Society of America 87, 1305–1323.

Mulder, W.A., Hak, B., 2009. Simultaneous imaging of velocity and attenuationperturbations from seismic data is nearly impossible. In: Expanded Abstracts.p. S043.

Munns, J.W., 1985. The Valhall field: a geological overview. Marine and PetroleumGeology 2, 23–43.

Nocedal, J., 1980. Updating quasi-Newton matrices with limited storage. Mathe-matics of Computation 35 (151), 773–782.

Nocedal, J., Wright, S.J., 1999. Numerical Optimization. Springer, New York, US.Operto, S., Virieux, J., Dessa, J.X., Pascal, G., 2006. Crustal imaging from multifold

ocean bottom seismometers data by frequency-domain full-waveform tomo-graphy: application to the eastern Nankai trough. Journal of GeophysicalResearch 111 (B09306). doi:10.1029/2005JB003835.

Plessix, R.-E., 2006. A review of the adjoint-state method for computing the gradientof a functional with geophysical applications. Geophysical Journal International167 (2), 495–503.

Pratt, R.G., 1990. Inverse theory applied to multi-source cross-hole tomography. PartII: elastic wave-equation method. Geophysical Prospecting 38, 311–330.

Pratt, R.G., Shin, C., Hicks, G.J., 1998. Gauss–Newton and full Newton methods infrequency-space seismic waveform inversion. Geophysical Journal Interna-tional 133, 341–362.

Pratt, R.G., Worthington, M.H., 1990. Inverse theory applied to multi-sourcecross-hole tomography. Part I: acoustic wave-equation method. GeophysicalProspecting 38, 287–310.

Ravaut, C., Operto, S., Improta, L., Virieux, J., Herrero, A., dell’Aversana, P., 2004.Multi-scale imaging of complex structures from multi-fold wide-apertureseismic data by frequency-domain full-wavefield inversions: application to athrust belt. Geophysical Journal International 159, 1032–1056.

Remaki, M., 2000. A new finite volume scheme for solving Maxwell’s system.COMPEL 19 (3), 913–931.

Saenger, E.H., Bohlen, T., 2004. Finite-difference modelling of viscoelastic and aniso-tropic wave propagation using the rotated staggered grid. Geophysics 69, 583–591.

Sambridge, M.S., Tarantola, A., Kennett, B.L., 1991. An alternative strategy for non-linear inversion of seismic waveforms. Geophysical Prospecting 39, 723–736.

Sears, T., Singh, S., Barton, P., 2008. Elastic full waveform inversion of multi-component OBC seismic data. Geophysical Prospecting 56 (6), 843–862.

Shewchuk, J.R., May 1996. Triangle: engineering a 2D quality mesh generator anddelaunay triangulator. In: Lin, M.C., Manocha, D. (Eds.), Applied ComputationalGeometry: Towards Geometric Engineering. Lecture Notes in Computer Science,vol. 1148. Springer-Verlag, pp. 203–222 from the First ACM Workshopon Applied Computational Geometry.

Shin, C., Jang, S., Min, D.J., 2001. Improved amplitude preservation for prestack depthmigration by inverse scattering theory. Geophysical Prospecting 49, 592–606.

Shin, C., Min, D.-J., 2006. Waveform inversion using a logarithmic wavefield.Geophysics 71 (3), R31–R42.

Shin, C., Min, D.-J., Marfurt, K.J., Lim, H.Y., Yang, D., Cha, Y., Ko, S., Yoon, K., Ha, T.,Hong, S., 2002. Traveltime and amplitude calculations using the damped wavesolution. Geophysics 67, 1637–1647.

Sirgue, L., Pratt, R.G., 2004. Efficient waveform inversion and imaging: a strategy forselecting temporal frequencies. Geophysics 69 (1), 231–248.

Sourbier, F., Operto, S., Virieux, J., Amestoy, P., L’Excellent, J.-Y., 2009a. Fwt2d: a massivelyparallel program for frequency-domain full-waveform tomography of wide-apertureseismic data-part 1: algorithm. Computers & Geosciences 35 (3), 487–495.

Sourbier, F., Operto, S., Virieux, J., Amestoy, P., L’Excellent, J.-Y., 2009b. Fwt2d: amassively parallel program for frequency-domain full-waveform tomographyof wide-aperture seismic data-part 2: numerical examples and scalabilityanalysis. Computers & Geosciences 35 (3), 496–514.

Stekl, I., Pratt, R.G., 1998. Accurate viscoelastic modeling by frequency-domain finitedifference using rotated operators. Geophysics 63, 1779–1794.

Symes, W.W., 2008. Migration velocity analysis and waveform inversion. Geophy-sical Prospecting 56, 765–790.

Tarantola, A., 1984. Inversion of seismic reflection data in the acoustic approxima-tion. Geophysics 49 (8), 1259–1266.

Tarantola, A., 1987. Inverse Problem Theory: Methods for Data Fitting and ModelParameter Estimation. Elsevier, New York.

Toksoz, M.N., Johnston, D.H., 1981. Seismic Wave Attenuation. Geophysics reprintseries, No. 2. Society of Exploration Geophysicists, Tulsa, OK.

Virieux, J., Operto, S., 2009. An overview of full waveform inversion in explorationgeophysics. Geophysics 74 (6), WCC127–WCC152.

dx.doi.org/10.1029/2005JB003835

Two-dimensional frequency-domain visco-elastic full waveform inversion … · 2013-10-22 · Full...

Documents

Transcript of Two-dimensional frequency-domain visco-elastic full waveform inversion … · 2013-10-22 · Full...