Stochastic interpretation of magnetotelluric data ...

13
7 ANNALS OF GEOPHYSICS, VOL. 50, N. 1, February 2007 Key words magnetotelluric method – inverse prob- lem – controlled random search – Markov chain Monte Carlo – neighbourhood algorithm 1. Introduction Interpretation of geoelectrical induction data in seismoactive and volcanic areas often requires a directed search for characteristic structural de- tails and quantitative parameters of the deep structure that could be directly compared to re- sults of other, generally non-electrical investiga- tions. Typical questions may be, e.g., correlation Stochastic interpretation of magnetotelluric data, comparison of methods Václav C ˇ erv ( 1 ), Michel Menvielle ( 2 )( 3 ) and Josef Pek ( 1 ) ( 1 ) Geophysical Institute, Academy of Sciences of the Czech Republic, Prague 4, Czech Republic ( 2 ) Centre d’Études des Environnements Terrestre et Planétaire, Saint Maur des Fosses Cedex, France ( 3 ) Département des Sciences de la Terre, Université Paris Sud XI, Orsay Cedex, France Abstract Global optimization and stochastic approaches to the interpretation of measured data have recently gained par- ticular attraction as tools for directed search for and/or verification of characteristic structural details and quan- titative parameters of the deep structure, which is a task often arising when interpreting geoelectrical induction data in seismoactive and volcanic areas. We present a comparison of three common global optimization and sto- chastic approaches to the solution of a magnetotelluric inverse problem for thick layer structures, specifically the controlled random search algorithm, the stochastic sampling by the Monte Carlo method with Markov chains and its newly suggested approximate, but largely accelerated, version, the neighbourhood algorithm. We test the algorithms on a notoriously difficult synthetic 5-layer structure with two conductors situated at different depths, as well as on the experimental COPROD1 data set standardly used to benchmark 1D magnetotelluric inversion codes. The controlled random search algorithm is a fast and reliable global minimization procedure if a relative- ly small number of parameters is involved and a search for a single target minimum is the main objective of the inversion. By repeated runs with different starting test model pools, a sufficiently exhaustive mapping of the pa- rameter space can be accomplished. The Markov chain Monte Carlo gives the most complete information for the parameter estimation and their uncertainty assessment by providing samples from the posterior probability dis- tribution of the model parameters conditioned on the experimental data. Though computationally intensive, this method shows good performance provided the model parameters are sufficiently decorrelated. For layered mod- els with mixed resistivities and layer thicknesses, where strong correlations occur and even different model class- es may conform to the target function, the method often converges poorly and even very long chains do not guar- antee fair distributions of the model parameters according to their probability densities. The neighbourhood re- sampling procedure attempts to accelerate the Monte Carlo simulation by approximating the computationally ex- pensive true target function by a simpler,piecewise constant interpolant on a Voronoi mesh constructed over a set of pre-generated models. The method performs relatively fast but seems to suggest systematically larger un- certainties for the model parameters. The results of the stochastic simulations are compared with the standard linearized solutions both for thick layer models and for smooth Occam solutions. Mailing address: Dr. Václav C ˇ erv, Geophysical Institu- te, Academy of Sciences of the Czech Republic, Boc ˇní II/1401, 14131 Prague 4, Czech Republic; e-mail: [email protected] brought to you by CORE View metadata, citation and similar papers at core.ac.uk provided by Earth-prints Repository

Transcript of Stochastic interpretation of magnetotelluric data ...

Page 1: Stochastic interpretation of magnetotelluric data ...

7

ANNALS OF GEOPHYSICS, VOL. 50, N. 1, February 2007

Key words magnetotelluric method – inverse prob-lem – controlled random search – Markov chainMonte Carlo – neighbourhood algorithm

1. Introduction

Interpretation of geoelectrical induction datain seismoactive and volcanic areas often requiresa directed search for characteristic structural de-tails and quantitative parameters of the deepstructure that could be directly compared to re-sults of other, generally non-electrical investiga-tions. Typical questions may be, e.g., correlation

Stochastic interpretation of magnetotelluric data,comparison of methods

Václav Cerv (1), Michel Menvielle (2)(3) and Josef Pek (1)(1) Geophysical Institute, Academy of Sciences of the Czech Republic, Prague 4, Czech Republic

(2) Centre d’Études des Environnements Terrestre et Planétaire, Saint Maur des Fosses Cedex, France(3) Département des Sciences de la Terre, Université Paris Sud XI, Orsay Cedex, France

AbstractGlobal optimization and stochastic approaches to the interpretation of measured data have recently gained par-ticular attraction as tools for directed search for and/or verification of characteristic structural details and quan-titative parameters of the deep structure, which is a task often arising when interpreting geoelectrical inductiondata in seismoactive and volcanic areas. We present a comparison of three common global optimization and sto-chastic approaches to the solution of a magnetotelluric inverse problem for thick layer structures, specifically thecontrolled random search algorithm, the stochastic sampling by the Monte Carlo method with Markov chainsand its newly suggested approximate, but largely accelerated, version, the neighbourhood algorithm. We test thealgorithms on a notoriously difficult synthetic 5-layer structure with two conductors situated at different depths,as well as on the experimental COPROD1 data set standardly used to benchmark 1D magnetotelluric inversioncodes. The controlled random search algorithm is a fast and reliable global minimization procedure if a relative-ly small number of parameters is involved and a search for a single target minimum is the main objective of theinversion. By repeated runs with different starting test model pools, a sufficiently exhaustive mapping of the pa-rameter space can be accomplished. The Markov chain Monte Carlo gives the most complete information for theparameter estimation and their uncertainty assessment by providing samples from the posterior probability dis-tribution of the model parameters conditioned on the experimental data. Though computationally intensive, thismethod shows good performance provided the model parameters are sufficiently decorrelated. For layered mod-els with mixed resistivities and layer thicknesses, where strong correlations occur and even different model class-es may conform to the target function, the method often converges poorly and even very long chains do not guar-antee fair distributions of the model parameters according to their probability densities. The neighbourhood re-sampling procedure attempts to accelerate the Monte Carlo simulation by approximating the computationally ex-pensive true target function by a simpler, piecewise constant interpolant on a Voronoi mesh constructed over aset of pre-generated models. The method performs relatively fast but seems to suggest systematically larger un-certainties for the model parameters. The results of the stochastic simulations are compared with the standardlinearized solutions both for thick layer models and for smooth Occam solutions.

Mailing address: Dr. Václav Cerv, Geophysical Institu-te, Academy of Sciences of the Czech Republic, Bocní II/1401,14131 Prague 4, Czech Republic; e-mail: [email protected]

brought to you by COREView metadata, citation and similar papers at core.ac.uk

provided by Earth-prints Repository

Page 2: Stochastic interpretation of magnetotelluric data ...

8

Václav Cerv, Michel Menvielle and Josef Pek

of clusters of seismic sources with boundaries ofblocks with a significant conductivity contrast,detection, spatial mapping and conductivity esti-mates of magmatic plumes, delineation ofgroundwater layers and of deep aquifers, etc.Due to the inherent ambiguity of the inverseproblem solution with noisy and incomplete da-ta, those structural features may be missed singlemodel linearized inversions, or smeared overlarge intervals of depths if smoothness of the in-verse model is emphasized by applying a rough-ness penalty for regularization.

Stochastic approaches to the interpretation ofmeasured data have recently gained attention ingeophysical applications (e.g., Mosegaard andTarantola, 1995). They are particularly attractiveas tools for estimating the model parametersfrom samples from their probability distribu-tions, for quantifying uncertainties in the esti-mated model parameters, and for structural hy-potheses testing. The advantage of the stochasticsimulations is that they aim at performing thesearch throughout the parameter space and pickup the models according to their probabilitiesmeasured by the misfit and priorities for specificmodel features. The main difficulty of the sto-chastic inversions is that they are carried out viaoften extremely computer intensive simulations.

Mapping the parameter space has long beenan ambition of global minimization methods ingeophysical inversions (e.g., Senn and Stoffa,1995). Though these methods primarily aim atsearching for a global minimum of a function,they always operate on a large population ofmodels and rely on random factors within theiroptimization strategy to reduce the risk of beingtrapped at local minima of the target function.By this mechanism, the optimization proceduregenerates a topography map of the target func-tion along a series of randomly perturbed pathsin the parameter space, with usually dense cov-erage in the vicinity of the minima. The modelpopulation generated by a global optimizationprocedure does not generally follow the trueprobability density function of the models.

In this paper we aim at demonstrating howensembles of models produced by selected glob-al optimization procedures compare with resultsof single model linearized inversions as well aswith results of a probabilistic sampling. As repre-

sentatives of the global minimization procedures,we have selected the Controlled Random Search(CRS) method (Price, 1977) and the Neighbor-hood Algorithm (NA) by Sambridge (1999a),both being relatively fast global minimizers withonly low demands on a model-conditioned fine-tuning of the optimization parameters. A fullystochastic sampling is based on a simple imple-mentation of the Markov Chain Monte Carlo(MCMC) procedure adopted from (Grandis et al., 1999) and modified for thick layer 1D MTinversion with both variable resistivities and lay-er thicknesses. The results are also comparedwith the standard Occam smooth inversion (Con-stable et al., 1987) and layered linearized inver-sion by Weaver and Agarwal (1993).

As to the model structures, we restrict our-selves to 1D layered MT models here that arebetter suited if structural details like sharp elec-trical boundaries, thin sheets or sandwich struc-tures with a tectonic significance are targets ofthe search. Moreover, the easiness of 1D MTdirect solutions allows us to carry out all thecomputer intensive tests more completely thanwould be possible under the serious limitationsimposed by using 2D or even 3D direct model-ling codes. Specifically, we test the selected op-timization procedures on a simple COPROD1benchmark model and then analyze a syntheticmodel with two conductive layers separated bya non-conductor in detail. The latter examplepresents a generally non-elementary problem,especially as regards the detection and resolu-tion of the deeper conductor, and is typical oftectonic settings in many active areas.

2. Global optimization and stochasticsampling methods

2.1. Controlled Random Search

Controlled Random Search (CRS) is a sim-ple yet often very efficient optimization algo-rithm first suggested by Price (1977). It startswith a sufficiently large pool of randomly gen-erated models, say pi, i=1, ..., Np with the targetvalues ϕ( pi), where each vector pi consists ofNS parameters that describe the physical model.The CRS algorithm then proceeds by employ-

Page 3: Stochastic interpretation of magnetotelluric data ...

9

Stochastic interpretation of magnetotelluric data, comparison of methods

ing a set of heuristic rules to generate a new testmodel from the current pool, say pT = R(p1, ...,pNP, rand) where ‘rand’ symbolizes a randomfactor that diversifies the model population andreduces the risk of the algorithm being trappedin the local minima of the target function. If thenew test model pT is better, in terms of the tar-get misfit, than the currently worst model in thepool it is used to replace the latter. If, moreover,ϕ( pT) is less than the currently best model mis-fit in the pool then many heuristics suggest car-rying out an additional detailed local search inthe vicinity of this successful point. By repeat-edly applying the above steps to the model pop-ulation the model pool develops and moves to-wards regions with better target values until atermination criterion is met, defined via a targetthreshold, by a minimum of a parameter changebetween successive iterations, or by setting alimit on the number of iterations.

Particular versions of the CRS algorithmlargely differ by the heuristics used for generat-ing the new test model. The original CRS algo-rithm by Price (1977) uses a simplex of NS+1models randomly chosen from the current pool.The new test model is generated by mirroringthe (NS+1)−st model from the simplex throughthe centroid of the first NS models,

.

This CRS version will be further referred to asCRS1.

A number of other heuristics are discussedand compared, e.g., in Tvrdík et al. (2002). Oneof the approaches showing good performance in1D MT inversions is a heuristics proposed byMontaz et al. (1997), referred to as CRS6 in whatfollows. It generates the new test model by com-ponent-by-component fitting parabolas throughthe currently best model and two further modelsselected randomly from the current pool. The pa-rameters of the new trial model pT are then givenby the minima of the individual parabolas.

For a given number of model parameters tobe estimated the performance of the CRS algo-rithms in 1D MT inversions depends primarilyon the size of the pool from which new testmodels are generated. The results indicate that

=N

p p p2

TS

i N

i

N

1

1

S

S

− +

=

/

CRS6 is largely superior to CRS1 as regards theconvergence speed. However, the CRS1 sim-plex algorithm seems to better explore the pa-rameter space and shows lower tendency to endup in secondary local minima. In general, CRSwith repeated runs presents a relatively fast andreliable global optimization algorithm, whichprovides us rapidly with information about thecomplete spectrum of the minima of the targetfunction under study.

2.2. Neighborhood algorithm

Recently, a new global optimization proce-dure was suggested by Sambridge (1999a) for aseismic inversion, based on an iterative evolu-tion of the initial model pool by randomly sam-pling the target function in the immediate vicin-ity of the pool models. The vicinity of a model isdefined in a most natural way, as a subregion inthe parameter space that comprises all the near-est points with regard to this particular model.Formally, the nearest neighborhood of a poolmodel pi, i∈1, ..., NP consists of all models p,which meet the condition ,j∈1, ..., NP, j≠ i . In this way, the whole param-eter space is divided into a system of convexVoronoi cells, each of them representing the near-est neighborhood to a particular pool model.

The optimization by the Neighborhood Algo-rithm (NA) proceeds by carrying out NP/Nr ran-dom steps within Nr selected Voronoi cells withthe best target values, generating thus NP newmodels in the regions of the minimum misfit ineach algorithm step. The random walks withinthe Voronoi cells are carried out by using thestandard Gibbs sampler (Gelman et al., 1995; al-so see next section) restricted to the selected cell.Sambridge (1999a) demonstrated both the opti-mization and numerical efficiency of the NA al-gorithm which adapts fast according to the shapeof the underlying target functions and relativelyrapidly provides a map of domains around thetarget minima. A particular choice of the factorNr controls the preference of the algorithm for alocal search or for global exploration.

Numerical tests for a 1D inverse MT prob-lem indicate that the NA shows quite a similarperformance as the CRS algorithms above.

p p p pi j− −2 2#

Page 4: Stochastic interpretation of magnetotelluric data ...

10

Václav Cerv, Michel Menvielle and Josef Pek

2.3. Monte Carlo with Markov Chains

Stochastic sampling methods are used togenerate model samples distributed proportion-ally to the likelihood of the models. The sampleprobability distribution can then be used to nu-merically approximate Bayesian integrals thatprovide single parameter estimates or informa-tion about their credibility. The Monte Carlomethods with Markov Chains (MCMC) are al-gorithms to generate sample ensembles for theMonte Carlo integration.

In the Bayesian approach, the inverse prob-lem solution is represented by a posterior prob-ability distribution of the model given the dataand prior information on the model (e.g., Gran-dis et al., 1999)

where Prob(p) is the prior probability distribu-tion, and the likelihood function Prob(d/p) inthe form given above assumes that the dataitems follow a normal distribution law N(dl

obs,δdl

obs), l = 1, ..., ND. If no specific prior informa-tion on the model parameters is available, we canassume, for a layered Earth’s model, that loga-rithms of both the resistivity and thickness as-sume constant values within sufficiently broadlimits for each layer. We use this non-informativeprior in all the MCMC simulations presented inSection 4.

Without going further into detail here (see,e.g., Gelman et al., 1995; Grandis et al., 1999,for reference), the idea behind the MCMC is tocreate a random walk, or Markov process, thathas Prob(d/p) as its stationary distribution andthen to run the process long enough so that theresulting sample closely approximates a samplefrom Prob(d/p).

The Gibbs sampler is one of the particularmethods used to construct such a Markovprocess. The Markov chain relies on updating themodel parameters in the process of successivelyscanning the parameter domain under study. Ineach scan, we update one single model param-

( )( / ) ( )

( / )( )

ProbProb Prob

Prob expd

d d

dp d p

p dp

21

obs

mod obs

l

l l

l

N 2

1

D

\ δ−−

=

( / )Prob p d =

< F) 3/

eter, say pk, by drawing a new value from theone-dimensional conditional probability distri-bution Prob(pk /d, p1, ..., pk−1, pk+1, ..., pNS), i.e.with all parameters except the k-th one fixed attheir current values. One MCMC step is com-pleted after all NS model coordinates have beenupdated. In this way, an ergodic Markov chainis designed with an invariant probability lawidentical with Prop(p/d). The posterior margin-al probability distributions for the parameters aswell as various Bayesian integrals are then esti-mated from the successive simulated values ofthis Markov chain.

2.4. Resampling with the neighborhoodalgorithm (NAR)

MCMC is a computer intensive methodsince a series of direct problem solutions is re-quired for each parameter scan within the Gibbssampler. If an ensemble of direct solutions al-ready exists from computations carried out pre-viously, an effective resampling procedure canbe applied to this model ensemble as suggestedby Sambridge (1999b). The models generated,e.g., by previous global optimization runs areused to approximate the misfit function by theneighborhood algorithm, i.e. by a piecewiseconstant function in a Voronoi mesh with cellscentered at the models available. Then, theMCMC sampling is carried out for that surro-gate target function in exactly the same way asthe full MCMC simulation above, but withoutany additional direct problem solution required.Clearly, the success of this approach highly de-pends on how good a support the underlyingmodel ensemble gives to the true misfit func-tion, which is a delicate question in multidi-mensional parameter spaces, and might be alsoa problem-dependent issue.

3. Test data sets

We selected two specific data sets to analyzethe performance of the stochastic optimizationprocedures in situations often encountered invarious interpretations of electromagnetic in-duction data. The first Data Set (DSI) is synthet-

Page 5: Stochastic interpretation of magnetotelluric data ...

11

Stochastic interpretation of magnetotelluric data, comparison of methods

ic and is derived from a model with a shallowconductor over a deeper conductive target. Thisgeneral setting occurs in a variety of practicalsituations, especially if a layer of conductivesediments screens deeper layers containing flu-ids, but also in large-scale settings like those ofan electric asthenosphere screened by a conduc-tive lower crust. It is generally known that thedeep conductor is difficult to resolve unless itsconductance is considerably higher than that ofthe shallower screening layer.

The particular data generating model usedhere was adopted from (Grandis et al., 1999).Model for DSI is shown in fig. 1a-d, marked bya gray-white contrast in all sub-panels, and itsparameters are given in the figure caption. Con-ductances of the layers, S=h/ρ, are 2.4, 16, 20and 25, in Siemens, from the top to the bottom,and indicate that there is only little chance to re-solve the deep thin conductor situated at the

depth of 2 km. It is clear from fig. 1a-d that for5% of Gaussian noise added to the syntheticMT data no common linearized inversion givesany indication that a deeper conductor existsbeneath the shallow conductive layer. They allrather extend the intermediate third layer togreater depths and smear the deep thin conduc-tor over a depth range of more than 2 km intothe high-resistivity basement.

The second Data Set (DSII) is the experi-mental COPROD1 set introduced and analyzedin detail by Jones and Hutton (1979a,b), laterused for demonstrating the performance of theOccam 1D inverse approach by Constable et al.(1987), and commonly employed as a bench-mark data set for performance tests of 1D in-verse techniques in magnetotellurics. The CO-PROD1 data represent a set of 15 pairs of logapparent resistivity and MT phase data itemswithin the period range of 20 to 2000 s and

Fig. 1a-d. Results of four different linearized inversion routines applied to the synthetic data set DSI. The da-ta generating model is marked by a gray-white contrast in all sub-panels, and its layer parameters are: layer 1 –h1=0.6 km, ρ1=250 Ωm; layer 2 – h2=0.4 km, ρ2=25 Ωm; layer 3 – h3=2 km, ρ3=10 Ωm; layer 4 – h4=0.25 km,ρ4=10 Ωm; basement 5 – ρ5=1000 Ωm. MT data were generated from this model for 41log-regularly spaced pe-riods within the range from 10−3 to 10 s, and the synthetic impedances were further contamined with Gaussiannoise with the standard deviation of 5% of the maximum impedance element for each period. Inverse results inthe sub-panels correspond to: a) Occam inversion with roughness penalty (Constable et al., 1987); b) Occam in-version with total variation penalty (Portniaguine and Zhdanov, 1999); c) Occam inversion with gradient supportpenalty (Portniaguine and Zhdanov, 1999); d) inversion for a minimum number of layers according to Weaverand Agarwal (1993). In the boxes below the panels the respective regularization weights (reg), misfit (RMS2) val-ues, and other routine-specific parameters are given.

ba c d

Page 6: Stochastic interpretation of magnetotelluric data ...

12

Václav Cerv, Michel Menvielle and Josef Pek

correspond to a relatively simple structure withone pronounced conductive layer at a lowercrustal/upper mantle depth. Nonetheless, thedata set is by far not elementary, especially ow-ing to the data error structure that does not seemto follow any simple noise model.

4. Search in the parameter space: modelsimulations for the test data sets

4.1. Synthetic data set

In the subsequent numerical tests we con-centrate on the comparison of the performanceof the selected global and stochastic optimiza-tion procedures for parameter estimation anduncertainty quantification in simple but practi-cally relevant magnetotelluric settings. Avail-ability of relatively simple and reliable lin-earized inverse procedures for 1D MT problemsmakes it possible to assess the extra informa-tion we can infer from sampling the whole pa-rameter space as compared to searching direct-ly for one specific inverse solution.

A first series of numerical experiments aimedat comparing the selected methods applied to thesynthetic data set DSI. The underlying model forDSI is a 5 layer structure with a deep thin con-ductor that is clearly hard to detect in the surfaceMT data. We experimented successively withmodels consisting of 3 to 6 layers, including thebasement, to test the outputs of the individual op-timization procedures. We intentionally usedvery broad a priori limits for all layer parame-ters, specifically 10−3≤hi≤10 km, 1≤ρi≤104 Ωm,i=1, ..., number of layers to test the convergenceas well as the degree of risk of the individualmethods to end up in false minima.

With 3-layer test models the best attainableRMS2 misfit was 1.88 and none of the methodscould fit the data satisfactorily. Both the CRS6(pool of 100 models, 5000 iterations, 200 runs,accepted all models with RMS 2≤4) and MCMC(50000 steps, burn-in 10000, thinning 100) couldrecover the resistivity of the first layer and that ofthe basement with high accuracy, but they werenot able to satisfactorily simulate the fine resistiv-ity structure in the intermediate part of the modelsection by one single layer. The second layer

does, however, provide the true integral conduc-tance of the three original conductive layers withgood accuracy, within less than 5% of the truevalue. Both methods compensate for the insuffi-cient resistivity contrast between the first and sec-ond layer by systematically decreasing the depthto the conductor by about 100 m as compared tothe true model. Figure 2 indicates the mean val-ues of the individual parameters are displayed,and ranges comprising 50 and 90% of all the pa-rameter values provided by the simulation tech-niques. Though not so comprehensive as com-plete parameter histograms, these plots qualita-tively provide a good idea about both the correct-ness of the recovered parameters, their uncertain-ty and skewness of the underlying histograms.

After having increased the number of layersto four, the fit to the data improved substantial-ly. Parameters of the CRS6 and MCMC proce-dures were the same as those specified for thecase of 3 layers above; from the CRS outputsonly models with RMS2≤1 were accepted nowfor constructing the histograms.

In the 4-layer case, both the CRS6 andMCMC recover the top layer resistivity almostexactly, while the thickness of the first layer issystematically greater by about 50 m than thetrue value and is scattered over about 40 m forthe MCMC estimate, if the interval between the25 and 75% quartiles of the parameter samplesis used to measure the uncertainty. As the CRS6was always terminated after a fixed number of5000 iterations, repeated values of h1 during thefinal stage of the iteration process lead to over-optimistic estimates of the parameter uncertain-ties, specifically 20 m for h1, as compared to theprobabilistic MCMC sampling. Contrary, theparameters of the second layer, i.e. the shallowconductor, are recovered only with large uncer-tainty, especially as regards its thickness. Nev-ertheless, both methods produce relativelysharp estimates of the conductance of this lay-er, within 1.5 S, which are by about 4 S lowerthan the true value of S2=16 S. The deficit of theconductance in the second layer is compensat-ed by increased conductance of the layer be-neath, which tries to account for the summaryeffect of the third layer in the true model as wellas for that of the deep thin conductor situatedon the top of the basement.

Page 7: Stochastic interpretation of magnetotelluric data ...

13

Stochastic interpretation of magnetotelluric data, comparison of methods

As the MCMC sampling is an extremelytime-consuming procedure, even for a simple 4-layer MT model, we also tested the relatively fastneighborhood resampling algorithm (NAR) bySambridge (1999b) on the present data set DSI.We applied this procedure to a complete set ofmodels generated by the CRS6 algorithm in 20successive runs with different initial pools, 100models each, and with 5000 iteration steps ineach run, which makes altogether about 100000to 150000 individual models distributed all overthe parameter space, but with most of them con-centrated in close neighborhoods of the mainminima of the target function. We performed20000 iterations of the NAR to produce a resam-pled set of models for further stochastic infer-ence. We can see from fig. 2 that the NAR re-sampling gives results similar to those of theMCMC, but with clearly more diffuse statisticalranges for those model parameters that have

been recovered sharply by the MCMC samplingabove. Comparison of true target misfits withthose obtained by the NA interpolation in theVoronoi mesh during the NAR runs shows thatthe NA interpolation may extend the domain ofthe target minimum widely. As a consequence,models with large misfits may be accepted morefrequently into the resampled chain than it wouldbe appropriate to their true likelihood.

For the 4-layer MT model tested here, thedifference between the MCMC and NAR out-puts can be illustrated by stacked model sam-ples shown in fig. 3 (top row of panels). Here,the gray scale is used to visualize the relativenumber of models from the sample ensemblewhich pass through the individual cells in thelog ρ−logz plane. The plots show limits for theresistivity indicated by the sample models as afunction of depth. The fuzziness of the NAR re-sults could be partly reduced by using a more

Fig. 2. Mean values (vertical bars) and ranges comprising 50 and 90% of all parameter values (black and grayzones, respectively) provided by various global optimization and stochastic sampling runs applied to the DSI da-ta set with 5% of Gaussian noise. The vertical bars in the top row indicate the true values of the parameters ofthe synthetic model. For specific routine parameters see the text. For the 4 and 5-layer models, the row labeledLIN shows parameter estimates (long bars) and variances (short bars) obtained for classical linearized inversesolutions. The classical parameter variances for the 5-layer model from a truncated SVD analysis (only the sev-en largest from altogether nine singular values were used to compute the parameter variance-covariance matrix)are shown by full lines, and those obtained with addition of the nul space of the solution are indicated by dottedlines. Black rectangles show the restrictions to the parameter acceptability range for the non-linear model (mod-els with RMS less than one here).

Page 8: Stochastic interpretation of magnetotelluric data ...

14

Václav Cerv, Michel Menvielle and Josef Pek

accurate, but also greatly more time-consum-ing, interpolation scheme employing a standardinverse distance power procedure (with thepower of 4 here). Unfortunately, other experi-ments indicate that this improvement is not sys-tematic and the performance of the NAR withvarious interpolants depends highly on the di-mension of the problem and the particular dis-tribution of the interpolated models throughoutthe model space.

For comparison with a classical single-mod-el inverse solution, we also present in fig. 2 results of a posteriori linearized uncertaintyanalysis based on the Singular Value Decompo-sition (SVD) of the sensitivity matrix for the 4-layered model obtained earlier by the inversionalgorithm by Weaver and Agarwal (1993) (seefig. 1a-d for the model structure). The structureof the singular vectors corresponding to the four

largest singular values of the sensitivity matrixfor this model weighted by data variances indi-cates that the most significant parameters in themodel are h1, S3=h3/ρ3, ρ1, and S2=h2/ρ2, whilethe two smallest singular values suggest the re-sistivity of the basement, ρ4, and the resistanceof the second layer, T2= h2ρ2, to be the least sig-nificant parameters. This ranking is also clearlymanifested in fig. 2 by the individual parame-ters’ error bars, computed as square roots of thediagonal elements of the parameter covariancematrix (Menke, 1989). These error bounds ap-proximately demarcate the range of models thatare equivalent, as to their fit to the experimentaldata, with the original model analyzed. Thoughthe classical variances are qualitatively in ac-cord with the parameter uncertainties suggestedby the stochastic simulations, their magnitudesare known to be underestimated (Menke, 1989).

Fig. 3. Gray-shade plots of stacked models produced by CRS, MCMC and NAR runs applied to DSI data setwith 5% of Gaussian noise. Top row of panels show the results for a 4-layer model approximation, bottom pan-els are for inversion with 5-layer models. The degree of gray color indicates the relative number of models thatpass through 0.1×0.1 sized cells in the logρ−logz plane.

Page 9: Stochastic interpretation of magnetotelluric data ...

15

Stochastic interpretation of magnetotelluric data, comparison of methods

Obtaining more realistic error bounds withinthe classical analysis would require more so-phisticated approaches to the estimates of theparameter limits to be applied, as those suggest-ed, e.g., by Pous et al. (1985) for linear or byMeju and Hutton (1992) for non-linear prob-lems, or simply performing a random search foracceptable models in a certain vicinity of theoriginal inverse solution.

For test models with 5 or more layers, theinverse problem becomes ill-posed and most ofthe inverse methods suffer in some way fromparameter indefiniteness or inter-dependence.Often, the physical sense of the parameters getslost in multi-layered models since characteristicconductive/resistive features of the real struc-ture cannot be attributed to the model parame-ters in an unambiguous way. In particular, thehistograms of the model parameters are diffi-cult to interpret in terms of marginal distribu-tions since they can, in fact, mix together trueparameters of more than one physical domain.

The bottom panels in figs. 2 and 3 showresults of the global and stochastic sampling for5-layer test models. Except for the resistivitiesof the first and last layer, which are well con-strained by the data, all the other parameter his-tograms are diffuse and, on a more detailedscale, even multimodal. The concentrated his-tograms resulting from the NAR algorithm withthe inverse distance power (power 4) inter-polant rather suggest that the method stayedtrapped in a vicinity of one specific minimumdictated by the particular distribution of theCRS set of models.

The stacked model plots in fig. 3 provide aclearer visual representation of the model sam-ples with respect to the resistivity-depth sec-tion. They show very similar conductivity pat-terns for the shallow part of the structure downto the bottom of the first conductor for both the4 and 5-layer models. The third layer is recov-ered sharply but already shows a typical high-resistivity ‘shadow’ in the 5-layer case, which isan artifact due to a low sensitivity of the MTfield to resistive domains beneath the first, shal-low conductor. As regards the manifestation ofthe deep thin conductor at the depth of 3 km,less than 10% of the models in both the CRSand MCMC patterns indicate a layer of in-

creased conductivity between 2 and 3.5 km.The standard NAR algorithm gives a largelydiffuse image of the resistivity structure whichindicates that the underlying set of CRS models(from 20 runs of CRS6, with the pool of 100models and 5000 iteration steps) does not cov-er the parameter space properly. This situationdid not change for better even when the supportmodels were generated by the NA minimizationalgorithm instead of CRS.

Due to a massive inter-dependence of the pa-rameters in multilayer cases, standard MCMCGibbs sampler mixes poorly and requires verylong chains, of the order of hundreds of thou-sands of samples, to walk throughout the param-eter space with sufficient density. Neverthelessthe resistivity pattern in fig. 3 remains quite sta-ble after a few tens of thousands of the MCMCsteps, though its detailed probabilistic interpreta-tion may be questionable for chains so short.

The stacked model resistivity image showsstable structural features also for increasingnumber of layers in the test models, though es-pecially the high-resistivity artifacts becomemore pronounced in runs with many layers.

The classical SVD analysis of the 5-layerlinearized inverse solution allows similar con-clusions to be made as regards the parameteruncertainty. Contrary to the 4-layer model stud-ied earlier, the SVD spectrum for the 5-layermodel indicates that at least two singular vec-tors form the nul space of the inverse solution,specifically those corresponding to the resist-ances of the second and fourth layers, T2= h2ρ2

and T4= h4 ρ4. Then the linearized variances forthe parameters involved in these combinationsare not bounded within the range of the physi-cal acceptability of the models. Figure 2 showsparameters of one particular classical 5-layerinverse solution along with their linearized vari-ances for the case of a truncated SVD spectrum,with only seven largest singular values fromnine left for the computation of the parametercovariance matrix, as well as for the case of thefull SVD spectrum. In the latter case, the pro-jection of the nul space solutions onto the spaceof the physical parameters gives very large, oralmost infinite, error bounds for the poorly re-solvable parameters. The predicted linearizedparameter variances have to be further verified,

Page 10: Stochastic interpretation of magnetotelluric data ...

16

Václav Cerv, Michel Menvielle and Josef Pek

and properly reduced, by an extra procedure sothat they do not exceed the region of acceptablemodels for the underlying non-linear problem.

One additional difficulty of the uncertaintyanalysis based on a linearized inverse single-model solution is that the SVD spectra may dif-fer not negligibly for different inverse modelsdue to the non-linearity of the underlying directproblem, especially if the experimental data aresufficiently noisy, suggesting thus sufficientlydifferent nul spaces for different inverse mod-els. In the stochastic sampling procedures, thisequivalency across various classes of modelsmay substantially deteriorate the convergenceproperties, but for long enough chains all themodel classes should be visited and properlysampled in the course of the stochastic walk.

4.2. Experimental COPROD1 data set

We carried out a similar series of numericalsimulations also for the experimental data setDSII (COPROD1) with the standard data errorsadopted from Constable et al. (1987). This dataset is specific especially due to its error struc-ture. We show in fig. 4 a projection of all theRMS2 misfit values obtained from twentyCRS6 runs (3 layer test models, pool size 100,3000 iterations) as a function of the resistivityof the first layer. The figure clearly shows a bi-modal character of the misfit, with a deep min-imum at ρ1 close to 170 Ωm, but with RMS2>1and with another broad, but very flat zone,ρ1>300 Ωm, with misfits RMS2<1. The formertarget minimum attracts most of the solutions ofboth the CRS and MCMC runs, and especiallyfor MCMC with 3-layer models does not allowthe chain to leave its ‘gravity’ in reasonabletime even in case that the chain has been start-ed from the minimum of the target function,provided by the CRS. In this case, the CRS mapof the target minima provides valuable informa-tion for constraining the parameters for theMCMC simulation.

Figure 5 summarizes the results of CRS,MCMC and NAR simulations with 4-layer mod-els for the DSII data set in the form of stacked log ρ−logz gray-shade plots. Similarly as in thesynthetic case, very broad a priori parameter

ranges were chosen to simulate minimum priorknowledge of the parameter limits. Specifically,we have set the layer thicknesses to be between3 and 300 km and the resistivities between 1 and105 Ωm, except for the first layer with ρ1>180Ωm to avoid the ‘target trap’ described above.

The resistivity patterns from the CRS6 (200runs, pool size 100, 3000 iterations) and MCMC(50000 steps, burn-in 10000, thinning 100) runsshow very similar structures, though the CRSpattern is sharper due to excessive accumulationof the solutions close to the target minima. Themodels seem to prefer very high resistivity val-ues of the top layer, but constraining this param-eter does not change the deeper resistivity pat-tern noticeably. Similarly as in the syntheticcase above, the NAR resampling algorithmsmears the resistivity patterns over a broaderrange of values. In the case studied here, no dra-matic improvement was achieved if the inversedistance power interpolant was used within the

Fig. 4. Projection of all RMS2 misfits versus the re-sistivity of the top layer obtained in the course of 20CRS6 runs applied to the COPROD1 (DSII) data setwith 3-layer test models. Along the axes, the corre-sponding histograms of the RMS2 and ρ1 values areshown.

Page 11: Stochastic interpretation of magnetotelluric data ...

17

Stochastic interpretation of magnetotelluric data, comparison of methods

NAR algorithm instead of the Voronoi tessella-tion. Figure 5 also presents the fit of the 4-lay-er model data from the above simulation runs to the experimental apparent resistivities andphases.

5. Discussion and conclusions

We applied three common methods of glob-al and stochastic optimization, specifically theCRS, MCMC and NA/NAR, to simple 1D mag-

Fig. 5. Left column: gray-shade plots of stacked models obtained from CRS, MCMC and NAR runs applied tothe COPROD1 (DSII) data set with 4-layer test models. The dashed line indicates a preferred 4-layer solutionby Jones and Hutton (1979b) who minimized the misfit function without considering the confidence weights.The full line shows a 4-layer solution from the inverse procedure by Weaver and Agarwal (1993) applied to thedata supplemented with error bars according to Constable et al. (1987). Central and right columns: data fit forthe apparent resistivities and phases, respectively, for models generated by the CRS, MCMC and NAR runs. Theexperimental COPROD1 resistivities and phases are shown by white circles with error bars according to Con-stable et al. (1987), the black dots are solutions for the model by Jones and Hutton (1979b).

Page 12: Stochastic interpretation of magnetotelluric data ...

18

Václav Cerv, Michel Menvielle and Josef Pek

netotelluric models to test their performance inproblems that target at searching for and at ver-ifying distinct structural features in geoelectri-cal sections, such as boundaries with a largeconductivity contrast or conductors screened byshallow conductive structures. Similar structur-al elements are often typical of tectonically ac-tive areas, and their detection, as well as estima-tion of their geometrical and electrical parame-ters, along with a proper specification of the un-certainty of these estimates, are indispensableprerequisites for carrying out any analysis ofpossible correlations between electrical struc-tural factors and non-electrical indicators of thetectonic activity.

CRS and NA are relatively simple, in termsof the tuning complexity, but effective globaloptimization algorithms that are suitable bothfor providing a quick inspection of the targetfunction topography and for searching for theinverse problem solution. Performing repeatedruns of these algorithms with different random-ly generated initial model populations gives notonly a good idea of the character and multi-modality of the underlying target function, butalso draws an image of the parameter bounds(uncertainties), though not in quite fair propor-tions with regard to the likelihood of the mod-els. The parameter distributions provided by theCRS or NA are affected by the tuning condi-tions of the minimization procedure and maybe, e.g., excessively peaked if the procedure isterminated after a fixed (and large enough)number of iteration steps, or diffuse and shiftedif the optimization is stopped after a certainportion of models in the pool, or all of them,reach a predefined misfit threshold.

Stochastic sampling techniques aim at pro-viding unbiased model samples distributed ac-cording to the likelihood of the models. For lay-ered MT models with very broad a priori limitsfor the model parameters, the MCMC Gibbssampler procedure shows a good convergenceprovided the underlying test models are not large-ly underdetermined. In the opposite case, theMarkov chain may mix poorly unless closer andmore realistic parameter bounds are specified, es-pecially for the layer thicknesses. With broad pa-rameter ranges, MCMC shows frequent irregulartransitions between fine modes of the target func-

tion and does not provide stable parameter mar-ginal distributions unless a very large number ofMCMC steps is carried out. It can, however, pro-vide relatively quickly a stable approximate im-age of the resistivity versus depth distribution viastacking the models from the chain.

Resampling model ensembles produced byrelatively fast global optimization procedures,such as CRS or NA, is another way to estimatingthe model parameters and their true uncertainty.The NAR resampling based on approximatingthe misfit function by constants in an ensamble-related Voronoi mesh gives results similar tothose of MCMC if sufficiently narrow boundsfor the MT model parameters can be specified apriori. For broad parameter limits and highly un-derdetermined models, NAR tends to producemore diffuse posterior distributions for the pa-rameters as compared to MCMC. One of thelikely reasons for this behavior may be that mod-els produced by a fast converging CRS proce-dure do not provide sufficient support for the NAinterpolation throughout the parameter space. Asa consequence, the high probability region mayexpand via the NA interpolation far beyond itstrue limits. In some cases, using a more accurateinterpolation technique (e.g., inverse distancepower interpolant here) could improve the re-sults, but it cannot be generalized.

As compared to standard single-model lin-earized inversions, the global and stochastic in-verse methods provide additional information,especially as regards the possible ranges of mod-el parameters and uncertainty of the parameterestimates conditioned on the observed data,though for the price of higher, and sometimesenormous computation times. They are of partic-ular interest in situations like, e.g., directedsearch for specific structural features, inversionwith various kind of prior information, or injointly solving a problem of parameter estima-tion and model selection (Malinverno, 2002).

Acknowledgements

Financial assistance of the Grant Agency ofthe Czech Republic under grants No. 205/04/0740and No. 205/04/0746, and of the Ministry of Edu-cation, Youth and Sports of the Czech Republic,

Page 13: Stochastic interpretation of magnetotelluric data ...

19

Stochastic interpretation of magnetotelluric data, comparison of methods

grant No. ME 677 is gratefully acknowledged.We thank V. Spichak and an anonymous review-er for helpful comments.

REFERENCES

CONSTABLE, S.C., R.L. PARKER and C.G. CONSTABLE

(1987): Occam’s inversion: a practical algorithm forgenerating smooth models from EM sounding data,Geophysics, 52, 289-300.

GELMAN, A., J.B. CARLIN, H.S. STERN and D.B. RUBIN

(1995): Bayesian Data Analysis (Chapman & Hall,New York), pp. 552.

GRANDIS, H., M. MENVIELLE and M. ROUSSIGNOL (1999):Bayesian inversion with Markov chains, I. The magne-totelluric one-dimensional case, Geophys. J. Int., 138,757-768.

JONES, A.G. and R. HUTTON (1979a): A multi-station mag-netotelluric study in Southern Scotland, I. Fieldwork,data analysis and results, Geophys. J. R. Astron. Soc.,56, 329-349.

JONES, A.G. and R. HUTTON (1979b): A multi-station magne-totelluric study in Southern Scotland, II. Monte-Carloinversion of the data and its geophysical and tectonic im-plication, Geophys. J. R. Astron. Soc., 56, 351-358.

MALINVERNO, A. (2002): Parsimonious Bayesian Markovchain Monte Carlo inversion in a nonlinear geophysicalproblem, Geophys. J. Int., 151, 675-688.

MEJU, M.A. and V.R.S. HUTTON (1992): Iterative most-squares inversion: application to magnetotelluric data,Geophys. J. Int., 108, 758-766.

MENKE, W. (1989): Geophysical Data Analysis: DiscreteInverse Theory (Academic Press, London), 2nd edition,pp. 289.

MONTAZ, A., A. TORN and S. VIITANEN (1997): A numericalcomparison of some modified controlled randomsearch algorithms, TUCS Technical Report No. 98.

MOSEGAARD, K. and A. TARANTOLA (1995): Monte Carlosampling of solutions to inverse problems, J. Geophys.Res., 100, 12431-12447.

PORTNIAGUINE, O. and M.S. ZHDANOV (1999): Focusing geo-physical inversion images, Geophysics, 64, 874-887.

POUS, J., X. LANA and A.M. CORREIG (1985): Generation ofEarth stratified models compatible with both ellipticityand phase velocity observations of Rayleigh waves,Pure Appl. Geophys., 123, 870-881.

PRICE, W.L. (1977): A controlled random search procedurefor global optimization, Computer J., 20, 367-370.

SAMBRIDGE, M. (1999a): Geophysical inversion with a neigh-bourhood algorithm, I. Searching a parameter space,Geophys. J. Int., 138, 479-494.

SAMBRIDGE, M. (1999b): Geophysical inversion with neigh-bourhood algorithm, II. Appraising the ensemble, Geo-phys. J. Int., 138, 727-746.

SENN, M.K. and P.L. STOFFA (1995): Global optimizationmethods in geophysical inversion, Advances in Explo-ration Geophysics (Elsevier, Amsterdam), 4, pp. 294.

TVRDÍK, J., L. MISÍK and I. KRIVY (2002): Competing heuris-tics in evolutionary algorithms, in Intelligent Technolo-gies: from Theory to Applications, edited by P. SINCÁK,V. KVASNICKA, J. VASCÁK and J. POSPÍCHAL (IOS Press,Amsterdam), 159-165.

WEAVER, J.T. and A.K. AGARWAL (1993): Automatic 1D in-version of magnetotelluric data by the method of mod-elling, Geophys. J. Int., 112, 115-123.