Comparing training-image based algorithms using...

44
1 Comparing training-image based algorithms using an analysis of distance Xiaojin Tan 1 , Pejman Tahmasebi 1 , and Jef Caers 1 1 Department of Energy Resources Engineering, Stanford University, USA; Abstract As additional multiple-point statistical (MPS) algorithms are developed, there is an increased need for scientific ways for comparison beyond the usual visual comparison or simple metrics, such as connectivity measures. In this paper we start from the general observation that any (not just MPS) geostatistical simulation algorithm represents two types of variability: 1) the within-realization variability, namely, that realizations reproduce a spatial continuity model (variogram, Boolean or training-image based), 2) the between-realization variability representing a model of spatial uncertainty. In this paper, we argue that any comparison of algorithms needs, at a minimum, to be based on these two randomizations. In fact, for the MPS algorithms, we illustrate, with examples, that there is often a trade-off: increased pattern reproduction entails reduced spatial uncertainty. In this paper we make the subjective choice that the best algorithm maximizes pattern reproduction while at the same time maximizes spatial uncertainty. In order to render these fundamental principles quantitative, we rely on a distance-based measure for both within-realization variability (pattern reproduction) and between-realization variability (spatial uncertainty). To compare any two (or more) algorithms, we first generate a set of realizations with each algorithm. Each realization is up-gridded into a set of successively coarser grids; a single realization is turned into a multi-resolution pyramid of grids. The same operation is performed on the training image. We calculate for each realization the Jensen-Shannon divergence (a distance) between the pattern histograms of each multi- resolution grid of the realizations generated with both algorithms and the training image. A weighted average of the Jensen-Shannon divergence of all multi-resolution grids represents a single quantitative measure of the within-realization variability (pattern reproduction) of a given algorithm. The same distance, but now calculated and averaged between sets of realizations of both algorithms is then a single quantitative measure of the between-realization variability. We illustrate in this paper that this method is efficient and effective for 2D, 3D, continuous and discrete training images. Multiple-point statistics, spatial uncertainty, training images, pattern modeling

Transcript of Comparing training-image based algorithms using...

1

Comparing training-image based algorithms using an

analysis of distance

Xiaojin Tan1, Pejman Tahmasebi1, and Jef Caers1 1 Department of Energy Resources Engineering, Stanford University, USA;

Abstract

As additional multiple-point statistical (MPS) algorithms are developed, there is an increased need for scientific ways for comparison beyond the usual visual comparison or simple metrics, such as connectivity measures. In this paper we start from the general observation that any (not just MPS) geostatistical simulation algorithm represents two types of variability: 1) the within-realization variability, namely, that realizations reproduce a spatial continuity model (variogram, Boolean or training-image based), 2) the between-realization variability representing a model of spatial uncertainty. In this paper, we argue that any comparison of algorithms needs, at a minimum, to be based on these two randomizations. In fact, for the MPS algorithms, we illustrate, with examples, that there is often a trade-off: increased pattern reproduction entails reduced spatial uncertainty. In this paper we make the subjective choice that the best algorithm maximizes pattern reproduction while at the same time maximizes spatial uncertainty. In order to render these fundamental principles quantitative, we rely on a distance-based measure for both within-realization variability (pattern reproduction) and between-realization variability (spatial uncertainty). To compare any two (or more) algorithms, we first generate a set of realizations with each algorithm. Each realization is up-gridded into a set of successively coarser grids; a single realization is turned into a multi-resolution pyramid of grids. The same operation is performed on the training image. We calculate for each realization the Jensen-Shannon divergence (a distance) between the pattern histograms of each multi-resolution grid of the realizations generated with both algorithms and the training image. A weighted average of the Jensen-Shannon divergence of all multi-resolution grids represents a single quantitative measure of the within-realization variability (pattern reproduction) of a given algorithm. The same distance, but now calculated and averaged between sets of realizations of both algorithms is then a single quantitative measure of the between-realization variability. We illustrate in this paper that this method is efficient and effective for 2D, 3D, continuous and discrete training images. Multiple-point statistics, spatial uncertainty, training images, pattern modeling

2

Introduction

Any presumed novel contribution in science requires evidence on a simple question: has the

proposed methodology improved on the state-of-the-art? Specifically to uncertainty and

stochastic modeling, such proof is neither always evident nor trivial. If we make any

probabilistic assessment of an event certain to happen or not, e.g. it will rain tomorrow or not,

then, there is no trivial objectifiable manner, without additional assumptions, typically relating

to stationarity, in assessing the correctness of that probabilistic assessment. In complex higher-

dimensional stochastic modeling, such as geostatistical simulation, the same question arises.

The geostatistical literature contains many probabilistic (or other) models of spatial uncertainty

through stochastic simulation. In this paper we study algorithms that rely on a training image as

source for specifying spatial continuity. More specifically, we investigate the following question:

given k algorithms plus a training image deemed relevant for the spatial phenomenon to be

modeled: how would one rank these algorithms?

We immediately need to acknowledge that such question cannot be addressed in a fully

objective manner. We will need to decide on some measure of “best” and such decision can be

questioned. However, what we do argue in this paper is that such measure should, at a

minimum, be based on two concepts that are often counteracting each other: 1) the

reproduction of the statistics deemed relevant for the field and 2) the variability between the

generated realizations (also termed the space of uncertainty). Specifically to geostatistics with

training images, the question of performance of the algorithms is important and remains

unaddressed: how do we measure progress in the development of new algorithms now that

many researchers enter this field each with their own approaches? Some papers rely on visual

3

comparisons, “it looks better”; others rely on some summary statistics that are checked against

the training image (Soleng et al., 2006; De Iaco & Maggio, 2011). However the summary

statistics addresses one issue only, reproduction of statistics, and ignores the question related

to space of uncertainty.

In this paper, we propose a practical methodology that allows ranking training image-based

geostatistical algorithms regardless of the nature of the training image, namely, whether they

are discrete or continuous, two or three dimensional, have small or large grid dimensions. Our

methodology relies on the following definition of best: an algorithm A is better than an

algorithm B if the training image statistics are reproduced better while at the same time the

space of uncertainty (the variability between realizations) is larger. To make this quantitative

we rely on the concept of distances between realizations and distances between the statistics

of a realization and the target training image statistics. We propose a ratio of these two

counteracting distances as a measure of performance and rank algorithms on that basis. We

provide various examples of comparisons of some MPS algorithms.

Methodology

Why distance and not covariance?

The proposed methodology heavily relies on distances as a concept for variation, not on

variance or covariance, hence this choice, often advocated by the authors (see Suzuki and

Caers, 2008; Scheidt and Caers, 2009; Caers, 2011) requires some motivation. Given a set of L

4

random vectors x, for example a set/ensemble of L geostatistical realizations of size N, typically

L<<N, gathered in a data matrix X of size N×L, then the centered ensemble covariance is simply

1 1

,T TC HX HX H I

L L 11 C is of size N×N, H is a centering matrix (1)

We will use capital italic for matrix and bold for vectors. The dimensions of these covariances

grow rapidly (N), hence it is often preferable to work with dot-products (Shawe-Taylor and

Cristianini, 2004), calculated from the same information as

( )TD HX HX of size L×L (2)

Clearly for all cases of practical interest in geostatistics we have: L<<N. More importantly, the

dot-product is related to the Euclidean distance, namely, each entry dij in D can be calculated as

function of a Euclidean distance (Borg & Lingoes, J. 1987)

2 2 21

2ij ki kj ijd e e e (3)

with eij the Euclidean distance between two realizations xi and xj; ekj is then the Euclidean

distance between realization xk and xj . Dot-products and Euclidean distances have simple

extensions to non-Euclidean spaces such as the Manhattan distance, Minkovski distance,

Hausdorff distance and many others. Kernels (Schölkopf & Smola, 2002; Shawe-Taylor and

Cristianini, 2004) are used to transform dot-products into new dot-products that separate the

space of X better. This is termed the “kernel trick” in the computer science literature and

commonly used for non-linear regression and classification problems. There are no trivial non-

Euclidean extensions of covariance or variance, or, such extensions do not address the

5

dimensionality problem (size of N) of the covariance matrix, which is still a measure of linear

relationship/variation. Higher order cumulants (Dimitrakopoulos et al., 2010) may capture more

complex variation, but their dimensionality explodes even faster than covariances. The same

observation can be made for “cliques” in a Markov random field setting (Tjelmeland & Besag,

2001).

Hence, while the subsequent methodology could be presented in terms of variances and

covariances (typically termed “analysis of variance” or ANOVA), it would neither have the

practicality nor the effectiveness of the same methodology presented using an “analysis of

distance” (which we will term: ANODI).

A multi-resolution pyramid of realizations and training image

In this paper we consider that some target statistics are presented in an exhaustive training

image, which is common to most multiple-point geostatistical modeling. We also recognize that

statistical variation occurs at multiple scales; hence, any comparison methodology should

include differences in such variation. For this purpose, we create, from any single realization a

pyramid of multiple resolution views (similar to Heeger & Bergen, 1999) of the same realization,

see for example in Figure 1.

The typical approach in MPS is to use a multi-grid approach by subsampling finer grids into a

coarser grid which was introduced in the snesim algorithm and needed for practical

applications (Strebelle, 2002; Caers et al., 2003). However, in doing so, finer scale variability is

simply lost at coarses scales. Instead, we rely on a multi-resolution method first proposed in

6

MPS by Honarkhah (2011). Figure 1 shows a result for a simple training image. Instead of

subsampling the image, we interpolate the coarse grid values from the finest scale resolution.

This interpolation relies on bi-cubic interpolation in 2D and tri-cubic interpolation in 3D (Lekien

& Marsden, 2005). In a general context, tri-cubic interpolation is used to obtain values at

arbitrary locations from a gridded data set. In 3D, the interpolated value v at any arbitrary

location (x,y,z) is obtained from the 64 nearest neighbor gridded values at locations( , , )i j kx y zu u u

as follows

3 3 3

0 0 0

( , , )i j kijk x y z

i j k

v x y z a u u u

The coefficients are obtained by solving a set of 64 equations (see Lekien & Marsden, 2005 for

details). In case of binary variables, the interpolated value is no longer binary. In that case we

use Otsu’s thresholding method (Otsu, 1979) to turn the continuous valued tri-cubic

interpolations back to binary variables (categorical values are basically a vector of binary

indictor variables).

In terms of mathematical notation this operation is summarized as follows. We consider a

training image as ti that expands into a set of multi-resolution grids g=1,…,G as

expand1 2{ , , , }G ti ti ti ti ti

(4)

Each realization =1,…,L generated with algorithm k can be expanded likewise

expand( ) ( ) ( ) ( ) ( ),1 ,2 ,{ , , , }k k k k k G re re re re re

(5)

7

Should realization and training image be created on different grids then G is the maximum

attainable coarse resolution common to both grids.

Summarizing pattern frequencies by means of clustering

We now consider the multi-resolution view of a training image or geostatistical realization and

attempt to summarize its statistics. We propose two ideas: 1) relying on multiple-point

histograms (MPH, see for example Deutsch and Gringarten, 2000; Lange et al., 2012), which

only works for small cases and binary variables and 2) relying on grouping patterns into clusters

and therefrom deducing a cluster-based histograms of patterns (CHP).

A multiple-point histogram (MPH) records for binary realizations, within a given fixed template,

every possible pattern configuration and creates a frequency table of these patterns (see for

example Lange et al., 2012). To create a meaningful table, one has to use a small template (for

example 4 × 4 and for 3D to 3 × 3 × 2), otherwise each bin would contain either one or zero

patterns. One then calculates an MPH for each grid in the pyramid. In terms of mathematical

notation this operation can be summarized as follows

summarize1 2 1 2{ , , , } { ( ), ( ), , )}G Gti ti ti MPH ti MPH ti MPH(ti

(6)

where each MPH vector consists of counts MPHi; hence for the training image at multi-

resolution g we have the notation,

,1( ) { ( ), , ( )}pat gg g n gMPH MPHMPH ti ti ti

(7)

8

with npat,g the number of different patterns for that resolution g. MPHs only apply to small cases

(usually 2D) and binary variables because the frequency table explodes when going to 3D or

dealing with multi-category and continuous variables. We will use it in this paper therefore as a

reference of comparison with the methodology presented next, which is applicable to any case.

In an alternative approach, we cluster patterns into groups using the methodology described in

Honarkhah and Caers (2010); this methodology works for 3D and continuous values. For each

cluster, we record the number of patterns and calculate the prototype, which is a

representative of each cluster; the prototype could be a mean of the patterns within that

cluster or a medoid pattern. Unlike in the MPH approach, the template size for each grid g is

determined automatically using a so-called elbow plot (see Honarkhah and Caers, 2010), which

determines a template size relevant to the pattern variation in the image. We repeat this

exercise for all multi-resolution grids in the pyramid for which this is possible. We do the same

for all realizations. However, for the realizations, the clustering is slightly different. Since we

want to make sure that the clustering results between training image and realization are

comparable for each multi-resolution grid, we cluster the realization patterns by calculating

distances with the training image prototypes and simply assign the pattern to the closest

prototype. In this fashion, we obtain the same number of groups/clusters with the same

prototypes for realizations and training image. In the example section, we show that such

clustering leads to similar results as using the full multiple-point histogram described above in

case of a simple 2D binary example.

9

In terms of mathematical notation the above operations can be summarized as follows. Each

pyramid is now summarized as a list of what we will term cluster-based histograms of patterns

CHP, itself a list of frequencies:

summarize1 2 1 2{ , , , } { ( ), ( ), , ( )}G Gti ti ti CHP ti CHP ti CHP ti (8)

and similarly for

summarize( ) ( ) ( ) ( ) ( ) ( ),1 ,2 , ,1 ,2 ,{ , , , } { ( ), ( ), , ( )}k k k G k k k Gre re re CHP re CHP re CHP re

(9)

Each CHP vector consists of frequencies; hence for the training image at multi-resolution g we

have the notation,

1( ) { ( ), , ( ), , ( )}g gg g c g C gCHP CHP CHPCHP ti ti ti ti

(10)

with Cg the number of clusters for the multi-resolution grid g.

Calculating distances

As stated in the introduction, two distances need to be calculated 1) a distance summarizing

pattern reproduction and 2) a distance summarizing space of uncertainty. We consider the

former distance first. The distance will need to summarize the total difference between CHPs

of realizations generated with algorithm k and training image. To do so, we propose a statistical

measure of distance termed the Jensen-Shannon divergence (Cover and Thomas, 1991; Endres

and Schindelin, 2003) which for two frequency distributions pi and qi is the average of two

Kullback-Leibler divergences

10

1 1( , ) log log

2 2i i

JS i ii ii i

p qd p q

q p

p q

(11)

Other statistical measures of difference that have been used in this context (Lange et al., 2012)

are the chi-squared distance, but such distance is not robust in the context of CHPs whose

frequencies vary widely. If we use the CHPs, then in terms of a single multi-resolution grid g,

the JS distance between the th realization and training image is

( ),( ) ( )

, , ( )1 1 ,

( ) ( )1 1( , ) ( )log ( )log

2 ( ) 2 ( )

g g

g g

g g

g gg g

C Cc k g c g

JS g k c k g c gc cc g c k g

CHP CHPd CHP CHP

CHP CHP

re ti

re ti re titi re

(12)

cg is a counter for the number of clusters Cg obtained for each multi-resolution grid g. The same

formula is applied to calculate differences between realizations and ' for each multi-

resolution grid g, in notation, the distances ( ) ( '), ( , )JS g k kd re re . The same Eq. (11) can be used to

calculate differences in MPH.

For each multi-resolution grid g, given L realizations of each algorithm k and a training image,

we can calculate k distance tables of (L+1)×(L+1) (k is the number of algorithms for

comparison). Multi-dimensional scaling (Borg & Groenen, 1997; Caers, 2011) can be used to

visualize the various differences. This visual appreciation may already reveal significant

differences between the algorithms and should be used as supporting visual evidence. It may

also reveal how algorithms compare as the resolutions change.

The between realization variability (space of uncertainty) is now calculated as an average over

all realizations per each multi-resolution grid g:

11

, ,

1 '

'

1

1( , )

( 1)

L Lbetweeng k JS g k kd d

L L

re re

(13)

The within-realization variability, namely how well it reproduces the target statistics, has then

the following average per each multi-resolution grid g:

( ), ,

1

1( , )

Lwithing k JS g kd d

L

re ti (14)

Ranking algorithm performance with ANODI

As stated in the introduction, we would like to obtain a ranking of the algorithms based on the

distances calculated above. We emphasize the idea of ranking, meaning that only a relative

ordering is needed, and no absolute quantification of performance is required. This motivates

our decision to use a ratio of distances between two algorithms k and m, namely, for the

between-realization variability we use the ratio,

,

,

betweeng k

betweeng m

d

d (15)

The final value (as a ratio) that quantifies the between-realization differences between two

algorithms k and m is a weighted average per multi-resolution grid as follows:

,

,1 ,

betweenGg kbetween

k m g betweeng g m

dr w

d

(16)

We decided for the following fixed weights:

12

1

2g g

w (17)

meaning that higher resolution grids get more weight than lower resolution ones. This choice is

based on two principles, namely 1) lower resolution grids contain less information and have less

variability than higher resolution grids in the same pyramid, and 2) shorter scale patterns are

more critical than larger scale patterns. For example, the channel disconnection seen in the

bottom right of Figure 1 represents a short scale pattern.

For the within-realization variability, we use the same idea namely

,

,1 ,

withinGg kwithin

k m g withing g m

dr w

d

(18)

The ratio of both

,,

,

betweenk mtotal

k m withink m

rr

r

(19)

can then be used to rank algorithms; the best algorithm has the largest ratio compared to all

other algorithms.

Application of the methodology

A 2D binary example

We first illustrate various elements of the methodology in considerable detail using a simple 2D

channelized training image (Strebelle’s training image, Strebelle (2002)) of size 101×101 (Figure

13

2). Three algorithms are considered: dispat (a distance-based pattern simulation, Honarkhah

and Caers, 2010), ccsim (cross-correlation-based pattern simulation, Tahmasebi et al. 2012;

Tahmasebi and Caers, this issue) and also sisim (sequential indicator simulation implemented in

SGEMS, Remy et al., 2009) where the variogram is calculated from the training image, see

Figure 2. The latter case is included to illustrate the difference between multi-point and two-

point algorithms in terms of space of uncertainty. 50 unconditional realizations are generated

with each algorithm. ccsim is a pattern-based method that is an improvement on simpat,

although only qualitative visual appreciations were made in Tahmasebi et al (2012). Hence, in

this paper we aim to demonstrate this improvement quantitatively. An example of a multi-

resolution pyramid of a single realization is shown in Figure 1. These pyramids are constructed

for all realizations as well as the training image. We will present the details of the results for

both the MPH, which is feasible for this simple case, and CHP approach.

Using a 4 × 4 template we create the MPHs of Eq. (6) for each multi-resolution grid up to g=10.

Since we have 50 realizations of each algorithm and 1 training image, we can construct a table

of 151 × 151 JS divergences. Multi-dimensional scaling (MDS) is used to visualize these

distances (and hence realizations vs training image), see Figure 3. The 151 × 151 Euclidean

distances between the points in Figure 3 approximate the 151 × 151 JS divergence distances.

MDS relies on eigenvalue decomposition; the x-axis in Figure 3 refers to the largest eigenvalue,

the y-axis to the second largest eigenvalue. The units on these axes are not important, what

matters are relative distances between these points. However, we do plot the percentage of

contribution of the each eigenvalue to the total sum of all eigenvalues. A large contribution in

these first two eigenvalues means that higher dimensions (3, 4, 5,…) contain only a small

14

percentage of the total variance (or distance). Note that these plots are only used for

visualization, we always use the actual JS-divergences in the calculations of ration in Eq. (19).

From this visual appreciation, we observe that dispat and ccsim reproduce the statistics of the

training image better than sisim (as expected) and that among the MPS algorithms ccsim seems

to outperform dispat. We also notice that the cloud of points for sisim is larger than for dispat

and ccsim, suggesting a larger space of uncertainty. Note however that this MDS plot is just for

g=1. If for example the same plot is made for multi-resolution g=6 than the differences are

much less pronounced, meaning that at least for that coarser resolution, all three algorithms

seem to have similar characteristics, see Figure 4.

Consider now repeating the same analysis, but using CHPs instead of MPHs. In such case, a CHP

is created for each multi-resolution grid g for each realization as well as the training image. To

create such CHP we use a template (the template size obtained is 20 × 20) to scan each

resolution of the training image for patterns. These patterns are then clustered in groups using

a kernel k-means clustering (see Honarkhah and Caers, 2010) for details. A class representative,

termed prototype is calculated. These prototypes are then used to cluster the patterns of the

realizations of the various algorithms, simply by assigning each pattern to the closest prototype.

In Figure 5, 48 clusters are obtained for the first resolution g=1 of the training image. Above

each prototype are the different counts of patterns within that cluster for the training image, a

single dispat, ccsim and sisim realization.

We then calculate JS-divergence distances given these CHPs. Consider for example multi-

resolution g=1, i.e. the original realizations and training image. We again create a table of

15

151×151 JS-divergence distances and MDS is used to visualize the distances, see Figure 6. Figure

7 provides a plot for g=3. In the CHP approach, we use less multi-resolution grids since the

template determined by the elbow plot is large (20 × 20) as compared to the fixed template

used by MPH (4 × 4).

Eq. (13) now calls for the calculation of an average of between-realization distances for each

algorithm k for each multiple grid g. We again consider using the MPH first. The result for MPH

is summarized in Table 1. From these distances we calculate the ratios in Eq. (15). These ratios

are then weighted according to Eq. (16). Table 2A presents the matrix representing Eq (16).

Note that these ratios compare spaces of uncertainty. We confirm what we suspected based on

the MDS plots: sisim has the largest space of uncertainty, while dispat has a slightly larger space

of uncertainty than ccsim. We now turn to the within-realization variability, meaning the

reproduction of statistics. Table 3 summarized the distances of Eq. (14). Table 2B summarizes

the ratios. Note that the algorithm having the best pattern reproduction is ccsim, significantly

better than dispat, confirming our visual appreciation from the MDS plots as well as

realizations. Table 2C compares the final ratio of ratios, namely Eq. (19), from which it can be

deduced that ccsim is the overly better algorithm with a quantitative rank of

1 : 0.70 : 0.46 (ccsim : dispat : sisim),

if we use “1” for the best algorithm. We will use this notation from now for all further

comparisons.

16

Now consider the same calculations but based on the CHP. Tables 4, 5 and 6 summarize the

distances and ratios for the CHP approach. In the case of using CHPs, we find the following

qualitative ranking

1 : 0.67 : 0.35 (ccsim : dispat : sisim),

In other words, for this example, the clustering did not affect the ranking, only how much

better or worse each algorithm faired in our comparison.

Trading uncertainty with pattern reproduction

In this example, we keep the same Strebelle training image, but study quantitatively the impact

of parameter choices for one single algorithm, namely snesim (single normal equation

simulation, Strebelle, 2002). A key parameter in this algorithm is the size of the search

template, expressed in one parameter ns. A larger ns will lead to searching for a larger data-

event near the location being simulated, and hence a more complete multiple-point statistics

than smaller data events. However, at some point, due to the limited size of the training image

(and hence the limited amount of unique multiple-point statistics) any increase in this

parameter will not improve the results any further. We use the snesim implementation in S-

GEMS (Remy et al., 2009, parameter is termed “# nodes in search template”) and run the

snesim algorithm for three choices of ns, namely 10, 50 and 200, generating 50 realizations

each.

17

Two realizations with each value of ns are shown in Figure 8. Figure 9 shows the MDS plot for

g=1. The ratios for pattern reproduction, space of and uncertainty as well as total ratio are

Space of uncertainty (“between”): 1.65 : 1 : 1 (ns=10 : ns=50 : ns=200)

Pattern reproduction (“within”): 2.75 : 1 : 1 (ns=10 : ns=50 : ns=200)

Total (“between/within”): 0.60 : 1 : 1 (ns=10 : ns=50 : ns=200)

Clearly going from ns=50 to ns=200 does not constitute an improvement. It is also clear there is

a trade-off between space of uncertainty and pattern reproduction. Using the same (limited) TI,

a better pattern reproduction leads to a smaller variation between the realizations (going from

ns=10 to ns=50). We therefore conjecture that any post-processing of snesim realizations

(Strebelle & Remy, 2005) would lead to a further reduction in uncertainty represented by the

post-processed realizations.

Continuous-valued 2D cases

A continuous training image, shown in Figure 10 of size 130 x 100 is used. This training image is

sampled from a multi-Gaussian model with an isotropic Gaussian variogram with range equal to

20. The aim here is to test how the methodology can be used to assess cases with continuous

training images. In this section we compare two algorithms: ccsim and sgsim.

50 realizations were generated with each algorithm. For sgsim, we model the variogram from

the training image. Some realizations are shown in Figure 10. QQ-plots (not shown) show

excellent reproduction of the marginal Gaussian distribution, while the ergodic fluctuations of

18

the variograms are quite similar for both methods, see Figure 12. Figure 11 shows an MDS plot

based on the JS-divergence distances. We notice a wider spread for the sgsim realizations, at

least for multi-resolutions g=1. One also notices that the ccsim realizations look visually

different from the sgsim realizations yet have very similar variogram reproduction, see Figure

12.

In terms of ratios we obtain the following result

Space of uncertainty (“between”): 1.48 : 1 (sgsim : ccsim)

Pattern reproduction (“within”): 1.10 : 1 (sgsim : ccsim)

Total (“between/within”): 1.24 : 1 (sgsim : ccsim)

Clearly sgsim generates a larger space of uncertainty than ccsim but their performance in

reproducing patterns of the given training image is similar. It should be understood that the aim

of both algorithms is very different: ccsim tries to reproduce the patterns of a limited size

training image, while sgsim attempts to sample a multi-Gaussian model with a given

covariance. The limited size of the training image results in a small pattern database resulting in

a smaller space of uncertainty.

3D binary case

Finally, we demonstrate a case with a 3D binary training image, see Figure 13. The training

image is a binary sand/shale conceptual model with the dimensions of 69×69×39. Two MPS

algorithms, ccsim and dispat, are used for comparison. Figure 13 depicts the projection of the

19

constructed metric space by means of multi-dimensional scaling using JS divergence when

multi-resolution g=1. The comparison in terms of ratios confirms the above findings on the

relative ranking between ccsim and dispat

Space of uncertainty (“between”): 1.14 : 1 (dispat : ccsim)

Pattern reproduction (“within”): 1.90 : 1 (dispat : ccsim)

Total (“between/within”): 0.60 : 1 (dispat : ccsim)

Conclusions

In this paper we provide a methodology for comparing the performance of training image-

based geostatistical algorithms in the case of unconditional simulations. Our methodology is

based on the observation that in generating geostatistical realizations two types of variability

are created: variability within the realizations and variability between the realizations. We

establish a distance-based approach to measure the trade-off between these two variabilities.

While one can debate the particular subjective choices made in the presented comparison

methodology, the variety of examples illustrate that the results obtained from this

methodology are consistent with expectations.

It should be understood that we are not researching the following question: which are the best

statistics to reproduce? In other words, we do not investigate the choice of training image (or

training images). Such choice is important and in making such choice, inconsistencies with

respect to local data may be present. We also limit ourselves to unconditional simulation. In the

20

case of conditional simulation, the space of uncertainty will evidently be affected by

conditioning data. Hence, one would need to investigate the “conditioning performance” of the

algorithm. For example, in Caers (2012) it is argued that such performance needs to be

assessed relative to a rejection sampler. Future research will focus on integrating this additional

“conditioning” component into the framework proposed in this paper, possibly by adapting a

weighting in Eq. (17) that favors short scale pattern variability.

Acknowledgements

We appreciate the discussions on distances derived from frequency tables with Katrine Lange

of the Technical University of Denmark during her visit at Stanford University.

21

References

Borg, I., Lingoes, J. 1987. Multidimensional Similarity Structure Analysis. Springer-Verlag, New York. Borg, I., Groenen, P. 1997. Modern multidimensional scaling: theory and applications. New-York, Springer. Caers J., Strebelle S., Payrazyan K., 2003. Stochastic integration of seismic data and geologic scenarios: A West Africa submarine channel saga. The Leading Edge (Tulsa, OK), 22 (3) , pp. 192-196. Caers, J., 2011. Modeling Uncertainty in the Earth Sciences. Wiley-Blackwell, 246p. Caers, J., 2012. On internal consistency, conditioning and models of uncertainty. In: Ninth International Geostatistics Congress, Oslo, Norway June 11 – 15, 2012. Chiles , J.P. and Delfiner P., 1999. Geostatistics: Modeling Spatial Uncertainty (Wiley Series in Probability and Statistics). Cover, T. M., Thomas, J. A., 1991. Elements of information theory. Wiley-Interscience, New York, NY, USA. Deutsch, C., Gringarten, E., 2000. Accounting for multiple-point continuity in geostatistical modeling. In: of Southern Africa, G. A. (Ed.), 6th International Geostatistics Congress. Vol. 1. Cape Town, South Africa, pp. 156-165. Dimitrakopoulos, R., Mustapha, H., & Gloaguen, E., 2010. High-order statistics of spatial random fields: exploring spatial cumulants for modeling complex non-Gaussian and non-linear phenomena. Mathematical Geosciences, 42(1), 65-99. Endres, D. M. & Schindelin, J. E., 2003. A new metric for probability distributions. IEEE Transactions on Information Theory 49 (7), 1858-1860. Goovaerts, P., 1997. Geostatistics for Natural Resources Evaluation (Applied Geostatistics Series), Cambridge University Press. Heeger, D. J., & Bergen, J. R. (1995, September). Pyramid-based texture analysis/synthesis. In Proceedings of the 22nd annual conference on Computer graphics and interactive techniques (pp. 229-238). ACM. Honarkhah, M. & Caers, J. 2010. Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling, Mathematical Geosciences, 42: 487-517.

22

Honarkhah, M., 2011. Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling, PhD dissertation, Stanford University, USA. (https://pangea.stanford.edu/ERE/pdf/pereports/PhD/Honarkhah2011.pdf) Honarkhah, M. & Caers, J, 2012. Direct Pattern-Based Simulation of Non-stationary Geostatistical Models. Math Geosci (2012) 44:651–672. De Iaco, S., & Maggio, S. (2011). Validation techniques for geological patterns simulations based on variogram and multiple-point statistics. Mathematical Geosciences, 43(4), 483-500. Lange, K., Frydendall, J., Cordua, K. S., Hansen, T. M., Melnikova, Y., & Mosegaard, K., 2012. A Frequency Matching Method: Solving Inverse Problems by Use of Geologically Realistic Prior Information. Mathematical Geosciences, available online. Lekien, F., & Marsden, J. (2005). Tricubic interpolation in three dimensions. International Journal for Numerical Methods in Engineering, 63(3), 455-471. Otsu, N., 1979. A threshold selection method from gray level histograms. IEEE Trans. Systems, Man and Cybernetics 9, 62-66. Remy, N., Boucher, A., and Wu, J., 2009. Applied Geostatistics with SGEMS: A User's Guide, Cambridge University Press: New York, NY. Shawe-Taylor, J. & Cristianini, N., 2004. Kernel Methods for Pattern Analysis, Cambridge University Press, 2004. Scheidt, C., & Caers, J., 2009. Representing spatial uncertainty using distances and kernels. Mathematical Geosciences, 41(4), 397-419. Schölkopf, B. and Smola, A.J., 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press. Soleng, H. H. Syversveen, H.H. and Kolbjørnsen, O., 2006. Comparing Facies Realizations: defining metrices on realization space. In Ecmor X. Proceedings of the 10th European Conference in the Mathematics of Oil Recovery, pages A014. European Association of Geoscientists & Engineers. 2006 Strebelle, S., 2002. Conditional simulation of complex geological structures using multiple-point statistics. Mathematical Geology 34 (1), 1-22. Strebelle S. & Remy, N., 2005. Post-processing of Multiple-point Geostatistical Models to Improve Reproduction of Training Patterns Geostatistics Banff 2004. Quantitative Geology and Geostatistics, Volume 14, 5, 979-988.

23

Suzuki, S., & Caers, J., 2008. A distance-based prior model parameterization for constraining solutions of spatial inverse problems. Mathematical Geosciences, 40(4), 445-469. Tahmasebi, P., Hezarkhani, A., & Sahimi, M., 2012. Multiple-point geostatistical modeling based on the cross-correlation functions. Computational Geosciences, 16:779–797. Tahmasebi, P. & Caers, J. (this issue). Fast stochastic pattern simulation using multi-scale search. Mathematical Geosciences, submitted. Tjelmeland, H., & Besag, J., 2001. Markov Random Fields with Higher‐order Interactions. Scandinavian Journal of Statistics, 25(3), 415-433.

24

Figure & Table captions

Figure 1: a single binary realization is decomposed into a pyramid of realizations at different resolution. Figure 2: 2D binary case: Training image, two realizations generated with dispat, ccsim and sisim. The variogram is calculated from the training image (x and y direction variogram shown here). Figure 3: MPH approach: 2D projection of realizations using multi-dimensional scaling (MDS). The distance is the JS-divergence for multi-resolution g=1 using MPHs. The axis are omitted (as typical for MDS plots, see Caers, 2011) since only the relative scatter of points is relevant. We also report the contribution, in %, of each eigenvalue to the total sum of all eigenvalues. Figure 4: MPH approach: multi-resolution g=6: selected realizations and MDS plot. Figure 5: list of 48 prototypes. Above each prototype is listed the number of patterns for the cluster for which the prototype is representative in the following order: training image – ccsim realization – a dispat realization – sisim realization Figure 6: CHP approach: 2D projection of realizations using multi-dimensional scaling (MDS). The distance is the JS-divergence for multi-resolution g=1 based on CHPs. The axis are omitted (as typical for MDS plots, see Caers, 2011) since only the relative scatter of points is relevant. Figure 7: CHP approach: multi-resolution g=3: selected realizations and MDS plot. Figure 8: snesim realization for different number of nodes in the search template. Figure 9: MDS plot of snesim realizations for multi-resolution g=1. Figure 10: 2D continuous case: training image, ccsim and sgsim realizations. Figure 11: MDS plot for multi-resolution g=1 Figure 12: variogram model (red) versus experimental variograms (blue) Figure 13: 3D binary case: training image, ccsim and dispat realization and MDS plot for multi-resolution g=1. Realizations are shown in two ways, with and without the background (blue) category removed. Table 1: 2D binary case: table with averaged “between”-realization JS divergences for the MPH approach

25

Table 2: ratios of the within and between realization variability and total ratio for the MPH approach Table 3: 2D binary case: table with averaged “within”-realization JS divergences for the MPH approach Table 4: 2D binary case: table with averaged “between”-realization JS divergences for the CHP approach Table 5: ratios of the within and between realization variability and total ratio for the CHP approach Table 6: 2D binary case: table with averaged “within”-realization JS divergences for the CHP approach

Multi-resolution 1

(101 x 101)

Multi-resolution 2

(51 x 51)

Multi-resolution 3

(26 x 26)

Figure 1

Pyramid of one single image

Training image

Figure 2

Variogram of Training image

Lag Distance

0 10 20 30 40 50

0

0

.1

0.2

0.3

dispat

ccsim

sisim

closest sisim

50th closest

closest dispat model 20th closest 30th closest 50th closest

Training image

closest ccsim model

40th closest

50th closest

Figure 3

ccsim

sisim

dispat

training image

20%

10%

Figure 4

Training image dispat ccsim sisim

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6x-y Plot & Multi Scale = 6

DISPAT

CCSIM

SISIM

Training Image

14%

20%

72-42-117-49 39-34-28-21 45-43-44-47 23-24-22-19 57-65-88-65 19-65-32-19 82-36-91-32

31-52-16-22 16-18-2-16 57-29-59-25 23-7-16-23 16-16-12-12 52-14-45-13 18-17-22-40

24-5-5-10 51-98-55-50 9-2-4-5 53-50-62-153 89-73-145-150 47-38-66-70 30-34-18-17

23-25-9-26 24-59-39-47 11-23-3-4 46-7-66-9 12-7-0-4 10-8-1-0 35-65-32-21

16-11-1-3 16-13-0-22 47-72-74-177 51-31-46-34 51-96-53-21 35-40-22-16 13-37-1-40

75-37-81-28 10-3-1-6 42-22-43-37 45-69-28-16 25-13-13-17 64-168-91-59 15-2-5-5

24-10-14-31 66-27-95-34 16-13-0-1 11-15-0-31 18-26-10-16 27-2-4-18

Figure 5

ccsim

sisim

dispat

training image

Training image closest ccsim model 40th closest 50th closest

closest sisim

50th closest

closest dispat model

20th closest

30th closest

50th closest

Figure 6

45%

20%

dispat ccsim Training image sisim

Figure 7

20%

14%

Figure 8

Training image

snesim ns=10

snesim ns=200

snesim ns=50

Figure 9

16%

8%

Training image

ccsim

sgsim

Figure 10

Ti, (0, 0), SGSIM, MR=1 closest to Ti 20th closest to Ti

30th closest to Ti 40th closest to Ti 50th closest to TiTi, (0, 0), SGSIM, MR=1 closest to Ti 20th closest to Ti

30th closest to Ti 40th closest to Ti 50th closest to Ti

Ti, (0, 0), CCSIM, MR=1 closest to Ti 20th closest to Ti

30th closest to Ti 40th closest to Ti 50th closest to Ti

Ti, (0, 0), CCSIM, MR=1 closest to Ti 20th closest to Ti

30th closest to Ti 40th closest to Ti 50th closest to Ti

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2x-y Plot & Multi Scale = 1

CCSIM

SGSIM

training image

Training image Ti, (0, 0), SGSIM, MR=1 closest to Ti 20th closest to Ti

30th closest to Ti 40th closest to Ti 50th closest to Ti

Ti, (0, 0), SGSIM, MR=1 closest to Ti 20th closest to Ti

30th closest to Ti 40th closest to Ti 50th closest to Ti

Ti, (0, 0), SGSIM, MR=1 closest to Ti 20th closest to Ti

30th closest to Ti 40th closest to Ti 50th closest to TiClosest SGSIM 30th Closest 50th Closest

Figure 11

Closest CCSIM model

30th Closest

50th Closest

Ti, (0, 0), CCSIM, MR=1 closest to Ti 20th closest to Ti

30th closest to Ti 40th closest to Ti 50th closest to Ti

Ti, (0, 0), CCSIM, MR=1 closest to Ti 20th closest to Ti

30th closest to Ti 40th closest to Ti 50th closest to Ti

Ti, (0, 0), CCSIM, MR=1 closest to Ti 20th closest to Ti

30th closest to Ti 40th closest to Ti 50th closest to Ti

56%

10%

Variograms of ccsim models

Variograms of sgsim models

Figure 12

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

No

rmal

sco

res

No

rmal

sco

res

0 10 20 30 40 50 60 Lag Distance

0 10 20 30 40 50 60 Lag Distance

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Lag Distance

Norm

al S

core

s

Variograms of SGSIM models

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Lag Distance

Norm

al S

core

s

Variograms of CCSIM models

-0.08 -0.06 -0.04 -0.02 0 0.02 0.04-0.025

-0.02

-0.015

-0.01

-0.005

0

0.005

0.01

0.015x-y Plot & Multi Scale = 1

Dispat

CCSIM

training image

Training image

a dispat model

a ccsim model

46%

19%

Figure 13

multi-resolution dispat ccsim sisim

g=1 0.0455 0.0319 0.1384

g=2 0.1776 0.1915 0.4149

g=3 0.3883 0.4882 0.6092

g=4 0.6444 0.7741 0.7893

g=5 0.8281 0.8968 0.8889

g=6 0.8871 0.9451 0.9470

g=7 0.9523 0.9772 0.9729

g=8 0.9835 0.9860 0.9856

g=9 0.9875 0.9945 0.9892

g=10 0.9339 0.9922 0.9908

Table 1

, 2

' 1

'

,1

1( , )

L Lbetweeng k JS g k kd d

L L

re re

MPH approach

Table 2

dispat ccsim sisim

dispat 1 1.15 0.38

ccsim * 1 0.33

sisim * * 1

alg

o k

algo m

,between

k mr

dispat ccsim sisim

dispat 1 1.63 0.24

ccsim * 1 0.15

sisim * * 1

algo m

,within

k mr

dispat ccsim sisim

dispat 1 0.70 1.58

ccsim * 1 2.20

sisim * * 1

algo m

,,

,

betweenk mtotal

k m withink m

rr

r

(A) (B) (C)

MPH approach

Table 3

multi-resolution dispat ccsim sisim

g=1 0.0496 0.0222 0.2529

g=2 0.1908 0.1672 0.5071

g=3 0.4140 0.4454 0.7201

g=4 0.6859 0.7430 0.9076

g=5 0.8441 0.8910 0.9778

g=6 0.9193 0.9452 0.9921

g=7 0.9566 0.9726 0.9941

g=8 0.9873 0.9824 0.9978

g=9 0.9923 0.9931 0.9990

g=10 0.9702 0.9904 0.9980

( ), ,

1

1( , )

Lwithing k JS g kd d

L

re ti

MPH approach

multi-resolution dispat ccsim sisim

g=1 0.0358 0.0408 0.0416

g=2 0.1166 0.1124 0.1249

g=3 0.2515 0.2269 0.2198

Table 4

, 2

' 1

'

,1

1( , )

L Lbetweeng k JS g k kd d

L L

re re

CHP approach

Table 5

dispat ccsim sisim

dispat 1 0.88 0.86

ccsim * 1 0.98

sisim * * 1

alg

o k

algo m

,between

k mr

dispat ccsim sisim

dispat 1 1.31 0.43

ccsim * 1 0.33

sisim * * 1

algo m

,within

k mr

dispat ccsim sisim

dispat 1 0.67 2.00

ccsim * 1 2.90

sisim * * 1

algo m

,,

,

betweenk mtotal

k m withink m

rr

r

(A) (B) (C)

CHP approach

Table 6

multi-resolution dispat ccsim sisim

g=1 0.0533 0.0405 0.1299

g=2 0.2346 0.2127 0.2620

g=3 0.3106 0.3015 0.3120

( ), ,

1

1( , )

Lwithing k JS g kd d

L

re ti

CHP approach