Joint quantification of uncertainty on spatial and non...

1

Joint quantification of uncertainty on spatial and non-spatial reservoir parameters

Comparison between the Joint Modeling Method and Distance Kernel Method

Céline Scheidt and Jef Caers

Stanford Center for Reservoir Forecasting, Stanford University

Abstract

The experimental design methodology is widely used to quantify uncertainty in the oil and gas industry. This technique is adapted for uncertainty quantification on non-spatial parameters, such as bubble point pressure, oil viscosity, and aquifer strength. However, it is not well adapted for the case of geostatistical (spatial) uncertainty, due to the discrete nature of many input parameters as well as the potential nonlinear response with respect to those parameters. One way to handle this type of uncertainty, derived from the experimental design theory, is called the joint modeling method (JMM). This method, originally proposed in a petroleum context by Zabalza (2000), incorporates both non-spatial and spatial parameters within an experimental design framework. The method consists of the construction of two models, a mean model which accounts for the non-spatial parameters and a dispersion model which accounts for the spatial uncertainty. Classical Monte-Carlo simulation is then applied to obtain the probability density and quantiles of the response of interest (for example the cumulative oil production). Another method to quantify spatial uncertainty is the distance kernel method (DKM) proposed recently by Scheidt and Caers (2007), which defines a realization-based model of uncertainty. Based on a distance measure between realizations, the methodology uses kernel methods to select a small subset of representative realizations which have the same characteristics as the entire set. Flow simulations are then run on the subset, allowing for an efficient and accurate quantification of uncertainty. In this work, we extend the DKM to address uncertainty in both spatial and non-spatial parameters, and propose it as an alternative to the joint JMM. Both methods are applied to a synthetic test case which has spatial uncertainty on the channel representation of the facies, and non-spatial uncertainties on the channel permeability, porosity, and connate water saturation. The results show that the DKM provides for a more accurate quantification of uncertainty with fewer reservoir simulations. Finally, we propose a third method which combines aspects of the DKM and the JMM. This third method again shows improvement in efficiency compared to the JMM alone.

2

1. Introduction

Uncertainty in reservoir performance is often very significant due to the small amount of data available to describe the reservoir. Reservoirs are modeled using a combination of spatial and non-spatial parameters. Spatial parameters describe properties that are correlated spatially, such as facies type, and are modeled using geostatistical methods. Non-spatial parameters describe phenomena that do not vary spatially such as water-oil contact depth, or bubble point pressure of the oil. Uncertainty exists in both type of parameters, and due to their distinct nature, the quantification of uncertainty of these parameters is done using differing and often incompatible approaches. First, uncertainty in spatial parameters (which we will denote as spatial uncertainty) is assessed using traditional ranking techniques (Ballin, 1992). Starting with multiple, alternatives realizations generated using any geostatistical algorithm, traditional ranking techniques aim at selecting the P10, P50 and P90 flow responses using static properties of the realizations (e.g. OOIP). Thus, spatial uncertainty is measured by evaluating multiple geostatistical realizations created by varying the input parameters of the geostatistical algorithms. Uncertainty in non-spatial parameters is often quantified using the experimental design methodology. Experimental design selects optimal combinations of parameter values for flow simulation, and then models the response using response surface methodology. Classical experimental design has been widely used in reservoir uncertainty management and has proved its efficiency to take into account various reservoir parameters (Damslet et al., 1992, Manceau et al., 2001, Venkataraman, 2000). This technique works well in the case of non-spatial parameters because the response (for example, the cumulative oil production) often behaves linearly. The experimental design and ranking approaches are efficient in modeling non-spatial and spatial uncertainty, respectively. However, these approaches have difficulty in accounting for all types of uncertainty. Since ranking is often performed using static properties of the realizations, one cannot use this technique to model uncertainty in non-spatial parameters, such as possibly differing oil and water relative permeability curves. In addition, classical experimental designs are often not suited to model spatial uncertainty, due to the discrete nature of some of the input geostatistical parameters, and the possible non-linear variation of the production response. However, a simultaneous method for treating both spatial and non-spatial uncertainty is desirable for an effective uncertainty quantification of reservoir performance. One approach which attempts to model both parameters simultaneously is the joint modeling method (JMM). The joint modeling method, initially proposed in a statistical context (McCullagh and Nelder, 1989), was applied in the context of petroleum engineering by Zabalza (2000) to incorporate both non-spatial and spatial parameters within an experimental design methodology for uncertainty quantification. Zabalza (2000) and Manceau et al. (2001) used the joint modeling methodology to quantify the

3

effect of varying geostatistical realizations on uncertainty analysis. Just as in experimental design, this approach requires running a set of simulations to capture the behavior of the production response as a function of the non-spatial parameters, using response surface methodology. Note that in the context of JMM, non-spatial parameters are also denoted “deterministic” parameters, because they are parameters that are varied in the deterministic experiment1. Spatial uncertainty is often denoted “stochastic” uncertainty, and describes the uncertainty derived from stochastic modeling methods such as geostatistical algorithms. To be consistent with previous papers on JMM, we will employ the same terminology here. Thus, spatial uncertainty will be described more generally as stochastic uncertainty. Deterministic uncertainty is the uncertainty of the response of non-spatial parameters in a deterministic experiment (i.e. numerical flow simulation)2. Recently, the distance-kernel method (Scheidt and Caers, 2007) has been proposed as an improvement to ranking techniques. This new method examines a large set of realizations and defines a realization-based model of uncertainty which is parameterized by distances. The goal is to select a small subset of representative reservoir models by analyzing the properties of the models as characterized by the distance. This subset has similar properties to the entire set of realizations. Uncertainty quantification can then be analyzed within that subset of realizations thus reducing significantly computation time. In this paper, we extend the distance-kernel method (DKM) to quantify uncertainty in both deterministic and spatial (stochastic) parameters. We compare the results of both JMM and DKM on a synthetic data set. Finally, we combine the advantages of the JMM and DKM to present a third method to quantify uncertainty in non-spatial (deterministic) and spatial (stochastic) parameters. In the following section, we first provide an overview of experimental design (ED) theory and give an illustrative example to show how experimental design and response surface methods (RSM) are used to assess uncertainty for deterministic parameters. Then, we introduce the joint modeling method which allows quantifying uncertainty not only for deterministic parameters, but also for spatial/stochastic parameters. We then review briefly the theory of the distance-kernel method. Subsequently, JMM and DKM are compared using a facies model with both spatial (stochastic) and non-spatial uncertainty, totaling 2025 possible reservoir models. Results from these two different approaches are also compared with a random selection of models. We then present initial results using an approach which combines the JMM and DKM methods. The paper ends with some conclusions.

1 Given the same input parameter values, deterministic experiments always result in the same response value. Numerical experiments, such as flow simulation, are deterministic experiments. 2 McCullagh and Nelder used yet another terminology, describing the response of an experiment to include “systematic” effects, and “random” effects.

4

2. Review of Experimental Design (ED) and Joint Modeling Methodology (JMM)

This section presents the basics of the ED and JMM theories, as well as simple examples to illustrate the principle of both techniques and their application. We consider first an example where we have a facies model on which we have uncertainty on:

- spatial or stochastic parameters, treated with stochastic methods, such as the geological representation of the reservoir (multiple realizations obtained using geostatistical algorithms with differing training images, channels configuration, etc.)

- non-spatial or deterministic properties of the realizations (such as mean permeability and porosity of the facies, fluid properties, etc.).

The JMM extends the ED theory to treat spatial (stochastic) uncertainty, thus to consider multiple geostatistical realizations. Before going into more detail with the JMM, we summarize quickly the theory of experimental design as it is applied to uncertainty quantification in reservoir simulation.

2.1. Experimental design methodology description

An experimental design defines n different configurations of parameter values – in our case flow simulation parameters. Given those n configurations of the k (deterministic) parameters, n simulations are performed and used to fit a response surface (RSM), usually a regression model (Myers and Montgomery, 2002). The aim of an experimental design is to define the minimum number of configurations in order to obtain the best fit of the RSM. An experimental design is represented by a matrix D [n x k] giving the coordinates of each simulation, i.e. the rows represent the values of the uncertain parameters which define the simulations and the columns represent the values of a parameter. A regression model of the second order for k factors ( )kxx ,,1 �=x can be written in the

form:

�xY �� ++++=<= = i ji

jiij

k

i

k

iiiiii xxxx ββββ

1 1

20)(

Applied to all n simulations from the experimental design, we have the following equation:

�X�Y +=

5

where: - Y is the 1×n vector of production response of interest (for example, the

cumulative oil production at a given time) - X is the pn × model matrix, which depends on the experimental design and on

the regression model. - � is a unknown 1×p vector of coefficients of the regression - � is a 1×n vector of errors with [ ] 0=εE and [ ] 2σε I=E

An alternative way of writing this model is: [ ] X�Y =E , since [ ] 0=εE . In uncertainty quantification, the production response is usually modeled by a second order polynomial model. To adjust a second order polynomial model, the simulations are often performed following a Central Composite Design (CCD), and the coefficients � of

the regression are usually obtained by the Least Square method: ( ) YXXX� T1Tˆ −= . This regression model can also be used to evaluate the response Y for a new configuration of the parameters, say x01, x02 , x0p. If [ ]pxxx 002010 ,,,,1 �=x , then a point estimate for

the new observation is computed by:

�xxY ˆ)(ˆ00 = (1)

The resulting proxy model Y is used to obtain a probability density of the production through Monte Carlo sampling, where the production response is evaluated by Eq. 1. ED methodology is widely used in reservoir management, and has proved its efficiency when applied to deterministic parameters (Damslet et al., 1992). We now apply ED to a simple case assuming that there is no uncertainty in the geological representation of the reservoir.

2.2. Application of the ED methodology to a simple case

To illustrate the uncertainty assessment workflow using the ED methodology, we apply the method to a synthetic case, borrowed from Suzuki and Caers (2006). We consider a channel system, composed of mud and sand. The channel sands have uniform porosity and permeability. However, these deterministic parameters are considered as uncertain. The mud is treated as inactive cells. The reservoir model is a 80x80 2D grid containing 3 producers and 3 injectors, all penetrating channel sand. The well locations are given in Figure 1. For a more detailed description of the case, see Suzuki and Caers (2006). However, this case is slightly different from the original case, since we assume that the permeability and the porosity are uncertain and that we have only one geostatistical realization of the reservoir (no spatial uncertainty).

6

Figure 1: Example of reservoir model realization with well locations.

Red is channel and blue is shale

The channel configuration of the reservoir is presented in Figure 2A. The value of the channel permeability may vary between 250mD and 750mD, and the value of the porosity may vary between 0.15 and 0.3. A central composite design consisting of 9 simulations is constructed for these two parameters, illustrated on Figure 2B. Simulations of the cumulative oil production at 6000 days were performed at each point of the experimental design (Figure 2C), and a second order polynomial model is fitted to the prediction response (Figure 2D). The last step of the method consists of using the response surface model through Monte Carlo sampling to compute the probability density of the cumulative oil production at 6000 days (Figure 2E). In this case, 1000 values uniformly distributed are generated for each parameter, in order to compute the density function. For this example, the P10, P50 and P90 quantiles of the production at 6000 days are respectively 23.52, 24.35 and 25.81 Mm3.

9 simulations

PORO PERMX1 0.3 750 2 0.3 2503 0.15 7504 0.15 2505 0.3 5006 0.15 5007 0.225 7508 0.225 2509 0.225 500

+

(E)

(A)

(B)

(C)

(D)

-1-0.5

00.5

1

-1-0.5

00.5

12000

2200

2400

2600

2800

3000

3200

PERMXPORO

FOP

R

-1-0.5

00.5

1

-1-0.5

00.5

12000

2200

2400

2600

2800

3000

PERMXPORO

FOP

R

2200 2300 2400 2500 2600 2700 2800 29000

1

2

3

4

5

6

7x 10

-3 Density Plot

FOPR

Freq

uenc

y

Figure 2: Workflow of the Experimental Design methodology

7

Note that the workflow described in Figure 2 is only valid for a single timestep. To compute the probability density of the production at different times, steps 2C to 2E need to be recomputed. The next section considers the same example, but we account for stochastic uncertainty in the channel configuration using more than one geostatistical realization and applying the joint modeling method. The JMM is similar to experimental design, except that it uses two regression models to model the variation of the production response as a function of the deterministic and stochastic parameters. It consists of the construction of a mean model to describe the production response as a function of deterministic parameters, and a dispersion model to account for the stochastic parameters. These two regression models are linked using generalized linear models (GLM). The production forecasts can then be quantified as a function of both deterministic and stochastic uncertainties.

2.3. Joint Modeling Methodology description

In the presence of two types of uncertainty (deterministic and stochastic), an experimental design is constructed with the deterministic parameters only. Then, the stochastic parameters are modeled by creating multiple geostatistical realizations for each configuration of deterministic parameters given by the experimental design. We introduce a few important notations for a better understanding of the theory. Let us denote:

− Y: the production response of interest (for example, cumulative oil production at a given time)

− k: the number of deterministic uncertain parameters (e.g. uniform channel permeability, porosity)

− n: the number of points of the experimental design; each point or configuration of the k deterministic parameters is represented by nuu ,...,1, =x

− r: the number of geostatistical realizations (number of replications for each point of the experimental design)

− riYui ,...,1, = : r evaluations of Y(xu) for different geostatistical realizations. Contrary to the traditional approach where a single evaluation of the response at each of the n points of the experimental design is performed, a series of simulations are performed at a given point ux of the experimental design. For each of the n points ux , r simulations are performed, one for each of the r geostatistical realizations, resulting in n x r simulations. These repeated simulations allow the construction of a model of the dispersion of the response due to the geostatistical uncertainty.

8

Figure 3 illustrates this principle for the case of k = 2 deterministic parameters and r = 4 possible geostatistical realizations. We employ a Central Composite Design for the 2 deterministic parameters, resulting in 9 different parameter combinations. Thus, a total of 36 (4x9) simulation of the response (cumulative oil production at 6000 days in this example) were performed. Figure 3A represents the response in the 2D space formed by the two deterministic parameters and Figure 3B is a 1D version of this 2D space, i.e. the x-axis represents the index u of the simulation nuu ,...,1, =x .

1 2 3 4 5 6 7 8 9500

1000

1500

2000

2500

3000

3500

Simulation #FO

PR

at 6

000

days

for

diff

eren

t re

aliz

atio

ns(A) (B)

-1-0.5

00.5

1

-1-0.5

00.5

1500

1000

1500

2000

2500

3000

3500

PERMXPORO

FOP

R

Figure 3: Cumulative oil values at 6000 days for each point of the experimental design, (A) 2D

view, (B) 1D view

The objective of the JMM is to construct intervals of possible values of the responses at each point of the experimental design. The simulations Yui are required to be normally distributed. From the simulation results, the JMM constructs two models, a mean model and a dispersion model of the response. These two models are fitted using generalized linear models (GLM), which expressions are given by Eq. 2 and Eq. 3. Appendix A provides more details about generalized linear models. For the mean model, we have the following equations (Zabalza, 2000):

[ ] [ ] [ ] rinu�diagYCov�YE uiuiuiuiui ,...1 and ,...,1 and 2 ===== �X (2)

where: - Yui is the response at point xu for realization i - uiµ is the mean of the response for realization i at point xu

- � is a unknown vector of coefficients of the regression - X = [ Xui,, u=1,...,n and i=1,...,r] is the model matrix. Each row Xui of X refers to a different observation, each column to a different covariate The variance model is a function of the dispersion D of the response, which is given by:

9

( ) nuYD uuu ,,1,)()()(2

�=−= xxx µ or ( )2)uiuiui �YD −= (3)

Under the assumption that the Yui are normally distributed, it can be shown that the dispersion Dui follows a Gamma law. The variance model is then given by Eq. 4:

[ ] ( ) [ ] ( )[ ] rinu�diagDCov��DE uiuiuiuiuiui ,...,1 and ,...,1 and ln;2222 ===== τ�H (4)

where: - uiD is the dispersion of the response for realization i at point xu

- � is a unknown vector of coefficients of the regression - H = [Hui,, u=1,...,n and i=1,...,r]’ is the model matrix. Each row Hui of H refers to a different observation, each column to a different covariate - τ is a scaling parameter Here we use �Hui as opposed to �X ui to emphasize the possible differences between

mean model and dispersion model terms. As we can see, the models (2) and (3) are dependant. The estimation of the mean model requires an estimate ( )x2σ of the dispersion model ( )x2σ to be used as a prior weight,

while the dispersion model requires an estimate ( )x� of mean model ( )x� in order to form the dispersion response variable Dui. For fitting these models, an iterative algorithm is required, whereby we alternate between fitting the model mean and then fitting the dispersion using the variable Dui. Note that since the response Yui is assumed to be normal in the JMM, the dispersion Dui has a Gamma distribution: 2

12~ χσ uiuiD , which is a

special case of a Chi distribution (see Myers and Montgomery, 2002). Following are the different steps of the methodology:

1. Use an ordinary least square estimator )0(� for the mean model (Eq. 2). )0(� is

given by ( ) YXXX� ''ˆ 1)0( −= and thus the model of the mean is: )0()0( ˆˆ �X=uiµ

2. Use )0(� to calculate the n residuals or dispersion: ( )2)0()1( ˆuiuiui YD µ−= (Eq. 3)

3. Consider )1(uiD as a realization of the random variable Dui representing the

dispersion of Yui. Determine )1(� by fitting the model of dispersion using maximum likelihood methodology of a Gamma model (Mc Cullagh and Nelder, 1989) 4. Use the parameter )1(� to calculate the estimator of the variance:

( ))1()1(2 ˆexp)(ˆ �xHx =σ (Eq. 4). Use )(ˆ)1(2 xσ to compute the matrix V in Eq. 3:

( ))(ˆ,),(ˆ)1(2)1(2

uu xxdiag σσ �=V .

10

5. Calculate )1(� using weighted least square method with V as a weight

( ) YVXXVX� 111)0( ''ˆ −−−=

6. Go back to step 2 with )1(� replacing )0(� , i.e. calculate )()( ˆˆ kkui �X=µ

7. Continue until convergence, i.e. when ( )� − 2)()(2 ))(ˆ)((ˆ/1 uk

uiu

kY xx µσ is stable.

Given the response surface models of the mean )(ˆ xµ and the dispersion )(ˆ 2 xσ , an

interval of prediction of the response Y can be constructed. Zabalza (2000) has shown that the prediction interval for Y is given by:

��

��

��

��

−�

��

−−= )(ˆ2

)(ˆ;)(ˆ2

1)(ˆ xStxxStxI xαµαµ (5)

with:

− ( ) )(ˆ'ˆ)(ˆ xVarxS xx σ+= X�X , Xx being the model matrix at point x

− �

��

−2

1α

t and �

��

2α

t are the quantiles of order �

��

−2

1α

and �

��

2α

the normal law

N(0,1) Remark: Usually, α is taken to be equal to 0.05, meaning that the probability that a prediction is out of the interval of prediction is less or equal than 5%. A Monte-Carlo sampling technique is then simultaneously applied to both models with a known prior distribution for each deterministic parameter, usually of uniform distribution. A normal law ))(ˆ),(ˆ( xxN σµ is applied to account for both deterministic and stochastic uncertainty. We thus obtain the density of probability of the response and the quantiles of production by using those two models.

2.4. Illustrative example of Joint Modeling Methodology

Similarly to the ED section, we present a simple illustration of the principle of the JMM. In the previous example (Figure 2), only 1 realization was taken into account. Now, as shown in Figure 4, the workflow is presented in the case of 4 different geostatistical realizations (Figure 4A) and 2 deterministic uncertain parameters: channel permeability and porosity. Similar to the example in Figure 3, a Central Composite Design for 2 parameters (Figure 4B) is used to account for these 2 deterministic parameters, and a total of 9x4=36 simulations are performed in this example, 9 for each geostatistical realization. Figure 4C shows the response for each geostatistical realization and for each of the 9 points of the experimental design, (i.e. each possible configuration of permeability and porosity) as well as the interval of prediction Ix obtained by the RSM of the mean and the RSM of the dispersion. Figure 4D presents the 3 response surfaces obtained by the 2

11

models: the middle surface represents the mean behavior of the production as a function of the 2 deterministic parameters, and the two bounding surfaces represent the dispersion due to geostatistical variability (Eq. 5). In order to calculate the probability density of the response (Figure 4E), a Monte-Carlo simulation is performed. In this case, 1000 values for each deterministic parameter are generated using a uniform distribution. In the classical case (without geostatistical variation), the response is directly obtained by the response surface of the mean, as illustrated in the previous section on ED. In our application, in order to account for the geostatistical variability, for each of the 1000 points, 50 values of responses are generated using a normal law ))(ˆ),(ˆ( xxN σµ . The density is then computed using 50 000 response values.

r=4 geological realizations n=9 points of ED

9 simulations

PORO PERMX1 0.3 750 2 0.3 2503 0.15 7504 0.15 2505 0.3 5006 0.15 5007 0.225 7508 0.225 2509 0.225 500

+

-1-0.5

00.5 1

-1-0.5

00.5

1500

1000

1500

2000

2500

3000

3500

4000

PERMXPORO

FOP

R

-1000 0 1000 2000 3000 4000 50000

1

2

3

4

5

6

7

8x 10

-4 Density Plot for Joint Modeling

FOPR

Freq

uenc

y

SimulationsMean value µµµµuiInterval of prediction

Uniform sampling

Normal sampling

1 2 3 4 5 6 7 8 9500

1000

1500

2000

2500

3000

3500

4000

Simulation #

FOP

R a

t 600

0 da

ys fo

r di

ffere

nt r

ealiz

atio

ns

(E)

(A)

(B)

(C)

(D)

Figure 4: Workflow of the Joint Modeling Methodology

As we can see on Figure 4, the JMM workflow is closely related to the ED workflow, except that ED considers only one response surface (the mean), whereas the JMM constructs a mean model and prediction intervals of the response. We propose now to see the impact of the use of prediction intervals (i.e. considering stochastic uncertainty) on the quantiles and density estimations.

12

2.5. Comparison of the results of the two methods

To do so, we compare the probability densities and quantiles estimation obtained with ED and JMM. Figure 5A shows a superposition of both probability densities of the cumulative oil production at 6000 days, the red curve representing results from ED and the blue curve from JMM. Similarly, Figure 5B represents a superposition of the cumulative oil production quantiles as a function of the time obtained with the ED technique (red) - without taking into account the geostatistical uncertainty, and the quantiles obtained by the JMM (blue). Recall that both ED and JMM assume that the response is independent of time. Thus, to calculate the quantiles of the production as a function of time, the entire workflow must be repeated for each time. In other words, a new response surface is constructed at each time. The assumption of independence is obviously incorrect for flow response, such as the cumulative oil production used in this work. However, time-dependant experimental designs and response surfaces are still an area of research and are not used in practice.

1 1.5 2 2.5 3 3.5 4

x 104

0

1

2

3

4

5

6x 10

-4 Density Plot

FOPT at 6000 days

Freq

uenc

y

Experimental DesignJoint Modeling

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5x 10

4

Time (days)

FOP

T

Quantile estimation - P10, P50 and P90

EDJMM

(A) (B)

Figure 5: Comparison of the ED and JMM: (a) probability density estimations at 6000 days and (b) quantile estimation as a function of time

We can observe on Figure 5 that the uncertainty obtained from the JMM is larger than that obtained with experimental design technique. For example at 6000 days, the P10, P50 and P90 quantiles of the production are respectively 23.52, 24.35 and 25.81 Mm3 for the ED and 19.21, 25.02 and 29.29 Mm3 for the JMM. Although this result is not surprising, it serves well as a reminder that stochastic uncertainty should be considered in any uncertainty quantification study. The distance-kernel methodology has been developed to take into account spatial (stochastic) uncertainty in uncertainty quantification. DKM allows the selection, among many geostatistical realizations, of some representative realizations for flow simulation. In the next section, we present briefly the main steps of the distance-kernel methodology.

13

3. Distance-Kernel Methodology: Review

The principle of the methodology, illustrated for facies models, is described in Figure 6. Starting with multiple (NR) realizations generated using any algorithm, a dissimilarity distance matrix is constructed (Figure 6a and 6b). This NR x NR matrix contains the “distance” between any two model realizations, which is a way to determine how similar two reservoir models are in terms of geological properties and flow behavior. The distance can be calculated in any manner - the only requirement for the distance is to be well correlated to the flow response(s) of interest. Example of such distances are the Hausdorff distance (Suzuki and Caers, 2006), time-of-flight based distances (Park and Caers, 2007), or flow-based distance using fast flow simulators. The distance matrix is then used to map all realizations into a Euclidean space R (Figure 6c), using a technique called multidimensional scaling (MDS). MDS translate the dissimilarity matrix into a configuration of points in n-dimensional Euclidean space (Borg and Groenen, 1997). Each point in this map represents a realization - the points are arranged in a way that their Euclidean distances correspond as much as possible to the dissimilarity distance of the realizations. Since in most cases the structure of the points in mapping space R is nonlinear, we propose the use of kernel methods, in order to transform the Euclidean space R into a new space F, called the feature space (Figure 6d). The goal of the kernel transform (Schöelkopf and Smola, 2002) is that points in this new space behave more linearly, so that standard linear tools for pattern detection can be used more successfully (such as principal component analysis, cluster analysis, dimensionality reduction, etc.). These tools allow the selection of a few “typical” points, in our case reservoir models, among a potentially very large set. In this work, we use the classical k-means algorithm in the feature space F, also called kernel k-means (KKM), to determine a subset of points defined by their centroids. The subset of models selected by KKM is small enough to allow uncertainty quantification (e.g. P10, P50, P90 quantiles) through flow simulation. For more details about the methodology, refer to Scheidt and Caers (2007).

14

Model 1 Model 2

Model 3 Model 4

δδδδ12

δδδδ13 δδδδ24

δδδδ34

δδδδ32

δδδδ14

��

ΦΦΦΦ

��

��

��

��

��

��

��

�

�

�

(a) (b) (c)

(d)

(e)

��

ΦΦΦΦP10,P50,P90 model selection

δδδδ44δδδδ43δδδδ42δδδδ414δδδδ34δδδδ33δδδδ32δδδδ313δδδδ24δδδδ23δδδδ22δδδδ212δδδδ14δδδδ13δδδδ12δδδδ111

4321

Figure 6: Proposed workflow for uncertainty quantification: (a) distance between two models, (b)

distance matrix, (c) models mapped in Euclidean space, (d) feature space, (e) pre-image construction, (f) P10, P50, P90 estimation

The k-means algorithm tries to assign points in k clusters Si by minimizing the expected squared distance between the points of the cluster and its center µi:

� �= ∈

−=k

i Sij

ij

J1

2

x

x µ

The algorithm starts by partitioning randomly the input points into k initial sets Si. It then calculates the mean point, or centroid µi, of each set. Then, every point is assigned to the cluster whose centroid is closest to that point. These two steps are repeated until convergence, which is obtained when the points no longer switch clusters (or alternatively, when centroids no longer change location). The k-means algorithm often starts with a random partition of input points to initialize the clusters. However, this approach is not always optimal, and is subject to many local minima. In this work, we employ a 2-step approach, which consists of initializing KKM with results of an alternative of k-means, which is called spectral clustering. This two-layer approach, first running spectral clustering to get an initial partitioning, and then refining the partitioning by running KKM on the partitioning, typically results in a robust

15

partitioning of the data (Dhillon, 2004). For details about spectral k-means, see Appendix B. DKM is well adapted to model uncertainty on spatial parameters. However, DKM can also be employed in the context of deterministic parameters or both deterministic and stochastic parameters. The next section proposes a comparison of the DKM and JMM.

4. Application to a synthetic case – comparison between both methods

In this section, we describe a synthetic case on which uncertainty assessment was performed using both DKM and JMM. Results are presented for both approaches, as well as results obtained by mixing both approaches in a new workflow.

4.1. Case description:

The synthetic case used in this section is modified form the case presented in Suzuki and Caers (2006). In this synthetic case, we assume again that we have a facies model with two different types of uncertainties:

- Deterministic (non-spatial) uncertain parameters: o Permeability of channels: 250, 500, 750 mD o Porosity of channels: 0.15, 0.225, 0.3 o Critical mobile water saturation (SWCR): 0.2, 0.25, 0.3

- Geostatistical (stochastic, spatial) uncertain parameters:

o Orientation of channels: N45E, N,N45W o Proportion of facies: 20%, 30% or 40% o Sinuosity of the channels: small, medium, large

The objective is to estimate the quantiles and the densities of the cumulative oil production (FOPT) as a function of the time. The uncertainty in the geostatistical parameters is taken into account by combining the 3 geostatistical uncertainties together, which leads to 27 possible training images (TI), depicted in Figure 7. 5 facies realizations per TI were generated using a multi-point geostatistical algorithm, giving a total of 135 possible geostatistical realizations which should be taken into account.

16

Figure 7: 27 Training Images used in this application

The uncertainty in the deterministic parameters is analyzed using an experimental design. Since a second order polynomial model is most often used in JMM to model the RSM (Eq. 2 and Eq. 4), a Central Composite Design (CCD) is employed. A CCD is constructed with the 3 deterministic parameters, giving 15 reservoir simulations. As a consequence, each of the 135 geostatistical realizations has 15 possible configurations of permeability, porosity and critical mobile water saturation (as given by the ED), leading to a total number of 2025 reservoir models (135x15). To compare the accuracy of both modeling approaches, flow simulation was performed on those 2025 models, an exercise that would not be possible in practice. The cumulative oil production for these 2025 simulations as a function of time is presented in Figure 8.

Figure 8: Cumulative oil production (FOPT) as a function of time for the 2025 simulations

The 2025 simulations of the cumulative oil production were performed using a standard finite difference simulator in order to serve as a reference to analyze the efficiency and the quality of the results of both methods, the JMM and the DKM.

17

4.2. Application of the Joint Modeling Method

We first apply the JMM to the case presented previously. Since 15 flow simulations need to be performed for each geostatistical realization, the use of JMM requires a reasonable number of geostatistical realizations to be considered. We are interested in assessing the accuracy of the JMM by varying the number of simulations to obtain the response surfaces, while keeping at the same time a feasible number of flow simulations. Our first step is to reduce the number of simulations required by the JMM, by selecting only one geostatistical realization per TI. Considering all 27 TIs leads to a total of 15x27=405 simulations, which would still be a significant number of flow simulations for real field cases. In order to vary (and reduce) the number of simulations to perform, we select geostatistical realizations from a subset of TIs. The JMM results that follow are compared with the results for the entire set of 2025 simulations. In the following figures, we present:

- (A): the P10, P50 and P90 quantile estimation of the cumulative oil production

- (B) and (C): the FOPT value at 3000 and 6000 days for each point of the ED and for each geostatistical realization (value obtained by the mean model is presented in red, interval of prediction in black)

- (D) and (E): density plots at 3000 and 6000 days for all the 2025 simulations (in red) and for the JMM (in blue). Note that in order to calculate the quantiles as a function of time, the mean and dispersion models must be constructed for each time step, and are thus incorrectly assumed to be independent.

We first apply the JMM by running a small number of flow simulations. Since the experimental design contains 15 points, only multiples of 15 are possible. Thus, to take into account the geostatistical uncertainty on facies, we consider 3 different geostatistical realizations, resulting in 45 simulations. To do so, we fix the sinuosity of the channel and proportion of facies to their mean values - the orientation of facies remains uncertain. The corresponding TIs are taken at the center of each of the 3 groups in Figure 7. Uncertainty assessment was then done on those 45 simulations. The resulting quantiles (A), intervals of prediction ((B) and (C)) and densities ((D) and (E)) are presented in Figure 9.

18

0 5 10 150.5

1

1.5

2

2.5

3

3.5

4

4.5x 10

4

Simulation #

FOP

T at

600

0 da

ys fo

r 3 re

aliz

atio

ns

0 5 10 151.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2x 10

4

Simulation #

FOP

T at

300

0 da

ys fo

r 3 re

aliz

atio

ns

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Time (days)

FOP

T

Exhautive SetJoint Modeling

(A)

(B) (C)

(D) (E)

0 0.5 1 1.5 2 2.5 3

x 104

0

1

2

3

4

5

x 10-4 Density Plot at 3000 days

FOPT

Freq

uenc

y

Exhaustive SetJoint Modeling

-1 0 1 2 3 4 5

x 104

0

0.2

0.4

0.6

0.8

1

1.2

1.4x 10

-4 Density Plot at 6000 days

FOPT

Freq

uenc

y


Figure 9: Results of JMM for 3 TI, i.e. 45 simulations: (A) P10, P50 and P90 quantile

estimations for all 2025 simulations (red) and for JMM (blue), (B) and (C) value of FOPT at 3000 and 6000 days for each point of the ED, (D) and (E) densities at 3000 and 6000 days

We observe in Figure 9A a large overestimation of the P50 and P90 quantiles given by the JMM. Also, for some points of the experimental design (8, 9, 10), the interval of prediction is too large, and thus exaggerating the stochastic uncertainty (Figure 9B and 9C). The two probability density plots (Figure 9D and 9E) show that the cumulative oil production density predicted by JMM is too narrow compared to the exhaustive set. This is understandable given that we have selected a very small number of geostatistical realizations to model the dispersion.

19

To improve the quality of results obtained using JMM, we add gradually more TIs. We now consider 9 TIs using again 1 geostatistical realization per TI, corresponding to first diagonal (from top left to bottom right) of each group of 9 TIs in Figure 7, totaling of 135 simulations. The results are shown in Figure 10.

(A)

(B) (C)

(D) (E)

0 5 10 150

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

4

Simulation #

FOP

T at

600

0 da

ys fo

r 9 re

aliz

atio

ns

0 5 10 150.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4x 10

4

Simulation #

FOP

T at

300

0 da

ys fo

r 9 re

aliz

atio

ns

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4

4.5x 10

4

Time (days)

FOP

T


-1 0 1 2 3 4 5

x 104

0

1

2

3

4

5

6

7

8x 10


FOPT

Freq

uenc

y


0 0.5 1 1.5 2 2.5 3 3.5

x 104

0

1

2

3


FOPT

Freq

uenc

y




As we observe on Figure 10A, the results have significantly improved when using more simulations. The estimation of production quantiles is very different from the case of 45 simulations. Moreover, the P90 quantiles, which were well estimated in the previous example, are now overestimated, and the P10 quantiles are underestimated, contrary to

20

the previous case where overestimation was detected. In addition, the P50 quantiles are much more precise in this case. Another observation is that overestimated prediction intervals can still be observed (Figure 10C), and for a some points of the experimental design (5-8, 10-14 on Figure 10B), simulated points can be found outside of the prediction interval. Finally, we observe for the JMM (Figure 10D and 10E) negative values of the cumulative oil production for 3000 and 6000 days, which is clearly unphysical. Figure 11 shows the results where 15 TIs where used from both diagonals on Figure 7. The total number of simulations for this case is 225.

(A)

(B) (C)

(D) (E)

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Time (days)

FOP

T


0 5 10 150.5

1

1.5

2

2.5

3

3.5

4

4.5x 10

4

Simulation #

FOP

T at

600

0 da

ys fo

r 15

real

izat

ions

0 5 10 150.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2x 10

4

Simulation #

FOP

T at

300

0 da

ys fo

r 15

real

izat

ions

-1 0 1 2 3 4 5

x 104

0

1

2

3

4

5

6

7

8x 10


FOPT

Freq

uenc

y


0 0.5 1 1.5 2 2.5 3 3.5

x 104

0

1

2

3


FOPT

Freq

uenc

y




21

As we observe on Figure 11A, the results have significantly improved when using more simulations. However, the geostatistical uncertainty is still not completely captured - we can still observe an overestimation of the range of uncertainty (Figure 11B and 11C). Again, negative values of FOPT are observed in the probability density plots. Let us examine the results, in Figure 12, using all the 27 TIs, which leads to a total of 405 flow simulations.

(A)

(B) (C)

(D) (E)

0 5 10 150.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2x 10

4

Simulation #

FOP

T at

300

0 da

ys fo

r 27

real

izat

ions

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Time (days)

FOP

T


0 5 10 150.5

1

1.5

2

2.5

3

3.5

4

4.5x 10

4

Simulation #

FOP

T at

600

0 da

ys fo

r 27

real

izat

ions

-1 0 1 2 3 4 5

x 104

0

1

2

3

4

5

6

7

8x 10


FOPT

Freq

uenc

y


0 0.5 1 1.5 2 2.5 3 3.5

x 104

0

1

2

3


FOPT

Freq

uenc

y


Figure 12: Results of JMM for 27 TI, i.e. 405 simulations: (A) P10, P50 and P90 quantile estimations for all 2025 simulation (red) and for JMM (blue), (B) and (C) value of FOPT at 3000 and 6000 days for each point of the ED, (D) and (E) densities at 3000 and 6000 days

22

Again, results have improved compared to the previous case. We observe that results are very accurate but a large number of flow simulations is required, which in many the cases may not be feasible. The JMM constructs response surfaces and uses the Monte Carlo method to construct the probability densities and quantile production curves. Alternatively, we can employ the DKM.

4.3. Application of the Distance Kernel Method (DKM)

In this section, we apply the DKM method with kernel k-means methodology (KKM) on the synthetic case presented previously. As mentioned in Section 3, the method relies on an estimation of the dissimilarity of all the possible reservoir models (2025 in this case). Note that in this case, flow-based distance is required since static distances, like Hausdorff distance, do not take into account the differences in the deterministic parameters such as uniform channel permeability, porosity, and critical water saturation. The distance is calculated using streamline simulation. Streamline simulation is known to be a fast flow simulation technique due to its capability to take large time steps. Simulation was performed for only two time-steps, up to 20000 and 30000 days. The distance between any two reservoir models is given as the average of absolute difference in total oil field production rate at 20000 and 30000 days. Using the dissimilarity distance defined previously, all reservoir models are mapped into a 2D Euclidean space (cf. Figure 13A) using multidimensional scaling (MDS). The scatter plot on Figure 13B, representing the Euclidean distance between any two points in the 2D Euclidean space as a function of the dissimilarity distance given by streamline simulation, shows clearly that the two distances are similar. Thus, we are assured that using a 2D representation of the models is accurate enough to subsequently use the Euclidean distance in R used for the rest of the study instead of the dissimilarity distance.

-8000 -6000 -4000 -2000 0 2000 4000 6000-2500

-2000

-1500

-1000

-500

0

500

1000MDS space

(A) (B) Figure 13: (A) 2025 realizations mapped in to a 2D space R, (B) Scatter plot of Euclidean

distance between points in R and dissimilarity matrix distance (correlation ρ = 0.9975).

23

The kernel employed to define the feature space F is Gaussian, whose parameter equals 20% of the range of distances in the Euclidean space (σ= 2250). Its expression is:

��

��

�

−−= 2

2

2exp),(

σyx

yxk

where and x and y represent realizations in the MDS space R. The KKM algorithm is then applied, which consists of using the traditional k-means clustering algorithm directly in the feature space F. In this example, we select 45 models and perform simulations on each of the selected models, which results in the same number of simulations as in the example application for the JMM, for comparison of the results. In order to initialize the cluster centroids, we propose to perform a two-step approach. First, spectral clustering is applied to have initial centroids, and then the kernel k-means (KKM) algorithm is performed using those initial centroids (See Appendix B for details about this 2-step approach). Flow simulations were then performed on the 45 centroids of the clusters, presented in blue in Figure 14A. The results of the simulations for FOPT are presented in Figure 14B. We can see that KKM permits the selection of representative realizations - the spread of uncertainty for only 45 simulations is quite good.

(A)

-8000 -6000 -4000 -2000 0 2000 4000 6000-2500

-2000

-1500

-1000

-500

0

500

1000

All realizationsSelected realizations

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4

4.5x 10

4

Time (days)

FOP

RFO

PT

0 1000 2000 3000 4000 5000 6000 7000 8000

Time (days)(B)

Figure 14: (A) Mapping space R: blue points represent the points selected by the methodology, (B) FOPT values for the 45 realizations corresponding to the blue points

The P10, P50 and P90 quantiles (Figure 15) were calculated by weighing each realization by the number of points within the clusters.

24

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Time (days)

FOP

T

All realizationsKKM

Figure 15: P10, P50 and P90 quantile estimation as a function of time

(A) (B)

-1 0 1 2 3 4 5

x 104

0

1

2

3

4

5

6

7

8x 10


FOPT

Freq

uenc

y

Exhaustive SetKKM results

0 0.5 1 1.5 2 2.5 3 3.5

x 104

0

1

2

3


FOPT

Freq

uenc

y

Exhaustive SetKKM results

Figure 16: Density plot for all models and the 45 selected models at (a) 3000 and (b) 6000 days

As we can see on Figure 15, the quantile estimations are accurate, while performing only 45 simulations (which is 2.2% of the total number of simulations). The density estimations are also of good quality (Figure 16), the 45 simulations (blue) have similar properties than the entire set of 2025 (red). Note that the densities are not smooth, due to the small number of simulations and the weight for each realization applied to compute those densities. In addition, comparison with the estimation of the quantiles using JMM using 45 simulations shows a better estimation of the quantiles and densities for DKM. To obtain equivalent results using JMM, one must run a larger number of simulations

25

4.4. Comparison with randomly chosen realizations

We now compare the results of DKM and JMM with a random selection of the same number of simulations. To do this, we have generated 20 sets of 45 and 135 simulations, we present comparisons with the best and the worst set. In the case of 45 simulations, comparison is made with the DKM and JMM methods, whereas in the other case, comparison is only done with the JMM.

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Time (days)

FOP

T


0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Time (days)

FOP

T

All realizationsKKM

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Time (days)

FOP

T

All realizationsRandom Selection

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Time (days)

FOP

T


(A) (B)

(C) (D)

FOP

T

FOP

T

Time (days)0 1000 2000 3000 4000 5000 6000 7000 8000

Time (days)

Figure 17: P10, P50 P90 values as a function of time, (A) 45 simulations selected by JMM, (B) 45 simulations selected by DKM, (C) best results using 45 randomly chosen models, (D) worst

results using 45 randomly chosen models

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Time (days)

FOP

T


0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Time (days)

FOP

T


0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4

4.5x 10

4

Time (days)

FOP

T


(B) (C)(A)

FOP

T

FOP

T

Time (days) Time (days)

Figure 18: P10, P50 P90 values as a function of time, (A ) JMM using 135 simulations, (B) best results using 135 randomly chosen models, (C) worst results using 135 randomly chosen models

26

We can see in Figure 17 that DKM gives better quantile estimation than randomly chosen realizations, which is not always the case for JMM (Fig 18). The main problem with the JMM is that in order to be accurate, one must select the correct realizations to properly capture the stochastic uncertainty. In the next section, we propose thus to use the DKM procedure to select the correct realizations.

4.5. Combination of JMM and DKM

We propose now to combine the two approaches. Instead of choosing arbitrarily the training images on which we want to perform the JMM, we first perform DKM with values of permeability, porosity and SWCR fixed at their mean, and then perform JMM on the 3 geostatistical realizations representing the P10, P50 and P90 quantiles of the production. To do so, 12 geostatistical realizations were selected by the DKM. Flow simulation was performed on these realizations, and the P10, P50, and P90 realizations were identified. Subsequently, we apply the JMM using these 3 realizations to calculate the stochastic uncertainty. A total of 45 simulations for the JMM (54 for the combination of both methods) are used. Results are presented in Figure 19 below.

27

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Time (days)

FOP

T


(A)

(B) (C)

(D) (E)

0 2 4 6 8 101

1.5

2

2.5

3

3.5

4

4.5x 10

4

Simulation #

FOP

T at

600

0 da

ys fo

r 3 re

aliz

atio

ns

0 2 4 6 8 100.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2x 10

4

Simulation #

FOP

T at

300

0 da

ys fo

r 3 re

aliz

atio

ns

-1 0 1 2 3 4 5

x 104

0

1

2

3

4

5

6

7

8x 10


FOPT

Freq

uenc

y


0 0.5 1 1.5 2 2.5 3 3.5

x 104

0

1

2

3


FOPT

Freq

uenc

y


Figure 19: Quantiles and densities (at 3000 and 6000 days) obtained by JMM on 45 models

selected by DKM

We can observe that the quantile estimations are much more accurate than in the case of 45 simulations using only JMM. The DKM procedure here allows the selection of optimal geostatistical realizations and thus improves the JMM. However, the results are still less accurate than applying DKM only.

28

5. Conclusions

The aim of this paper is to propose the DKM as an alternative method to JMM to quantify uncertainty on both deterministic and stochastic parameters. This method has been compared with the JMM on a synthetic dataset, consisting of a facies model with stochastic uncertainty on the channel orientation, the proportion of facies and the channel sinuosity and with deterministic uncertainty on porosity, permeability and critical water saturation. The comparative study shows that DKM provides more accurate quantile estimations than JMM, with only 45 flow simulations, reducing clearly the number of flow simulation required to have an accurate estimation of the densities and quantiles of the response. The efficiency of the DKM is dependent upon an accurate measure of distance which must be correlated with the production response(s) of interest. In the case presented here, the DKM is applied by using a flow-based distance. A flow-based distance is required to calculate distances when flow parameters (such as critical water saturation) are uncertain. Employing static measures of distance such as the Hausdorff distance may not be as accurate. Note that the distance calculation in this case required a streamline simulation for each model, but the CPU time required for each simulation is negligible compared to the full flow simulation. One limitation of the JMM is the number of geostatistical realizations that can be taken into account. In order to have an acceptable number of simulations, only a few (2 to 5) geostatistical realizations can be used in practice. When there are many possible training images or simply many geostatistical realizations, a method to select an accurate subset of realizations for use with the JMM is necessary. One way to do so, also presented in this paper, is to combine both JMM and DKM approaches. Results were promising in the synthetic case used, showing a great improvement of the results compared to the case where the realizations were chosen randomly. In addition, one drawback of JMM is that non-physical results may be obtained, for example negative prediction of the cumulative oil production. This is not the case using DKM, which only consider response from existing simulations and does not try to predict new responses from potentially inaccurate response surface models.

29

References Borg, I., and Groenen, P., 1997, Modern multidimensional scaling: theory and applications: Springer, New-York Damslet, E., Hage, A., and Volder, R., Maximum Information at Minimum Cost! A North Field Development Study With an Experimental Design, JPT (Dec. 1992), p.1350 Dhillon, I. S., Guan, Y. and Kulis, B., Kernel k-means, Spectral Clustering and Normalized Cuts, KDD, August 22-25, 2004, Seattle, Washington, USA Ng, A. Y., Jordan, M. and Weiss, Y., On spectral clustering: Analysis and an algorithm, In Proc. Of NIPS-14, 2001 Manceau, E., Mezghani, I., Zabalza-Mezghani, I. and Roggero, F., Combination of Experimental Design and Joint Modeling Methods for Quantifying the Risk Associated With Deterministic and Stochastic Uncertainties – An Integrated Test Study, SPE Annual Technical Conference and Exhibition, New Orleans, SPE 71620, 2001. Manceau, E., Zabalza-Mezghani, I., and Roggero, F., Use of Experimental Design to Make Decisions in an Uncertain Reservoir Environment - From Reservoir Uncertainties to Economic Risk Analysis, OAPEC seminar, 26-28th June 2001, Rueil-Malmaison, France. McCullagh, P. and Nelder, J. A., Generalized Linear Models, 2nd edition, Chapman and Hall, 1989 Myers, R., H. and Montgomery, D., C., Response Surface Methodology, Wiley & Sons, 2002 Scheidt, C. and Caers, J, A Workflow for Spatial Uncertainty Quantification using Distances and Kernels, SCRF report 20, Stanford University Schöelkopf, B., Smola, A., 2002, Learning with Kernels: MIT Press, Cambridge, MA, 626 p. Suzuki, S., and Caers, J., 2006, History matching with an uncertain geological scenario: SPE Annual Technical Conference and Exhibition, San Antonio, SPE 102154. Venkataraman, R., Application of the Method of Experimental Design to Quantify Uncertainty in Prodution Profiles, SPE 59422, SPE Asia-Pacific Conference, Yokohama, Japan, April 25-26, 2000. Zabalza, I., Analyse Statistique et Planification d’Expérience en Ingénierie de Réservoir, IFP Thesis, 2000

30

Appendix A: Generalized Linear Models Consider:

- niiy ,,1)(�==y the vector of realizations yi of random variables Yi, independently

distributed - ( )ii YE=µ - niiY ,,1)(

�==Y and nii ,,1)(�== µ�

To define the generalized linear model associated with Y, the assumption that the probability densities of Yi belong to the exponential family is needed. The generalized linear model of Y is then defined by:

( )��

��

�

====

=

niVdiagCov

g

E

i ,,1),()()(

)(

�µτη

YX��

�Y

where: - η is the linear predictor of � - � is a unknown vector of coefficients of the regression - X is the model matrix, each row Xi is defined such as �X i is a polynomial model - g is the link function, differentiable and monotonous - τ is a scaling parameter, V is the variance function of Y

Assuming that the response Y is normally distributed, we can easily specify the link function g, the variance function V and the scaling parameter τ (Zabalza, 2000):

��

��

�

==

=

20

1στ

V

Idg

with 20σ an estimation of the variance of Y.

The assumption of normality on the response Y induce a Gamma distribution for the dispersion ( )2)()()( xxx µ−= YD . For a Gamma distribution, the ln function as a link function insures the positivity of the variance. In addition, the Gamma law allows the specification of the variance function:

��

==

220 )()(

)ln()(σi�V

g ��

31

Appendix B: Spectral Clustering Using a random partition of input points to initialize the initial clusters is not always optimal and the k-means algorithm is subjected to many local minima. In this work, we propose a two-step approach, which consists of initializing KKM with results from an alternative of k-means, which is called Spectral Clustering.

Spectral Clustering as initialization of KKM A promising alternative to traditional clustering methods that has recently emerged in a number of fields is to use spectral methods for clustering. These algorithms clusters points using eigenvectors of matrices derived from the data. Algorithm proposed in Ng et al (2001) for initializations of points for KKM

1. Form the affinity matrix ( )222/exp σjiij xxK −−=

2. Define D to be the diagonal matrix whose (i,i)-element is the sum of K’s row and construct the matrix: 1/21/2KDDL −−=

3. Find v1, ..., vk, the k largest eigenvectors of L, and form the matrix [ ]kvv ,,1 �=V

by stacking the eigenvectors in columns 4. Form the matrix Y from V by renormalizing each of V’s rows to have unit length,

i.e. 2/12 )/(�=j

ijijij XXY

5. Treat each row of Y as a point in Rk, cluster them in k clusters via k-means (initial centroids are initialized such that they are 90 degrees apart )

6. Assign the original point xi to cluster j if and only if row i of the matrix Y was assigned to cluster j

Spectral approach allows to solve a relaxed problem: compute the first k eigenvectors of the matrix 1/21/2KDDL −−= . This maps the original points to a lower-dimensional space and a discrete clustering solution is attained. One can treat the resulting partitioning as a good initialization to kernel k-means on the full dataset. This two-step approach – first running spectral clustering to get an initial partitioning and then refining the partitioning by running KKM on the partitioning – typically results in a robust partitioning of the data. (Dhillon, 2004) For an overview of clustering techniques, see Buhmann (1995), and Shawe-Taylor and Cristianini (2004) for specific information about kernel clustering techniques.

Joint quantification of uncertainty on spatial and non...

Documents

Transcript of Joint quantification of uncertainty on spatial and non...