2010 Bhattacharjya Math Geosci

download 2010 Bhattacharjya Math Geosci

of 23

Transcript of 2010 Bhattacharjya Math Geosci

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    1/23

    Math Geosci (2010) 42: 141163DOI 10.1007/s11004-009-9256-y

    The Value of Information in Spatial Decision Making

    Debarun Bhattacharjya Jo Eidsvik

    Tapan Mukerji

    Received: 21 February 2009 / Accepted: 16 November 2009 / Published online: 5 December 2009

    International Association for Mathematical Geosciences 2009

    Abstract Experiments performed over spatially correlated domains, if poorly cho-

    sen, may not be worth their cost of acquisition. In this paper, we integrate the

    decision-analytic notion of value of information with spatial statistical models. We

    formulate methods to evaluate monetary values associated with experiments per-

    formed in the spatial decision making context, including the prior value, the value

    of perfect information, and the value of the experiment, providing imperfect infor-

    mation. The prior for the spatial distinction of interest is assumed to be a categoricalMarkov random field whereas the likelihood distribution can take any form depending

    on the experiment under consideration. We demonstrate how to efficiently compute

    the value of an experiment for Markov random fields of moderate size, with the aid

    of two examples. The first is a motivating example with presence-absence data, while

    the second application is inspired by seismic exploration in the petroleum industry.

    We discuss insights from the two examples, relating the value of an experiment with

    its accuracy, the cost and revenue from downstream decisions, and the design of the

    experiment.

    Keywords Value of information Value of experiment Markov random field

    Spatial decision making Decision analysis Recursive computation

    D. Bhattacharjya ()

    Business Analytics and Math Sciences, IBM T.J. Watson Research Center, Yorktown Heights,

    NY, USA

    e-mail: [email protected]

    J. Eidsvik

    Department of Mathematical Sciences, NTNU, Trondheim, Norway

    T. Mukerji

    Department of Energy Resources Engineering, Stanford University, Stanford, CA, USA

    mailto:[email protected]:[email protected]
  • 7/31/2019 2010 Bhattacharjya Math Geosci

    2/23

    142 Math Geosci (2010) 42: 141163

    1 Introduction

    Gathering the right kind and right amount of information is crucial for any deci-sion making process. Auxiliary decisions regarding information gathering often arise

    when an important decision is to be made in the future. This downstream deci-sion may have a lot at stake, and it may be worthwhile to obtain more informa-tion before an irrevocable allocation of resources. A crucial questionhow muchinformation should one purchase?is related to the well-established concept ofvalue of information in decision analysis (Howard 1966; Raiffa 1968; Lindley 1971;

    Matheson 1990). The value of information for a particular information gatheringscheme is the maximum monetary amount that a decision maker should be willing topay to acquire it. The value of information depends on several factors, including theprior probabilities, the quality of the test and the decision makers utility curve. Inthis paper, we present models that compute the value of information for experiments

    performed in the context of spatial decision making. We use the phrase spatial deci-sion making to refer to decision problems with two important characteristics: that thedecision generally involves a choice of alternatives over space and that the distinc-tion of interest is spatially correlated. There are several applications that are relevant

    within this context. Petroleum exploration and production is a natural contender asa possible application, here the distinction of interest is the presence or absence ofeconomic accumulations of oil within a reservoir. The subsurface distribution of hy-drocarbons is spatially correlated, and the decision maker must decide where to drillwells to recover oil and maximize profits. Information about the latent distinction of

    interest may be acquired (at a cost) in the form of seismic data, measured with noise.While seismic data are usually continuous variables, other applications use binarydata acquired in the form of presenceabsence, again possibly measured with noisethrough observation. We illustrate our methods with examples from both cases.

    The value of information is a powerful tool in many decision making scenarios in-cluding medical sciences (e.g., Yokota and Thompson 2004; Welton et al. 2008) andgeosciences, particularly in the oil and gas industry, where it is used for the analysis oflarge decision problems involving reservoir appraisal and depletion (Begg et al. 2002;Cunningham and Begg 2008). Usually the application of value of information in med-

    ical decisions does not involve spatial dependence. In many geoscience applicationsat the global level, spatial dependence is usually not modeled directly. Our focusblends a local view with the global level, in the sense that the value of informationis computed for a particular reservoir using a modeling approach that works at a

    finer granularity. We briefly discuss some recent literature for computing the valueof information in spatial decision making, and compare their approach to the generalframework adopted in the current paper. Polasky and Solow (2001) present issuesregarding the value of information in conservation biology, indicating that inferencesabout value of information can often be counter-intuitive. They assume that the dis-

    tinction of interest is independent in the spatial domain. Houck and Pavlov (2006)estimate the value of information for electromagnetic surveys using a decision treeformulation. Bickel et al. (2006) use a Gaussian model to estimate the value of seis-mic attributes, but do not model spatial dependence between the target drilling sites.

    In Eidsvik et al. (2008), the value of information is computed for a spatially correlatedcontinuous variable using a Gaussian model.

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    3/23

    Math Geosci (2010) 42: 141163 143

    The value of information is only one of many approaches for making decisions

    about experiments. The decision-analytic view on experimentation is based on explic-

    itly representing decisions and preferences in the model, and is very different from

    statistical approaches (Howard 1970). Our focus is on the value of the experiment,

    which should be distinguished from the question of experimental design. Decision-theoretical approaches have been successfully applied to experimental design. For

    example, Lindley (1971) and Clyde (2001) outline general approaches to experimen-

    tal design, trying to answer the question of what is the best way to conduct the ex-

    periment. We attempt to answer a different question in a spatial setting, namely what

    is the value of the experiment. Apart from the value of information, other measures

    have been used to make decisions about experiments. Entropy is one of the most

    popular measures in spatial design of experiments, see Zidek et al. (2000), Le and

    Zidek (2006), Marchant and Lark (2007) and Fuentes et al. (2007). The goal here is

    to choose the experiment that minimizes the entropy. Entropy has gained popularitybecause of its generality and ease of usage, but it is fundamentally different as a mea-

    sure of valuing information. We discuss some of these differences later, in Sect. 3.5.

    Another criterion used in statistics is minimum prediction variance. For instance,

    Diggle and Lophaven (2006) describe a Bayesian model for geostatistical design of

    monitoring sites using minimum integrated prediction variance as the criterion. As

    with entropy, it is hard to relate this criterion to the downstream decision and its as-

    sociated (monetary) values. Note that most geostatistical applications focus on both

    prediction and on spatial covariance parameter estimation, see Mller (2001) and

    Marchant and Lark (2007). Diggle and Lophaven (2006) show that the best spatialdesign of an experiment depends on whether you fix spatial covariance parameters or

    include them in a Bayesian model. In our framework, we treat the parameters as fixed

    and known from a priori assumptions. In the examples, we will show sensitivity to

    a spatial dependence parameter, but our focus is on valuing experiments rather than

    parameter estimation or optimal spatial design.

    We introduce the value of information to situations that naturally exhibit spa-

    tial dependence through a categorical Markov random field (MRF) model (Besag

    1974) for the distinction of interest. Moderate size MRFs can be assessed recur-

    sively (Reeves and Pettitt 2004), and this allows us to efficiently compute the value

    of information. One of the main contributions of the paper is to outline a specific

    decision-analytic method for computing the value of information in such a spatial

    setting. We also provide new insights via two examples. In the first example, we

    explore the effects of constraints in the decision makers budget, and in the second

    example we illustrate important design issues regarding complete versus partial spa-

    tial experiments. We use a monetary value measure to ensure that the value of infor-

    mation corresponds directly to the maximum a decision maker should be willing to

    spend to purchase an experiment. For the sake of clarity, we assume throughout the

    paper that the decision maker is risk-neutral (Raiffa 1968). In other words, the deci-

    sion maker is indifferent between a lottery of uncertain monetary prospects and the

    expected value of the lottery. Our models can incorporate risk-averse or risk-seeking

    behavior if required. Section 2 develops basic notation for the rest of the paper. In

    Sect. 3, we present the model assumptions and equations. This section explains how

    we integrate categorical MRFs with different approaches to valuing information. In

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    4/23

    144 Math Geosci (2010) 42: 141163

    Sect. 4, we describe an algorithm that couples Monte Carlo simulation and a recur-

    sive method for computing the value of information of an experiment. The model

    is demonstrated on two examples in Sect. 5 to develop critical insights. Finally, we

    discuss our conclusions and directions for further research in Sect. 6.

    2 Basic Notation

    We consider a spatial field that lies on a lateral 2-dimensional grid composed of

    n1 n2 cells, i.e., the grid has n1 rows and n2 columns with a total of N = n1n2cells. Besag (1974) refers to this kind of setting as a lattice system. We can index

    the cells in the grid from top to bottom and left to right as i = 1, 2, . . . , N . Let

    x = {xi : i = 1, 2, . . . , N } be the random variable over the entire field for the dis-tinction of interest to the decision maker. We assume that xi is a categorical random

    variable that equals any one out of d possible colors (or states). When d = 2, xi is a

    Bernoulli random variable. For this work, we will choose xi = 1 as the favorable out-

    come at cell i. The joint prior probability of an outcome x is denoted by p(x), while

    the marginal probability of an outcome xi is denoted by pi (xi ). The marginal prior

    probability of the favorable outcome xi = 1 is then pi (1). We call the set created by

    these marginal probabilities {pi (1) : i = 1, 2, . . . , N } the prior probability map.

    Let y = {yj : j J} be the random variable for the result of an experiment, which

    may be conducted to obtain more information about x. The set J can be a subset of{1, 2, . . . , N }, and thus the experiment need not be performed at all cells in the grid.

    In fact, it may be optimal to perform the experiment only at a few cells, depending on

    all the parameters and the cost of the experiment. The likelihood for the experimental

    result y given the outcome of the distinction of interest x is p(y|x). For a continuous

    random variable y, p(y|x) is a density function; it is a probability function if y is

    a discrete random variable. We inspect both cases with the help of examples, but in

    presenting our methods we treat y as continuous. The likelihood of the experimen-

    tal outcome could be obtained by summing out all possible configurations ofx , i.e.,

    p(y) =x p(y|x)p(x), although often it is impractical to perform this summationdue to the large number of possible configurations. The posterior probability of the

    outcome of the distinction of interest x given an experimental result y is p(x|y). The

    marginal posterior probability of an outcome xi at cell i given an experimental result

    y is written as pi (xi |y). As the favorable outcome we use xi = 1, which has mar-

    ginal posterior probability pi (1|y). For all i, this is called the posterior probability

    map.

    3 Model Formulation

    We now expand on the terminology and specify model assumptions regarding the

    probability distributions for x and y. The conceptual equations that determine mea-

    sures for the value of an experiment are also explained in this section.

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    5/23

    Math Geosci (2010) 42: 141163 145

    3.1 Prior Spatial Model for the Distinction of Interest

    Spatial dependence in the distinction of interest x is incorporated through the use of a

    categorical first-order MRF formulation (Besag 1974). This implies that the probabil-

    ity for a certain outcome in a given cell, given the outcome in all other cells, dependsonly on the outcome in the four neighboring cells. The spatial field is thus represented

    as an Ising model with as the interaction parameter

    p(x) =exp[

    ij I (xi = xj) +

    Ni=1 i (xi )]

    z. (1)

    I(A) is an indicator function taking value 1 if A is true and 0 otherwise. The first

    summation in (1) is over all pairs xi and xj that are closest neighbors in the grid,

    and z is a normalizing constant. The point-wise prior function i (xi ) is a function ofthe outcome at every cell. It provides the mechanism for including prior information

    about the outcome at a particular cell, based on expert opinion or previous data. On

    the other hand, controls the spatial dependence of the latent variable. We assume

    that and i (xi ) are known a-priori. The special case when i (xi ) = 0, xi , i is

    known as the uninformative prior case. In this situation, the marginal probabilities

    are such that each state is equally likely at every cell. Note that for general i (xi ), the

    marginal probabilities pi (xi ) will also depend on the interaction parameter .

    3.2 Likelihood Model for the Experiment

    The random variable y is the result of an experiment performed in cells with in-

    dices in the set J. We assume that yj, the result for the experiment at cell j, given

    the outcome over the entire field x, depends only on the marginal outcome xj. By

    conditional independence, we get

    p(y|x) =jJ

    p(yj|xj). (2)

    Each local likelihood distribution p(yj|xj) can be any probability distribution (dis-

    crete or continuous) depending on the kind of experiment being performed. The like-

    lihood distribution is a measure of the accuracy of the experiment. We show results

    from an experiment with a binary outcome (Sect. 5.1) and an experiment with a con-

    tinuous (Gaussian) outcome (Sect. 5.2).

    3.3 A Simple Decision-analytic Framework

    In this section, we present the decision-analytic concepts, independent of case-

    specific issues. Critical assumptions about the nature of the downstream decision are

    as follows. First, we assume that the main decision specifically involves a one-time

    selection of cells from the field. Sequential decisions regarding cell-selection are not

    considered. The decision makers goal is then to obtain value from the individual

    cells. It is thus possible to isolate each cell and think about costs and revenues for

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    6/23

    146 Math Geosci (2010) 42: 141163

    each cell separately. In this way, we are only concerned with the marginal probabili-

    ties of the favorable outcome at each cell. Crucially, this assumption ensures that the

    value from the field is equal to the sum of the values from the cells. Finally, the deci-

    sion for selection of cells is an unconstrained decision problem. The decision maker

    may choose as many cells as is profitable. The introduction of constraints would en-tail that the decision at a particular cell could not be made independently of those

    at other cells. A more general formulation in Sect. 3.4 relaxes the second and third

    assumptions.

    Let the cost of selecting cell i be Ci and the revenue gained from a favorable

    outcome at that cell be Ri . No revenue is gained if any other outcome is observed.

    The outcome for a cell is not ascertained until and unless the cell is selected. As an

    example, Ci may be the cost associated with drilling a well at cell i and Ri is the

    corresponding revenue obtained from discovering oil at that cell. Revenues and costs

    are external to the probability model, treated as fixed. The decision maker can makeher decision without purchasing the experiment. In that case, the expected value from

    the ith cell Vi is given by

    Vi = max

    0,

    Ri pi (1) Ci

    , (3)

    where pi (1), as mentioned earlier, denotes the prior marginal probability of the fa-

    vorable outcome xi = 1. It is optimal for the risk neutral decision maker to choose

    the ith cell only if the expected profit is positive. Hence the value is the maximum

    of 0, for the case where the cell is not selected, and the expected profit expression

    Ri pi (1) Ci , for the case where the cell is selected. The favorable outcome is seenwhen xi = 1 and all other outcomes result in zero revenue. With the assumptions we

    have described, the prior value (PV) or the total value from the field based on the

    prior alone is the sum of the expected value from the cells

    PV =

    Ni=1

    Vi =

    Ni=1

    max

    0,

    Ri pi (1) Ci

    . (4)

    What if the decision maker had perfect information about the distinction of in-

    terest? In other words, what if a clairvoyant would be willing to reveal the outcomeofxi ? How much should the decision maker pay for this information? If the outcome

    is indeed favorable, the value obtained is the maximum of 0 (cell is not selected) and

    Ri Ci (cell is selected). The decision maker will choose not to select a cell if it is

    known that the outcome is not favorable, thereby making zero profit and equivalently

    ensuring no loss. The value with free clairvoyance (VFC) (see Raiffa 1968) on the

    distinction of interest over the whole field is the sum of the expected value (with free

    clairvoyance on the distinction of interest) obtained at the individual cells. Hence,

    VFC =N

    i=1

    pi (1) max

    0, (Ri Ci )

    . (5)

    Note that when there is clairvoyance, the decision maker can decide whether to select

    the cell or not, after observing the value of xi . In most situations, presumably the

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    7/23

    Math Geosci (2010) 42: 141163 147

    revenues outweigh the costs throughout the field, Ri > Ci i. In those cases, (5)

    can be reduced to VFC =N

    i=1 pi (1) (Ri Ci ). The value of perfect information

    (VOPI) on the distinction of interest, which is the most that the decision maker should

    pay for perfect information, is the increase in profitability from the prior situation to

    the one where clairvoyance is obtained without cost. Note that this is exactly trueonly for decision makers with an exponential or straight-line utility-curve, but it is

    a reasonable approximation for other standard utility functions (Raiffa 1968). In our

    case, it is exact since we assume risk neutrality, which is equivalent to a straight-line

    utility function.

    Therefore,

    VOPI = VFC PV. (6)

    In both PV and VFC calculations, the value depends only on the marginal probabili-

    ties of observing the favorable outcome since the value from a particular cell does notdepend directly on other cells. If the experiment is performed, a result y is observed

    before the main decision is made. The computation for the conditional value of the

    ith cell is along the same lines as in (3), replacing the prior with the posterior. The

    expected value Vi is evaluated with the expectation over the experimental result y

    Vi =

    y

    max

    0, Ri pi (1|y) Ci

    p(y) dy. (7)

    With the assumptions we have described, the value with the free experiment (VFE)

    is the sum of the expected value Vi over all cells

    VFE =

    Ni=1

    Vi =

    Ni=1

    y

    max

    0, Ri pi (1|y) Ci

    p(y) dy. (8)

    Note that we do not assume that the experiment is free of cost. We claim that if one

    could indeed perform the experiment for free, then the value of the situation would

    by given by (8).

    The value of information for the experiment, or for short, the value of the experi-

    ment (VOE), can be computed as the difference between value with the free experi-ment and value from the prior.

    VOE = VFE PV. (9)

    This is the gain in profitability from performing the experiment, and hence this is

    the maximum that should be spent on purchasing the experiment. In our opinion, the

    VOE is the best measure for the worth of an experiment because, by definition, it in-

    dicates how valuable the experiment is to the decision maker in monetary units. How-

    ever poor the experiment is, one can always choose to ignore the results, make the

    same decisions as determined by (4) irrespective of what the experiment reveals, and

    end up being as well off as before. Therefore, in a worthless experiment, VOE = 0.

    Also, no matter how good the experiment is, it cannot be better than directly obtaining

    information about the latent variable since this is the variable that is of ultimate inter-

    est to the decision maker. In this way, the VOPI acts as an upper bound on the VOE.

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    8/23

    148 Math Geosci (2010) 42: 141163

    This naturally leads to another measure of the experiment, the chance of knowing

    (COK), or probability of knowing, a popular notion in the decision analysis commu-

    nity. It can be shown that for a decision maker with a straight-line or an exponential

    utility function,

    COK = VOE/VOPI, 0 COK 1. (10)

    The chance of knowing is thus named for the following reason. Consider a lottery

    where with a probability p, a clairvoyant will provide perfect information on the

    distinction of interest for no charge and with probability 1 p will provide no infor-

    mation at all. The probability p that makes the decision maker indifferent between

    obtaining the experiment for free and playing this lottery is the chance of knowing

    for that experiment. For a decision maker with a straight-line or an exponential utility

    function, the chance of knowing is determined by (10). A good experiment would

    require a higher probability for a person to be indifferent, whereas for a poor exper-iment a smaller value of p would suffice. To summarize, COK is a number between

    0 and 1 that rates the worth of an experiment in a certain context. VOE provides an

    actual monetary value on information, and COK presents an intuitive way to compare

    different experiments quantitatively.

    3.4 A General Decision-analytic Framework

    We now develop a more general framework where we relax the second and third

    assumptions presented in Sect. 3.3. Rather than treating cells separately, the joint dis-

    tribution of the distinction of interest is considered. Furthermore, the constrained case

    needs special attention since the action (where to select sites) depends on the outcome

    of the distinction of interest. Let f(a,x) be the value derived from a realization of the

    field x when action a is taken. a is one of the actions in the set of alternatives A, or

    a A. The prior value, before any information is revealed, is the value derived from

    the optimal course of action based on the prior on x. It is optimal for the risk-neutral

    decision maker to choose the action that maximizes the expected value. Thus,

    PV = maxaA

    xf(a,x) p(x)

    . (11)

    For the case of perfect information on the distinction of interest, the realization x

    is known to the decision maker before the optimal action is chosen. The value of

    perfect information is given by the difference between the expected value with free

    clairvoyance on the distinction of interest and the prior value

    VOPI =x

    maxaA

    f(a,x)

    p(x) PV. (12)

    As before, the value of the experiment is the difference between the expected value

    given result y and the prior value

    VOE =

    y

    maxaA

    x

    f(a,x) p(x|y)

    p(y) dy PV. (13)

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    9/23

    Math Geosci (2010) 42: 141163 149

    The COK can be obtained from (10). In many real-world problems this general for-

    mulation might be preferred to our simplified version presented in Sect. 3.3, since a

    decision at one cell would affect the decision at another. In the petroleum example,

    deviated wells at one cell could drain oil from adjacent cells. Another issue is that of

    constraints in selecting sites; the budget for a field puts a limit on the number of wellsthat can be drilled. All these aspects must be modeled intelligently in the spatial de-

    cision making context because the price for generality in the formulation is a severe

    increase in computation. For the simple framework presented in Sect. 3.3, the VOE

    calculation of (7) involves an integral over the experiment which is approximated and

    summed over all N cells. In addition to this calculation, the general framework also

    requires a summation over x, as seen in (11) and (13).

    3.5 Another Criterion: Entropy

    In this section, we discuss entropy as another criterion for evaluating experiments.

    The notion of entropy, introduced from information theory, e.g., Ash (1965), has

    been used to measure the reduction in uncertainty ofx on observing the outcome y

    of an experiment. It is also one of the most common measure for comparing spatial

    (Gaussian) experiments, see Zidek et al. (2000). The entropy is defined by

    H (x) = x

    p(x) log p(x), (14)

    which can be constructed sequentially as

    H (x) = H (xN) + H (xN1|xN) + + H (x1|x2, . . . , xN). (15)

    The normalized expected reduction in entropy from the experiment y is

    NEMI =H (x)

    y

    H (x|y)p(y) dy

    H (x), (16)

    where NEMI is the normalized expected mutual information betweenx

    andy

    and isa number between 0 and 1, where a NEMI of 1 indicates the most relevant experiment

    possible. In (16), H (x) is the prior entropy andy

    H (x|y)p(y) dy is the expected

    posterior entropy. There is a wide gulf between the philosophies of VOE and NEMI

    as measures for valuing information. NEMI provides a sense of how much uncer-

    tainty can be reduced by performing an experiment, but it cannot directly imply how

    much the decision maker should pay for it. See, for instance, the discussion in Le

    and Zidek (2006), Chap. 11, on estimating the best bang for the buck by entropy. It is

    hardly surprising that the decision-analytic notion of VOE is tied inexorably with de-

    cisions and the preferences of the decision maker. VOE is a more complete measure

    for valuing information and possibly also more difficult to obtain. For an experiment

    to be economic for the decision maker, it must be material to the decision that brings

    value. A material experiment is one that can affect the decision, i.e., the action chosen

    by the decision maker is not identical for different outcomes of the experiment. For

    an experiment to be material, it must be relevant to the distinction of interest. Entropy

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    10/23

    150 Math Geosci (2010) 42: 141163

    based parameters such as NEMI only address aspects of relevancy of the experiment.

    Mutual information measures may be used as a guide in designing the most relevant

    experiment, e.g., Mukerji et al. (2001).

    4 Computational Issues

    The joint distributions for the distinction of interest and the experimental result are

    over the entire field with N cells and are high dimensional. Solving (7) analytically

    may not be possible in general, so we use Monte Carlo simulation by generating M

    realizations of the experimental result. For the mth sample, let wi (ym) = max[0, Ri

    pi (1|ym) Ci ]. Now we can approximate (7) as

    Vi 1M

    Mm=1

    wiym

    . (17)

    An exposition on Monte Carlo methods can be found in Liu (2001). The Monte Carlo

    error is not large if a sufficient number of realizations are generated. In our sim-

    ple framework from Sect. 3.3, we first generate i.i.d. realizations x1,x2, . . . ,xM of

    MRFs in (1). Next, we draw realizations of the experiment y1,y2, . . . ,yM from the

    likelihood p(y|x). These experiments are marginally from the distribution p(y). For

    each of these conceptual datasets ym we compute the marginal posterior for the fa-

    vorable outcome denoted by pi (1|ym

    ). We use recursive forward and backward tech-niques to draw realizations xm, and compute the marginal pi (1|y

    m). We will refer

    to the two methods as RecGenerate and RecCompute respectively, to specify where

    exactly they are used in the main algorithm. The recursive method itself is based on

    Reeves and Pettitt (2004) and outlined in the Appendix. The algorithm for computing

    the value of an experiment is as follows

    1. Find the marginal prior probability for x using RecCompute.

    2. Solve (3) through (6) to find the PV and the VOPI.

    3. Generate a realization x from the prior with RecGenerate.

    4. Generate a conditional realization y of the experimental result from the likelihoodof the experiment, given the realization x.

    5. With the current realization of the experiment y, use RecCompute to evaluate

    the marginal posterior probability of the favorable outcome xi = 1, denoted as

    pi (1|y), and compute the associated value wi (y). This is done for all cells i =

    1, . . . , N .

    6. Repeat Steps 3 through 5 M times, and approximate the integral for Vi , i =

    1, . . . , N , in (7) with the average value from the simulations, shown in (17).

    7. Solve (8) through (10) to get VOE and COK.

    Crucial tasks in Steps 1 through 7 of the algorithm are RecGenerate and RecCom-

    pute, using recursive computing on the field of size n1n2 = N. The recursive method

    presented in the Appendix is of order O(N), but for each step of the recursion we

    need to evaluate and store terms of size dn1 ; this is the computationally intensive part

    of the algorithm. Therefore, the smallest grid dimension n1 should not be too large

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    11/23

    Math Geosci (2010) 42: 141163 151

    (say, around n1 = 10 for d = 3), although memory and storage concerns will become

    less of an issue with the growing prevalence of 64-bit machines. Note that the algo-

    rithm above only applies to the simple framework of Sect. 3.3. For the more general

    setting of Sect. 3.4, one may require a function of the joint distribution over the entire

    field p(x|y), and this calls for a more complicated and time consuming Monte Carlomethod. The algorithm of choice would typically depend on the particular situation.

    5 Examples

    First, we present a motivating example with presenceabsence data, illustrated with

    site selection and the effect of spatial dependence. Then we turn to an example for

    seismic experiments, comparing complete and partial tests.

    5.1 Value of Absence-presence Data and Constrained Decisions

    Consider a region of land modeled as a 3 3 grid (N = 9). The distinction of interest

    is an asset that varies spatially. We consider conducting an experiment to check if

    the asset is present/absent. The experiment contains noise, so even though data indi-

    cates presence, the distinction might be absent, and vice-versa. Applications include

    mining (mineral present), agriculture (plant present), or conservation biology (Po-

    lasky and Solow 2001). For instance, in the latter case, a decision maker might be

    interested in selecting (a constrained number of) sites from the grid to set up nat-ural wildlife reserves with the goal of conserving an endangered species, but there

    is uncertainty regarding the presence of the species (Carroll et al. 1999). The deci-

    sion maker can purchase an experiment in the form of a survey team exploring every

    cell, and indicate whether they believe the species is present or not at every cell. The

    decision maker is interested in the value of information of surveys with different ac-

    curacies. If the decision maker selects a cell, she must pay the cost C (assumed to be

    the same across cells). If the asset is present at a cell that she chooses, she obtains

    revenue R (again assumed to be the same across cells). Obtaining dollar values for

    conservation applications may be challenging in practice; this may be more naturalin the domain of the example presented in the next sectionoil exploration. Let x be

    the actual presence or absence (d = 2) of the asset at all cells in the grid. The categor-

    ical outcome xi = 1 indicates presence. The experimental result yj = 1 implies that

    the survey team believes the asset is present at cell j. In general, the sensitivity of the

    test (p(yj = 1|xj = 1)) may be different from the specificity (p(yj = 0|xj = 0)), but

    here we assume that they are the same. The accuracy of the binary test is therefore

    defined by the following likelihood equation

    p(yj

    = 0|xj

    = 0) = p(yj

    = 1|xj

    = 1) = ; j = 1, . . . , 9. (18)

    A value of close to 1 indicates a good test. We choose to model the prior for x

    as an uninformative MRF, i.e., there is an equal prior chance of presence or absence

    in every cell. Spatial correlation for x is determined by the interaction parameter .

    Furthermore, C = 1 monetary unit (so that revenue can now be written in units of C).

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    12/23

    152 Math Geosci (2010) 42: 141163

    Fig. 1 Sensitivity analysis in the unconstrained case. There are 3 plots of VOE and VOPI vs. : (left)

    revenue = 2 units; (middle) revenue = 5 units; and (right) revenue = 10 units. Each plot shows VOPI(dashed line) and VOE for tests with accuracy = 0.5, 0.7, and 0.9 (solid/dashed lines). There are 3 VOElines for each parameter setting, representing the mean VOE, the 25th and 75th percentile. These estimates

    are based on 20 VOE computations, each using 10000 Monte Carlo samples

    First, we assume that the cell-selection is unconstrained, i.e., the decision maker

    can select as many cells as is profitable. Sensitivity analysis on the parameters and

    can provide insights into general trends. Figure 1 shows VOE as a function of these

    parameters in three plots, for revenue R = 2, 5, and 10 monetary units from left to

    right, respectively. Each display shows VOE results obtained by = 0.5, 0.7, and 0.9,as a function of . In each case, we compute the VOE from M = 10000 Monte Carlo

    simulations, and this is repeated 20 times. The three lines for each parameter setting

    represent the mean over the 20 replications, the 5th largest (25th percentile) and the

    15 largest (75th percentile). The trends within each plot show that, as expected, the

    accuracy is a critical parameter. The curve for = 0.9 shows the highest VOE. In

    all three graphs, = 0.5 has VOE = 0 as the experiment provides no information

    about the distinction of interest. The VOPI is a horizontal line and has the same

    value (4.5 = NC2 units in this case) for all and for all R 2. It only depends

    on the marginal probability of success in each cell, which is 0.5 in the case of the

    uninformative prior. Note that the VOE increases as increases. This is because the

    chance that the entire grid will either contain the asset at all cells (jackpot) or in no

    cells (huge loss) becomes higher as increases. As there are no constraints on the

    number of cells that can be selected, the decision maker is free to choose all the cells

    or none; for large values of , this all-or-nothing policy is optimal. VOE is not very

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    13/23

    Math Geosci (2010) 42: 141163 153

    Fig. 2 Prior entropy (top), posterior entropy (middle) and NEMI (bottom). Success probability is = 0.7(dashed) and = 0.9 (solid)

    sensitive to for lower values, which is useful to know if one has prior knowledge

    indicating that the interaction parameter is close to 0. The Monte Carlo errors in our

    estimates of VOE can be recognized through the differences between the 25th and

    75th percentiles computed from the 20 batches, size 10000 each. The error is largest

    for high R and (right display), as can also be seen from the fluctuations in the plots

    of the 25th and 75th percentiles.

    The plots also show that experiments do not automatically become more valuable

    when the decision situation is more lucrative (large R). Consider the case of = 0.7

    and = 0 in the graphs for R = 2 and R = 5. For this choice of and , VOE = 0

    for R = 5, whereas for R = 2 VOE is almost 2 units. When R = 2, there is a chance

    for the experimental result to change the main decision. Hence the experiment is

    valuable as it is able to affect the decision. On the other hand ifR = 5 (and = 0), it

    is worthwhile to select all cells no matter the experiment outcome. The impact of the

    experiment on the decision is a fundamental issue in the decision-analytic approach

    to valuing information. Figure 2 shows sensitivity analysis for NEMI, presented in

    Sect. 3.5. We can observe the trend in NEMI with , for the case of R = 2 and for

    = 0.7 and = 0.9. Note that both the prior and the posterior entropy decrease

    with increasing indicating that there is less disorder for high . NEMI increases

    as increases, suggesting that the experiment becomes more relevant for fields with

    greater spatial dependence. NEMI is larger for = 0.9. It is difficult to relate the

    NEMI of an experiment with how much one should be willing to pay for it.

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    14/23

    154 Math Geosci (2010) 42: 141163

    Fig. 3 Sensitivity analysis in the constrained case. The accuracy of the experiment is = 0.9. Two plotsof VOE vs. : (left) revenue = 2 units; and (right) revenue = 5 units. Each plot shows VOE for 3 valuesofk (the maximum number of cells that can be selected): k = 1 (dotted line), k = 5 (solid line) and k = 9

    (dashed line)

    What if there are constraints on the number of cells that can be selected for devel-

    oping the asset? We then use ideas from both Sects. 3.3 and 3.4 to solve the equations

    for VOE in the following way. We retain the assumption that the cells act as separate

    units, while introducing another parameter k = the maximum number of cells that

    can be selected (based on a budget). Once we have a particular probability map (prior

    or posterior), we maximize profits by choosing the best k prospects, if profitable. Fig-

    ure 3 demonstrates results from sensitivity analysis on for a test with an accuracy

    of = 0.9. We compare the VOE for k = 1, 5, and 9 for R = 2 (left) and R = 5

    (right) in Fig. 3. The graph on the left is in accordance with our initial reaction; VOE

    is highest when k = 9 and lowest when k = 1. The case with k = 9 corresponds to the

    unconstrained case. As we observed in Fig. 1, VOE increases in the unconstrained

    situation. However, for k = 5 and to some extent also for k = 1, VOE decreases as

    increases. This trend is even more prominent in the graph on the right. We explain

    this tendency as follows. The experiment has a certain facet that can be relatively

    more valuable when there is both low spatial dependence and a limit on the number

    of cells that can be selected. It tells you which cells are likely to be favorable, thereby

    guiding the decision maker about the choice of site location. There is a little more

    leeway for the decision maker regarding selection of an appropriate location when

    there is high spatial dependence. Thus VOE can be relatively smaller when is high.

    There is another seemingly counterintuitive trend in the plot for R = 5. The VOE

    for curve k = 5 is higher than the curve for k = 9 for small . It is indeed possible

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    15/23

    Math Geosci (2010) 42: 141163 155

    for the experiment to be more valuable in the situation when there are constraints, as

    opposed to when there are none.

    5.2 Value of Complete and Partial Seismic Experiments

    Here we present an example from the petroleum industry. The distinction of interest

    is lithology and fluid composition along the top of the reservoir. We assess the value

    of different attributes obtained from seismic reflection data (Avseth et al. 2005), us-

    ing the decision-analytic framework outlined earlier. The two seismic attributes are

    R0 = reflection amplitude at zero incidence angle and G = gradient in reflection

    amplitude as a function of incidence angle, i.e., offset (distance between shot and re-

    ceiver). These attributes are typically obtained from a seismic amplitude versus offset

    (AVO) analysis. If these attributes are purchased, the decision maker will have to pay

    to conduct AVO processing, analyses, and interpretation. We compare the value ofpartial tests to that of acquiring AVO attributes along the entire field. The example is

    motivated by previous work on AVO seismic data from the Glitne field in the North

    Sea, for example, Eidsvik et al. (2004) and Avseth et al. (2005). See also Eidsvik et

    al. (2008) for an example with a continuous distinction of interest. The focus of at-

    tention is an n1 = 5 by n2 = 20 grid (N = 100); a part of a reservoir that may be the

    lobe of a turbidite system and hence of main interest for exploration. The distinction

    of interest takes on any one of d = 3 states, oil saturated sand (xi = 1), brine satu-

    rated sand (xi = 2), or shale (xi = 3). The only profitable outcome is that of oil sand.

    For the MRF prior model in (1) we use i(x

    i) = 0, representing an informed prior

    based on previous knowledge and expert geologic opinion. The experimental result

    yj, j J, is a continuously distributed random variable. With an experiment that

    measures only one AVO attribute, R0, the measurement at site j is yj = R0,j R,

    whereas for a situation where both AVO attributes are acquired the multivariate data is

    indicated by yj = (R0,j, Gj) R2. We use a hierarchical statistical relationship for

    rock physics variables (Eidsvik et al. 2004) in a Monte Carlo setting to fit a Gaussian

    likelihood model to the AVO seismic data. For the case with both attributes R0 and

    G, the likelihood equals

    p(yj|xj) = Normal

    1(xj), 2(xj)

    ,

    0.062 0.007

    0.007 0.172

    , (19)

    where yj = (R0,j, Gj), j J, mean 1 = (0.03, 0.08, 0.02) and 2 = (0.21,

    0.15, 0). When we just consider R0, only 1 and variance 0.062 are needed.

    Figure 4 shows the AVO seismic data from a 5 20 grid from the North Sea

    (Eidsvik et al. 2004). This data is somewhat upscaled from the original dataset and

    each grid cell is sized at 1002 m2. For parameter estimation of interaction parameter

    we assume that this field is similar to the one under consideration and use the data

    for this purpose. We estimate in (1) based on these AVO seismic data and fixed

    values of i (xi ). The maximum likelihood estimate of is computed by evaluating

    the marginal likelihood p(y) for a set of values, see the Appendix. In Fig. 5, we

    display log p(y) as a function of , and see that the maximum likelihood estimate is

    about 0.9.

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    16/23

    156 Math Geosci (2010) 42: 141163

    Fig. 4 Seismic data from the Glitne reservoir in the North Sea. The display shows two seismic attributes

    R0 (top) and G (below). Each grid cell is 100 m 100 m and the domain covers an area of 500 m 2000m

    Fig. 5 Maximum likelihood estimation of . The graph is obtained by evaluating the marginal likelihoodp(y) using the Glitne dataset for several values between 0 and 1.5. The maximum is near 0.9

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    17/23

    Math Geosci (2010) 42: 141163 157

    Fig. 6 Prior probability of oil sand. The image shows pi (1), i = 1, . . . , 100. Note that oil sand is morelikely in the central parts and in the southeast where the pointwise prior terms i (1) are larger

    Fig. 7 Cell locations for two kinds of partial tests. Partial test 1 covers 12 cells in the central part, whilepartial test 2 covers 15 cells in the southeast. In comparison, the full test covers all 100 cells

    Note that we do not have the extracted AVO attributes for the reservoir. However,

    we assume that some prior geologic knowledge is available, from which we can spec-

    ify the point-wise prior function i (xi ). This knowledge assigns high i (1) for central

    cells and the southeast flank (indicating a higher chance of oil sand in these areas)

    and high i (2) and i (3) in other areas. In Fig. 6, we show the associated prior prob-

    ability map of oil sand. These marginal probabilities pi (1) are computed from the

    fixed values of and i (xi ). The choice ofi (xi ) for MRFs is not a trivial problem,

    but rather a separate estimation problem on its own. The analyst may treat i (xi ) as

    expert knowledge interpreted as log p(y0i |xi ), with expert input data y0i and where

    p(xi ) is the uninformative prior. Another method would be to assign a low dimen-

    sional parametric form to i (xi ), and then estimate the resulting few parameters along

    with using maximum marginal likelihood. We demonstrate the value of informa-

    tion for the full test, i.e., extracting seismic attributes at every grid cell j = 1, . . . , N ,

    and for two partial tests covering only parts of the domain. The two partial tests are

    shown in Fig. 7. Partial test 1 covers the central parts of the domain, while partial

    test 2 mostly covers the southeastern parts. These two spatial domains are believed

    to be the most likely candidate areas for oil, as specified by the informative prior in

    Fig. 6. For each of the tests, we consider using only the R0 seismic attribute, or both

    (R0, G) seismic attributes. Altogether this entails six testing configurations. For nu-

    merical calculations, we assume nominal values for cost and revenue. We assume a

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    18/23

    158 Math Geosci (2010) 42: 141163

    Table 1 Value of experiment (VOE) and chance of knowing (COK) for seismic AVO attributes R0 and(R0, G) and for different experimental configurations; complete test, partial tests 1 and 2

    Complete test Partial test 1 Partial test 2

    VOE (million $) COK VOE (million $) COK VOE (million $) COK

    Ro 4.91 0.07 0.48 0.007 2.11 0.03

    (R0, G) 35.94 0.51 1.93 0.03 10.61 0.15

    cost ofC = $2 million for drilling a well at a cell, and a revenue ofR = $5 million ifa well discovers oil sand at a cell. The absolute values should not be taken in earnest,

    it is only the relative values that are of interest for comparison. We use the volume

    of a cell, the assumed porosity and fraction of recoverable oil and the price of oil to

    estimate the revenue. Realistic numbers should be obtained from management and

    petroleum engineers in a real-world case study.Results of the tests and attribute selection schemes are displayed in Table 1. The

    specific course of action regarding experimentation depends on the cost of each con-figuration, which would depend on factors such as whether the reservoir is offshore

    or not, etc. It seems likely that in this case, purchasing both attributes over the entire

    field may be most beneficial. The VOE and COK values are high in our example,

    particularly when both the attributes (R0, G) are acquired over the entire field. This

    is because decisions regarding several cells in the field can be strongly affected by the

    experimental result. The field itself seems to be very lucrative, and has a prior value

    (PV) of around US $12 million. We would expect COK values to be much lower inpractice because the magnitude of revenues and costs typically result in a very highVOPI in the petroleum industry. In our example, VOPI is around US $70 million.

    Depending on the price of partial experiments, partial test 2 may be preferred over

    partial test 1the numbers indicate that intelligent experimental positioning can be

    extremely valuable. Also, a partial test with both attributes can be better than a com-

    plete test with only one attribute. The processing time of seismic data will typically

    vary with the size of the test. The decision about data collection, processing and

    analysis would depend on this price and the VOE for each test.

    6 Conclusions

    In this paper, we propose a decision-analytic approach to valuing experiments per-

    formed in situations that naturally exhibit spatial dependence. We incorporate de-

    pendence by modeling the system as a two dimensional grid, and by treating the

    joint prior distribution of the categorical distinction of interest as a Markov random

    field. We illustrate our methodology with the help of two examples. The first example

    (Sect. 5.1) indicates that spatial dependence can play a significant role in determiningthe value of an experiment, and that the effects of constraints in site selection can be

    counter intuitive. The second example (Sect. 5.2) compares the value of partial tests

    with complete tests, in terms of the data attributes and the spatial domain. The results

    suggest that intelligent experimental design can add substantial value to the decision

    situation. High likelihood experiments are useful, but they should be able to affect

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    19/23

    Math Geosci (2010) 42: 141163 159

    the downstream decision. In practice, decisions regarding the choice and location of

    experiments should be analyzed by a multi-disciplinary taskforce. Our methodology

    inspires a collaborative effort by aggregating the experimental likelihood, prior spa-

    tial model and downstream decisions and associated values into a single meta-model.

    Decision trees and similar tools have previously been used to compute the mon-etary value of information in spatial decision making. These techniques aggregate

    distinctions to a higher level and therefore excessively simplify several important

    aspects of the decision problem, including the alternatives, experimental likelihood

    and value assessments. Minimizing entropy is another common approach for spatial

    design of experiments, but our focus is on valuing experiments rather than site se-

    lection, and this necessarily implies explicitly modeling future decisions, revenues

    and costs. While VOE is a more complete measure for valuing information, it can be

    challenging to compute for complex large-scale problems. Another issue is that of the

    time horizon. The value of measurements from an experiment may have impacts onfuture decisions that were not foreseen in the current analyses. The value of an exper-

    iment may change over time with new or different technology or different economic

    situations.

    There are several possible directions for further research. One direction involves

    statistical issues. For instance, other prior models could be investigated. Hierarchical

    Bayesian models could also be captured within our framework, such as a prior on .

    One might consider including a probabilistic model for revenues Ri and costs Ci ,

    rather than treating them as fixed. From the computational point of view, algorithms

    for larger grid sizes could be explored. In our framework, the recursive method worksbest for moderate size grids, say, n1 < 10. Block updating using Gibbs sampling is

    a possible way to estimate value of information for larger grids. Other simulation

    techniques could also be considered, see Friel and Rue (2007). Specific techniques

    may be suitable for specific models. Finally, the decision-analytic assumptions can

    be relaxed or modified. In general, the value from the field can depend on the joint

    distribution of the distinction of interest, not only marginals. Sequential decisions and

    even sequential experimentation are other avenues that can be examined. Decisions

    regarding experimentation should be subjected to intensive analysis in all domains,

    particularly in realms where there is the additional complexity of spatial dependence.The approach naturally fosters a multi-disciplinary outlook for valuing information

    in spatial decision making, and stimulates ideas for creative alternatives in decisions

    related to experimentation.

    Appendix: Recursive Computations for Markov Random Fields (MRFs)

    Let x = (x1, . . . , xN) be a categorical MRF on the two dimensional regular grid.

    Here, N = n1n2 is the total number of grid nodes, n1 is the shorter direction (say

    North) and n2 is the longer direction (say East). Assume that each xi {1, . . . , d },

    i = 1, . . . , N , i.e., the categorical values can take d possible colors. Suppose the grid

    nodes are numbered sequentially from northwest so that x1 is the categorical value

    in the northwest, xn1 in the southwest, xn1(n21)+1 in the northeast, and xN in the

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    20/23

    160 Math Geosci (2010) 42: 141163

    Fig. 8 Illustration of a 3 3grid and the indexing of cells.For an Ising model, the fullconditional probability at cell 5depends only on the outcomes atthe four nearest neighbors

    southeast. See Fig. 8 for an example of a 3 3 grid and the indexing of cells. The

    MRF probability function is written as

    p(x) =exp[

    ij I (xi = xj) +

    i i (xi )]

    z=

    h(x)

    z, (20)

    where i j means all neighboring pairs, and i (l), i = 1, . . . , N , l = 1, . . . , d , are

    model parameters. This is the simplest MRF and is known as the Ising model, where

    the neighbors of an interior node i are defined by i + 1, i 1, i n1, and i + n1, see

    Fig. 8. If node i is on the edge or is a corner node, some of these neighbors vanish.

    The normalizing constant z is given by

    z =

    dx1=1

    dxN=1

    h(x). (21)

    Note that this normalizing constant is a function of the model parameters and i (l),

    i = 1, . . . , N . Location specific model parameters i (l) can also be functions of data

    yi via the likelihood term, i.e., i (xi ) = log p(yi |xi ), xi {1, . . . , d }. The density in

    (20) is then a posterior p(x|y), and the normalizing constant depends both on modelparameters and the data. With data y available, the marginal likelihood of data (with

    x summed out) is given by

    p(y) =p(y|x)p(x)

    p(x|y)=

    j p(yj|xj)h

    1(x)/z1

    h2(x)/z2=

    z2

    z1, (22)

    where h1(x) and z1 are defined by and i (xi ) terms including only prior informa-

    tion, while h2(x) and z2 are defined by both prior information in and i (xi ) along

    with the log p(yi |xi ) part from the likelihood terms. Hence, the functional expres-

    sions depending on x cancel and the marginal likelihood in (22) equals the ratio of

    the normalizing constants in posterior and prior. For parameter estimation of , we

    evaluate the marginal likelihood for a set of parameter values and find the maximum

    likelihood over this set.

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    21/23

    Math Geosci (2010) 42: 141163 161

    The probability function for x in (20) can be written sequentially as

    p(x) = p(x1|x2, . . . , xN)p(x2|x3, . . . , xN) p(xN1|xN)p(xN)

    = p(x1|x2, . . . , x1+n1 )p(x2|x3, . . . , x2+n1 ) p(xN1|xN)p(xN)

    =h(x1|x2, . . . , x1+n1 )h(x2|x3, . . . , x2+n1 ) h(xN1|xN)h(xN)

    z, (23)

    in which we use the Markov property, conditioning only on buffer variables in

    a sequential line up going from northwest to southeast. The buffer values of an

    internal node i are (xi+1, xi+2, . . . , xi+n1 ). This buffer of length n1 allows us to

    take the required neighboring southern node and eastern node into the condition-

    ing. The buffer gets shorter in the last (easternmost) column. The h() terms in (23)

    are uniquely defined by the sequential line up. For the first location, this term is

    h(x1|x2, . . . , x1+n1 ) = exp{[I (x1 = x2) + I (x1 = x1+n1 )] + 1(x1)}, and similarlyfor x2, x3, and so on until h(xN1|xN) = exp{I(xN1 = xN) + N1(xN1)} and

    h(xN) = exp{N(xN)}.

    We first illustrate a method for recursive forward computation of the normalizing

    constant z in (21) and (23). This method follows Reeves and Pettitt (2004), and z is

    computed by summing out one variable at a time in a sequential manner going from

    northwest to southeast. The recursion starts by

    z1(x2, . . . , xn1+1) =

    dx1=1

    h(x1|x2, . . . , x1+n1 ), (24)

    since x1 is only involved in the first term of the sequential formulation in (23). The

    recursive calculation continues by using, for general i N n1

    zi (xi+1, . . . , xi+n1 ) =

    dxi =1

    h(xi |xi+1, . . . , xi+n1 )zi1(xi , . . . , xi+n11). (25)

    The terms in (xi+1, . . . , xi+n1 ) take one value for every buffer configuration, andwith d colors we have dn1 possible configurations. The zi values must be stored for

    all these configurations, since they are required for the next step in the recursion. This

    storage means that the recursive computation method is only suitable for moderate

    size rows in the grid (n1 < 20 for d = 2). As the buffer length gradually decreases

    in the easternmost column, the number of possible buffer configurations gets smaller,

    and at the final step N we calculate

    z = zN =

    dxN=1

    h(xN)zN1(xN). (26)

    We have then summed out all variables as required in (21).

    We next demonstrate a recursive backward sampling algorithm for drawing x from

    the probability function in (20). The value ofxN is sampled from the marginal prob-

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    22/23

    162 Math Geosci (2010) 42: 141163

    ability vector of state xN given by

    pN(xN) =1

    z

    d

    x1=1

    d

    xN1=1h(x)

    =1

    zh(xN)zN1(xN), xN {1, . . . , d }, (27)

    where the sequential normalizing constants evaluated in (25) and (26) are used. We

    continue in a backward manner; generating xN1 conditional on the sample of xNfrom probability vector

    p(xN1|xN) = dx1=1

    dxN2=1

    h(x)

    zp(xN)

    =h(xN1|xN)zN2(xN, xN1)

    zN1(xN), xN1 {1, . . . , d }, (28)

    and so on for all i = N 2, . . . , 1.

    We finally present a backward evaluation scheme for the marginal probabilities

    pi (xi ), i = N , . . . , 1. The marginal for xN is given directly in (27). For N 1, we

    first arrange the joint density p(xN1, xN) and then sum out xN

    pN1(xN1) =d

    x1=1

    dxN2=1

    dxN=1

    h(x)

    z

    =

    dxN=1

    h(xN)h(xN1|xN)zN2(xN, xN1)

    z, xN1 {1, . . . , d }.

    (29)

    For general node i, this evaluation consists of a backward construction of the joint

    probability for the buffer of length n1. The marginal for node i is then evaluated

    by summing over all buffer configurations xi+1, . . . , xi+n1 for each possible value

    of xi {1, . . . , d }. A similar backward formula is used when computing the entropydefined by the sequential formula in (15).

    References

    Ash RB (1965) Information theory. Dover, New YorkAvseth P, Mukerji T, Mavko G (2005) Quantitative seismic interpretation. Cambridge University Press,

    CambridgeBesag J (1974) Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc Ser B 36:192

    225Bickel JE, Gibson RL, McVay DA, Pickering S, Waggoner J (2006) Quantifying 3D land seismic reliabilityand value. Soc Pet Eng J SPE 102340

    Begg S, Bratvold RB, Campbell JM (2002) The value of flexibility in managing uncertainty in oil and gasinvestments. SPE 77586

    Carroll C, Zielinski WJ, Noss RF (1999) Using presenceabsence data to build and test spatial habitatmodels for the fisher in the Klamath region, USA. Conserv Biol 13:13441359

  • 7/31/2019 2010 Bhattacharjya Math Geosci

    23/23

    Math Geosci (2010) 42: 141163 163

    Clyde MA (2001) Experimental design: A Bayesian perspective. In: Smelser et al (eds) International en-cyclopedia in social and behavioral sciences. Elsevier, Amsterdam

    Cunningham P, Begg S (2008) Using the value of information to determine optimal well order in a sequen-tial drilling program. AAPG Bull 92:13931402

    Diggle P, Lophaven S (2006) Bayesian geostatistical design. Scand J Stat 33:5364

    Eidsvik J, Avseth P, Omre H, Mukerji T, Mavko G (2004) Stochastic reservoir characterization usingprestack seismic data. Geophysics 69:978993

    Eidsvik J, Bhattacharjya D, Mukerji T (2008) Value of information of seismic amplitude and CSEM resis-tivity. Geophysics 73:R59R69

    Friel N, Rue H (2007) Recursive computing and simulation-free inference for general factorizable models.Biometrika 94:661672

    Fuentes M, Chaudhuri A, Holland DM (2007) Bayesian entropy for spatial sampling design of environ-

    mental data. Environ Ecol Stat 14:323340Houck RT, Pavlov DA (2006) Evaluating reconnaissance CSEM survey designs using detection theory.

    Lead Edge 25:9941004Howard R (1966) Information value theory. IEEE Trans Syst Sci Cybern 2(1):2226Howard R (1970) Decision analysis: Perspectives on inference, decision and experimentation. Proc IEEE

    58:632643, IEEE Special Issue, Detection Theory and ApplicationsLe ND, Zidek JV (2006) Statistical analysis of environmental space-time processes. Springer, BerlinLindley DV (1971) Making decisions. Wiley, New York

    Liu JS (2001) Monte Carlo strategies in scientific computing. Springer, New YorkMarchant BP, Lark RM (2007) Optimized sample schemes for geostatistical surveys. Math Geol 39:113

    134Matheson JE (1990) Using influence diagrams to value information and control. In: Oliver RM, Smith JQ

    (eds) Influence diagrams, belief nets and decision analysis. Wiley, New York, pp 2548Mukerji T, Avseth P, Mavko G, Takahashi I, Gonzalez E (2001) Statistical rock physics; combining rock

    physics, information theory, and geostatistics to reduce uncertainty in seismic reservoir characteriza-tion. Lead Edge 20:313319

    Mller WG (2001) Collecting spatial data. Springer, HeidelbergPolasky A, Solow AR (2001) The value of information in reserve site selection. Biodivers Conserv10:10511058

    Raiffa H (1968) Decision analysis. Addison-Wesley, ReadingReeves R, Pettitt AN (2004) Efficient recursions for general factorisable models. Biometrika 91:751757Welton NJ, Ades AE, Caldwell DM, Peters TJ (2008) Research prioritization based on expected value of

    partial perfect information: a case-study on interventions to increase uptake of breast cancer screen-ing. J R Stat Soc A 171:807841

    Yokota F, Thompson KM (2004) Value of information analysis in environmental health risk managementdecisions: past, present and future. Risk Anal 24:635650

    Zidek JV, Sun W, Le ND (2000) Designing and integrating composite networks for monitoring multivariateGaussian pollution fields. Appl Stat 49:6379