Nonparametric Predictive Utility Inference

Accepted Manuscript

Decision Support

Nonparametric Predictive Utility Inference

B. Houlding, F.P.A. Coolen

PII: S0377-2217(12)00233-0

DOI: 10.1016/j.ejor.2012.03.024

Reference: EOR 11026

To appear in: European Journal of Operational Research

Received Date: 18 April 2011

Revised Date: 30 January 2012

Accepted Date: 13 March 2012

Please cite this article as: Houlding, B., Coolen, F.P.A., Nonparametric Predictive Utility Inference, European

Journal of Operational Research (2012), doi: 10.1016/j.ejor.2012.03.024

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers

we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and

review of the resulting proof before it is published in its final form. Please note that during the production process

errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

http://dx.doi.org/10.1016/j.ejor.2012.03.024

http://dx.doi.org/10.1016/j.ejor.2012.03.024

Nonparametric Predictive Utility Inference

B. Houldinga,§, F.P.A. Coolenb

aDiscipline of Statistics, Trinity College Dublin, Ireland.bDepartment of Mathematical Sciences, Durham University, UK.

Abstract

We consider the natural combination of two strands of recent statistical research, i.e., that of decision makingwith uncertain utility and that of Nonparametric Predictive Inference (NPI). In doing so we present theidea of Nonparametric Predictive Utility Inference (NPUI), which is suggested as a possible strategy for theproblem of utility induction in cases of extremely vague prior information. An example of the use of NPUIwithin a motivating sequential decision problem is also considered for two extreme selection criteria, i.e., arule that is based on an attitude of extreme pessimism and a rule that is based on an attitude of extremeoptimism.

Key words: Decision Analysis, Nonparametrics, Predictive Inference, Exchangeability, Uncertain Utility.

1. Introduction

The Bayesian paradigm, e.g., [14], coupled with the expected utility hypothesis of [6], provides a transpar-ent and attractive methodology for solving problems of decision making under uncertainty. In this approachpreferences over a set of possible decisions are reconstructed by taking into account both the probabilitythat each decision leads to a particular outcome, and the relative preference for obtaining that outcome asmeasured by its utility value. Furthermore, if the outcome that pertains from a particular decision dependson the value of an unknown random quantity, then the probabilities associated with the set of possible de-cision outcomes are typically assumed to be subject to an assigned prior parametric distribution. Learningthen occurs following observation of data that has a probabilistic dependence with the unknown randomquantity of interest, and the usual ‘posterior is proportional to likelihood times prior’ of Bayes’ Theorem isemployed.

However, implicit within this theory (and hence necessary for its application) is the assumption that theDecision Maker (DM) knows her preferences, meaning that she can assign an appropriate utility function(with domain the full set of all possible decision outcomes) for use within the problem. In applicationsthis is usually achieved by either assuming a fixed utility form, e.g., a logarithmic utility function formonetary returns, or by selecting specific utility values for particular and relevant decision outcomes. Assuch, classical Bayesian subjective expected utility theory does not permit inherent uncertainty in preferencesover decisions. It also does not allow the learning of utility and specifies that the DM will never be surprisedby the utility of a realized outcome.

Nevertheless, not for all situations is the assumption of a known preference relation over outcomesdeserved, and often a DM may instead need to learn her preferences through suitable experimentation.Indeed, a DM may consider it inappropriate to assign a particular and fixed utility value for any outcomethat is novel or unfamiliar, choosing instead to only do so after direct experience or exposure. Such cases ofutility uncertainty motivate so-called adaptive utility theory, e.g., [13], [25], and [24, 26], which generalizesthe traditional utility concept by only requiring the utility function be known up to the value of some

§Corresponding author, address: Rm 105, Discipline of Statistics, Lloyd’s Institute, Trinity College Dublin, Dublin 2,Ireland. E-mail: [email protected], tel: +353 (0) 1896 2933, fax: +353 (0) 1 677 0711.

uncertain utility parameter. The principal idea of adaptive utility is then to treat the uncertain utilityparameter in the same manner that unknown random quantities are typically treated in standard Bayesianstatistical inference, i.e., they are subjected to a parametric learning model in accordance with Bayes’Theorem. Yet, and despite adaptive utility theory explicitly permitting a DM to remain uncommitted toa presumed known and correct utility function, its previous use has required knowledge of a precise andmeaningful prior distribution concerning true preferences, something that is unlikely to be considered eitherreasonable or justifiable when selecting from outcomes that include (initially) new and foreign possibilities.

Rather than assuming a precise prior distribution over an uncertain utility parameter, interest here isin the use of the Nonparametric Predictive Inference (NPI) technique of [8, 10], and [3], which is a lowstructure statistical technique arising naturally as a result of Hill’s A(n) assumption, [21, 22, 23]. Given aknown ordered series of utility values that are considered subject to a post-data exchangeability assumption,Nonparametric Predictive Utility Inference (NPUI) proceeds by assigning equal mass to the probability thata new utility value falls within any of the intervals formed by the known ordered utility, leading to thequantification of such utility uncertainty through interval probability.

An outline of the remainder is as follows; Section 2 reviews the concept of uncertain utility and brieflydescribes how this can be taken into account through adaptive utility theory, whilst Section 3 reviews thestatistical technique of NPI. In Section 4 the two strands of uncertain utility and nonparametric predictiveinference are combined to formally introduce the NPUI model, with an illustrative example being given inSection 5. Finally, Section 6 concludes with a discussion of possible future directions.

2. Uncertain Utility

The assumption that a DM can accurately identify the utility value of any considered decision outcomeis prevalent within the theory and application of Bayesian decision making under uncertainty. Often thiswill be through the use of an assumed model for the utility of outcomes from a continuous domain, e.g., alogarithmic model for monetary returns. Alternatively, specific utility values may be considered appropriatefollowing suitable introspection concerning the decision problem. In either case, the possibility that a DMmay not a priori know their preference for a given outcome is often ignored.

In contrast, the theory of adaptive utility, as introduced by [13] and further developed by [25] and [24, 26],generalises classical Bayesian decision theory, suggesting a methodology for decision selection even when theDM is unable to fully specify her preferences. In this setting the DM is permitted to be uncertain over hertrue preferences, with the appropriate utility function over decision outcomes only being known up to thevalue of some uncertain parameter µ. Such a parameter is referred to as the DM’s state of mind, and it canbe used to model uncertainty over any aspect of the DM’s preferences, with interesting examples includingvectors of unknown trade-oÆ weights or unknown level of risk aversion. Notationally, such a dependencebetween the utility function u(·) and the state of mind µ was displayed via the inclusion of a conditioningargument, e.g., u(·|µ).

Instead of assuming that the utility function is fully known, adaptive utility theory makes use of aprobabilistic specification concerning the uncertain state of mind µ. Bayesian updating can then occur oncethe DM receives additional information concerning her true preferences, and previous examples consideredfor such utility related information include noise corrupted observations of the true utility value, or the signof the diÆerence between the prior expectation of the utility value for a given outcome and the utility valuethat was actually received (i.e., an indication of elation or disappointment). An adaptive utility functionau(·) was then defined as the expectation of the possible utility values with respect to beliefs over the stateof mind, i.e., au(·) = Eµ[u(·|µ)].

In essence, the use of adaptive utility theory is analogous to the use of a hierarchical prior within robustBayesian analysis, e.g., [5]. A utility value, once scaled to fall within the interval [0, 1], corresponds to aprobability, with the utility of decision outcome o being that probability p which makes the DM indiÆerentbetween receiving o for sure, or selecting the decision which results in most preferred outcome o

§ withprobability p and least preferred outcome o§ otherwise, see [15]. Adaptive utility theory allows the DM tobe uncertain of the value of p that results in her indiÆerence, instead considering a non-degenerate priorsubjective probability distribution for its value.

2

That such a probability distribution concerning utility values may change following updating in lightof additional information motivates the name adaptive utility theory, for if the probability distribution didalter, then so would the expected utility return, i.e., the adaptive utility value. In other words, the adaptiveutility value of an unfamiliar decision outcome will ‘adapt’ in light of additional information concerningpreferences. This is in contrast to classical utility theory in which the utility values of all decision outcomesare considered known and fixed. Yet, explicitly permitting utility uncertainty does not alter the suggesteddecision selection within a one-oÆ decision problem, as the strategy arising from the adaptive utility settingis the same as that which would arise from traditional theory if the adaptive utility values were assumedequal to the true utility values. However, in a sequential decision problem the acceptance that certaindecision outcomes do not have known utility can alter the optimal selection strategy.

Notable alternative theories that similarly seek to incorporate uncertain preference within a decisionmaking paradigm include the approaches of [18] and [4]. Rather than expressing uncertainty over an unknownutility parameter through the use of a precise prior probability distribution, Farrow and Goldstein allow theDM to remain non-committed and to instead only provide an upper and lower bound for the true utilityvalue (through the declaration of lower and upper bounds on trade-oÆ parameters in a multi-attribute utilityhierarchy). In this respect the approach of Farrow and Goldstein is similar to the NPUI approach presentedhere. However, diÆerences result in that the use of the NPI statistical technique would appear to be asimpler approach that is data driven. In particular, the NPUI approach considered here does not requireany explicit declaration of subjective judgments and/or expressions concerning the utility of a novel outcomeother than that of a usually reasonable and objective post-data exchangeability assumption.

In contrast, the approach of Ben-Haim et al. is to instead specify an Info-Gap model of utility uncertainty,and is relevant for situations of severe uncertainty whereby, other than the specification of a best point-estimate guess, nothing else can be elicited. In particular, there is no probabilistic specification of howaccurate such a best point-estimate guess may be. Instead a nested subset of possible horizons of uncertaintyis specified and the decision selected which is deemed most robust in that it guarantees a specified minimumcritical return for the largest horizon of uncertainty. Unlike the imprecise theory of Farrow and Goldstein,or the NPI method used here, the Info-Gap approach of Ben-Haim et al. is non-probabilistic, and cannot quantify, or even give bounds on, the probability of an outcome for any quantity that is subject to‘Info-Gapping’.

There is also a long and developed literature on theories that seek to take into account the descriptivebehaviour of decision makers following observations such as that identified in the Ellsberg Paradox, see forexample [16, 34, 17, 19, 27, 20] and the references therein. These typically deal with what has been describedas ‘ambiguity aversion’, which has arisen following observational studies where individuals are found to beless prepared to take part in bets when the stated potential outcomes depended on the occurrence or non-occurrence of a vague or unfamiliar event, compared to other bets whose stated potential outcomes dependedon the occurrence or non-occurrence of events with which the individuals held greater experiences.

Here we do not consider situations where the likely outcome of a decision is unknown (though thatwould be a straightforward generalisation of the material presented), but instead with regards uncertaintyin the relative preference for that outcome, i.e., its utility value. As such, we do not deviate from thesubjective expected utility maximisation framework, only instead generalising it so as to take into accountthe potential for utilities of various outcomes as being uncertain, and return to the precise expected utilitysetting in situations when the relative preferences of all possible decision outcomes are a priori known. Moregenerally, however, we explicitly seek to permit use of the limited data available under the NonparametricPredictive Inference framwework, where a decision maker is not totally ignorant of their likely preferencefor a decision outcome (due to the post-data exchangeability assumption, i.e., having some experience ofwhat they may consider related events or outcomes), but neither are they totally sure of their preference,and hence the utility, of all available outcomes due to one or more of them never having been previouslyexperienced. In doing so the Nonparametric Predictive Inference framework quantitatively provides boundson possible ranges in a very flexible manner that require only minimal assumptions to be agreed with. Yet wewould argue that this described motivating situation for the use of the Nonparametric Predictive Inferenceframework forms the norm, rather than the exception.

Finally, due to the bounds provided for expected utilities of unfamiliar outcomes, the proposed solution3

to the decision problem is closely related to the Æ-maxmin expected utility model, where Æ is a parametercontrolling the decision maker’s level of pessimism, e.g. [19]. However, the application of this inferentialprocess within a sequential analysis, where we have to take into account the possibility that an unfamiliaroutcome may be experienced and that such an eventually needs to be considered from the outset, alsodiÆerentiates the work from the general literature.

3. Nonparametric Predictive Inference

The implicit assumption underlying adaptive utility, i.e., that there exists an available and precise priordistribution representing the DM’s possible preferences, may be unreasonable in situations where there existsan opportunity of receiving unexperienced or novel outcomes. Instead it may be more appropriate to permita less restricting belief model that is closer to resembling a state of ignorance, and one such possibility isthat of Nonparametric Predictive Inference (NPI).

NPI is a low structure statistical technique that is predictive by nature and arises naturally as a resultof Hill’s A(n) assumption, which in turn only assumes exchangeability at the level of the pre-observationrandom variables associated with the data. This is in contrast to alternative interval probability techniquessuch as that of [36], which more closely follows the Bayesian, [14], Representation Theorems and assumesexchangeability of an infinite sequence of events. Note that two random variables X and Y are exchangeableif and only if, for all values x and y, the relation P (X = x, Y = y) = P (X = y, Y = x) holds (the conceptgeneralizes straightforwardly for more than two random variables). As such, exchangeability is an assumptionof symmetry that is particularly suitable in situations where draws are randomly taken from a population(which can be finite or infinite), hence making it an appropriate assumption in many situations. Indeed, thecommon statistical assumption of sampling independent and identically distributed random variables wouldclearly result in exchangeability, but so would sampling without replacement, which is not independent.

The assumption A(n) is as follows: Let real-valued x(1) < · · · < x(n) be the ordered statistics of datax1, . . . , xn, and let Xi be the corresponding pre-data random quantities, then under A(n):

1. The observable random quantities X1, . . . , Xn are exchangeable. Note this aspect was not included in[21], but here interest lies in the more specific case in which it is true.

2. Ties have probability 0, so xi 6= xj for all i 6= j, almost surely (generalization to include ties isstraightforward, but requires more awkward notation).

3. Given data x1, . . . , xn, the probability that the next observation Xn+1 falls in the open interval Ij =(x(j°1), x(j)) is 1/(n + 1) for j = 1, . . . , n + 1, with the definition that x(0) = °1 and x(n+1) =1.

As such, A(n) is a nonparametric (distribution free) post-data assumption that is inductive by nature andwhich may be relevant when there is a lack of additional information further to the data itself. Its use may beconsidered a more measured or less presumptuous alternative for making inference over a future observationthan the direct specification of conditional independencies and specific distributional forms, especially ifthese are di±cult to justify due to the existence of only extremely vague prior information concerning theunderlying distribution.

Hence as a rule for statistical inference, NPI is ‘frequentist’ by nature, allowing predictions to be madeover future events without the specification of a prior distribution. Indeed, as NPI is a data driven approach,it is a natural tool for consideration in situations where the specification of subjective beliefs would bedi±cult or inappropriate. For example, [11] considered NPI for inference in studying aspects of the size andcomposition of juries so as to determine their representativeness of the views of society.

Nevertheless, [22, 23] demonstrated that A(n) does fully coincide with the general framework of Bayesianstatistics and the use of a finitely additive prior, whilst [3] were able to relate A(n) and the resulting NPI tothe theory of interval or imprecise probability. In the case of interval probability, A(n) was used to providebounds on the probability of a future event (rather than the specification of a precise value), with the‘subjectivist’ interpretation that such bounds represent lower and upper prices for a bet where the payoutdepends on whether or not the event of interest is seen to occur.

4

Further extensions include [8] where NPI was considered for cases of two finitely exchangeable populations(where exchangeability is only assumed within populations and not between them), and [1] where algorithmswere developed for obtaining maximum entropy for multinomial data. In an Operations Research context,NPI was previously applied to problems of decision modelling in multiple queues situations, e.g., [9], and tosupport replacement decisions e.g., [12], where the fact that the NPI approach adapts fully to the availabledata is an attractive feature. Finally, NPI has also been considered in relation to the concept of objectiveBayesianism, e.g., as an attractive option for risk analysis where objectivity of the inferences is often anexplicit aim, see [32], and by demonstrating in [10] that the assumption A(n) (and its variants) supports thefollowing widely supported inferential norms or axioms of objective inference1:

Empirical: Objective inferences should not disagree with empirical evidence.

Logical: If there is no information suggesting that one possible outcome is more likely than another, thenthis should be reflected by identical uncertainty quantifications for these outcomes.

4. Nonparametric Predictive Utility Inference

Concerning the use of an exchangeability assumption for the pre-observation utility values of a collectionof outcomes, it seems sensible to assume exchangeability at least at the level of collections of outcomeswhich are sensibly grouped, or which fall under the same taxonomic category, e.g., cereal brands, or fruitsetc. Indeed, exchangeability is simply a formalization of the notion that the future is predictable on thebasis of past experience, and would be well-suited for any situation where the DM believes that their pastexperience of joy or dislike from experiencing certain outcomes grouped under some categorization, is likelyto inform them of the possibility of similarly experiencing joy or dislike from a new and novel outcome thatis also listed under the same categorization.

However, whilst A(n) concerns the prediction of a random variable with domain R, utility values areinstead bound to a finite interval [a, b], and this is to prevent any outcome from being deemed infinitelybetter (worse) than an alternative, meaning a DM would always accept (reject) any strategy that hadpositive probability of that outcome, regardless of how small that probability would be. In addition, as autility function is only unique up to a positive linear transformation, this finite interval is often consideredto be the unit interval [0, 1].

The resulting problem is that, if it is agreed that the utility scaling should be on the unit interval[0, 1], how should the utility values for experienced outcomes be placed on that scale? If it is known thatthe collection of experienced outcomes includes within it the best and worst possible outcomes from thetaxonomic collection which is deemed to have exchangeable utility values, then this is achieved by specifyingthose outcomes to have utilities 1 and 0, respectively. The remaining experienced outcomes are then scoredthrough the previously described process of hypothetical comparisons between obtaining that outcome withcertainty, or to engage in the lottery in which the best outcome is obtained with probability p and the worstoutcome otherwise. Furthermore, as we are discussing the experienced outcomes we assume that the DM isable to achieve this.

Nevertheless, once we allow the possibility of novel outcomes with unknown utility, then it would appearrestricting to argue that no future outcome could be considered better (worse) than all previously experiencedoutcomes. This is a general problem of modelling uncertain utility, and to the best of our knowledge, there isnot yet any practical solution that does not rely on similarly unreasonable assumptions. Previous possibilitieshave been to include scaling subject to commensurability assumptions in which the same outcome is alwaysdeemed most preferable, e.g., in [7] and [26], but this does not allow uncertainty as to which outcome ismost preferred. As such, we here suggest the interpretation that the utilities of experienced outcomes havebeen placed on the [0, 1] scale by considering ‘hypothetical’ best and worst possible outcomes that could

1For more information on NPI see www.npi-statistics.com

5

exist within the exchangeable collection and by assigning those hypothetical outcomes to have utilities 1and 0, respectively.

Under the above assumption, let u(1), u(2), . . . , u(n), with 0 < u(i) ∑ u(j) < 1 for i < j, be the orderedvalues of the known utilities u1, . . . , un representing preferences over the set of distinct decision outcomesOn = {o1, o2, . . . , on}. Furthermore, denote by Un = {U1, . . . , Un} the set of pre-observation randomquantities which represent the utilities of the elements within On before they are experienced, and supposethat the elements within Un are considered exchangeable. Then given a new and novel outcome onew whoseutility value Unew 2 (0, 1) is unknown but considered exchangeable with the elements of Un, the NPUImodel considered here states only the following:

P

≥Unew 2 [u(i), u(i+1)]

¥= P

≥Unew 2 (0, u(1)]

¥(1)

= P

≥Unew 2 [u(n), 1)

¥

=1

n + 1

In contrast to Hill’s A(n) assumption, which concerns observations on the entire real line, the NPUImodel discussed here refers to observations that are restricted to the open interval (0, 1). To make clearsuch a distinction we will denote the assumptions underlying NPUI by UA(n), where the inclusion of the ‘U ’can be understood as indicating both that we are concerned with utility inference and that we are restrictedto the unit interval.

Now, denoting the lower and upper bounds for the expected utility of outcome onew as E[Unew] andE[Unew], respectively, and noting that the sum of the ordered utility values is the same as the sum of theun-ordered utility values, i.e.,

Pni=1 u(i) =

Pni=1 ui,the NPUI model and Equation (1) leads to the following:

E[Unew] =1

n + 1

nX

i=1

u(i) =1

n + 1

nX

i=1

ui (2)

E[Unew] =1

n + 1

≥1 +

nX

i=1

u(i)

¥=

1n + 1

≥1 +

nX

i=1

ui

¥(3)

=1

n + 1+ E[Unew]

A straightforward result arising from Equations (2) and (3) is that the diÆerence between the upper andlower expected utility bounds for the new outcome onew, to be denoted as ¢

≥E[Unew]

¥, is the following:

¢≥E[Unew]

¥= E[Unew]° E[Unew] =

1n + 1

(4)

Equation (4) is an intuitive result that shows how the NPUI model decreases the range of possible valuesfor the expected utility of a new and novel outcome as the number of known utility values for alternativeoutcomes is increased, or in other words, as past experience is increased. At the extreme of having observedno alternative outcomes whose utility could be considered exchangeable with the utility of the new outcome,the model simply states that the expected utility for the new outcome can take any value in the range (0, 1),whilst in the limiting case of having an infinite number of known and exchangeable alternatives, the boundsfor the expected utility value for the new outcome coincide. Of course that is not to say the actual utilityfor the new outcome becomes known, only that the expected utility value is then identified.

The NPUI model of Equation (1) oÆers a transparent and robust method for representing utility un-certainty that does not require a presumed known and possibly arbitrary prior distribution. In situationswhere most of the known utility values are relatively small, the NPUI model gives low valued expectedutility bounds for a novel outcome, suggesting that it may not be appropriate to experiment in a pairwisechoice against outcomes with known high utility. Similarly, in situations where most of the known utilities

6

are relatively high, with only a few outcomes known to have small utility, the NPUI model gives large valuedexpected utility bounds.

In the case of a sequential decision problem in which there is more than one novel outcome available forselection, the results of [3] for updating within the NPI setting become applicable. Let onew2 be a secondnovel outcome whose unknown utility Unew2 is assumed exchangeable with the set Un

S{Unew}, and denote(0, u(1)] as I1, [u(i), u(i+1)] as Ii+1, and [u(n), 1) as In+1. Then if it is discovered that Unew has value unew

which falls in the interval Ij , this interval is split into the two distinct intervals with either lower or upperboundary unew. Denoting these additional intervals as Ilj and Iuj , the assumption UA(n) is replaced by thestronger assumption UA(n+1) stating that Unew2 is equally likely, with probability 1/(n + 2), to fall in anyof I1, . . . , Ij°1, Ilj , Iuj , Ij+1, . . . , In+1.

A result of such updating is the following:

E[Unew2 |unew] =1

n + 2

≥ nX

i=1

ui + unew

¥(5)

=n + 1n + 2

E[Unew] +unew

n + 2

E[Unew2 |unew] =1

n + 2

≥1 +

nX

i=1

ui + unew

¥(6)

=n + 1n + 2

E[Unew] +unew

n + 2

=1

n + 2+

n + 1n + 2

E[Unew] +unew

n + 2

=1

n + 2+ E[Unew2 |unew]

Furthermore,

¢≥E[Unew2 |unew]

¥= E[Unew2 |unew]° E[Unew2 |unew] (7)

=n + 1n + 2

E[Unew] +unew

n + 2° n + 1

n + 2E[Unew]° unew

n + 2

=n + 1n + 2

¢≥E[Unew]

¥=

1n + 2

Equations (5) and (6) show how the updated expected utility bounds are both a weighted sum oftheir value before knowledge of unew and of the actual value unew is seen to take. Such updating rulesappear intuitive. For example, if unew 2 (0, E[Unew]), then both the lower and upper expected utilitybounds are reduced as a result of the updating. Alternatively, if unew 2 (E[Unew], 1), then the resultof updating is to increase both the lower and upper expected utility bounds, whilst in the situation ofunew 2 (E[Unew], E[Unew]), updating leads to an increase in the lower expected utility bound and a decreasein the upper expected utility bound. Finally, it should be noted that the weights in Equations (5) and(6) indicate that new observations have diminishing eÆect on the updated expected utility bounds as thenumber of known utilities increases.

In a sequential decision context, myopic decision making is avoided by taking into account the influencethat early decision choices will have upon later decisions. From the view point of the first decision in thesequence, the actual value of currently uncertain utilities will be unknown. However, by the time of a laterdecision epoch t, some novel outcomes may have been experienced and the uncertainty concerning theirutilities removed. If indeed novel outcomes had been experienced by epoch t, then the expected utilitybounds of residual novel outcomes will not be governed by Equations (2) and (3), but instead by Equations(5) and (6). Unfortunately, at the planning stage before any decision is made, all novel outcomes will haveunknown utilities; hence to consider the possibility that some of these will be known in the future, the

7

following is required:

E[Unew2 |Unew 2 Ij ] =1

n + 2

≥ nX

i=1

ui + inf(Ij)¥

(8)

E[Unew2 |Unew 2 Ij ] =1

n + 2

≥1 +

nX

i=1

ui + sup(Ij)¥

(9)

The conditioning arguments of Equations (8) and (9) reflect the fact that initially onew has uncertainutility Unew, but that when onew is obtained, Unew will be observed to fall in one of the intervals I1 toIn+1. The actual value that Unew will take within any particular interval Ij will of course influence futuredecisions, but the NPUI model makes no specification on this prior to its observation (as we wish to avoidmaking any a priori distributional assumptions for which there is no apparent justification), instead onlyspecifying P (Unew 2 Ij).

An internal consistency result that arises straightforwardly from Equations (8) and (9) is the following:

E[Unew2 ] =n+1X

j=1

E[Unew2 |Unew 2 Ij ]P (Unew 2 Ij) (10)

E[Unew2 ] =n+1X

j=1

E[Unew2 |Unew 2 Ij ]P (Unew 2 Ij) (11)

5. Example

We now illustrate how the NPUI concept may be incorporated within a sequential decision problem, andso as to highlight the diÆerences that may occur as a result of the choice rule employed, we consider the useof a rule that is based on an attitude of Extreme Pessimism (EP), and a rule that is based on an attitudeExtreme Optimism (EO):

EP: Under extreme pessimism the DM will always select the outcome or sequential decision path whoselower expected utility bound is greatest. Furthermore, if in a sequential problem a strategy has to be decidedfor making decisions based on future knowledge of the interval to which a currently uncertain utility belongsto, then it will always be assumed that the uncertain utility will equal the infimum of the interval underconsideration.

EO: Under extreme optimism the DM will always select the outcome or sequential decision path whoseupper expected utility bound is greatest. Furthermore, if in a sequential problem a strategy has to bedecided for making decisions based on future knowledge of the interval to which a currently uncertain utilitybelongs to, then it will always be assumed that the uncertain utility will equal the supremum of the intervalunder consideration.

The EP and EO choice rules can be identified with the usual Maximin and Maximax decision selectioncriteria, respectively, see for example [35], or the Æ-maxmin expected utility model, e.g. [19]. Nevertheless,there does exist a slight diÆerence in that both EP and EO are sequential in nature and make additionalassumptions concerning the treatment of decisions that are to be made following the future observation ofan initially uncertain utility. Alternative sequential choice rules that are based on the characteristics ofcriteria such as opportunity loss (minimax regret) [33], the Hurwicz criterion, or satisficing etc., will not beconsidered here. [28] oÆer a review of such alternative non-probabilistic selection criteria.

Now suppose a DM is facing a situation in which she must sequentially select a fruit for consumption fromamongst three available options. Only one of the three available options has previously been experienced bythe DM, with both alternatives assumed unfamiliar. In addition, the DM’s experience of fruit consumption is

8

such that she has knowledge over her preferences for a collection of five diÆerent fruits with ordered labellingsf1, . . . , f5 (ordered so that fruit fi has utility value u(i)), and where the known utility values u(1), . . . , u(5)

are 0.3, 0.35, 0.4, 0.5, and 0.7, respectively. Figure 1 shows the DM’s previous utility experience on a [0, 1]utility line.

The two alternative and unexperienced fruits will be labeled fnew and fnew2 , and will be assumed tohave exchangeable subjective utility (meaning the labellings are unimportant and that either can be takenif a decision is made to select one of them).

Figure 1: Assumed previous utility experience

Consider first the solution to this example if it were treated as a one-oÆ solitary decision problem, ratherthan as a sequential problem. In such a setting, and using Equations (2) and (3), the lower expected utilitybound for either of fnew or fnew2 is found to be E[Ufnew ] = 3/8 = 0.375, whilst the upper expected utilitybound is E[Ufnew ] = 13/24 º 0.542. As such, under the EP choice rule the DM should only try one of thenew fruits if the known alternative option is f1, whilst the EO choice rule requires selection of an unfamiliarfruit unless the known alternative option is f5. That is to say, the result of the NPUI model is that onlyf1 is known to have a utility less than the lower expected utility bound of fnew, whilst only f5 is known tohave a utility greater than the upper expected utility bound of fnew.

Now suppose the situation is formally treated as a sequential problem with a planning horizon of threedecision epochs. Furthermore, suppose that the DM’s objective is to maximise the overall sum of utilityreturns generated by fruit consumption. A decision tree of this setting is provided in Figure 2. In sucha sequential problem the DM must consider her possible updated expected utility bounds in light of thepossibility of experiencing one of the new fruits at either the first or second decision epoch, and the way thatthis is performed under the NPUI model is provided for by Equation (8) in the case of the lower expectedutility bound (the EP choice rule), and by Equation (9) in the case of the upper expected utility bound (theEO choice rule).

The solution to such a sequential decision problem can be determined given a choice rule for comparinguncertain utilities with known utilities, but once this has been identified, e.g., by EP or EO, the usual ‘roll-back’ algorithm for solving a decision tree can be implemented. Furthermore, the number of calculationsrequired in performing this task can be reduced given the understanding that it would be irrational (neverof positive benefit in terms of increased utility) to delay the selection of an uncertain utility outcome overa known utility outcome if such a selection was indeed to be made within the planning horizon. This isbecause we assume the utility for known outcomes to be fixed over the duration of the planning horizon, andare not altered by the sequence in which outcomes are experienced. As such, there would be a non-negativedecrease to the resulting value of the information gained from selecting an uncertain utility outcome if it isselected later in the decision sequence, for if a novel outcome was found to have utility greater than thoseoutcomes already experienced, its continued selection would provide additional utility in each subsequentepoch within the planning horizon. For the decision tree in Figure 2, this means that the following pathscan all be ignored:

• Take known ! Take known ! Try new1

• Take known ! Try new1 ! · · ·• Try new1 ! Take known ! Try new2

• Try new1 ! Repeat new1 ! Try new2

9

Figure 2: Decision tree for sequential selection problem with gray shadowing indicating a range of possible outcomes for thepreceding choice node

Two additional paths can also be ignored due to the irrationality in their selection:

• Try new1 ! Take known ! Repeat new1

• Try new1 ! Repeat new1 ! Take known

Figure 3 displays the reduced decision tree which only includes paths that are not a priori irrational.The solution to this decision tree is summarised in Table 1, which considers all possibilities for the availableknown fruit, with the actual known fruit available being indicated by the first column. The second columnlists the lower bound of the expected utility for the optimal decision strategy under the EP choice rule, whilstthe third column lists the upper bound of the expected utility for that decision strategy. The fourth andfifth columns list the lower and upper bounds for the expected utility of the optimal decision strategy underthe EO choice rule, respectively, whilst finally columns six and seven indicate whether or not the optimaldecision strategy under either the EP or EO choice rules includes the initial selection of an unfamiliar option.Note that when it is found optimal to not select a novel option, the lower and upper expected utility boundsfor the choice rule under consideration will coincide, and this is because in each period the DM will selectthe experienced outcome that has known utility.

10

Figure 3: Reduced Decision tree for the selection problem with irrational paths omitted

Bounds on Expected Utility Sum for Optimal Decision Strategy over 3 Periods

Pessimistic Optimistic Select a Novel OptionAvailable Lower Upper Lower Upper Pessimistic Optimistic

f1 1.298 1.817 1.298 1.817 Yes Yesf2 1.305 1.819 1.305 1.819 Yes Yesf3 1.323 1.785 1.319 1.826 Yes Yesf4 1.500 1.500 1.367 1.855 No Yesf5 2.100 2.100 2.100 2.100 No No

Table 1: Expected utility bounds for sequential selection problem and optimal selection strategy

At this point a note should be made concerning how the upper bound was found for the EP choice rule,and similarly, how the lower bound was found for the EO choice rule. Whilst the EP (EO) choice rule isspecifically designed to find the decision strategy that has maximum lower (upper) expected utility bound,it does not take into consideration the upper (lower) expected utility bound of that decision strategy. Todetermine the upper (lower) bound for the EP (EO) choice rule, the decision strategy suggested remainsfixed, but we consider the resulting utility from the selection of a novel outcome to be the supremum(infimum) of the utility interval Ij that it is assumed to fall in.

In addition, there are occasions when equality exists between the utility value for the known option andeither the lower or upper utility bound for a novel option, yet the actual choice that is made will aÆect theupper expected utility bound under the EP choice rule and the lower expected utility bound for the EOchoice rule. To account for such problems an additional rule was included stating that an uncertain utilityoption is selected in cases of indiÆerence under the EP decision strategy, whilst a known utility option isselected in cases of indiÆerence for the EO strategy. This would appear instinctive, as under the EP choicerule the lower expected utility bound for the novel option is calculated on a worst case scenario, and soif indiÆerence only exists in this worst case scenario, the upper bound should be calculated by assumingselection of the novel outcome. Similarly, the upper bound used in the EO choice rule is based on a bestcase situation, and so it would appear most appropriate to select a known option in the case of indiÆerence.

11

As is to be expected, the results given in Table 1 show that the lower bound of the expected utilitycorresponding to the EO choice rule is never greater than the lower bound for the EP choice rule, whilstthe upper bound for the EP choice rule is never greater than the upper bound for the EO choice rule.Indeed, there can be no other choice rule that will generate a larger lower expected utility bound than thatresulting from the EP choice rule, nor can there exist an alternative choice rule that will generate a largerupper expected utility bound than that resulting from the EO choice rule. However, when the consideredknown available fruit was f1, f2 or f5, both EP and EO resulted in the same expected utility bounds. Forf1 and f2 this occurs because the available known option has a utility value that is small enough to makeexperimentation the preferred strategy under both EP and EO, and so both choice rules give rise to thesame optimal decision strategy. The EO choice rule is of course more readily disposed to experimentationin any case, but even the EP choice rule is supportive of experimenting when the alternatives are f1 or f2.

In the case of available alternative f5, both the EP and EO strategies again coincide, but now bysuggesting experimentation be avoided and that the DM should instead repeatedly select the availableknown option. Furthermore, as there is no uncertainty in the utility of option f5, both the lower and upperexpected utility bounds for the EP and EO choice rules also coincide. However, under the EO choice rule,the expected value of information gained following the initial experimentation of an unfamiliar outcomewill increase as the length of the planning horizon is also increased, and this is because if the unfamiliaroutcome were found to have greater utility, then it would be re-selected in every future epoch, see forexample [31] and [26]. Hence, if the planning horizon of the strategy was extended beyond three decisionepochs, then there would exist a horizon for which the EO choice rule would suggest initial experimentation.In contrast though, the EP choice rule will never suggest experimentation against the selection of the bestknown outcome regardless of the length of the planning horizon, and that is because even if the unfamiliaroutcome is found to be better, the EP choice rule will assume that its utility will equal the infimum of theinterval [u(n), 1).

The EP and EO choice rules do disagree, however, when either fruit f3 or f4 is considered as the avail-able known option. In the case of f4 this is because, under a planning horizon of three periods, the EPchoice rule suggests avoiding experimentation, whilst this duration is indeed considered su±cient for initialexperimentation under the trial seeking EO choice rule. Finally, both EP and EO suggest initial experi-mentation against an initial selection of f3, but diÆerences arise in the expected utility bounds because theEP and EO choice rules suggest alternate strategies depending on the resulting observation from the initialexperimentation, i.e., the EO choice rule would suggest continued experimentation for some observationsthat would lead to the EP choice rule reverting to known outcomes.

6. Concluding Remarks

There has been limited discussion on the idea that preferences over decision outcomes may be uncertain,even though such scenarios have empirical support in life, e.g., when a new brand becomes available forpurchase. Here we introduced the possibility of using NPUI as a normative decision theory for such situations,and have discussed its virtues for when a DM faces a decision problem that includes the possibility ofobtaining novel or unfamiliar outcomes. The inclusion of the NPI statistical technique appears natural forsuch situations, and is a simple inferential procedure that avoids making any unjustifiable distributional orconditional independence assumptions.

The decision selection criteria of EP and EO oÆer bounds on the expected utility resulting from a decisionstrategy, and were specifically designed for use within a sequential problem. Either could be employed todetermine a specific strategy that is selected before the first decision epoch, and both have their merits ineither maximizing the minimum expected utility stream, or maximizing the maximum possible expectedutility stream. In selecting one over the alternative, a DM will indicate a level of trial aversion, with EPindicating extreme trial aversion and a preference for the known, whilst EO indicates extreme trial seekingbehaviour and a preference for the unknown. The theory of trial aversion was introduced by [26] and is ananalogous, but orthogonal, concept to the usual risk aversion of [30] and [2], but only arising in theories ofdecision making under uncertain preferences.

12

A further area that could be addressed is the possibility of there being two (or more) distinct setsof outcome types, with exchangeability of outcome utility only being assumed applicable within outcometype. Following [8], given observation sets u(1), u(2), . . . , u(n) and v(1), v(2), . . . , v(m), the lower and upperprobabilities of the event that vm+1 ∏ un+1, are as follows:

P (vm+1 ∏ un+1) =P

i,j I{v(i)∏u(j)}

(m + 1)(n + 1)(12)

P (vm+1 ∏ un+1) =n + m + 1 +

Pi,j I{v(i)∏u(j)}

(m + 1)(n + 1)(13)

In terms of utility inference, this extension of NPI is relevant when there is one group of outcomes that areconsidered to have exchangeable utility values (say fruits) and another distinct group of outcomes that alsohave their own exchangeable utility values (say vegetables), and the decision is a selection from amongst theunion of the two. Unfortunately, it would appear to be first necessary to determine how diÆering outcomeclasses can be placed on a common utility scaling, for the suggestion used here of a ‘hypothetical’ best andworst outcome from within a single exchangeable class is no longer su±cient. The obvious extension wouldbe to elicit relative preference between a hypothetical best outcome from one exchangeable class and thehypothetical best outcome from a second (or third etc.) exchangeable class, and this returns to the situationof commensurability discussed in [26], yet such an assumption would be di±cult to implement in practice.

Finally, we believe there is scope for future work by consideration of the methodology presented to areal life innovative application, and one possibility for this is the problem of deciding how long to testsoftware before its release, e.g., [29]. Here there is a clear motivation that, as software bugs and theirrelative impacts are discovered sequentially in the testing process, the expected bounds on the loss fromany remaining undiscovered bugs (due perhaps to customer dissatisfaction resulting from faulty software)will evolve. There is then potential for developing a stopping criterion, based on the inferential frameworkdeveloped here, that seeks to find an optimal trade-oÆ between maximising reliability of the software andbetween reducing lost market opportunity or market share due to delayed release.

Acknowledgments

The authors wish to express their grateful thanks to the supportive comments and suggestions provided byYakov Ben-Haim, and for the feedback from three anonymous referees whose comments have been beneficialin aiding presentation. B. Houlding is funded by the STATICA project, a Principle Investigator program ofScience Foundation Ireland, Grant Number 08/IN.1/I1879.

References

[1] Abellan, J., Baker, R.M., & Coolen, F.P.A., (2011). Maximising Entropy on the Nonparametric Predictive Inference Modelfor Multinomial Data. European Journal of Operational Research, 212, 112-122.

[2] Arrow, K.J., (1971). Essays in the Theory of Risk-Bearing. Markham, Chicago.[3] Augustin, T., & Coolen, F.P.A., (2004). Nonparametric Predictive Inference and Interval Probability. Journal of Statistical

Planning and Inference, 124, 251-272.[4] Ben-Haim, Y., Dasco, C.C., Carrasco, J., & Rajan, N., (2009). Heterogeneous Uncertainties in Cholesterol Management.

International Journal of Approximate Reasoning, 50, 1046-1065.[5] Berger, J.O., (1993). Statistical Decision Theory and Bayesian Analysis. Springer.[6] Bernoulli, D., (1738). Exposition of a New Theory on the Measurement of Risk (Translation). Econometrica, 22, 22-36.[7] Boutilier, C., (2003). On the Foundations of Expected Expected Utility. 18th International Joint Conference on AI, 712-717.[8] Coolen, F.P.A., (1996). Comparing Two Populations Based on Low Stochastic Structure Assumptions. Statistics & Proba-

bility Letters, 29, 297-305.[9] Coolen, F.P.A., & Coolen-Schrijner, P. (2003). A Nonparametric Predictive Method for Queues. European Journal of

Operational Research, 145, 425-442.[10] Coolen, F.P.A., (2006). On Nonparametric Predictive Inference and Objective Bayesianism. Journal of Logic, Language

and Information, 15, 21-47.[11] Coolen, F.P.A., Houlding, B., & Parkinson, G., (2011). Considerations on Jury Size and Composition using Lower Prob-

abilities. Journal of Statistical Planning and Inference, 141, 382-391.

13

[12] Coolen-Schrijner, P., Coolen, F.P.A., & Shaw, S.C., (2006). Nonparametric Adaptive Opportunity-Based Age ReplacementStrategies. Journal of the Operational Research Society, 57, 63-81.

[13] Cyert, R.M., & DeGroot, M.H., (1975). Adaptive Utility. In Day, R.H., Groves, T., (Eds.), Adaptive Economic Models(pp. 223-246). Academic Press, New York.

[14] de Finetti, B., (1974). Theory of Probability (volume 1). John Wiley and Sons.[15] DeGroot, M.H., (1970). Optimal Statistical Decisions. McGraw-Hill.[16] Ellsberg, D., (1961). Risk, Ambiguity, and the Savage Axioms. The Quarterly Journal of Economics, 75, 643-669.[17] Epstein, L.G., (1999). A Definition of Uncertainty Aversion. Review of Economic Studies, 66, 579-608.[18] Farrow, M., & Goldstein, M., (2006). Trade-OÆ Sensitive Experimental Design: A Multicriterion, Decision Theoretic,

Bayes Linear Approach. Journal of Statistical Planning and Inference, 136, 498-526.[19] Ghiradato, P., Maccheroni, F., & Marinacci, M., (2004). DiÆerentiating Ambiguity and Ambiguity Attitude. Journal of

Economic Theory, 118, 133-173.[20] Halevy, Y., (2007). Ellsberg Revisited: An Experimental Study. Econometrica, 75, 503-536.[21] Hill, B.M., (1968). Posterior Distribution of Percentiles: Bayes’ Theorem for Sampling from a Population. Journal of the

American Statistical Association, 63, 677-691.[22] Hill, B.M., (1988). De Finetti’s Theorem, Induction, and A(n) or Bayesian Nonparametric Predictive Inference. In

Bernardo, J.M. et al. (Eds.), Bayesian Statistics 3 (pp. 211-241). Oxford University Press.[23] Hill, B.M., (1993). Parametric Models for A(n): Splitting Processes and Mixtures. Journal of the Royal Statistical Society

B, 55, 423-433.[24] Houlding, B., & Coolen, F.P.A., (2007). Sequential Adaptive Utility Decision Making for System Failure Correction.

Journal of Risk and Reliability, Proceedings of the IMech E Part O, 221, 285-295.[25] Houlding, B., (2008). Sequential Decision Making with Adaptive Utility. PhD Thesis, Department of Mathematical Sci-

ences, Durham University.[26] Houlding, B., & Coolen, F.P.A., (2011). Adaptive Utility and Trial Aversion. Journal of Statistical Planning and Inference,

141, 734-747.[27] KlibanoÆ, P., Marinacci, M., & Mukerji, S., (2005). A Smooth Model of Decision Making under Ambiguity. Econometrica,

73, 1849-1892.[28] Kmietowicz, Z.W., & Pearman, A.D., (1981). Decision Theory and Incomplete Knowledge. Aldershot Gower.[29] McDaid, K., & Wilson, S.P., (2001). Deciding How Long to Test Software. Journal of the Royal Statistical Society: Series

D (The Statistician), 50, 117-134.[30] Pratt, J.W., (1964). Risk Aversion in the Small and in the Large. Econometrica, 32, 122-136.[31] RaiÆa, H., & Schlaifer, R., (1961). Applied Statistical Decision Theory. Harvard University, Boston.[32] Roelofs, V.J., Coolen, F.P.A., & Hart, A.D.M., (2011). Nonparametric Predictive Inference for Exposure Assessment. Risk

Analysis, 31, 218-227.[33] Savage, L.J., (1954). The Foundations of Statistics. John Wiley and Sons.[34] Schmeidler, D., (1989). Subjective Probability and Expected Utility without Additivity. Econometrica, 57, 571-587.[35] Wald, A., (1950). Statistical Decision Functions. John Wiley & Sons.[36] Walley, P., (1991). Statistical Reasoning with Imprecise Probabilities. Chapman and Hall.

14

A normative decision theory for when preferences are uncertain. Development/Application of low-structure `data driven' inferential belief model. Suggestive (bounds) for choice rule in a sequential problem with uncertain preferences.

Nonparametric Predictive Utility Inference

Documents

Transcript of Nonparametric Predictive Utility Inference