The Reasonableness of Possibility From the Perspective of Cox

Computational Intelligence, Volume 17, Number 1, 2001

THE REASONABLENESS OF POSSIBILITYFROM THE PERSPECTIVE OF COX

Paul Snow

The possibility calculus is shown to be a reasonable belief representation in Cox’s sense, eventhough possibility is formally different from probability. So-called linear possibility measures satisfy theequations that appear in Cox’s theorem. Linear possibilities are known to be related to the full rangeof possibility measures through a method for representing belief based on sets that is similar to atechnique pioneered by Cox in the probabilistic domain. Exploring the relationship between possibilityand Cox’s belief measures provides an opportunity to discuss some of the ways in which Cox dissentedfrom Bayesian orthodoxy, especially his tolerance of partially ordered belief and his rejection of priorprobabilities for inference which begins in ignorance.

Key words: Cox’s theorem, non-Bayesian probability, defaults, possibility.

1. INTRODUCTION

Richard Threlkeld Cox’s motivation for the use of probability in belief models iswell known in the artificial intelligence (AI) community. See, for instance, Cheeseman(1988) and the many commentaries published with it. Although Cox has long beendescribed as an orthodox Bayesian writer (Jaynes 1963), Cox’s own thinking on proba-bility and belief modeling was highly nuanced and heterodox. His chief acknowledgedinfluence was John Maynard Keynes (1921), who wrote outside the Bayes-Laplace tradi-tion. Keynes was skeptical about whether numerical probabilities had any role in beliefmodeling except in some highly circumscribed circumstances. In his view, only ordinalcomparisons between some beliefs were possible, and for other beliefs, even ordinalcomparison was impossible.

Cox’s compact body of probabilistic writing, only about 175 pages published overthree decades (1946, 1961, 1978), can be read as an intergenerational conversationwith Keynes. Cox demonstrated in 1946 that, contrary to Keynes’ conjecture, numericaldegrees of belief could be used to derive the rules of probability from assumptionsabout reasonable features of beliefs. Cox’s 1961 book discussed how sets of probabilitydistributions exhibited the partial orderings desired by Keynes. In 1978, Cox showed thatan abstract variable measure of belief, closer to Keynes’ original nonnumeric proposal,also could be used as the starting point in deriving the same probability rules.

The immediate occasion of Cox’s writing on probability in 1946 was a trio of articlesthat had appeared in The American Journal of Physics (Bergmann 1941; Kemble 1942;Margenau 1942). These authors discussed whether there was any role for probability inthe natural sciences except as a measure of relative frequencies in some ensemble ofpossible events.

Bergmann was firmly frequentist. He rejected the gambling behavior arguments ofRamsey (1926) and was particularly dismissive of Keynes, finding the economist’s the-ory “irreconcilable with the most basic ideas of � � � modern logic.” Kemble agreedthat a frequentist interpretation of probability seemed appropriate for the hard sci-ences, but he found that interpretation incoherent for some applications in statisticalmechanics. Margenau suggested that subjective probability could be useful if it was

Correspondence to Paul Snow, P.O. Box 6134, Concord NH 03303-6134 USA. e-mail: [email protected].

c© 2001 Blackwell Publishers, 350 Main Street, Malden, MA 02148, USA, and 108 Cowley Road, Oxford, OX4 1JF, UK.

The Reasonableness of Possibility 179

properly distinguished from frequentist probability. To this end, he suggested that theterm probability be reserved for the relative frequency measure and that the term like-lihood be used for the subjective calculus.

Cox’s 1946 article continues this debate. As if to answer Bergmann’s remarks aboutKeynes, Cox derives the rules of probability with an emphasis on constraints derivedfrom basic ideas of Boolean logic. Cox also avoids any use of the notion of frequencyin an ensemble, not even the indirect reference implicit in gambling arguments. Thisallows him to resolve cleanly the dilemma pointed out by Kemble. Cox even adoptsMargenau’s awkward term likelihood for a generic degree of belief. Of course, Coxthen goes on to argue that probabilities based on degrees of belief are as worthy of usein physical science as those based on relative frequencies.

In such a resolutely probabilistic context, it would be easy to overlook that there arelikelihoods that are reasonable according to Cox’s criteria which obey nonprobabilisticrules of disjunctive combination. An interesting example is the formally nonprobabilisticbelief calculus introduced by Lotfi Zadeh (1978) called possibility.

The axioms of probability that Cox derived are satisfied by a specific kind of pos-sibility measure. Cox’s ideas about using sets of belief measures to represent partiallyordered beliefs recover the full range of possibility measures. A specific interpreta-tion of possibility measures suggests a mechanism of belief change other than Bayesianconditionalization. Cox’s endorsement of conditionalization was highly restricted andpresents no impediment to this account of belief change. In particular, Cox did notsubscribe to a probabilistic representation of complete ignorance, and he offered notheory of how the viewing of evidence would overcome initial ignorance.

2. COX’S PROBABILITY AXIOMS AND THEIR DERIVATION

Cox’s subject was the belief in one proposition on the assumption that some otherproposition is true. His own notation for this, following Keynes, is terse:

b|aand may lead to confusion when Cox considers functions like

f �b|a�which appears to be a function of an ordered pair of propositions but is actually afunction of the scalar quantity b|a. It does not help that Cox then sometimes uses theoriginal b|a symbology for f �b|a� on the grounds that f �b|a� is the same kind of objectas b|a, a measure of conditional belief. In hopes of sorting this out, I shall adopt thenotation

q�b|a�for Cox’s b|a, and when a function of this measure is discussed, I shall write

f �q�b|a��Cox’s three works differ about just what q�b|a� is supposed to denote. In 1946,

q�·� was introduced as an assignment of a specific scalar to each element in a setof ordered pairs of propositions. Its meaning was not more specific than “some mea-sure of the reasonable credibility of the proposition b when a is known to be true.”

180 Computational Intelligence

The point of the 1946 version of the famous theorem was to show that under mod-est assumptions, there would exist some function f �·� where f �q�·�� would obey therules of probability. The arguments of 1946 also permit inferences about what, besidesprobability and functions of it, might be reasonable q�·� in Cox’s view.

Nonprobabilistic belief measures do not come up in either of the later works.In 1961, q�·� was introduced as some function of a probability. Cox then argues that anotion of probability that represents beliefs is “harmonious” with the kinds of probabil-ities favored by “other schools.” This agenda is explained in the first chapter. In 1978,q�·� was again a function of a belief-modeling probability from the beginning.

Consequently, I shall follow the conventions of 1946 in this section. Assuming thatthe beliefs to be modeled have been captured by some scalar valued q�·�, Cox requiresthat there be a relationship among the three belief measures q�cb|a�� q�c|ba�, andq�b|a�. That is, he seeks some function F�·� where

q�cb|a� = F�q�c|ba�� q�b|a�� (1)

Cox argues that F�·� should be an associative function. His concern was that someconjunctions might be subdivided in more than one way, e.g.,

q�dcb|a� = F�q�dc|ba�� q�b|a�� = F�F�q�d|cba�� q�c|ba�� q�b|a�q�dcb|a� = F�q�d|cba�� q�cb|a�� = F�q�d|cba�� F�q�c|ba�� q�b|a��

and both routes must yield the same value.The functional equation that expresses associativity, i.e.,

F�F�x� y�� z� = F�x� F�y� z�� (2)

is assumed to hold for all real values x� y, and z. Although there has been a controversyrecently about whether Cox disclosed the requirement that Equation (2) hold for allx� y, and z, the author of that controversy has exonerated Cox on this point (Halpern1999).

Equation (2) implies that

f�F�F�x� y�� z� = f�F�x� F�y� z�� 2′�

where f �·� is any scalar function. A particular solution for Equation (2′) is

f �F�x� y�� = f �x� ∗ f �y� (3)

for any f �·� that displays the multiplicative relationship in Equation (3).There has been much discussion by Cox (e.g., note 10 of 1978) and others (e.g.,

Aczel, 1966) about the conditions needed for a multiplicative f �·� to exist and forEquation (3) to be a general solution of Equation (2′) subject to the constraint ofEquation (2). It is easily verified that Equation (3) is a particular solution of Equation (2′)for any f �·� that does satisfy Equation (3) and, of course, Equation (2) is satisfied byany associative F�·�.

Since this article is concerned principally with a specific belief representation, thequestions of generality are beside the point. I shall simply choose a suitable f �·� andverify that Equation (2) holds for the F�·� in question when the time comes. It issufficient for my purposes that the chosen functions are a particular solution of Cox’s


expression of conjunctive reasonableness. The solution (Equation 3) was satisfactory toCox, without restrictions on f �·� except that it be a function of one variable (1946, p. 6).

Rewriting Equation (3) by substitution based on Equation (1), we have

f �q�cb|a�� = f �q�c|ba�� ∗ f �q�b|a�� (4)

Cox notes that if c is a tautology T , Equation (4) implies that

f �q�T |a�� = 1

Cox goes on to argue that it is also reasonable to have a function that relates degreesof belief in a conditional proposition and its negation, i.e., some function S�·� where

f �q�b|a�� = S�f �q�¬b|a��Since double negation in Boolean logic returns the original proposition, S�·� also shouldrecover the starting value when applied twice:

f �q�b|a�� = S�S�f �q�b|a��This requirement, when combined with Equation (4) and some further Boolean argu-mentation, yields a functional equation whose general solution is

f �q�b|a��m + f �q�¬b|a��m = 1

where m is an arbitrary constant; i.e., any positive value of m solves the functionalequation constraints. Note that Equation (4) is unaffected by raising both sides to apower:

f �q�cb|a��k = f �q�c|ba��k ∗ f �q�b|a��kThere is therefore no loss of generality in taking k = m. Since f �·� is an unspecifiedfunction, there is also no loss of generality in choosing m = 1, yielding the negation rule:

f �q�b|a�� + f �q�¬b|a�� = 1 (5)

To derive the usual additive probabilistic rule for disjunctions,

f �q�cb|a�� = f �q�c|a�� + f �q�b|a�� − f �q�cb|a�� (6)

Cox notes that Equation (4) implies

f �q�bc|a�� + f �q�b¬c|a�� = f �q�b|a�� ∗ �f �q�c|ba�� + f �q�¬c|ba��The factor in braces is equal to one by Equation (5), so

f �q�bc|a�� + f �q�b¬c|a�� = f �q�b|a��This equation is then applied twice to the measure for ¬�c ∨ b�, which sentence isequivalent to ¬c ∧ ¬b by De Morgan’s rule:

f �q�¬c¬b|a�� = f �q�¬b|a�� − f �q�c¬b|a��= f �q�¬b|a�� − f �q�c|a�� + f �q�cb|a��

Equation (6) is obtained by subtracting both ends of this chain of equations from oneand simplifying using Equation (5).


Cox’s mathematics does not show that the original scalar degrees of belief q�·�must obey the laws of probability if his premises about reasonableness are accepted.It is sufficient for a scalar belief representation q�·� to be reasonable in Cox’s sense ifthe assignment respects Equation (2) and there exists any f �·� that, when applied tothe q�·� values, satisfies Equations (4), (5), and (6).

Cox’s concern was to show that the rules of probability could be as legitimatelyapplied to belief as to the relative frequency of events in an ensemble. If those rulesare applicable to some function of reasonable degrees of belief, then his goal is achieved.What rules, if any, the original q�·� might follow beyond transitive functional conjunc-tion seems not to have interested Cox, except for a brief discussion of the specific caseabout which Kemble had raised a question. This lack of interest is unsurprising becausethe now popular nonprobabilistic calculi were either less well known or not yet inventedduring the working lifetime of Cox, who was born in 1898.

3. ZERO MEASURE CONDITIONING IN COX’S PROBABILITY

Equation (4) can be compared with the usual Łukasiewicz-Kolmogorov rule that itimplies, replacing the composition f �q�·�� with p�·�, i.e.,

p�c|ba� = p�cb|a�p�b|a� (7)

when p�b|a� > 0. The difference, of course, is just that restriction. Equation (4) is welldefined if f �q�b|a�� = 0, whereas the usual rule (Equation 7) is undefined in that case.

Equation (7) and its exception can be interpreted in two different ways. One inter-pretation is that the conditional probability p�c|ba� should be undefined when p�b|a� =0. Łukasiewicz (1913) meant his definition to be taken in this sense. His domain ofprobabilistic application was an abstract version of inference about proportions in amixture. The conditional p�c|ba� would be the answer to a question such as “In pop-ulation a, what proportion of the bs are also cs?” If there are no bs in a, then thisquestion has no sensible answer. Kolmogorov (1933) adopts the same convention, butwith little or no motivation.

The other reading is that Equation (7) states a constraint and that the value ofp�c|ba� is unconstrained by the other probabilities when p�b|a� = 0. The conditionalcould then take a value of the believer’s choice. Cox’s Equation (4) says just this, sinceif p�b|a� = 0, then so does p�cb|a� = 0. By the ordinary rules of probability,

0 ≤ f �q�cb|a�� ≤ f �q�b|a�� = 0

because cb ⇒ b, and the “constraint” of Equation (7) is the vacuous

0 ∗ p�c|ba� = 0

Cox considered some conditioning on zero-measure hypotheses to be sensible.Briefly in 1961 and at length in 1978, Cox distinguished between a conditioning hypoth-esis that is a contradiction as opposed to a noncontradictory hypothesis that happens tobe false. Contradictions are excluded as hypotheses, but “it is permissible logically andoften worth while to consider the probability of an inference on an hypothesis which iscontrary to fact in one respect or another” (1961, p. 17).


To put the thought in something closer to a Keynesian formulation, the probabilityp�c|ba� would measure what b∧ a tells the believer about c. If b∧ a is a contradiction,then it presumably tells the believer very little about anything. If, on the other hand,the believer happens not to know of any bs that are in a, but there is any sense inwhich there could be, then thinking about how belief in c might be affected by knowl-edge about b could be useful. The circumstance that p�b|a� is zero, which moots ques-tions about proportions within b when p�·� is a physical proportion, fails to distinguishbetween useful and silly hypotheses for reflection about beliefs.

It is interesting to note that Bruno de Finetti (1974) arrives at the more permis-sive Equation (4) using arguments based on his own idealized model of admissiblegambling behavior. Both de Finetti and his contemporary followers (e.g., Coletti andScozzafava, 1997) find Equation (4), and not something stronger, congenial for theresolution of some foundational questions in mathematical probability, such as the fea-sibility of countable additivity and the crafting of a robust definition of conditionalindependence.

For Cox, the choice between the two interpretations of Equation (7) reflects aprincipled difference between the requirements of a model of proportions in a mixtureand those of a model of beliefs. While Cox acknowledged the usefulness of mixtureanalogies in probabilistic reasoning (1961, p. 90), he was obviously looking for somethingelse as a basis for belief modeling. The equation that Cox developed has solutions forzero values of the conditioning event. He was aware that this was so, and he declinedto add any other assumptions to achieve agreement with Łukasiewicz and Kolmogorov,except to exclude contradictory conditioning sentences.

4. A SIMPLE SOLUTION TO COX’S EQUATIONS

Another mode of reasoning about beliefs that is not directly analogous to propor-tions in a mixture is inference about defaults or the identification of best guesses as tothe true state of affairs. Cox briefly contemplated this kind of reasoning with approval(1961, p. 34):

[W]e do not, in reasonable discourse, dispense with the rules of probability, although we mayuse them so familiarly as to be unaware of them. When we employ probable inference as aguide to reasonable decisions, it is by these rules that we judge that � � � some inference is sonearly certain that we can take it for granted or some contingency so nearly impossible that wecan leave it out of our calculation.

Suppose we have some background information a and a finite set of mutually exclu-sive and collectively exhaustive propositions B = �b1� b2� b3. Note that a is outsideof B. We choose three alternatives in B for simplicity, but all points made will hold forany finite plural domain.

One solution of Equations (4) through (6) that takes advantage of zero-measureconditioning allowed by Equation (4) is

f �q�b1|a�� = 1 f �q�b2|a�� = f �q�b3|a�� = 0

f �q�b2|a ∧ ¬b1�� = 1 f �q�b1|a ∧ ¬b1�� = f �q�b3|a ∧ ¬b1�� = 0

f �q�b3|a ∧ b3�� = 1 f �q�b1|a ∧ b3�� = f �q�b2|a ∧ b3�� = 0

In words, our best guess when b1 is possible is b1, indicated by assigning to it and alldisjunctions that include it the value of unity. If we learn that b1 is false, then b2 is ourbest guess. Proposition b3 is our best guess only when there is no unfalsified alternative.


For any particular conditioning hypothesis, f �q�·�� resembles a Boolean distribu-tion over the elementary propositions and their disjunctions, where exactly one elemen-tary proposition has a value of 1, and all others have values of 0. As the conditioninghypothesis varies, different quasi-Boolean distributions are adopted. The value of zero,of course, does not necessarily correspond to falsehood but is overloaded with theidentification of options that are neither false nor the default best guess on the currentinformation.

It is easy to verify that a conditional Boolean distribution obeys the constraint ofEquation (4), i.e.,

f �q�cb|a�� = f �q�c|ba�� ∗ f �q�b|a��where b and c are disjunctions formed from the sentences in B, and b is not empty.If f �q�b|a�� = 0, then so does f �q�cb|a�� = 0, as discussed in the preceding section,which leaves f �q�c|ba�� unconstrained. When f �q�b|a�� = 1, the values of the other twomeasures that appear in Equation (4), f �q�cb|a�� and f �q�c|ba��, are both determinedby whether or not the “top” proposition in b is also in cb. The two values are alwaysthe same; i.e., either both values are one or else both values are zero.

5. LINEAR POSSIBILITY

In this section we examine a decomposition of the conditional Boolean f �q�·�� inorder to identify a particular kind of possibility measure as a valid instance of q�·� andtherefore as a reasonable measure of belief in Cox’s sense.

Suppose that f �·� were an indicator function on [0, 1] where

f �·�: �0� 1� → �0� 1f �1� = 1

f �x� = 0 x < 1

Among the choices for q�·� that would achieve our simple model of default reasoningwould be, for any nonempty disjunction d formed from the set �b1� b2� b3,

q�b1|a� = 1 q�b2|a� = 12 q�b3|a� = 1

4 (8)

q�bi|a ∧ d� = 0 if d ∧ bi = ∅ (9a)

= 1 if q�bi|a� = q�d|a� when bi ⇒ d (9b)

= q�bi|a� otherwise when bi ⇒ d (9c)

q�d|a� = maxb⇒d

q�b|a� (10)

Expression (10) is, of course, the general disjunctive combination rule for the pos-sibility calculus. The “conditioning” rule (Equations 9a–c) restates the usual Duboisand Prade (1986) convention for that operation in possibility. The F�·� function thatcorresponds to this specification is

q�cb|a� = min�q�c|ba�� q�b|a��The min�·� function is associative, and so Equation (2) is satisfied.


It is straightforward to verify that the indicator function f �·� applied to the q�·�described in Equation (8) through (10) does yield a conditional Boolean distribution asdescribed in the preceding section:

f �q�b|ad�� = 1 iff b contains the highest possibility atom in d

= 0 otherwise

The value assignments in Equation (8) are an example of a “linear” possibilitydistribution (Benferhat et al. 1997). That is, each elementary proposition conditionedon a has a distinct positive value, and as is conventional with possibility, the largestelementary value is unity. Linear possibility measures are a special case of the popularpossibility calculus, a case that Benferhat and coauthors find useful for explaining thesemantics of default reasoning.

Any other assignment of values for b2 and b3 in the same order as Equation (8)would serve equally well to capture the default reasoning pattern with the chosen f �·�.Note that distinct values are needed to satisfy the Cox requirement that f �·� be afunction of the value of its argument, as opposed to a function of the ordered pair ofsentences in question or of other information in the problem. For example, if b1 andb2 had the same value, unity, when conditioned on a, i.e.,

q�b1|a� = 1 q�b2|a� = 1

then, conditioning on ¬b3,f �q�b1|a ∧ ¬b3�� + f �q�b2|a ∧ ¬b3�� = 2

contrary to the Cox criterion of Equation (5). Similar problems would arise for any tiein a general possibility measure.

Linear possibility distributions, however, clearly satisfy the sufficient conditions forreasonableness that arise from Cox’s theorem. This is attributable to the reasonable-ness of an elementary representation of completely ordered default reasoning that ispermissible under the Cox conditioning rule (Equation 4).

It is interesting that the conditional Boolean f �q�·�� inherits all the expressive powerof linear possibility in the domain of default reasoning. Thus, while it is a simple modelof defaults, it can perform preferential entailment and displays rational monotonicity,as linear possibilities do (Benferhat et al. 1997).

6. SETS AND GENERAL POSSIBILITY MEASURES

Like Keynes, Cox did not think that complete ordering was a plausible attribute ofa reasonable model of belief. A brief remark in 1946 (p. 9), “It is hardly to be supposedthat every reasonable expectation should have a precise numerical value,” records hisview, but Cox offers no explanation in 1946 about how nonscalar belief measures aresupposed to work.

The subject is addressed in the sixth chapter of Cox’s 1961 book. Cox explains thatwhile some probabilities are well defined, “others are scarcely defined at all except thatthey are limited, as all probabilities are, by the extremes of certainty and impossibility”(p. 29). As to which situation is typical, Cox believes (p. 33), “Most of the time we arelimited � � � to approximations or judgments of more or less.”


Cox then goes on to show how one might derive constraints on some probabilitiesfrom imprecise knowledge of others. The present-day reader recognizes the techniqueas an early instance of the “set-based Bayesian” method (Kyburg and Pittarelli, 1996).Belief is represented by a set of probability distributions rather than by a single one.Ordinal and numerical relationships common to all the distributions in the set are thebeliefs being represented.

General possibility measures, i.e., those in which tie values are allowed, can bemotivated as a brief encoding of sets of the Cox-reasonable default functions for linearpossibility measures. For example, the possibility distribution

��b1|a� = 1 ��b2|a� = 1 ��b3|a� = 14 (11)

can be taken to represent a set of two default measures. If q�·� obeys Equations (8)through (10), the two measures are

{f �q�b|a ∧ d��: q�b1|a� = 1; q�b2|a� = 1

2 ; q�b3|a� = 14 ; b� d ∈ B

}(12)

and

{f �q�b|a ∧ d��: q�b1|a� = 1

2 ; q�b2|a� = 1; q�b3|a� = 14 ; b� d ∈ B

}(13)

That is, the tie in Equation (11) is broken both possible ways. The basic idea of rep-resenting general possibility distributions as sets of linear possibility distributions thatbreak ties in this fashion is already familiar in possibility theory (reviewed in Benferhatet al. 1997).

In the possibility distribution Equation (11),

��c|a� > ��d|a�

for c and d occurs just when

f�q�c|a ∧ �c ∧ d�� = 1 and f�q�d|a ∧ �c ∧ d�� = 0

is unanimous in Equations (12) and (13). Thus Equation (11) encodes what the setcontaining Equations (12) and (13) asserts about ordinal default beliefs, and from theinformation in Equation (11), the set can be constructed algorithmically.

The utility of a compact representation to the default reasoner is obvious: If therewere several elementary propositions instead of just three and several tied propositionsin the default ordering to be represented, then the size of the indicator set would growcorrespondingly. Polling such a set for unanimity is a waste of time when inspectionof just one possibility distribution would yield the same ordinal information about thedefaults being represented.

The existence of a set composed of individually reasonable value assignments meansthat the pattern of reasoning captured by their unanimous assertions is itself reasonablewithin Cox’s criteria. Since some such set can be constructed for every possibility distri-bution, the entire possibility calculus is an instance of a reasonable belief representationin the Cox sense.


7. COX’S PERMISSIVENESS

The results of the preceding sections could be avoided if Cox’s assumptions werestrengthened in some way to forbid linear possibility distributions. One strategy, ofcourse, would be to rule out the conditional Boolean distribution itself by adopting theŁukasiewicz interpretation of Equation (7) and so forbid zero-measure conditioning.This would be at odds with Cox’s conception of what is acceptable as a probabilisticmodel of belief and what kinds of reasoning are within the scope of probabilisticregulation.

A less drastic approach would be to require more of F�·� than associativity andto restrict f �·� to be a special kind of function. Other authors besides Cox have sug-gested stronger conditions on F�·� than associativity. For example, Horvitz et al. (1986)assumed that the function also should be continuous and strictly increasing in eachplace except for values of zero. Aczel (1966, Section 6.2) showed that such conditionson F�·� implied a circumstance from which the existence of some continuous and strictlyincreasing f �·� can be easily inferred. As Dubois and Prade (1989) have observed, theHorvitz group’s additional requirements exclude possibility measures.

The assumption that F�·� be strictly increasing in each place does square with aprobabilist’s idea of a good belief representation (Snow 1995). As the linear possibilityexample shows, however, it is possible to have a belief representation that does notsatisfy the condition, and yet a function of the representation nevertheless has multi-plicative conjunctive combination. It is unclear why the believer would not be free touse either the q�·� or the f �q�·�� level of description as one saw fit.

A third strategy to rule out linear possibility would be to require functional negationof the q�·� values rather than just the f �q�·��. After all, Cox was willing to imposefunctional conditioning (Equation 2) directly on the q�·�. If functional negation alsowere applied to the q�·�, i.e.,

q�¬b|a� = N�q�b|a��then possibility would be eliminated in a stroke.

One difficulty with this approach is that Cox provided little motivation for whatfunctional negation has to do with reasonableness. In 1961 and 1978, where the q�·�swere already functions of probabilities at the time of their first appearance, functionalnegation simply was assumed to hold, as indeed it does for probabilities, and so nomotivation was really needed. In 1946, the only explanation offered is that the propo-sition ¬b is determined when the proposition b is specified. This may be a reason forsaying that q�¬b|a� should be a definite number, invariant under Boolean identities.Possibility satisfies this requirement but also illustrates that functional negation doesnot follow from the explanation.

This desultory motivation contrasts with Cox’s discussions with examples in eachof his works arguing for the existence of the function F�·�, which, if granted, leads toassociativity on pain of getting two different values for the same ordered pair. Cox isexplicit in 1946 (p. 6) that some F�·� should be assumed to bind “whatever measurebe chosen.” That there be some functional relationship among the three q�·�s appearsfundamental to Cox’s intuition about what belief measures should be, perhaps related tothe central role that the conditional character of any belief-modeling probability playsin the Keynesian theory.

Functional negation appears less important to Cox, and he did not propose thatit hold “whatever measure be chosen.” Given a choice of how strong a form of the


property to assume, Cox chose the weaker option available to him, requiring only thef �q�·�� to display the property.

For some readers there may be a sense that Cox missed an opportunity to crafta more exclusively probabilistic account of credal reasonableness than he did, perhapseven an impression that Cox made some mistake, or that he is being read too literallyhere. It turns out, however, that there is an independent argument for the reasonable-ness of linear possibility based on Cox’s Keynesian ideas about ordinal belief.

The linear possibility described by Equation (8), i.e.,

��b1|a� = 1 ��b2|a� = 12 ��b3|a� = 1

4

exactly represents the ordinal structure of the probability distribution, i.e.,

p�b1|a� = 47 p�b2|a� = 2

7 p�b3|a� = 17 (14)

according to the following rules. For disjunctions c� d, and e formed from the elementsof B where e is not empty,

p�c|ae� ≥ p�d|ae� ⇔ ��c ∧ ¬d|ae� ≥ ��d ∧ ¬c|ae� (15)

��c|ae� ≥ ��d|ae� ⇔ p�c|ae� ≥ p�d ∧ ¬c|ae� (16)

The rules are straightforward to check on the three-element domain.To show that rules of the same form hold for other sized finite domains, note that a

probability distribution such as Equation (14) can be constructed from the powers of 2for any finite plural number of atoms. It is simple to verify that for any such probability,any linear possibility with the same ordering of atoms, and any exclusive sentences vand w,

p�v|a� ≥ p�w|a� ⇔ ��v|a� ≥ ��w|a� (17)

since both ordinal comparisons turn on whether the highest valued atom in v−w impliesv. Equivalence (15) follows from Equation (17) and the well-known probabilistic prop-erty of quasi-additivity. That is, for any probability p�·� and arbitrary sentences x and y,

p�x|a� ≥ p�y|a� ⇔ p�x ∧ ¬y|a� ≥ p�y ∧ ¬x|a�Equivalence (16) follows from Equation (17) and the possibilistic property that for anysentences x and y,

��x|a� ≥ ��y|a� ⇔ ��x|a� ≥ ��y ∧ ¬x|a�which expresses in symbols that x has at least as high possibility as y if an atom ofhighest possibility that implies x ∨ y also implies x.

Cox thought that a probability-agreeing ordering was a reasonable state of belief.The probability ordering and the possibility ordering of Equations (15) and (16) describeone another exactly. If one expresses a reasonable state of belief, then the other does,too.

Cox’s 1946 theorem as written supports the same conclusion about the reasonable-ness of linear possibility. Tinkering with the assumptions in his theorem will not renderlinear possibility unreasonable in Cox’s view. Thus there are no grounds to think thatCox was mistaken in choosing his assumptions cautiously. A narrow and literal readingof Cox (1946) is consistent with his other ideas about reasonable belief.


8. NONBAYESIAN EVIDENTIARY REASONING

Formally, any zero-measure conditioning allowed by Equation (4), such as defaultreasoning using the conditional Boolean functions of the preceding sections, is beliefchange that does not rely on Bayesian revision. There is nothing in Cox’s work, however,that makes Equation (4) the exclusive regulator for belief change. This section considersa situation where Cox’s principles would comport with a mechanism for belief changethat is outside the scope of Equation (4).

If one has beliefs about both b and bc on background a, then in the absence ofzeroes, Equation (4) does bind belief in c on the hypothesis ab. But like Keynes, Coxdoes not assume that one always experiences all the beliefs that appear in the equation.Cox’s theory is about how some beliefs are related to others. If one lacks some beliefs,then the missing beliefs participate in no relationships, and there is nothing for thetheory to explain.

An interesting case is when the starting information a is uninformative about thehypotheses of interest, often called elsewhere prior ignorance. An orthodox bayesian hasthe option, and perhaps feels the obligation, to adopt an “uninformative” probabilitydistribution for use as a prior in such a situation. This maneuver is unavailable to Cox,not even as an option, since “precise uninformative probability” is an oxymoron forhim.

An early statement of this Keynesian view appears on page 6 of Cox (1946):

It is not to be supposed that a relation of likelihood exists between any two propositions. If a isthe proposition “Caesar invaded Britain” and b is “Tomorrow will be warmer than today,” thereis no likelihood b | a, because there is no reasonable connection between the two propositions.

To assert that there is literally “no likelihood” in such circumstances is, of course, outof step with typical modern views about probabilistic belief modeling. Many would sayinstead that the likelihood b|a exists but that it is equal to, say, b|¬a. By 1961, Cox didsoften his position slightly and allowed the bare existence of probability on irrelevanthypotheses. His newer view is hardly more useful for the would-be user of Equation (4),however.

Not only must the hypothesis of a probability assert something, if the probability is to be definedwithin any limits narrower than the extremes of certainty and impossibility, but also what itasserts must have some relevance to the inference.

This is followed by another example of the Caesar-weather variety on page 36 of Cox(1961).

Prior ignorance is not much of a problem in the Cox theory. When the believereventually gets some information about the propositions of interest, then and only thenis there any nontrivial sort of probability to talk about. The formation of the first beliefsin an inference episode is simply placed outside the theory. As noted, the theory is aboutthe relationships among beliefs, and there are no earlier beliefs for the first beliefs tohave a relationship with.

A similar situation also can arise in a partially ordered belief formalism when thesecond piece of relevant information arrives. Again, Equation (4) may not provide anyguidance for belief revision, since some of the beliefs that appear in the equation maynot be experienced. This situation is not much different from having started in ignoranceand learned both pieces of information simultaneously rather than sequentially. Whatbeliefs happened to have been entertained in the meantime need not constrain the finalstate of belief.


If one would like to introduce some regularity into the situation, then although Coxprovides little guidance, neither does he erect many impediments. On any given state ofinformation, the conditions for reasonableness (Equations 4–6) must be satisfied. Anynew information that eliminates some of the propositions of interest must obey theconstraints imposed by Equation (4). Apart from that, a new conditioning hypothesis isa new inference problem. A similar position is argued by Kyburg (1978).

A traditional interpretation of possibility values is as the conditional probability ofsome evidence given the truth of the elementary proposition whose possibility is beingassessed (Dubois and Prade 1992). Conventionally, the conditional probabilities arenormalized so that the possibility of the best-supported elementary proposition is 1. Ifthis particular operation is performed by dividing all the conditional probabilities by thelargest one, then the resulting possibility values are likelihoods in the usual technicalsense (a multiple of a conditional probability) as well as Cox’s sense.

Thus the linear possibility in Equation (8) might have arisen by viewing someevidence a whose conditional probabilities given each of the hypotheses might havebeen, say,

p�a|b1� = 0�4 p�a|b2� = 0�2 p�a|b3� = 0�1

The values are then normalized by dividing through by the biggest value, 0.4 in theexample. New extrinsic information a′ that brings with it new conditional probabilitiescauses a reassessment of the possibilities based on

p�a′ ∧ a|bi�

and thus updates the default guesses outside of Equation (4) and of Bayes’ rule.Conditioning on the sentences of interest is handled by the rules for possibilistic

default reasoning already discussed and shown to be obey the constraints of Equation (4).On any specific state of information, f �q�·�� satisfies Equations (4) through (6).

The patterns of belief change allowed by the conditional probabilistic interpretationof possibility are no doubt specialized in their application, just as defaults are a specialkind of beliefs. Nevertheless, these patterns fully comply with the requirements forreasonableness demanded by Cox. As such, they illustrate that Cox’s writing cannot beinterpreted as a warrant for the exclusive use of bayesian techniques in the managementof belief change.

9. CONCLUSIONS

This article continues recent work that explores the close relationship between prob-abilistic and possibilistic representations of belief. Earlier efforts have concentrated onthe zero-free probability distributions, of which Equation (14) is a simple example.The Cox criteria, especially that zero-measure conditioning is well defined according toEquation (4), allow a new perspective on this issue and a different class of possibilisticprobabilities. The relationship is mediated by a numerical function rather than by alogical operation such as Equation (15) on sentences being compared. The conditionalBoolean distribution also may provide the basis of a clear expression of probability andpossibility’s shared semantics for defaults (Benferhat et al. 1997; Snow 1999).

The other principal conclusion concerns the reputation of Richard Threlkeld Cox,a thoughtful and rigorous critic of Bayesian orthodoxy. While Cox’s theorem providesa justification for using probability in belief models, of which the Bayesian models are


one kind, Cox provides no support for Jaynes (1963) when he disparages nonBayesiantechniques for belief management as “inconsistent.”

It is worth noting that Cox never claimed to show that any belief-managementscheme was unreasonable. His concerns were to show that the use of probability inbelief models was reasonable on weak assumptions and that belief modeling was just asfit a subject for probabilistic treatment as mass phenomena are.

It is a disservice to deny Cox credit for an original approach to belief modeling andno compensation to credit him instead with justifying the supposed exclusive reasonable-ness of an existing method to which he did not subscribe. In particular, Cox is entitledto recognition for establishing a quantitative foundation for reasonable nonlaplaceanbelief representations. This is a “prediction” as physicists use that term in connectionwith their models. The prediction was fulfilled by Lotfi Zadeh, whose possibility calculusappeared archivally in 1978, the same year as Cox’s last probabilistic writing.

REFERENCES

Aczel, J. 1966. Lectures on Functional Equations and Their Applications. Academic Press, New York.Benferhat, S., D. Dubois, and H. Prade. 1997. Possibilistic and standard probabilistic semantics of

conditional knowledge. In Proceedings of AAAI, pp. 70–75.Bergmann, G. 1941. The logic of probability. American Journal of Physics, 9:263–272.Cheeseman, P. 1988. An inquiry into computer understanding (with commentaries). Computational

Intelligence, 4:58–142; also 6:179–192 (1990).Coletti, G., and R. Scozzafava. 1998. Null events and stochastical independence. Kybernetika, 34:

69–78.Cox, R. T. 1946. Probability, frequency, and reasonable expectation. American Journal of Physics,

14:1–13.Cox, R. T. 1961. The Algebra of Probable Inference. Johns Hopkins Press, Baltimore.Cox, R. T. 1978. Of inference and inquiry: an essay in inductive logic. In The Maximum Entropy

formalism, pp. 119–167. Edited by R. D. Levine and M. Tribus. MIT Press, Cambridge, Mass.de Finetti, B. 1974. Theory of Probability. Wiley, New York.Dubois, D., and H. Prade. 1986. Possibilistic inference under matrix form. In Fuzzy Logic in Knowl-

edge Engineering, pp. 112–126. Edited by H. Prade and C. V. Negoita. Verlag TUV, Rhineland.Dubois, D., and H. Prade. 1989. Measure-free conditioning, probability, and non-monotonic reason-

ing. In Proceedings of IJCAI, pp. 1110–1114.Dubois, D., and Prade, H. 1992. Belief change and possibility theory. In Belief Revision, pp. 142–182.

Edited by P. Grdenfors. Cambridge University Press, Cambridge, England.Halpern, J. Y. 1999. Cox’s theorem revisited. Journal of Artificial Intelligence Research, 11:429–435.Horvitz, E. J., D. E. Heckerman, and C. P. Langlotz. 1986. A framework for comparing alternative

formalisms for plausible reasoning. In Proceedings of AAAI, pp. 210–214.Jaynes, E. T. 1963. Review of the algebra of probable inference. American Journal of Physics,

31:66–67.Kemble, E. C. 1942. Is the frequency theory of probability adequate for all scientific purposes?

American Journal of Physics, 10:6–16.Keynes, J. M. 1921. A Treatise on Probability. Macmillan, London.Kolmogorov, A. N. 1933. Grundbegriffe der wahrscheinlichkeitrechnung. Ergebnisse der Mathe-

matik, 3. English translation by N. Morrison. 1956. Foundations of the Theory of Probability.Chelsea, New York.

Kyburg, H. 1978. Subjective probability: Criticisms, reflections, and problems. Journal of Philosophi-cal Logic, 7:157–180.

Kyburg, H. E., Jr. and M. Pittarelli. 1996. Set-based bayesianism. IEEE Transactions on Systems,Man, and Cybernetics, 26:324–339.


Łukasiewicz, J. 1913. Logical foundations of probability theory. In Jan Łukasiewicz, Selected Works,pp. 16–63. Edited by L. Borkowski (1970). North-Holland, Amsterdam.

Margenau, H. 1942. The role of definitions in physical science, with remarks on the frequency defi-nition of probability. American Journal of Physics, 10:224–232.

Ramsey, F. P. 1926. Truth and probability. In The Foundations of Mathematics and Other LogicalEssays. Edited by R. B. Braithwaite (1950). The Humanities Press, New York.

Snow, P. 1995. An intuitive motivation of bayesian belief models. Computational Intelligence,11:449–459.

Snow, P. 1999. Diverse confidence levels in a probabilistic semantics for conditional logics. ArtificialIntelligence, 113:269–279.

Zadeh, L. A. 1978. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1:3–28.

The Reasonableness of Possibility From the Perspective of Cox

Documents

Transcript of The Reasonableness of Possibility From the Perspective of Cox