Aggregation of Epistemic Uncertainty: A New Interpretation...

Aggregation of Epistemic Uncertainty: A New Interpretation of the Certainty Factor with

Possibility Theory and Causation Events

Koichi Yamada Department of Information and Management Systems Engineering

Nagaoka University of Technology Nagaoka, Niigata, JAPAN

[email protected]

Abstract—Information aggregation has a long history of studies. It has been used in decision-making, sensor fusion, information retrieval, affective intelligence and many other applications for combining certainties, reliabilities, sentiments and other degrees of information to judge something in the real world. The paper dares to revisit a traditional and seemingly forgotten representation of uncertainty called Certainty Factors, and discusses a new interpretation with Possibility theory and causation events. Then it develops a few aggregation functions of uncertainties derived from distinct pieces of evidence. The Certainty Factors had been criticized due to lack of sound mathematical interpretation from the viewpoint of Probability theory. Thus, the paper first tries to establish a theory for a sound interpretation using Possibility theory. Then it examines the aggregation based on the interpretation. It proposes four combination functions with sound theoretical basis, one of which is exactly the same as the combination criticized for long time.

Keywords—uncertainty combination, Certainty Factors, Possibility theory, causation events, information aggregation.

I. INTRODUCTION Information aggregation has been used in decision-

making [1], sensor fusion [2], information retrieval [3] and many other applications to aggregate uncertain information derived from distinct pieces of evidence and to get the final degrees of certainty, reliability, sentiment, conformity, etc. In general, aggregation methods of uncertain information are classified into three groups; t-norm (product), t-conorm (sum) and averaging, and are used in different situations depending on reliability of the evidence [3-6]. However, the reliability itself is uncertain or unknown in many applications such as medical diagnosis, sentiment analysis, etc. So, it is not an easy task to choose an appropriate method of aggregation. In many cases we have to choose one based on our intuition or the rule of thumb.

MYCIN [7], a traditional expert system referred to a lot in the second AI boom from 1970s to 80s, proposed an uncertain expression called Certainty Factor [8] and a way to combine/update CFs of hypotheses derived from multiple pieces of evidence. The CF was highly evaluated for practical usage because it seemed to work well without caring about the implicit assumption of independency and reliability, while theoreticians criticized it due to a theoretical deficit that the combination rule cannot be derived from the definition of CF [9]. Since then, many papers had been published to solve the problem [6,9-11], many of which were studies based on Probability theory. However, the efforts to solve the theoretical issue led us to more complexity and less practicability [9,11]. It seems almost forgotten these days in

the study of uncertainty, even as probability has prevailed in the field of intelligent systems.

This paper recalls the CF that was told to be "practical" but not to be "sound" theoretically, and interprets it with Possibility theory [12, 17], an epistemic theory of uncertainty, as well as causation events [13-16]. Then, it discusses combination methods of CFs in the framework of Possibility theory and proposes some combination rules with sound basis, one of which is exactly the same as the combination rule used in MYCIN.

II. EXISTING THEORIES

A. Certainty Factors CF is a representation of certainty/uncertainty given by a

real value in the interval [-1,1], where "1" represents the perfect affirmation, "-1" does the perfect negation, and "0" means neither is supported. This is clearly a bipolar scale mentioned in [5]. The Certainty Factor Cf (h,e) of hypothesis h given evidence e was defined in [8].

C f (h,e) =MB(h,e)!MD(h,e) . (1)

Later it was re-defined by the next equation [7].

Cf (h,e) =MB(h,e)!MD(h,e)

1!min(MB(h,d),MD(h,e)),

where MB(h,e) represents a degree that belief in h is revised by e toward affirmation, and MD(h,e) does a degree toward negation. These are values in [0,1] and defined using probabilities as follows: MB(h,e) = (P(h | e)! P(h)) / (1! P(h)) ,

MD(h,e) = (P(h)! P(h | e)) / P(h) .

It should be noted that the above definition presupposes modularity of evidence, meaning that CF of hypothesis h depends only on the evidence e, no matter what other evidence is present or not. In addition, the CF does not represent a belief state in h given e, but does the degree of belief revision from the prior to the posterior.

Let x be the current belief in h, and y be a CF of h given by a new evidence ey, that is y = Cf (h, ey). Then, the current belief x is revised by the next equation:

fM (x, y) =x + y! xy, if x, y " 0,x + y+ xy, if x, y # 0, x = y $ 0,(x + y) / (1!min(| x |, | y |), otherwise.

%

&'

('

(2)

It is undefined when x=1 and y=-1, and vice versa. The equation satisfies commutativity, associativity, continuity,

377

2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposiumon Advanced Intelligent Systems

978-1-5386-2633-7/18/$31.00 ©2018 IEEEDOI 10.1109/SCIS-ISIS.2018.00081

and monotonicity, but not idempotency. Thanks to commutativity and associativity, the result of revision does not depend on the order of evidence when multiple pieces of evidence are given. This might be one of reasons of the "practicality." In addition, since fM (0, x) = x for any x, there is virtually no problem to interpret Cf (h, e) as a belief state of h given only e (any other possible evidence is not present). In the case, the eq. (2) could be understood as a combination rule rather than a revision rule, that is, fM(x, y) = Cf (h, ex ey) is understood as a belief state given only ex and ey.

The reason why the CF was criticized in spite of the practicality is that the combination rule could not be derived from the definition of CF under appropriate conditions in the framework of probability [9]. Thus, [10] and [11] discussed the mathematical properties and a method of belief update respectively, beyond probability theory.

The combination rule represented by eq. (2) takes probabilistic sum in the case of x•y ! 0, leading to increase in the absolute value of CF both in affirmation and negation. In the case of x•y < 0, it takes an in-between value. So the combination rule could be used as a model of group decision-making in the sense that the opinion is intensified when two decision-makers are in the same positive/negative side, otherwise they compromise and weaken the opinion.

Now, when x•y < 0, fM(x, y) satisfies the following four conditions required for an averaging function [4].

C.1: min(x, y) " fa(x, y) " max(x, y). C.2: fa(x, y) = fa(y, x). : commutativity C.3: fa(x, y) " fa(z, w), if x " z, y " w. : monotonicity C.4: fa(x, y) is continuous. : continuity

[Proposition-1] fM(x, y) = (x + y) / ( 1- min(| x |, | y |), xy < 0, x, y ! [-1,1] is an averaging function. Proof: C.2 and C.3 are proved in [6]. C.4 is obvious. C.1 is proved below. Since fM(x, y) is comuutative, only the case of x > 0 and y < 0 is proved, separating the case into A) x ! | y | and B) x < | y |. Note that min(x, y) = y and max(x, y) = x. A) when x ! | y |, fM=(x+y)/(1+y). Then, max(x, y)•(1+y) - (x+y) = x(1+y) - (x+y) = y(x-1) ! 0. Also (x+y) - min (x, y)•(1+y) = (x+y) - y(1+y) = x - y2 > x - x2 = x(1- x) ! 0. Thus, C.1 holds in this case. B) when x < | y |, fM = (x+y) / (1- x). Then, max(x, y)•(1- x) - (x+y) = x(1- x) - (x+y) = -x2 - y ! 0. Also, (x+y) - min(x, y)•(1 - x) = (x+y) - y(1 - x) = x(1+y) ! 0. Thus C.1 also holds in this case. (End of Proof)

B. Possibility Theory [12] Possibility measure is a function ! from the power set

of the universal set U to interval [0,1], and satisfies !(!) = 0 , !(U ) =1 and !(A!B) =max(!(A),!(B)) for A, B ! U as axioms, where ! , U, ! are the empty set, the universal set and the union operator, respectively. The dual function N called necessity measure is defined by N (A) =1!!(AC ) . AC is a compliment of A. It satisfies N(! ) = 0, N(U) = 1 and N(A! B) = min(N(A), N(B)), where ! is intersection. In general, ! (A) ! N(A) holds.

Possibility and necessity have a distribution function ! (u) = !({u}) and !(u) =N({u}), u !U , respectively. When a possibility distribution is given, the corresponding Possibility and Necessity measures are obtained by the following;

!(A) =maxu!U " (u) ,

N (A) =1!!(AC ) .

The uncertainty of an elementary event u !U is represented by a possibility and necessity pair (! (u),!(u)) or a pair (! (u),! (u )) of possibilities of u and its negation, where ! (u ) =1!!(u) .

III. INTERPRETATION OF CF WITH POSSIBILITY THEORY As mentioned in Introduction, the trials toward a sound

interpretation of CFs with probability were not successful. This section discusses a new interpretation using causation events [13-16] and Possibility theory [12, 17].

A. Causation Events and Conditional Causal Possibility The paper interprets Cf (h,e), "the certainty of hypothesis h given only evidence e," using a epistemic certainty of causation event h:e given only e. A causation event h:e is an event originally meaning that "e causes h," but we interpret it as "e supports h" in the paper. The event satisfies the following logical formulae: h : e! (h : e)"e! (h : e)"h! (h : e)"e"h , (3a) h : e!e" e , (3b) h : e!h" h , (3c)

where ! and ! represent equivalence and conjunction. In the case of probability, P(h|e) ! P(h:e|e) holds [13,14]. The paper also introduces a concept of the opposite hypothesis of h, called o-hypothesis. It is represented by k. The o-hypothesis k is different from the negation of h, and satisfies k! h and h! k . It is a concept included in h in the sense of Set theory. It satisfies that h!k" h!h# False , but h!k ( h! k , k! h ) is not a tautology. h!k is not a contradiction, either. Let h and k be "hot" and "cold", respectively. The example satisfies all the above logical formulae. Even in the case of binary hypotheses such as "female" (h) and "male" (k), the formulae hold, because hypotheses are epistemic intrinsically complying with the closed world assumption, which insists what is not known to be true is false. Under the assumption, both hypotheses are regarded as being false; h!k , when no evidence is available as to female or male. Thus it actually means "unknown" or "no evidence."

We define a causation event k:d for o-hypothesis k and its evidence d as follows; k : d! (k : d)"d! (k : d)"k! (k : d)"d"k , (4a)

k : d!d" d , (4b) k : d!k" k . (4c)

Clearly, it holds that k : d! h : d and h : e! k : e , because k! h and h! k , respectively. From (3a) and (4a), (h : e)!(k : e)" False is also derived. That is, a piece of evidence e cannot support both a hypothesis and its o-hypothesis.

Now, we assume the following as to a hypothesis, the o-hypothesis and causation events.

h! e!Eh! h : e( )! d!Ek! k : d( ) , (5a)

k! d"Ek# k : d( )$ e"Eh$ h : e( ) , (5b)

378

where Eh and Ek are sets of all pieces of possible evidence that support h and k, respectively. Eh!Ek "# , because (h : e)!(k : e)" False . The equation 5(a) means that hypothesis h is present if and only if one or more of possible causation events supporting it (h) is (are) present and all causation events supporting the o-hypothesis (k) are absent. Let us deny both hypothesis and o-hypothesis.

h! e!Eh! h : e( )! d!Ek! k : d( ), (6a)

k ! d!Ek! k : d( )! e!Eh! h : e( ) . (6b)

We can confirm that the assumptions 5(a) and 5(b) are consistent with the above definition of k; k! h and h! k .

From now on, we will not differentiate evidence of the hypothesis and the o-hypothesis, and use the same symbol e! E = Eh"Ek . Then, we define conditional possibility of a causation event given only e by the next equations.

! (h : ei!ei ) = ! (h : ei | e1!...!ei"1!ei !ei+1!...!en ) , (7a)

! (k : ei!ei ) = ! (k : ei | e1!...!ei"1!ei !ei+1!...!en ) , (7b)

where ei ! E = Eh"Ek = {e1,...,en} . The symbol "!" represents the condition that only ei is present in the set E of all possible evidence and the others are absent. Similar to the case that h!k is not a contradiction, we suppose that ei !ej never be a contradiction even if ei ! Eh and ej ! Ek , because the evidence is also an epistemic concept.

From the definition above, we know that ! (k : e!e) = 0 ife! Eh , and ! (h : e!e) = 0 if e! Ek . In the case where causation events are conditional independent of the other evidence, it should hold that ! (h : e!e) = ! (h : e | e) and ! (k : e!e) = ! (k : e | e) . [Proposition-2] The following equations hold. ! (h!e) = ! (h : e!e) , (8a) ! (h!e) = ! (h : e!e) , (8b) ! (k!e) = ! (k : e!e) , (8c) ! (k!e) = ! (k : e!e) . (8d) Proof: (1) Proof of 8(a): Let Eh = {e1,...,en} , Ek = {d1,...,dm} ,A! e1!e2 !...!en , B! d1!...!dm . Conjunction of the RHS of

formula (5a), A and B is (!i=1,nh : ei )"A"(" j=1,mk : d j )"B# (!i=1,n ((h : ei )!A))!(! j=1,m (k : d j )!d j ) ! ((h : e1)!A)! ( j=1,m! d j ) ! (h : e1)!A!B . This must be equivalent to conjunction of the LHS of (5a), A and B, that is, h!A!B . Thus, ! (h : e1 | A!B) = ! (h | A!B) . This is nothing but (8a). (8c) is proved in the same way. (2)Proof of 8(b): Conjunction of RHS of (6a), A and B is {(!i=1,nh : ei )"(" j=1,mk : d j )}!A!B ! {(!i=1,nh : ei )!A!B}"{(" j=1,mk : d j )!A!(d1!...! dm)} ! {(h : e1!e1)!(!i=2,n (h : ei !ei ))!B}" h : e1!A!B . This must be equivalent to conjunction of the LHS of (6a), A and B, that is, h!A!B . Thus, ! (h | A!B) = ! (h : ei | A!B) . This is nothing but (8b). (8d) is proved in the same way. (End of Proof)

B. Transformation between CF and Possibility Let us represent uncertainty of an event by a possibility

distribution ( , ) on {T, F}, where T and F are True and False, respectively. From the property of possibility, =1 or =1 must be satisfied. Then, the next proposition holds.

[Proposition-3] Let f ( !T , !F ) = !T - !F . Then the function f is a bijection (one-to-one onto mapping) from D = [1,1]![0,1] ! [0,1] ! [1,1] to [-1,1] under the condition that max( !T , !F ) = 1. Proof: (Surgection, onto) Let Cf ! [-1,1]. When -1 " Cf " 0, , we give !F = 1 and !T = !F - Cf. Then ( !T , !F )!D and f ( !T ,!F )=Cf. When 0 < Cf " 1, , we give !T = 1 and !F = !T - Cf. Then, ( !T , !F )!D and f ( !T , !F )=Cf. Thus, f ( !T , !F ) is a surgection from D to [-1,1]. Injection, one-to-one) If it is not injective, multiple ( !T , !F )!D exist such that f ( !T , !F )=Cf for some Cf! [-1,1], where one of !T and !F must be one and the other is a value in [0,1]. But such combination of !T and !F is clearly only one shown below. if Cf ! 0, then !T = 1 and !F = 1.0 - Cf. if Cf < 0, then !T = 1+ Cf and !F = 1.0. Thus, it contradict with multiple exsistence. (End of Proof)

The Proposition shows that possibility distributions can be transformed uniquely into a value in [-1,1], and vice versa. Thus, we represent a conditional possibility distribution of h given only e with a single value.

qh (h!e) = ! (h!e)!! (h!e) . (9a)

Inversely, the original representation is recovered using the following equations.

! (h!e) =1.0, if qh (h!e) ! 0,1.0+ qh (h!e), otherwise."#$

(9b)

! (h!e) =1.0! qh (h!e), if qh (h!e) " 0,1.0, otherwise.#$%

(9c)

Similarly the transformation of conditional possibility distribution of k given only e is is done with the nex equations.

qk (k!e) = ! (k!e)!! (k!e) . (10a)

! (k!e) =1.0, if qh (k!e) ! 0,1.0+ qk (k!e), otherwise."#$

(10b)

! (k!e) =1.0! qk (k!e), if qk (k!e) " 0,1.0, otherwise.#$%

(10c)

The necessity distributions are obtained with the following;

!(h!e) =1!" (h!e) , !(h!e) =1!" (h!e) , !(k!e) =1!" (k!e) , !(k!e) =1!" (k!e) . Thus, the single valued distribution is also obtained using the next equations. qh (h!e) =!(h!e)!!(h !e) , qk (k!e) =!(k!e)!!(k !e) .

Table 1 Relations among qh(h!e), possibilities, necessities qh(h!e) (! (h!e), ! (h !e)) (!(h!e), !(h !e)) 1.0 (1.0, 0.0) (1.0,0.0) 0.7 (1.0, 0.3) (0.7, 0.0) 0.3 (1.0, 0.7) (0.3,0.0) 0.0 (1.0,1.0) (0.0,0.0) -0.3 (0.7, 1.0) (0.0, 0.3) -0.7 (0.3, 1.0) (0.0, 0.7) -1.0 (0.0, 1.0) (0.0, 1.0)

!T !F!T

!F

379

Finally, we define Certainty Factors wih the next equation.

Cf (h,e) =qh (h!e) = ! (h!e)!! (h!e), if e " Eh,

!qk (k!e) = ! (k!e)!! (k!e), if e " Ek.

#$%

&% (11)

The definition reflects the symmetric structure of hypothesis and o-hypothesis, which creates the discriminative property of this bipolar scale of uncertainty.

IV. COMBINATION OF CERTAINTY FACTORS

A. Combination of Possibility Distributions Let x and y be CFs of hypothesis h (or k) supported by

two pieces of evidence ex, ey, respectively. Their possibilistic expressions are derived from eq. (9), (10) and (11). Now, we discuss ways to combine these two possibility distributions and to get a new one through dividing the problem into three cases; 1) x, y ! 0, 2) x, y " 0 but x = y ! 0 , and 3) x > 0 > y or x < 0 < y. (1) Case 1: x, y ! 0 In this case, the evidence ex, ey of x, y respectively supports h, meaning ex,ey ! Eh . From eq. (9b) and (9c) we get ! (h!ex ) =1.0 , ! (h!ex ) =1! x , ! (h!ey ) =1.0 , and ! (h!ey ) =1! y . The conditional possibility of h given only ex and ey is derived using eq. (5a), (8a), assuming non-interactivity [17] between causation events and independence of causation events given the evidence.

! (h!ex,ey ) = ! ((!e"Eh h : e)#(#e"Ek k : e)!ex,ey )

= ! ((!e"Eh h : e)!ex,ey ) = ! (h : ex !h : ey!ex,ey )

=max(! (h : ex !ex,ey ),! (h : ey!ex,ey ))

=max(! (h : ex !ex ),! (h : ey!ey ))

=max(! (h!ex ),! (h!ey )) =1.0

(12a)

The possibility of negation of h given only ex and ey is obtained using eq. (6a), (8b) and the same assumptions as the above.

! (h!ex,ey ) = ! ((!e"Eh h : e)#(#e"Ek k : e)!ex,ey )

= ! ((!e"Eh h : e)!ex,ey ) = ! (h : ex !h : ey!ex,ey )

=min(! (h : ex !ex,ey ),! (h : ey!ex,ey ))

=min(! (h : ex !ex ),! (h : ey!ey ))

=min(! (h !ex ),! (h !ey )) =min(1$ x,1$ y)

(12b)

The CF value is calculated from the results of eq. (12a), (12b) using equation (11). Cf (h,exey ) =max(x, y) . (2) Case 2: x, y " 0 and x = y ! 0 The evidence ex, ey of x, y respectively supports k in this case, meaning ex,ey ! Ek . Thus, from (10b), (10c), we get ! (k!ex ) =1.0 , ! (k!ex ) =1+ x , ! (k!ey ) =1.0 and ! (k !ey ) =1+ y . The possibility of k given only ex and ey is obtained using eq. (5b) and (8c) with the same assumption as in (1).

! (k!ex,ey ) = ! ((!e"Ek k : e)#(#e"Eh h : e)!ex,ey )

= ! ((!e"Ek k : e)!ex,ey ) = ! (k : ex !k : ey!ex,ey )

=max(! (k : ex !ex,ey ),! (k : ey!ex,ey ))

=max(! (k : ex !ex ),! (k : ey!ey ))

(13a)

=max(! (k!ex ),! (k!ey )) =1.0

The possibility of negation of k given only ex and ey is obtained using (6b), (8d) with the same assumptions as the above.

! (k!ex,ey ) = ! ((!e"Ek k : e)#(#e"Eh h : e)!ex,ey )

= ! ((!e"Ek k : e)!ex,ey ) = ! ((k : ex !k : ey )!ex,ey )

=min(! (k : ex !ex,ey ),! (k : ey!ex,ey ))

=min(! (k : ex !ex ),! (k : ey!ey ))

=min(! (k !ex ), )! (k !ey )) =min(1+ x,1+ y)

(13b)

The CF value is calculated from the results of (13a), (13b) using equation (11). Cf (h,exey ) =min(x, y) .

(3) Case 3: x > 0 > y (or x < 0 < y ) Evidence ex supports h, while ey does k when x > 0 > y, which means ex ! Eh and ey ! Ek . From eq. (9) and (10), we get ! (h!ex ) =1.0 , ! (h!ex ) =1! x , ! (k!ey ) =1.0 and ! (k !ey ) =1+ y . Since the two pieces of evidence ex and ey look at the opposite directions of h and k respectively, we combine non-contradictive parts of their assertions, i.e. we calculate possibilities of h!k and h !k given only ex and ey, under the condition that the evidence is reliable.

We get the following from eq. (5a), (6b) assuming the non-interactivity and the independence.

! (h!k !ex,ey ) = ! (h!ex,ey )

= ! (("e#Eh h : e)!(!e#Ek k : e)!ex,ey )

=min(! ("e#Eh h : e!ex,ey ),! (!e#Ek k : e!ex,ey ))

=min(! (h : ex !ex,ey ),! (k : ey!ex,ey ))

=min(! (h : ex !ex ),! (k : ey!ey ))

=min(! (h!ex ),! (k!ey )) =1+ y

(14a)

Similarly from eq. (5b) and (6a) , we get

! (h!k!ex,ey ) = ! (k!ex,ey )

= ! (( e"Ek# k : e)!( e"Eh! h : e)!ex,ey )

=min(! ( e"Ek# k : e!ex,ey ),! ( e"Eh! h : e!ex,ey ))

=min(! (k : ey!ex,ey ),! (h : ex !ex,ey ))

=min(! (k : ey!ey ),! (h : ex !ex ))

=min(! (k!ey ),! (h!ex )) =1$ x

(14b)

Assuming (14a) as possibility of ! (h!ex,ey ) or ! (k!ex,ey ) and (14b) as possibility of ! (h !ex,ey ) or ! (k!ex,ey ) , respectively, we get the CF value from eq. (11). Cf (h,exey) = x+y .

Since this equation is symmetrical, we also get the same result in the case of y > 0 > x. Now, by summing up the above three cases, the combination of CFs derived from distinct pieces of evidence ex, ey is obtained using the equation.

380

fmin (x, y) =max(x, y), if x, y ! 0,min(x, y), if x, y " 0, x = y # 0,x + y, otherwise.

$

%&

'&

(15)

In the above, the possibility distribusion of Case 3 given by (14a) and (14b) does not satisfy the condition that the maximum of a possibility distribution should be one. Thus, by normalizing the distribution we get ! (h!ex,ey ) = (1+ y) / (1!min(| x |, | y |) , ! (h !ex,ey ) = (1! x) / (1!min(| x |, | y |) . In the case, the combination is given by the following;

fmin!N (x, y) =max(x, y), if x, y " 0,min(x, y), if x, y # 0, x = y $ 0,(x + y) (1!min(| x |, | y |), otherwise.

%

&'

('

(16)

B. Combination with Algebraic Sum and Product In the previous section, max and min were used for

possibilistic calculation of logical sum and product of events, respectively. These are the standard operations in Possibility theory. There are some cases where other operations are used. This section applies another set of popular operations, algebraic sum and product, instead. Note that the conditional independence and non-interactivity assumed in the previous section should be replaced by probability-like conditional independence in this case.

Now we replace max and min operations in eq.s (12), (13) and (14) by algebraic sum and product, respectively. (1) Case 1: x, y ! 0 ! (h!ex,ey ) = ! (h!ex )+! (h!ey )!! (h!ex )•! (h!ey ) =1.0 . ! (h !ex,ey ) = ! (h !ex )•! (h !ey ) = (1! x)(1! y) . Cf (h,exey ) =1! (1! x)(1! y) = x + y! xy .

(2) Case 2: x, y " 0 and x = y ! 0 ! (k!ex,ey ) = ! (k!ex )+! (k!ey )!! (k!ex )•! (k!ey ) =1.0 . ! (k !ex,ey ) = ! (k !ex )•! (k !ey ) = (1+ x)(1+ y) . Cf (h,exey ) = (1+ x)(1+ y)!1= x + y+ xy .

(3) Case 3: x > 0 > y (x < 0 < y) ! (h!k !ex,ey ) = ! (h!ex )•! (k !ey ) =1+ y . ! (h !k!ex,ey ) = ! (h !ex )•! (k!ey ) =1" x . Cf (h,exey ) = (1+ y)! (1! x) = x + y . When x < 0 < y in Case 3, ex and ey should be replaced as well as x and y in the above equations. Summing up the above three cases gives us the next combination rule of CFs.

falg(x, y) =x + y! xy, if x, y " 0,x + y+ xy, if x, y # 0, x = y $ 0,x + y, otherwise.

%

&'

('

(17)

If we normalize the possibility distribution in the case of x > 0 > y or x < 0 < y, we get the following combination rule.

falg!N (x, y) =x + y! xy, if x, y " 0,x + y+ xy, if x, y # 0, x = y $ 0,(x + y) (1!min(| x |, | y |), otherwise.

%

&'

('

(18)

This is exactly the same as eq. (2). It means that the combination rule of CFs, which has been criticized for long time as not being sound theoretically, could be justified through interpretation with Possibility theory and causation events in addition to some assumprions of conditonal independency among causation events given their evidence.

C. Some Properties and Discussion The previous section proposed four different equations to

combine multiple CFs derived from distinct pieces of evidence, interpreting them with Possibility theory; one uses max/min operations and the other does algebraic sum/product operations for possibility calculation, with two cases of normalization and non-normalization of the possibility distribution for each operation. From the viewpoint of Possibility theory, the equations with max/min operations might be preferable, because they are the standard operations of Possibility theory. A merit of using max/min operations is that the independence requirement between ex and ey becomes moderate compared with the case of algebraic sum/product, because the possibility becomes an ordinal scale. No independence is required for max operation of possibility calculation.

Looking at mathematical properties of combination rules, all of them satisfy commutativity and continuity but none is idempotent. fmin and fmin-N are monotonically non-desreasing and falg and falg-N are monotonically increasing. As to associativity, only falg-N equivalent to the MYCIN's combination satisfies it. If we give much weight to associativity that lets us combine multiple CFs in any order, the falg-N should be judged as the most practical .

The assumption that all four combinations rely on is the conditional independence and non-interactivity of causation events given their evidence, though the mathematical details are slightly different between the cases of max/min operations and algebraic sum/product operations as mentioned above. This assumption could be a constraint of applicability, because in some applications evidence ex and ey may have a synergy effect on h, and some unknown evidence that affects h may be present. However, as long as CF (and possibility) represents the epistemic uncertainty in human knowledge, this might be a realistic constraint considering the bounded rationality of humans. Complex conditional effects with many possible pieces of evidence might be beyond the cognitive capability of humans. In addition, it is widely known that Naive Bayes classifiers with similar assumptions show satisfactory results even in the case of applications that do not satisfy them.

V. NEUMERICAL EAMPLES Suppose we have five pieces of uncertain evidence

regarding a news article whether it is true or fake. The CFs are given in the order where we got the evidence; x1=0.3, x2=-0.5, x3=0.8, x4=0.4, and x5=-0.7. Table 2 shows the combination results in the above order using the combination rules from eq. (15) to (18). For example, the value of each entry (e.g. -0.20) in the fmin row represents the combination result from the value (0.3) in the left column and the xi (-0.5) in the same column. The values in the first column are combination results of x1=0.3 and 0. Table 3 shows the results when CFs are combined in the descending order, and Table 4 in the ascending order.

381

Table 2. Combination Results Orders i 1 2 3 4 5 xi 0.3 -0.5 0.8 0.4 -0.7 fmin 0.30 -0.20 0.60 0.60 -0.10 fmin-N 0.30 -0.29 0.72 0.72 0.07 falg 0.30 -0.20 0.60 0.76 0.06 falg-N 0.30 -0.29 0.72 0.83 0.44

Table 3. Combination Results in Descending Order Orders i 1 2 3 4 5 xi 0.8 0.4 0.3 -0.5 -0.7 fmin 0.80 0.80 0.80 0.30 -0.40 fmin-N 0.80 0.80 0.80 0.60 -0.25 falg 0.80 0.88 0.92 0.42 -0.28 falg-N 0.80 0.88 0.92 0.83 0.44

Table 4. Combination Results in Ascending Order Orders i 1 2 3 4 5 xi -0.7 -0.5 0.3 0.4 0.8 fmin -0.70 -0.70 -0.40 0.00 0.80 fmin-N -0.70 -0.70 -0.57 -0.29 0.72 falg -0.70 -0.85 -0.55 -0.15 0.65 falg-N -0.70 -0.85 -0.79 -0.64 0.44

As shown in the tables, the final results (the values in the last column) are the same only in the case of falg-N thanks to the associativity, but the others are not. What should be noted is that the final results of fmin, fmin-N and falg are less than that of falg-N in the case of descending order (Table 3), and larger in the case of ascending order (Table 4). This shows that the effects of values combined later are larger than those of values combined before in the combinations of fmin, fmin-N and falg with no associativity.

The property of associativity might be convenient in real applications and could be one of reasons why the combination rule with the property is regarded as practical. However, considering the aspect of epistemic or cognitive uncertainty, the combinations with fmin, fmin-N and falg may be more similar than falg-N to human judgment in the sense of oblivion.

VI. CONCLUSIONS Certainty Factor was first used in MYCIN developed in 1970s, and had become popular for representing uncertainty in human experts' knowledge because of its easiness and practicality. However, due to the difficulty of sound interpretation with probability, it had been criticized for long time by theoreticians. Now that Bayesian networks [18] have become a major tool and statistical machine learning has become the main stream of AI, the world of uncertainty have been governed by probability theory. One of differences between the intelligence in knowledge systems and that of current statistical machine learning systems seems to exist in the way to deal with uncertainty; the former deals with heuristic knowledge that humans have, and uncertainty contained in the knowledge is epistemic, while the latter derives patterns or functions from data and uncertainty contained in data is objective. Reminding the fact that Fuzzy logic, Possibility theory and Certainty Factor were all devised to deal with epistemic

knowledge in the world where probability had already been the major theory of uncertainty, it seems that the trials to interpret CF with probability theory was unreasonable. The paper examined and built a sound interpretation of CFs with Possibility theory and causation events, where the definition of CFs and the combination rules are examined very carefully and integrated with no logical gap. So we can use the simple and easy MYCIN's combination rule without saying the excuse, "... though it is not sound theoretically."

REFERENCES [1] J. L. Marichal, "Aggregation functions for decision making," in

Decision-making process: concepts and methods, D. Bouyssou, et al. Eds. ISTE Ltd. 2010, pp. 673-721.

[2] C. Martine, E. Schaffernicht, A. Scheidig, and H. M. Gross, "Sensor fusion using a probabilistic aggregation scheme for people detection and tracking," ECMR 2005, Ancona, Italy, 2005, pp. 176-181.

[3] S. Marrara, G. Pasi, and M. Viviani, "Aggregation operators in information retrieval," Fuzzy Sets and Systems, vol. 324, 2017, pp. 3-19.

[4] D. Dubois and H. Prade, "A review of fuzzy set aggregation connectives," Information Sciences, vol. 36, 1985, pp. 85-121.

[5] D. Dubois and H. Prade, "On the use of aggregation operations in information fusion processes," Fuzzy Sets and Systems, vol. 142, 2004, pp. 143-161.

[6] A. K. Tsadiras and K. G. Margaritis, "The MYCIN certainty factor handling function as uninorm operator and its use as a threshold function in artificial neurons," Fuzzy Sets and Systems, vol. 93, 1998, pp. 263-274.

[7] B. G. Buchanan and E. H. Shortliffe, Rule-based expert systems - the MYCIN experiments of the Stanford Heuristic Programming Project, MA, Addison-Wesley, 1984.

[8] E. H. Shortliffe and B. G. Buchanan, "A model of inexact reasoning in medicine," Mathematical Biosciences, vol. 23, Issues 3–4, 1975, pp. 351-379.

[9] D. Heckerman, "Probabilistic interpretations for MYCIN's Certainty Factors," Machine Intelligence and Pattern Recognition, vol. 4, 1986, pp. 167-196.

[10] P. Hajek, "Combining functions for certainty degrees in consulting systems," Int. J. Man-Machine Studies, vol. 22, 1985, pp. 59-76.

[11] G.P. Amaya Cruz and G. Beliakov, "On the interpretation of certainty factors in expert systems," Artificial Intelligence in Medicine, vol. 8, 1996, pp. 1-14.

[12] D. Dubois and H. Prade, Possibility Theory, Plenum Pub. 1988. [13] Y. Peng and J. A. Reggia, "A probabilistic causal model for diagnostic

problem solving—Part I: integrating symbolic causal inference with numeric probabilistic inference," IEEE Trans. Systems, Man, Cybernet. vol. 17, 1987, pp. 146–162.

[14] Y. Peng and J. A. Reggia, Abductive inference models for diagnostic problem-solving, Springer-Verlag, 1990.

[15] K. Yamada, "Possibilistic causality consistency problem based on asymmetrically-valued causal model," Fuzzy Sets and Systems, vol. 132, 2002, pp. 33-48.

[16] K. Yamada, "Diagnosis under compound effects and multiple causes by means of the conditional causal possibility approach," Fuzzy Sets and Systems, Vol. 145, 2004, 183-212.

[17] L. A. Zadeh, "Fuzzy sets as a basis for a theory of possibility," Fuzzy Sets and Systems, vol. 1, 1978, pp. 3-28.

[18] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Pub. 1988.

382

Aggregation of Epistemic Uncertainty: A New Interpretation...

Documents

Transcript of Aggregation of Epistemic Uncertainty: A New Interpretation...