An inductive learning procedure to identify fuzzy systems

Fuzzy Sets and Systems 55 (1993) 121-132 121 North-Holland

An inductive learning procedure to identify fuzzy systems* Migue l D e l g a d o and A n t o n i o G o n z a l e z Departamento de Ciencias de la Computaci6n e lnteligencia Artificial, Universidad de Granada, 18071, Granada, Spain

Received May 1992 Revised July 1992

Abstract: In many real cases, a system may not be described by a functional input-output description and the only way to describe it is through a finite set of fuzzy rules. Even in some cases where exact identification is possible, looking for that rule description may be useful for the sake of ease and effectiveness. The automatic identification of a system through the analysis of an input-output data set is the aim of this work. In order to identify a system from raw data, obtained in repeated instances of its own system, the frequency of appearance of some kind of patterns may be particularly interesting. Nevertheless, when the base domain is fuzzy and the data can be crisp, statistical methods are not very appropriate and it is necessary to study a new model for frequencies on fuzzy domains. The proposed model for identifying systems is based on this definition.

Kevwords: Automatic learning; system identilication; fuzzy domains; frequency on fuzzy domains; evidence theory.

1. Introduction

Any system can be described through the existing relation between its input variables and its output variables. In order to identify such relation, a functional input-output description may be available, but in the case of many complex processes this is not feasible and we need to look for alternative methods. The use of fuzzy models, and more particularly those described through fuzzy rules, has been shown to be successful.

A basic problem associated with many systems is that of the knowledge acquisition or knowledge elicitation, that is, the transfer of knowledge from some source into the knowledge base. For many systems the usual way of knowledge acquisition has been the application of techniques for eliciting data from experts, but now an interesting alternative is appearing which consists in applying automatic techniques for the identification of fuzzy systems. Such techniques greatly simplify the design of fuzzy logic control or expert systems and add a capability for adaptation. Several approaches for that identification have been proposed in literature. For example, Pedrycz [12] used fuzzy discretization and clustering techniques to identify fuzzy relational systems. Other authors, e.g. Takagi and Sugeno [17] and Sugeno and Tanaka [16], investigate the identification of a system where the consequent of the fuzzy rule is a linear input-output relation. The use of the matrix product to express the induced degree of uncertainty for generating linguistic rules had been proposed by Binaghi [1] and Peng and Wang [13]. Lee proposed [8] a self-learning controller where the learning capability is provided by neuron-like elements, which are derived from animal conditioning theory. Sestito and Dillon [14] proposed a method which automatically allows us to extract production rules by single-layered networks trained by Hebb's rule. Finally, Nomura et al. presented in [10] a learning of fuzzy rules based on the descent method.

We are interested in the identification problem for systems that can be described through fuzzy rules.

Correspondence to: Dr. A. Gonzalez, Departamento de Ciencias dc la Computaci6n e Inteligencia Artificial, Universidad de Granada, 18071, Granada, Spain. e-mail: [email protected].

* This work has been supported by the CICYT under Project TIC92-0665.

0165-0114/93/$06.00 © 1993--Elsevier Science Publishers B.V. All rights reserved

122 M. Delgado, A. Gonzalez / An inductive learning procedure to identify fuzzy systems

In particular, we are interested in investigating the use of frequencies in the identification of a system, and in the use of uncertainty approaches to represent the current belief of a fuzzy rule.

Basically, the problem consists in finding a finite set of fuzzy rules able to reproduce the input-output system behaviour

R]: if X1 isAl] a n d . . , and X,, isAn] then Y1 is B1] and . . , and Ym is Bmj (1)

where R] (j = 1 . . . . , p ) denotes the j-th rule of implication, p is the number of fuzzy rules, Aii and Bq are fuzzy sets, Xi are the input variables, and Y~ are the output variable.

This identifiction problem has two steps: structure identification and parameter identification. The identification of the structure of a system consists in finding the input variables which affect the outputs. In this work, we do not consider this problem. The aim of this work is that of parameter identification, that is, to find the antecedent and consequent values in each rule already obtained by a structure identification.

Let us suppose we know, from the previous process, that a certain rule contains the variables Xi, i = 1 , . . . , n, and Yk, k = 1 . . . . . m, being the input and output variables respectively. As usual, in the following we will assume that the fuzzy sets Aq (values of Xi in the rules) and Bkj (values of Yk in the rules) are to be taken from fixed sets of labels

Ls={L~l , Ls2 . . . . . LsN,}, s = l , 2 . . . . ,n, and Tr={Trl, Tr2 . . . . . TrN,} , r = l , 2 . . . . . m,

respectively. Under these hypotheses, the set of all possible rules associated with the disclosed structure is just the

Cartesian product

.qt =L, X L 2 X ' " • x L , X T 1 X T 2 X " " • X T m (2)

and the identification of a system occurs by selecting the subset of 91 composed by the 'true rules'. The property 'to be a true rule' for a system may be considered as a crisp (Boolean) one. From our

point of view and according to the nature of the problem, each possible rule in 9t will fulfill that property with a degree (which may be 0). This degree may be interpreted as the strength of the implication and taken as a simple weight of the fuzzy rule. However, in this work we consider that the identification is to be formulated in terms of associating with each possible rule some kind of belief or uncertainty measure about the statement "this rule is a true rule for the system". This uncertainty will be assigned to each rule by means of the concrete examples available for the learning procedure:

example file ~ I LEARNING PROCEDURE I ~ uncertainty measure on 9t.

Obviously two tasks are to be done: (1) To select the measure to be used. (2) To establish a procedure to compute it for each rule from raw data. There are several models for representing the uncertain information, such as probability theory,

possibility theory [7, 18] and Dempster-Shafer's theory of belief functions [6, 15]. In order to identify a system from raw data, obtained in repeated instances of its own system, the frequency of appearance of some kind of pattern seems to be particularly appropriate. Thus, from our point of view, the Bayesian methods could be very useful for solving this problem, and we will use the concept of frequency on fuzzy domain we have developed in an earlier paper [5].

The paper is divided into four sections. In Section 2 we summarize the result about the frequency on fuzzy domains which will be needed. This concept is the base for the learning model shown in Section 3. Finally, Section 4 shows some examples that illustrate the use of the presented learning model.

2. The frequency on fuzzy domains

Let us suppose we have a finite domain D = { a l , a 2 . . . . . aN} of a variable X and a sequence Eh = {el, e2 . . . . . eh} of h observations about X. There is no problem in calculating the (crisp)

M. Delgado, A. Gonzalez / An inductive learning procedure to identify fuzzy systems

t level 1 Negat i ve

level 2 Sm 11 Med um H i g h

level 3 fuzzy sets on e

level 4 n e g a t i v e rea l numbers

Zero

Zero

fuzzy se t s on [-1,11 fuzzy r I I

[- 1,1] pos l t i v e

P o s i t i v e

Sm! l ~ - ~ ! i u m ~ H , gh

sets on IR + !

real numbers

Fig. 1. Hierarchy of labels.

123

frequency of xi when e/is a value of D. But, if we consider D as a domain composed of fuzzy sets representing labels and Eh composed of elements belonging to either D or any domain related with D, computing the frequency is not straightforward.

Let us remember an example from [5]. Consider the following hierarchical structure of (linguistic) values or labels (where ~+ and ~ denote the positive and negative real numbers respectively) given in Figure 1.

Let us assume a variable X taking its values on the domain D is equal to the level 2 of this tree, i.e.

D = {HN, MN, SN, Z, SP, MP, HP}, (3)

where MN = Medium_Negative, HN = High_Negative, SN = Small_Negative, Z = Zero, SP = Small_ Positive, M P = Medium_Positive, H P = High_Positive, and suppose the example set E~ contains elements from any level of the hierarchy.

This situation may appear in those problems where a concrete level for the rules is fixed, but the data can be obtained in many different ways, from experts (usually in terms of fuzzy labels) or by the simulation of the systems (as real numbers). Any domain being related with D in a hierarchy as that given in Figure 1 will be called domain coherent with D in the following. For example, level 3 which is composed of many fuzzy subsets on the real line is coherent with the domain given by level 2 (Figure 2).

Because classical frequencies are quite inappropriate to deal with system under the aforementioned circumstances, in [5] we introduced the following extended definition of frequency.

Defini t ion 2.1. Let D = {al, a2 . . . . . ax} be the domain of a variable X, where ai are fuzzy sets in a referential set U. Let Eh = {e~, e2 . . . . . eh} be a sequence of examples of the variable X obtained through some kind of sampling or trial process and taking values on D or any coherent domain with D.

HN MN SN Z SP MP HP

- 5 0 - 2 5 - 1 0 0 10 2 5 50

Fig. 2. Domain at level 2.

1 2 4 M. Delgado, A. Gonzalez / An inductive learning procedure to identify fuzzy systems

The frequency of any subset A of D through the set Eh will be

fh(A) = /TA(e;)

h

i fA =0,

otherwise, (4)

where YIz(e;) = SHpa~A II.(ej) and H.(e;) is a normalized non-negative compatibility degree between the D-element a and the example e#

The compatibility degree/7.( . ) can be defined in several ways, for example

H.(ej) = Z.(ej)/sup,,~D Z.(ej) (5)

with rc.(ej)= sup,, {/x.(u)*/x~j(u)}, where /z.(.) and /Xej(') are the membership functions of a and e; respectively and * is a t-norm. When e; is a crisp example then z.(e;) coincides with/x.(e;).

With the above formulation Ha@i) can be interpreted as the possibility measure of A given the evidence ej, that is,

HA(ej) = Poss(A l ej). (6)

Therefore the frequency defined by (4) is the average of all possibility measures of a success A given each example of a example set Eh.

Obviously, we must accept that all the examples are covered by the domain D, that is

sup II,(ej) > 0, j = 1 . . . . . h; (7) a e D

in other cases, if this condition is not verified, the compatibility degree definition does not make sense. The following results hold (see [5] for proofs), for the frequency fh(').

Proposition 2.1. (1) ~.EDfi , (a) >>- 1; (2) fh(A) ~>0VA c_D; (3) = 1; (4) L(A)<~fi,(B) irA =_B; (5) fi,(A U B) + fi,(A f3 B) <~fh(A) + fh(B) VA, B c_D.

Proposition 2.2. The frequency fi, is a plausibility function.

Since fi, is a plausibility, its dual measure is defined by

gh(A) = 1 -- j~(J,) = 1 -- ~ SUpa~A H.(ej) (8) j l h

is a belief and gh(A)<~fh(A)VA~_D. Thus, for any subset A of the domain D, the interval [gh(A), fi,(A)] measures the uncertainty about A that lies in the raw data set:

Example set Eh ~ [frequency calculus[ belief of Z [gh(A), fh(Z)].

Theorem. The pair of dual measures (gh, fi,) is a belief-plausibility pair in the sense of Dempster- Shafer's theory.

This result provides a theoretical framework for frequencies on fuzzy domains that allows us to consider them as an appropriate generalization of the classical ones. When the domain and the

M. Delgado, A. Gonzalez / A n inductive learning procedure to identify f u z zy systems 125

examples are at the same level, then fh is a classical frequency which induces a probability measure, a particular case of plausibility function such as fh =- gh, and the uncertainty is assessed by a single real value. Otherwise, when the data values are on a different level to the domain set, we obtain a frequency whose associated measure model is a general belief-plausibility pair and the uncertainty is assessed by means of a real interval.

Once we have a frequency concept on fuzzy domains available and when we have identified the rules as points in a Cartesian product, we are interested in extending the concept of frequency to multidimensional spaces. On the other hand, this extension is needed to define the conditioning operator, a basic tool to update information which constitutes the core of our learning procedure.

In short, we are interested in obtaining the frequency of a vector of fuzzy subsets of the base domain, from a set of examples, also vectors, i.e. points in the Cartesian product of certain domains being coherent with the base one. The following definitions were proposed in 15].

Definition 2.2. Let D = D~ × D 2 X • - • X D M be the Cartesian product of domains Dj = {au, a2j , . . . , a%j},

where each domain Dj corresponds to a variable Xj and aij is a fuzzy set on a referential set U/. Let Eh ={e~, e2 . . . . . eh} be a sequence of examples of the multidimensional variable (X~, X2 . . . . . XM) obtained through some sampling or trial process and taking values on D = DI × D2 × • • • x D~I or any coherent domain with D (that is, the Cartesian product of domains being coherent with each Di). We define the f r e q u e n c y of any B = (B~, B2 . . . . . BM), with Bj ~_ Dj, through the set E~ as

f i , (B, . . . . . BM) -- (9) h

where ej = (e U . . . . . eMj) , j = 1 . . . . , h, and *i stands for the repeated application of * over i, * being a t-norm representing the Cartesian product operator.

One of the most important tasks in any learning procedure is that of updating information. The basic tool to update information is the conditioning. Since fh is a upper measure we can use the different conditional definitions proposed in literature. First, in the theory of evidence, there is a direct generalization of the conditioning in probability theory given by Dempster [6]. According to it the conditional plausibility and belief are

PID(A I B) PI(A, B) PI(B) - PI(~A, B) - P I ( B ~ ' Be lD(AIB)= (10)

PI(B)

This formula has been widely used, but there are other possible alternatives (see [9] for details), as for example

PI(A, B) BeI(A, B) P1u(A I B) - PI(A, B) + Bel(.,~, B ) ' Bel t jA I B) - Bel(A, B) + PI(/[, B)" (11)

In the next section we will use the conditioning given by (10), but nevertheless, if we change the conditioned plausibility definition then a similar result could be obtained.

3. The learning model

Once the frequency on fuzzy domain has been defined and included within the theory of evidence, we can use this concept to build a learning model.


3.1. De f in i t ion o f the m o d e l

As we stated in the introduction, the problem we are trying to solve consists in parameter identification (after the structure identification) of a set of rules, a problem which consists in computing for each possible rule, the degree of that rule being a 'true rule'. Let us note that the structure identification, which can be solved by using the causal network theory [11], among other alternatives, and the definition of frequency on fuzzy domains, will not be studied here.

Thus, let us suppose we have identified the rules to be like (1), and that from a set of examples, we are interested in estimating for any possible rule with this structure a parameter representing the power (degree of truth) of the rule. In this work we interpret this parameter as an uncertainty degree obtained from the evidence which lies in the data.

In order to simplify the description of the learning procedure we will consider that the rules have both a n-dimensional antecedent and an m-dimensional consequent for which the following notation will be used

Rij =- if X is A~ then Y is B] [OLi] ., •i]] (12)

with X = (X1, Xz . . . . , An), Y = (Y1, Y2, . . . , Ym), A~ = ( A l i , Az~, . . . , Ani), B = (BIj , B2j . . . . . Bmj). Let us suppose the example sequence is (as in former sections) Eh ={e~, e 2 , . . . , eh}, where

ek = (eXlk, eX2k . . . . . exnk, eylk, ey2k . . . . , eymk), ] = 1, 2 . . . . . h, and eXrk and eysk are examples of X r and Ys respectively. According to our basic assumptions, eXrk and eysk belong to a coherent domain with Lr and Ts respectively. Now the example ek will be written by ek =(exk, eyk) with eXk = (eXxk, eX2k . . . . . eXnk) and eyk = (eylk, ey2k, • • •, eymk).

The calculus of the frequency needs to use the compatibility degree between the value of a variable and an example. By using Definition 2.2 we take the following compatibilities:

HAi(eXk) = * HAs,(eXsk) and Hsj(eyk) = * HB~j(eysk) (13)

with * a t-norm. Taking into account our previous developments it seems obvious to think about employing

conditional frequencies to assess [a~j,/3ij], but let us remember frequencies on fuzzy domains belong to the plausibility-belief pair model and thus by using them we are obtaining an interval assessment for ~//. Let us consider

[3ij = fh(Bj ] Ai) (14) o~,j : g ~ ( B j I A,),

where

Y~k HA,(eXk) * H~(eyk) ~,k l l z , (eXk) * HBj(eyk) gh(Bj ]Ai) = 1 Ek Hz, (eXk) ' fh(Bj I Ai) . . . . . . . . . . ~k Hz,(eXk) (15)

We are now interested in obtaining an interpretation of this model and in explaining the exact kind of calculus we need to do.

3.2. In trepre ta t ion a n d calculus

The interval [aij,/3ij] assesses the information contained in the examples about the event "the rule Rij is a true rule for the system". A direct interpretation of that assessment is the usual one in a lower-upper measure model, that is, aij and /3/j are respectively the lower and upper bound of the evidence contained in the set of examples about the aforementioned property. A deeper analysis of these bounds provides a more significant interpretation. For that we will introduce the measurement of the 'adequateness' between a certain example and a rule, that is an assessment of the ability of that example to match with the rule or equivalently the ability of that rule to describe the given example.

M. Delgado, A. Gonzalez I An inductive learning procedure to identify fuzzy systems 127

Obviously, HA,(exk) and HB,(eyk) respectively measure the 'adequateness' between the example ek and the antecedent and the consequent of the rule R 0 respectively. On the other hand, Hg(eyk) measures the 'adequateness' between the example ek and the complement of the consequent of the rule R~j which can be interpreted as the best adjustment between the example and any other consequent different from Bj. From these values we can build the whole learning model and interpret it.

First of all, let us remark that an example has an effect on the rule R 0' in the sense of contributing to the corresponding interval assessment [aij,/3~j], iff I1A,(exk) > 0 because in this case this ek matches to some extent the antecedent of R~j. Now by using HB,(eyk) and //~j(eyk) we can split the examples having effect on R~j into two categories:

Positive examples: those such as HA,(exk)> 0 and HBj(eyk)> 0 (they match with both the antecedent and the consequent of the rule),

Negative examples: those such as IIz,(eXk)> 0 and HB,(eyk)= 0 (they match with the antecedent of R~ but do not match with its consequent, in fact they match with some element in the complement of the rule, since it can be easily proved that if Hs~(eyk) = 0 then Hg(eyk) = 1).

Although this classification could help to decide which rules are to represent the data, the above definitions are over simplified as the adequacy is considered to be a crisp property and thus it is not possible to distinguish between, for example, very positive examples to the rule (e.g. H~,(exk) = 1 and HBj(eyk) = 1) and lightly positive examples (e.g. Hz,(eXk) = 0.2 and HB~(ey~) = 0.001). The alternative is to consider the adequacy to be a vague property and thus to obtain the positive and negative example sets as fuzzy sets.

Definition 3.1. The set of examples that have an effect on the rule R~j is the fuzzy subset of Eh characterized by

E(Rij) = {(ek, HA,(exk)) [ e, E Eh}.

Definition 3.2. The set of the positive examples to R 0 is the following fuzzy subset set of Eh:

E+(Rij) = {(ek, HA,(exk) * HB,(eyk)) ] ek ~ Eh}.

Definition 3.3. The negative example set for R 0 is defined as the following fuzzy sets of Eh:

E (Rij) = {(ek, Hz,(ex~) * HB,(eyk)) ] ek E Eh}.

An example ej such as Hz,(exk)= 1 and HB,(eyk)= 1 has a membership of 1 to E+(Ri), and an example ej such as HA,(eXk)= 0.2 and HB,(eyk)= 0.001 has a membership of 0 to E+(Ri) if the Lukasiewicz t-norm is taken.

An example is considered negative for a rule when it matches better with some other rule having the same antecedent but different consequent. It is easy to show that an example ek such as Hz,(exk) = 1 and HBj(eyk) = 0 belongs to E-(Ri) with membership degree 1, that is, ej ought to match with some rule with the same antecedent and different consequent.

In the general case, the relation between HBj(ey~) and HB,(eyk) is very complex, but in the usual case of consequent with a single variable this relation is very significant, and it can be easily proved that max{HBj(eyk), //g(eyk)} = 1, and hence the following result is straightforward

max{HA,(exk) * HB,(eyk), Hz,(exk) * / /g (eyk)} = Hzi(ex~). (16)

Therefore if the max t-conorm is used for defining the set union, then the following equality holds:

E(Ri) ~- E+(Ri) tO E-(Ri). (17)

From the above subsets of Eh we can interpret the interval assessment for each rule given by (14). Let us denote ]A I = ~ , E v t-tA(U), as the cardinality of a fuzzy subset. It is obvious that the quotient

IE+(R,j)I/IE(&~)I (18)


represents the proportion of positive examples (among those having some effect on the rule) and, on the other hand it is easy to show that it coincides with /3q. Therefore the upper assessment of uncertainty to the rule quantifies the proportion of positive examples.

Similarly the quotient

IE-(Ks)I/IE(Rq)I (19)

represents the proportion of negative examples and now it coincides with

fh(Bj /Ai) (20 )

which measures the global compatibility of the examples with the labels A~ = ( A l l , . . . , A . i ) and Bj = (B v, B2j . . . . . Bmj), and thus it assesses to what extent Rq is rejected (negatively supported) by the examples. According to (8),

1 - a q = 1 --gh(Bj I A i ) = f h ( B j IA,). (21)

Therefore, the interval [aq, ~q] has the following interpretation: each rule is affected by the fuzzy proportion of the positive examples/3q (that is, examples adequate to the rule in a certain degree) and the fuzzy percentage of non negative examples c~q (that is, one minus the percentage of examples adequate to the complement of the rule).

For instance, the interval [0.73, 0.93] indicates that 27% of the examples which affect the rule increase our certainty about other rules with the same antecedent and different consequent, and the 93% completely represent the rule. A rule with assessment value equal to [0, 1] has 100% positive examples and 100% negative examples, that is, 100% examples do not affect the rule (we have no information about the rule) and this interval, as usual, represents the ignorance about the sentence "this rule represents the data set". In its turn, if the interval is [1, 1] the positive examples are 100% and the negative examples 0% and the rule perfectly describes the data set. The interval [0,0] represents 0% positive examples and 100% negative examples, in which case all the information relative to the rule is concentrated in other rules with the same antecedent but different consequent.

Let us observe that (18) and (19) may be adopted to directly define ~xq and/3q instead of being used to interpret these two values. On the other hand it must be remarked that Definitions 3.1, 3.2, 3.3 are not the only possible ones to characterize E(Rq), E+(Rq) and E-(Rq), as the adequateness could be assessed in several different ways.

If one uses (18) to (21) with some different characterizations of E(Rq), E+(Rq) and E (Rq), then a frequency-based learning procedure is obtained again, but a different approach to the concept of the frequency on fuzzy domains is actually used.

Once we have an interesting interpretation of the learning model, for calculating the uncertainty assessment of each rule, we can directly use formulas (18) and (19).

This learning procedure may be used either incrementally or not incrementally. In the first case the frequencies are updated when each example arrives whereas in the second the frequency is computed after all examples are collected. The two options may be integrated into a very effective procedure which may be summarized in the following steps:

(S1) Collect a large enough set of examples, ($2) Compute the evidence of the rules from this set, ($3) Update this evidence when new examples are available.

The proposed learning method is very simple to use for these kinds of processes, since step $3 (or incremental way) can be easily made. Thus, let us suppose we dispose of a certain evidence of a rule Rq obtained by processing an example set, and therefore we have the following values:

ao = fE+(Rj)], bo = IE-(Rij)l and c,,= IE(Rq)b (22)

M. Delgado, A. Gonzalez / An inductive learning procedure to identify fitzzy svstems 129

and evidence

old-evidence(Rij)=[1 bo a,,] (23) C 0 ' C 0 3"

If a new example e is considered by learning procedure, we only need to calculate the following values:

(24) x = HA,(ex), y = H/~,(ey), z - H~,(ey)

and to modify the interval assessment of uncertainty in the following way:

new-evidence(Rs~)=[1 b'2 +- x~*z,a°-+-x*Y 1. Co + X Co + X

(25)

If x = 0 then this evidence does not modify the old belief of the rule since it does not affect it and old-evidence(R0) = new-evidence(Rii). If x # 0 then the modification (increase or decrease) depends on the values y and z, which are obviously related.

The resultant model, which has combined the evidence contained in the example with the old evidence of the rule, is very simple. When we do not have any previous information, that is e is the first example considered by the learning procedure, the initial evidence can be taken as [0, 1], that is, the ignorance within the evidence theory.

Let us come back to the learning procedure. After this process, the system is described by a set of rules each being affected by an evidence interval. From a formal point of view this representation must consider all possible rules and all possible intervals. In practice, the rules with low evidence may not be considered. The threshold for that ought to be established in each case.

3.3. Using the learning procedure

Although we have introduced and interpreted the uncertainty interval in a simple case, the model is general enough to cope with a very broad spectrum of problems.

First of all let us remember that any general rule with a multidimensional variable in the antecedent and/or the consequent, may be changed in the single one we have considered only by constructing the joint possibility distribution of the multidimensional variables (by some t-norm), that is just by considering them as points of a Cartesian product space.

On the other hand, rules with disjunctions of variables in the antecedant and/or the consequent may be dealt with by using fuzzy frequencies relative to a set of labels. For instance, for the rule "if X is A or B then Y is C" we will use [gh(C I {A, B}), fh(C I {14, B}) I.

By combining the described use of Cartesian products and the set of labels, we may manage rules with conjunctions and disjunctions in the antecedent and/or the consequent. For instance for "if X is A or B and Y is C then Z is D or F" we will use [gh({D, F} I {A, B}, C}), fh({D, F} I {A, B}, C})].

Moreover, by taking complete set of labels in a domain, the learning model allows us to restrict the variable numbers in a rule, for example, the rule

i f X i s A and Y i s D ~ , t h e n Z i s C

where D, is the domain of Y, has the same uncertainty assessment as

i f X i s A t h e n Z i s C .

It is obvious that any system identification is done with prediction and control purposes. Any case inference is to be made on the rules and thus the question about how to combine and propagate the interval assessment we have introduced arises at once.

We have shown that the theoretical model for frequencies on fuzzy domains is that of a plausibility-belief pair which may be included in the more general one of lower and upper probabilities. This is a very general frame of representation, including in particular cases probabilities, possibilities [7, 18], belief functions [6, 15] and Choquet capacities of order two [2, 4].


The inclusion of our frequency concept in the upper and lower probabilities model will allow us to make inference from our learned rules, to do that, we only need to use an inference model, on the basis of the approach proposed in [3].

The proposed learning method and this inference approach will be combined for testing and tuning the rule description of a system. If a large example set is available, then a part of it may be used for learning purposes (training set), whereas the remainder is reserved to check the learned rules (test set). For that an inference on the x-part of the examples and after, a comparison between the obtained and the observed y-part are to be made.

4. Examples

1. Let us suppose the domain D ={HN, MN, SN, Z, SP, MP, HP} is defined by the following trapezoidal fuzzy number (see Figure 2):

HN = (-1000, -50, 1, 25), MN = (-25, -25, 25, 15), SN = (-10, -10, 15, 10), Z -- (0, 0, 10, 10),

SP = (10, 10, 10, 15), MP = (25, 25, 15, 25), HP = (50, 1000, 25, 1).

The structure identification, from the database shown in Table 1, has chosen the following rule:

if X3 is Z and X7 is MP then X9 is HP is 3/. (26)

By using za,(ej)= sup, {/xa,(u)*/Xej(U)} as the compatibility degree, with * the Lukasiewicz t-norm, the parameter identification calculates the following values:

1 - a = fk({HN, MN, SN, Z, SP, MP}/Z, MP) = 0.27, /3 = fk(He/Z, MP) = 0.93

and the power assigned to (26) is -/= [0.73, 0.93], a result obtained by combining the adaptation of each example to the rule (Z, MP) ~ HP: [0.73, 0.93].

A simple analysis of the examples in Table 1 can explain this result. Examples el, e4, e7 and e9 can be considered positive for the rule (Z, M P ) ~ HP since the compatibility degree between each example and the antecedent and consequent of the rule is positive. Example e 3 can be considered negative for this rule since the compatibility degree with the antecedent of the rule is positive. However, the compatibility degree between e3 and the consequent of (Z, MP) ~ HP is zero. Finally, five examples, e2, es, e6, e8 and elo, have no influence on this rule since the compatibility degree between each one and the antecedent of the rule is zero.

2. This second example tries to show the effect of the t-norm in the learning method. Thus, we use the same domain as in the above example and we represent fuzzy rule

i f X i s A a n d Y i s B t h e n Z i s C

T a b l e l

X 1 X 2 X 3 X4 X 5 X 6 X 7 X 8 X 9 X10

ej 0.1 28 300

e 2 10 23 70

e 3 0.5 13 10

e 4 - 0 . 4 48 800

e5 0 3 45

e 6 - 0 . 3 9 44

e 7 - 0 . 7 29 300

e8 - 2 6 - 4 0 100

e9 0.2 25 40 e , ) - 3 8 9 10

M. Delgado, A. Gonzalez / An inductive learning procedure to identify fuzzy systems 131

as (A, B, C)

and the example X=x~, Y=y~ and Z = z ~ as (xl, yt, zl). From the example (11,24,26), the rule (MP, SP, HP) takes the following uncertainty intervals for the selected t-norms:

* = Lukasiewicz t-norm--> [0, 1] ignorance,

* = product t-norm --> [0, 0.04] slight belief,

* = min t-norm --* [0, 0.58] moderate belief.

Really, this example is a very poor representation of the rule (MP, SP, HP), and the chosen t-norms reflect two postures: we don't know anything about this rule since this example either doesn't represent this rule, or it represents the rule but with a very reduced credibility or a moderate credibility (here we only use this example to know the belief of the rule).

The role of the t-norm is basic to the learning procedure as shown in this example. In order to concentrate the evidence in the most representative rules, it may be convenient to use the Lukasiewicz t-norm.

5. Further remarks

1. Procedures to filter outliers as well as a robustness analysis of the method ought to be carried out.

2. This learning procedure is basically introduced here as a non-incremental one, but we have pointed out before that an incremental version may be developed without special difficulties. However, the incremental version (or with capability for adaptation) needs a deeper analysis including at least a sensibility analysis (related to the ones referred to in the previous remark) as well as the stopping problem.

Section 3 has shown a learning model based on the conditioning defined by (10). Alternative conditional measures can be used for obtaining new models.

These questions will be dealt with in forthcoming papers.

References

[1] E. Binaghi, Learning of uncertainty classification rules in medical diagnosis, in: R. Kruse, P. Siegel, Eds., Symbolic and Quantitative Approaches to Uncertainty, Lecture Notes in Computer Science 548 (Springer, Berlin, 1991) 115-119.

[2] L.M. de Campos and M.J. Bolafios, Representation of fuzzy measures through probabilities, Fuzzy Sets and Systems 31 (1989) 23-36.

[3] L.M. de Campos and A. Gonz~lez, A fuzzy inference model based on an uncertainty forward propagation approach, in: R. Lowen, M. Roubens, Eds., IFSA'91 Artificial Intelligence (1991) 13-16.

[4] G. Choquet, Theory of capacities, Ann. Inst. Fourier 5 (1953) 131-295. [5] M. Delgado and A. Gonzfilez, A frequency model to assess Dempster/Shafer's evidences, Tech. Report no. 92-1-1,

Department of Computer Science and Artificial Intelligence (DECSAI), University of Granada (1992); also submitted to Internat. J. Approximate Reasoning.

[6] A.P. Dempster, Upper and lower probabilities induced by a multivalued mapping, Ann. Math. Statistics 38 (1967) 325-339. [7] D. Dubois and H. Prade, Possibility Theory: An Approach to Computerized Processing of Uncertainty (Plenum Press, New

York, 1988). [8] C.C. Lee, A self-learning rule-based controller employing approximate reasoning and neural net concepts, lnternat. J.

Intelligent Systems 6 (1991) 71-93. 19] S. Moral and L.M. de Campos, Updating uncertain information, in: B. Bouchon-Meunier, R.R. Yager, L.A. Zadeh, Eds.,

Uncertainty in Knowledge Bases. Lecture Notes in Computer Science 521 (Springer, Berlin, 1991) 58-67. [10] H. Nomura, I. Hayashi and N. Wakami, A learning method of fuzzy inference rules by descent method, Proceedings of the

IEEE International Conference on Fuzzy Systems (1992) 203-210.


[111 J. Pearl, Probabilistic Reasoning in Intelligent Systems. Networks of Plausibility Inference (Morgan & Kaufmann, San Mateo, CA, 1988).

[12] W. Pedrycz, An identification algorithm in fuzzy relational systems, Fuzzy Sets and Systems 13 (1984) 153-167. [13] X. Peng and P. Wang, On generating linguistic rules for fuzzy models, in: B. Bouchon, L. Saitta, R.R. Yager, Eds.,

Uncertainty and Intelligent Systems (Springer, Berlin, 1988) 185-192. [14] S. Sestito and T. Dillon, Using single-layered neural networks for the extraction of conjunctive rules and hierarchical

classifications, J. AppL Intelligence 1 (1991) 157-173. [15] G. Sharer, A Mathematical Theory of Evidence (Princeton University Press, Princeton, NJ, 1976). [16] M. Sugeno and K. Tanaka, Successive identification of a fuzzy model and its applications to prediction of a complex system,

Fuzzy Sets and Systems 42 (1991) 315-334. [17] T. Takagi and M. Sugeno, Fuzzy identification of systems and its applications to modeling and control, IEEE Trans. Systems,

Man Cybernet. 15 (1985) 116-132. [18] L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems 1 (1978) 3-28.

An inductive learning procedure to identify fuzzy systems

Documents

Transcript of An inductive learning procedure to identify fuzzy systems