Computational Intelligence: Soft Computing and Fuzzy-Neuro Integration with Applications

NATO ASI Series

Advanced Science Institutes Series

A series presenting the results of activities sponsored by the NATO Science Committee, which aims at the dissemination of advanced scientific and technological knowledge, with a view to strengthening links between scientific communities.

The Series is published by an international board of publishers in conjunction with the NATO Scientific Affairs Division.

A Life Sciences B Physics

C Mathematical and Physical Sciences D Behavioural and Social Sciences E Applied Sciences

F Computer and Systems Sciences G Ecological Sciences H Cell Biology

Global Environmental Change

Partnership Sub-Series

1. Disarmament Technologies 2. Environment

3. High Technology 4. Science and Technology Policy 5. Computer Networking

Plenum Publishing Corporation London and New York

Kluwer Academic Publishers Dordrecht, Boston and London

Springer-Verlag Berlin Heidelberg New York Barcelona Budapest Hong Kong London Milan Paris Santa Clara Singapore Tokyo

Kluwer Academic Publishers Springer-Verlag / Kluwer Academic Publishers Kluwer Academic Publishers Kluwer Academic Publishers Kluwer Academic Publishers

The Partnership Sub-Series incorporates activities undertaken in collaboration with NATO's Cooperation Partners, the countries of the CIS and Central and Eastern Europe, in Priority Areas of concern to those countries.

NATO-PCO Database

The electronic index to the NATO ASI Series provides full bibliographical references (with keywords and/or abstracts) to about 50 000 contributions from international scientists published in all sections of the NATO ASI Series. Access to the NATO-PCO Database is possible via the CD-ROM "NATO Science & Technology Disk" with userfriendly retrieval software in English, French and German (© WTV GmbH and DATAWARE Technologies Inc. 1992).

The CD-ROM can be ordered through any member of the Board of Publishers or through NATO-PCO, B-3090 Overijse, Belgium.

Series F: Computer and Systems Sciences, Vol. 162

Springer-Verlag Berlin Heidelberg GmbH

Computational Intelligence: Soft Computing and Fuzzy-Neuro Integration with Applications

Edited by

Okyay Kaynak Faculty of Engineering Bogazi!j:i University TR-80815 Istanbul, Turkey

Lotfi A. Zadeh Computer Science Division U niversity of California at Berkeley Berkeley, CA 94720-1776, USA

Burhan Tiirkşen University of Toronto Ontario M5S 3G8, Canada

Imre J. Rudas Bânki Donat Polytechnic N epszinMz u. 8 H-I081 Budapest, Hungary

Springer Published in cooperation with NATO Scientific Affairs Division

Proceedings of the NATO Advanced Study Institute on Soft Computing and Its Applications held at Manavgat, Antalya, Turkey, August 21-31,1996

Llbrary of Congress Cataloglng-In-Publication Data

Computational intell igence : soft'computing and fuzzy-neuro integratlon with applications / edited by Okyay Kaynak ... [et al. 1.

p. cm. -- (NATD ASI series. Serles F. Computer and systems sciences ; voI. 162)

"Proceedings of the NATD Advanced Study Institute on Computational Intelligence (Fuzzy-Neural Integration) held at Antalya, Turkey. Augustt 21-31. 1996"--CIP verse t.p.

Includes bibliographical references and index. ISBN 978-3-642-63796-4 ISBN 978-3-642-58930-0 (eBook) DOI 10.1007/978-3-642-58930-0 1. Soft computing--Congresses. 2. Neural networks (Computer

science)--Congresses. 3. Fuzzy systems--Congresses. 1. Kaynak. Dkyay, 1948- II. NATD Advanced Study Institute on Computational Intelligence (Fuzzy-Neural Integration) (1996: Antalya. Turkey) III. Ser ies: NATD ASI series. Ser ies F. Computer and systems sciences ; no. 162. QA76.9.S63C66 1998 006.3--dc21 98-25071

ACM Subject Classification (1998): 1.2, J.2, 1.5, F.I,C.I, J.6

ISBN 978-3-642-63796-4

CIP

This work is subject to copyright. AH rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfllms or in any other way,and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law.

© Springer-Verlag Berlin Heidelberg 1998 Originally published by Springer-Verlag Berlin Heidelberg New York in 1998

Softcover reprint of the hardcover 1 st edition 1998

Typesetting: Camera-ready by authors/editors Printed on acid-free paper SPIN: 10552716 45/3142 - 5 4 3210

Preface

This book is a collection of some of the papers that were presented during a NATO Advanced Study Institute (ASI) on Soft Computing and Its Applications that was held in Manavgat, Antalya, Turkey, on 21-31 August 1996. The ASI had the goal of creating an opportunity for the engineers and the scientists working in the emerging field of soft computing (SC) to come together in an informal atmosphere and to discuss and disseminate knowledge on the application aspects of soft computing techniques, especially in intelligent control systems and mechatronics. Two particular areas that the institute put a special emphasis on are (1) how to achieve a synergistic combination of the main constituents of soft computing and (2) how can such a combination be applied to achieve a high machine intelligence quotient (MIQ).

In the opening stages of the institute, it was stated that soft computing is a consortium of computing methodologies which provide a foundation for the conception, design, and deployment of intelligent systems and aims at the formalization of the remarkable human ability to make rational decisions in an environment of uncertainty and imprecision. It was pointed out that soft computing provides a better and more natural foundation for intelligent systems.

The institute asserted that traditional "hard computing" (HC), based on binary logic, crisp systems, numerical analysis, and crisp software, has the characteristics of precision and categoricity, while soft computing has those of approximation and dispositionality. Although in the former, imprecision and uncertainty are undesirable properties, in the latter the tolerance for imprecision and uncertainty is exploited to achieve tractability, lower cost, high MIQ, and economy of communication.

The papers presented during the institute considered the principal constituents of soft computing, namely fuzzy logic (FL), neurocomputing (NC), genetic computing (GC), and probabilistic reasoning (PR), the relation between them and their fusion in industrial applications. In this perspective, it was discussed that the principal contribution of fuzzy logic relates to its provision of a foundation for approximate reasoning, while neural network theory provides an effective methodology for learning from examples, and probabilistic reasoning systems furnish computationally effective techniques for representing and propagating probabilities and beliefs in complex inference networks. Several presentations described a number of practical applications ranging from helicopter control, fault diagnosis, and smart appliances to speech and pattern recognition and planning under uncertainty.

VI

Novel concepts such as computing with words and information granulation were also discussed during the AS!. Prof. Zadeh argued that information granularity lies at the center of human reasoning and concept formation, and plays a pivotal role in fuzzy logic and computing with words.

A number of presentations concentrated on the connections between neural networks and fuzzy logic. Use of vectoral neural networks in soft computing and modelling fuzzy reasoning by neural networks were among the topics of discussion. On the fuzzy logic side, fuzzy data analysis and fuzzy decision support systems were discussed in depth.

Some speakers concentrated on the software and hardware architectures for soft computing. Computer vision was a major topic of discussion as an application area. Robotics and mechatronics applications were also discussed. It was agreed that the successful applications of soft computing and the resulting rapid growth of interest in this emerging field indicate to us that, using Zadeh's words, "soft computing is likely to play an important role in science and engineering, but eventually its influence may extend much farther. In many ways, soft computing represents a significant paradigm shift in the aims of computing-a shift which reflects the fact that the human mind, unlike present day computers, possesses a remarkable ability to store and process information which is pervasively imprecise, uncertain and lacking in categoricity".

Some of the papers of the book are not exactly the same as they were presented during the AS!. The authors had ample time to modify the contents of their contributions and to put them into a more appropriate form for a book. Additionally, the book also contains two papers by Prof. Dubois, who could not participate in the ASI as was originally planned.

The title of the book is slightly different from the title of the institute, and the book itself is divided into six main parts, namely (i) computational intelligence, (ii) foundations of fuzzy theory, (iii) fuzzy systems, (iv) neural networks, (v) data analysis, and (vi) applications. Each part has a number of papers authored by leading experts of the field. The first part starts with a paper by Prof. Zadeh himself in which he expresses his views on the roles of soft computing and fuzzy logic in the conception, design, and deployment of information/intelligent systems.

Finally, on behalf of all the editors of the book, I would like to thank NATO Scientific Affairs Division for their support of the AS!. I hope that the readers will find the resulting volume interesting and beneficial. Additionally, I would like to acknowledge the facilities provided by the National University of Singapore during the final editing stages of this book.

March 1998 Okyay Kaynak

Contents

Preface .................................................................................................................. v

Part 1 Computational Intelligence

Roles of Soft Computing and Fuzzy Logic in the Conception, Design and Deployment of InformationlIntelligent Systems ............................................... 1 L. A. Zadeh

Computational Intelligence Defined - By Everyone! ........................................... 10 1. C. Bezdek

Computational Intelligence: Extended Truth Tables and Fuzzy Normal Forms .................................................................................................................... 38 I. B. Tiirkgen

Uncertainty Theories by Modal Logic .................................................................. 60 G. Resconi

Part 2 Foundations of Fuzzy Theory

Sup-T Equations: State of the Art ......................................................................... 80 B. De Baets

Measures of Specificity ......................................................................................... 94 R. R. Yager

What's in a Fuzzy Membership Value? ............................................................... 114 S. Kundu

New Types of Generalized Operations ............................................................... 128 l. 1. Rudas, O. Kaynak

VIII

Part 3 Fuzzy Systems

Intelligent Fuzzy System Modeling ..................................................................... 157 1. B. Tiirk kn

Fuzzy Inference Systems: A Critical Review ...................................................... 177 V Cherkassky

Fuzzy Decision Support Systems ........................................................................ 198 H-J. Zimmermann

Neuro-Fuzzy Systems .......................................................................................... 230 R. Kruse, D. Nauck

Fuzzified Petri-Nets and Their Application to Organising Supervisory Controller ............................................................................................................ 260 G. M Dimirovski

Part 4 Neural Networks,

A Review of Neural Networks with Direct Learning Based on Linear or Non-linear Threshold Logics ......................................................................... 283 D. M Dubois

The Morphogenetic Neuron ................................................................................ 304 G. Resconi

Boolean Soft Computing by Non-linear Neural Networks with Hyperincursive Stack Memory ............................................................................ 333 D. M Dubois

Part 5 Data Analysis

Using Competitive Learning Models for Multiple Prototype Classifier Design ................................................................................................................. 352 J. C. Bezdek, S. G. Lim, and T. Reichherzer

Fuzzy Data Analysis ............................................................................................ 381 H-J. Zimmermann

Probabilistic and Possibilistic Networks and How To Learn Them from Data ............................................................................................................ 403 C. Borgelt, R. Kruse

IX

Part 6 Applications

Image Pattern Recognition Based on Fuzzy Technology .................................... 427 K. Hirota, Y Arai, Y Nakagawa

Fuzzy Sets and the Management of Uncertainty in Computer Vision ................. 434 J M Keller

Intelligent Robotic Systems Based on Soft Computing - Adaptation, Learning and Evolution ................................................................. 450 T. Fukuda, K. Shimojima

Hardware and Software Architectures for Soft Computing ................................. 482 R. Paluzzi

Fuzzy Logic Control for Design and Control of Manufacturing Systems ................................................................................................................ 496 B. Tan

Applications of Intelligent Multiobjective Fuzzy Decision Making .................... 514 E. H Ruspini

A Product Life Cycle Information Management System Infrastructure with CAD/CAE/CAM, Task Automation, and Intelligent Support Capabilities .......................................................................................................... 521 H P. Frisch

Roles of Soft Computing and Fuzzy Logic in the Conception, Design and Deployment of Information/Intelligent Systems!

Lotfi A. Zadeh

University of California at Berkeley, Berkeley, CA 94720-1776, USA

Abstract. The essence of soft computing is that, unlike the traditional, hard computing, it is aimed at an accommodation with the pervasive imprecision of the real world. Thus, the guiding principle of soft computing is: ' ... exploit the tolerance for imprecision, uncertainty and partial truth to achieve tractability, robustness, low solution cost and better rapport with reality.' In the final analysis, the role model for soft computing is the human mind.

Soft computing is not a single methodology. Rather, it is a partnership. The principal partners at this juncture are fuzzy logic, neurocomputing, genetic computing and probabilistic computing, with the latter subsuming chaotic systems, belief networks and parts of learning theory.

In coming years, the ubiquity of intelligent systems is certain to have a profound impact on the ways in which man-made intelligent systems are conceived, designed, manufactured, employed and interacted with. It is in this perspective that the basic issues relating to soft computing and intelligent systems are addressed in this paper.

1. Introduction

To see the evolution of fuzzy logic in a proper perspective, it is important to note that we are in the throes of what is popularly called the information revolution. The artifacts of this revolution are visible to all. The Internet, World Wide Web, cellular phones, facsimile machines and portable computers with powerful information processing capabilities have all become a part of everyday reality. The centrality of information in almost everything that we do is a fact that few would care to challenge.

Much less visible, but potentially of equal or even greater importance, is what might be called the intelligent systems revolution. The artifacts of this revolution are man-made systems which exhibit an ability to reason, learn from experience and make rational decisions without human intervention. I coined the term MIQ (machine intelligence quotient) to describe a measure of intelligence of man-made

I A slightly different version of this paper has previously been published under a slightly different title in BT Technology Journal, 14, No 4, pp 32-36 (October 1996).

O. Kaynak et al. (eds.), Computational Intelligence: Soft Computing and Fuzzy-Neuro Integration with Applications© Springer-Verlag Berlin Heidelberg 1998

2

systems. In this perspective, an intelligent system is a system which has a high MIQ.

I will have more to say about MIQ at a later point. A question that I should like to raise now is the following: We have been talking about artificial intelligence (AI) for over four decades. Why did it take AI so long to yield visible results?

Let me cite an example that bears on this question. When I was an instructor at Columbia University, I wrote a paper entitled 'Thinking Machines - A New Field in Electrical Engineering' which was published in a student magazine [1]. In the opening paragraph of that article, I quoted a number of headlines which appeared in the popular press of the time. One of the headlines read: 'An Electric Brain Capable of Translating Foreign Languages Is Being Built'. The point is that my article was published in January 1950, about six years before the term 'artificial intelligence' was coined. What is obvious today is that a translation machine could not have been built in 1950 or earlier. The requisite technologies and methodologies were not in place.

Weare much more humble today than we were at that time. The difficulty of building systems that could mimic human reasoning and cognitive ability turned out to be much greater that we thought at that time. Even today, with a vast array of powerful tools at our disposal, we are still incapable of building machines that can do what many children can do with ease, e.g. understand a fairy tale, peel an orange, or eat food with a knife and a fork.

At this point, let me return to the concept of MIQ. A basic difference between IQ and MIQ is that IQ is more or less constant, whereas MIQ changes with time and is machine-specific. Furthermore, the dimensions of MIQ and IQ are not the same. For example, speech recognition might be an important dimension of MIQ but in the case of IQ, it is taken for granted.

At this juncture, we do not have as yet an agreed set of tests to measure the MIQ of a man-made system, e.g. a camcorder; but I believe that such tests will be devised at some point in the future and that eventually the concept of MIQ will play an important role in defining and measuring machine intelligence.

In realistic terms, we are just beginning to enter the age of intelligent systems. Why did it take so long for this to happen?

In my view, there are three main reasons. First, until recently the principal tools in AI's armamentarium were centred on symbol manipulation and predicate logic, while the use of numerical techniques was looked upon with disfavor. What is more obvious today than it was in the past is that symbol manipulation and predicate logic have serious limitations in dealing with real-world problems in the realms of computer vision, speech recognition, handwriting recognition, image understanding, multimedia database search, motion planning, common-sense reasoning, management of uncertainty and many other fields which relate to machine intelligence.

3

2. Soft Computing and Fuzzy Logic

During the past several years, our ability to conceive, design and build machines with a high MIQ has been greatly enhanced by the advent of what is now referred to as soft computing (SC). Soft computing is not a single methodology. Rather, it is a consortium of computing methodologies which collectively provide a foundation for the conception, design and deployment of intelligent systems. At this juncture, the principal members of soft computing are fuzzy logic (FL), neurocomputing (NC), genetic computing (GC), and probabilistic computing (PC), with the last subsuming evidential reasoning, belief networks, chaotic systems, and parts of machine learning theory. In contrast to the traditional hard computing, soft computing is tolerant of imprecision, uncertainty and partial truth. The guiding principle of soft computing is: ' ... exploit the tolerance for imprecision, uncertainty and partial truth to achieve tractability, robustness, low solution cost and better rapport with reality.'

What is important about soft computing is that its constituent methodologies are for the most part synergistic and complementary rather than competitive. Thus, in many cases, a higher MIQ can be achieved by employing FL, NC, GC, and PC in combination rather than singly. Furthermore, there are many problems which cannot be solved if the only tool is fuzzy logic, neuro-computing, genetic computing or probabilistic reasoning. This challenges the position of those who claim that their favourite tool, be it FL, NC, GC, or PC, is capable of solving all problems. The proponents of such views will certainly shrink in number once a better understanding of soft computing becomes widespread.

Within SC, each of the constituent methodologies has a set of capabilities to offer. In the case of fuzzy logic, it is a body of concepts and techniques for dealing with imprecision, information granulation, approximate reasoning and, most importantly, computing with words. In the case of neurocomputing, it is the capability for learning, adaptation and identification. In the case of genetic computing, it is the capability to employ systematized random search and achieve optimal performance. And in the case of probabilistic computing, it is a body of concepts and techniques for uncertainty management and evidential reasoning.

Systems in which FL, NC, GC, and PC are used in some combination are called hybrid systems. Among the most visible systems of this type are the so-called neuro-fuzzy systems. We are beginning to see fuzzy-genetic systems, neurogenetic systems and neuro-fuzzy-genetic systems. In my view, eventually, most high-MIQ systems will be hybrid systems. In the future, the ubiquity of hybrid systems will have a profound impact on the ways in which intelligent systems are designed, built and interacted with.

What is the place of fuzzy logic in soft computing? First, I should like to clarify a common misconception about what fuzzy logic is and what it has to offer.

A source of confusion is that the label fuzzy logic is used in two different senses. In a narrow sense, fuzzy logic is a logical system which is an extension of multivalued logic. However, even in its narrow sense the agenda of fuzzy logic is very

4

different both in spirit and in substance from the agendas of multi-valued logical systems.

In its wide sense - which is the sense in predominant use today - fuzzy logic is coextensive with the theory of fuzzy sets, that is, classes with unsharp boundaries [2]. In this perspective, fuzzy logic in its narrow sense is a branch of fuzzy logic in its wide sense.

What is important about fuzzy logyc is that any theory, T, can be fuzzified - and hence generalized - by replacing the concept of a crisp set in T with that of a fuzzy set. In this way, one is led to a fuzzy T, e.g. fuzzy arithmetic, fuzzy topology, fuzzy probability theory, fuzzy control and fuzzy decision analysis. What is gained from fuzzification is greater generality and better rapport with reality. However, fuzzy numbers are more difficult to compute with than crisp numbers. Furthermore, the meanings- of most fuzzy concepts are context- and/or applicationdependent. This is the price that has to be paid for a better rapport with reality.

3. Information Granulation

There is a point of fundamental importance which lies at the base of ways in which humans deal with fuzzy concepts. The point in question has to do with information granulation and its role in human reasoning, communication and concept formation. In what follows, I will attempt to explain why information granulation plays an essential role in dealing with fuzzy concepts and, in particular, in reasoning and computing with words rather than numbers.

The concept of information granulation motivated most of my early work on fuzzy sets and fuzzy logic. Basically, the point that I stressed is that most human concepts are fuzzy because they are the result of clumping of points or objects which are drawn together by similarity. The fuzziness of such clumps, then, is a direct consequence of the fuzziness of the concept of similarity. Simple examples of clumps are the concepts of 'middle-aged,' 'downtown,' 'partially cloudy,' 'obtuse,' etc. To underscore its role, a clump will be referred to as a granule.

In a natural language, words play the role of labels of granules. In this role, words serve to achieve data compression. The achievement of data compression through the use of words is a key facet of human reasoning and concept formation.

In fuzzy logic, information granulation underlies the concepts of linguistic variable and fuzzy if-then rules [3,4]. These concepts were formally introduced in my paper 'Outline of a New Approach to the Analysis of Complete Systems and Decision Processes' in 1973 [5]. Today, almost all applications of fuzzy logic employ these concepts. It is of historical interest to note that my introduction of these concepts was met with scepticism and hostility by many eminent members of the scientific establishment.

The importance of fuzzy rules stems from the fact that such rules are close to human intuition. In fuzzy logic, fuzzy rules play a central role in what is called

5

fuzzy dependency and command language (FDCL). In an informal way, it is this language that is used in most of the applications of fuzzy logic.

In comparing fuzzy logic with other methodologies, a point that is frequently unrecognized is that, typically, the point of departure in a fuzzy logic solution is a human solution. Thus, a fuzzy logic solution is usually a human solution expressed in FDCL. An easily understood example of this point is the car parking problem in which the objective is to place the car near the curb and almost parallel to it. A fuzzy logic solution of the parking problem would be a collection of fuzzy if-then rules which describe how a human parks a car. The parking problem is hard to solve in the context of classical control. In this context, the point of departure is not a human solution but a description of the final state, the initial state, the constraints and the equations of motion.

A further example which illustrates the essentiality of information granularity is the following. Consider a situation in which a person A is talking over the phone to a person B whom A does not know. After a short time, say 10-20 seconds, A can form a rough estimate of the age of B expressed as:

the probability that B is very young is very low

the probability that B is young is low

the probability that B is middle-aged is high

the probability that B is old is low

the probability that B is very old is very low

These estimates may be interpreted as a granular representation of the probability distribution, P, ofB's age. In a symbolic form, P may be represented as a fuzzy graph:

P = very low\very young + low\young + high\middle-aged + low\old + verylow\very old

In this expression, '+' is the disjunction operator and a term such as 'low\old' means that low is the linguistic probability that B is old.

The important point is that humans can form such estimates using linguistic, i.e. granulated, values of age and probabilities. However, humans could not come up with numerical estimates of the form 'the probability that B is 25 is 0.012.'

It should be observed that in many cases a human would estimate the age of B as middle-aged omitting the associated probability. The omission of probabilities may be justified if there exists what might be called a p-dominant value in a probability distribution, i.e. a value whose probability dominates the probabilities of other values. The omission of probabilities plays a key role in approximate reasoning [6].

6

A question which arises is: 'Could the use of a methodology within soft computing provide an estimate of the age of B without human intervention?' In my view, the answer is no. More specifically, neurocomputing and genetic computing techniques would fail because of the complexity of input/output pairs, while fuzzy logic would fail - even though a human solution exists - because humans would not be able to articulate the rules by which the age estimate is arrived at.

In summary, information granulation lies at the center of human reasoning, communication and concept formation. Within fuzzy logic, it plays a pivotal role in what might be called computing with words, or CW for short. CW may be viewed as one of the most important contributions of fuzzy logic. What is CW? As its name suggests, in CW the objects of computing are words rather than numbers, with words playing the role of labels of granules. Very simple examples of CW are:

Dana is young and Tandy is a few years older than Dana,

:. Tandy is (young + few) years old

Most students are young and most young students are single,

:. most2 students are single

In these examples, young, few and most are fuzzy numbers; + is the operation of addition in fuzzy arithmetic, and mose is the square of most in fuzzy arithmetic.

In Western cultures, there is a deep-seated tradition of according more respect to numbers than to words; but, as is true of any tradition, a time comes when the rationale for a tradition ceases to be beyond question. In my view, the time has come to question the validity of this tradition.

What we need at this juncture is a system which allows the data to be expressed as propositions in a natural language. This is what CW attempts to provide. The point of departure in CW is a collection of propositions expressed in a natural language. This collection is referred to as the initial data set (IDS). The desired answers or conclusions are likewise expressed as a collection of propositions expressed in a natural language. This collection is referred to as the terminal data set (TDS). The problem is to arrive at TDS starting with IDS. A very simple example is one where the IDS is the proposition 'most Swedes are tall,' and the TDS is the answer to the query 'what is the average height of Swedes?' The answer is expected to be of the form 'the average height of Swedes is A, where A is a linguistic value of height.' In this example, the aim of CW is to compute A from the information provided by the IDS.

In CW, words play the role of fuzzy constraints and a proposition is interpreted as a fuzzy constraint on a variable. For example, the proposition 'Mary is young' is interpreted as a fuzzy constraint on Mary's age. In symbols:

7

Mary is young ~ Age(Mary) is young

In this expression, '~' represents the operation of explicitation; 'Age(Mary)' is the constrained variable; and 'young' is a fuzzy relation which constrains 'Age(Mary).'

More generally, if p is a proposition in a natural language, the result of explicitation of p is what is called the canonical form of p. Basically, the canonical form of a proposition p makes explicit the implicit fuzzy constraint in p, and thus serves to define the meaning of p as a constraint on a variable. In a more general setting, the canonical form of p is represented as:

XisrR

where X is the linguistically constrained variable, e.g., Age(Mary), R is the constraining fuzzy relation, e.g. young, and isr is a variable in which r is a discrete variable whose values define the role of R in relation to X. In particular, if r = d, isd is abbreviated to 'is' and the constraint 'X is R' is said to disjuncture. In this case, R defines the possibility distribution of X. What is the reason for treating r as a variable? The richness of natural languages necessitates the use of a wide variety of constraints to represent the meaning of a proposition expressed in natural language. In CW, the principal types of constraints that are employed in addition to the disjuncture type are: conjunctive, probabilistic, usuality, random set, rough set, fuzzy graph, and functional types. Each of these types corresponds to a particular value of r.

In CW, the first step in computing the terminal data set is that of explicitation, i.e. the representation of propositions in IDS in their canonical forms. The second step involves constraint propagation, which is carried out through the use of the rules of inference in fuzzy logic. In effect, the rules of inference in fuzzy logic may be interpreted as the rules of constraint propagation.

The third and final step in the computation of the terminal data set involves a retranslation of induced constraints into propositions expressed in a natural language. In fuzzy logic, this requires the use of what is referred to as linguistic approximation.

What is important to recognize is that the steps sketched above may require an extensive use of computing with numbers. However, as a stage of CW, computing with numbers takes place behind a curtain, hidden from the view of a user.

So what is it that CW has to offer? The ability to infer from an IDS in which information is conveyed by propositions expressed in a natural language opens the door to the formulation and solution of many problems in which the available information is not precise enough to justify the use of conventional techniques. To illustrate this, suppose that the problem is that of maximizing a function which is described in words through the fuzzy if-then rules:

if X is small then Y is small

if X is medium than Y is large

if X is large then Y is small

8

in which small, medium and large are defined through their membership functions. Another problem in this vein is the following. Assume that a box contains ten balls of various sizes of which several are large and a few are small. What is the probability that a ball drawn at random is neither small nor large?

In these examples, the propositions in the IDS are quite simple. The real challenge is to develop CW to a point where it could cope with propositions of much greater complexity which express real-world knowledge.

At this juncture, CW is a branch of fuzzy logic. In my view, in coming years it is likely to evolve into an important methodology in its own right, providing a way of coping with the pervasive imprecision and uncertainty of the real world [6]. In this perspective, the role model for CW, fuzzy logic, and soft computing is the human mind.

4. Conclusions

The conception, design and deployment of information/intelligent systems presents a great challenge to those of us who are engaged in the development and applications of fuzzy logic and soft computing. Hopefully, our efforts will contribute to the creation of a society in which information/intelligent systems will serve to enhance human welfare and intellectual freedom.

9

References

1. Azanni, N. and Nwana, H.S. (eds.) (1997) Software Agents and Soft Computing, Towards Enhancing Machine Intelligence. Springer-Verlag, Berlin.

2. Bouchon-Meunier, 8., Yager R and Zadeh, L.A. (eds.) (1995) Fuzzy Logic and Soft Computing. Advances in Fuzzy Systems - Applications and Theory, Vol. 4. World Scientific, Singapore.

3. Chen, Y.-Y. Hirota, K. and Yen, loY. (eds.) (1996) Soft Computing in Intelligent Systems and Information Processing. Proceedings of 1996 Asian Fuzzy Systems Symposium, IEEE.

4. Dubois, D., Prade, H. and Yager, R (1993) Readings for Fuzzy Sets for Intelligent Systems. Morgan Kaufmann, San Monteo.

5. Dubois, D., Prade, H. and Yager, R (1997) Fuzzy Information Engineering, A Guided Tour of Applications. John Wiley & Sons, Inc., New York.

6. Jang, J.-S.R., Mizutani, E. and Sun, C.-T. (1997) Neuro-Fuzzy and Soft Computing, A Computational Approach to Learning and Machine Intelligence. Prentice HalI, Upper Saddle River, Nl

7. Lee, C.S.G. and Lin, C.-T. (1996) Neural Fuzzy Systems, A Neuro Fuzzy Synergism in InteIligent Systems. Prentice HaIl, Upper Saddle River, Nl

8. Zadeh, L.A., (1950) Thinking Machines - A New Field in Electrical Engineering. Columbia Eng, No 3.

9. Zadeh, L.A., (1971) Toward a Theory of Fuzzy Systems. In: Aspect Network and System Theory. Rinehart and Winston, NY.

10. Zadeh, L.A., (Jan 1973) Outline of a New Approach to the Analysis of Complete Systems and Decision Processes, IEEE Trans Syst Man Cybernet, SME-3, No I.

11. Zadeh, L.A., (1975) The concept of linguistic variable and its application approximate reasoning. Inf. Sci., 8,.

12. Zadeh, L.A. (1991) The Calculus of Fuzzy If-Then Rules. AI Expert 7, 23-27. 13. Zadeh, L.A. and Yager RR (1991) Uncertainty in Knowledge Base. Springer-Verlag,

Berlin. 14. Zadeh, L.A. (1996) Fuzzy Logic = Computation with Words. IEEE Transactions on

Fuzzy Systems, 4, No.2, 103-111.

15. Zadeh, L.A. (1996) Fuzzy Logic and Calculi of Fuzzy Rules and Fuzzy Graphs: A Precis. Multiple-Valued Logic, 1, 1-38.

Computational Intelligence Defined - By Everyone !

James C. Bezdek

Department of Computer Science, University of West Florida, Pensacola, FL 32514, USA [email protected]

Abstract. Here is the abstract from my 1992 paper about Computational Intelligence (CI) [1]:

This paper concerns the relationship between neural-like computational networks, numerical pattern recognition and intelligence. Extensive research that proposes the use of neural models for a wide variety of applications has been conducted in the past few years. Sometimes the justification for investigating the potential of neural nets (NNs) is obvious. On the other hand, current enthusiasm for this approach has also led to the use of neural models when the apparent rationale for their use has been justified by what is best described as "feeding frenzy". In this latter instance there is at times a concomitant lack of concern about many "side issues" connected with algorithms (e.g., complexity, convergence, stability, robustness and performance validation) that need attention before any computational model becomes part of an operational system. These issues are examined with a view towards guessing how best to integrate and exploit the promise of the neural approach with other efforts aimed at advancing the art and science of pattern recognition and its applications in fielded systems in the next decade. A further purpose of the present paper is to characterize the notions of computational, artificial and biological intelligence; my hope is that a careful discussion of the relationship between systems that exhibit each of these properties will serve to guide rational expectations and development of models that exhibit or mimic "human behavior". This article adds to the growing and pretty amusing literature that tries to

explain what I meant. I will add my own opinion to the many others that offer an explanation for the current popularity of the term CI.

Keywords. Artificial intelligence, computational intelligence, evolutionary computation, fuzzy logic, neural networks, pattern recognition.

1. Chronology of the Term

Computational intelligence is not a new term, and I was definitely not the first person to use it. The earliest well documented use of CI is, as far as I know, the


11

title of the Canadian journal Computational Intelligence l . I often cite Peter Cheeseman's 1988 article about fuzziness versus probability, and it appeared in this journal in 1988 [2]. I do not know whether the founding editor of this journal explained the choice of its name, but I doubt if it was closely related to the set of meanings that are attached to the term nowadays. There must be some (perhaps much) additional published discussion about the term CI, but I don't know the citations to give you.

Bob Marks wrote an editorial in 1993 about the difference between computational intelligence and Artificial Intelligence (AI) [3]. Marks gave his own interpretation of CI, and he added some interesting data about the disciplines that might be in CI and AI. Bob's editorial was partly occasioned by the choice (again and alas, mine) of the name World Congress on Computational Intelligence (WCCI). This 1994 IEEE meeting combined three annual international conferences on neural networks (ICNN), fuzzy systems (FUZZ-IEEE) and evolutionary computation (ICEC) that are sponsored by the IEEE Neural Networks Council (NNC).

The IEEE press published a book of invited papers that were presented at the 1994 WCCI symposium called Computational Intelligence: Imitating Life [4]. The introduction to this book, co-authored by Zurada, Marks and Robinson, expanded on the material in [3]. The first chapter in this book was my paper "What is computational intelligence?" [5]. Here is the abstract of that paper:

This note is about neural-like computational networks, numerical pattern recognition and intelligence. Its purpose is to record my ideas about the notions of computational, artificial and biological intelligence, and the semantics we use to describe them. Perhaps a dialog about the term "intelligent systems" will guide us towards rational expectations about models that try to mimic this aspect of human behavior.

Since then, there has been an amazing proliferation of usage of the term CI, as well as authors that offer their own explanation of it [6-9]. There have been about 15 international CI conferences, there are dozens of organizational CI units in academia and industry such as "Institutes of CI", or "CI laboratory", or whatever; at least one textbook with CI in its title is in press [9]; and there is even a university in London that is awarding master's degrees in the subject! (Royal Holloway, a college of the London University, awards the MSCI degree).

Why? Well, I'm not really sure. But I suspect that there are two main reasons. First, the technical community is somewhat disenchanted with (perceptions, anyway, of) the basis of AI research. I will argue here that AI tackles really hard problems, and that goals may have been set unrealistically high in the early days of AI. And second, scientists and engineers have a certain hunger - maybe even a justifiable need - for new terms that will spark interest and help sell papers, grant proposals, research and development programs and even products. These

lComputational Intelligence has the alternate title Intelligence Informatique. It is a quarterly journal published in Ottawa by the National Research Council of Canada, Issue 1(1) is dated February, 1985. I have been told that it is an AI journal in disguise.

12

are the defining characteristics of the so-called buzzword, of which CI is currently a prime example. After all, computational neural networks in their best known form have been around since 1943 [10], evolutionary computation since 1954 [11]2, and fuzzy sets since 1965 [12]. Funding entities and journal editors get tired of the same old terms. Is my attitude about this a little too cynical? Probably. But I think it's pretty accurate.

2. A Quick Tour of my View

For those who have not followed the meteoric rise of the term CI, this section reviews (well, actually it pretty much plagiarizes) briefly what I said in my 1992 and 1994 papers. First, I introduced the ABCs:

A B C

Artificial Biological Computational

Non-biological (Man-made) Physical + chemical + (11) = organic Mathematics + computers

The centerpiece of [1] and [5] was the diagram which is repeated here (but with an important modification discussed below) as Fig. 2.1. This figure illustrates my view of the relationship between my ABCs and Neural Nets (NN), Pattern Recognition (PR) and Intelligence (I). References [1] and [5] discuss the 9 nodes in the middle of Fig. 2.1. In the first place, I won't disagree if you argue that the distinction between the bottom and middle rows of Fig. 2.1 is pretty fuzzy (some of you will feel that the difference between A and C is artificial). Some think that artificial intelligence (AI) is a subset of CI, and there are many who feel that AI and CI are the same thing. CI occupies the lower right hand corner in my diagram.

I think that A, B, and C correspond to three very different levels of system complexity, which increases from left to right, and from bottom to top in this sketch. I have skewed the nodes so that visual distances between them correspond in a loose way to the disparity between the terms they represent. Horizontally, e.g., distinctions between Computational Neural Nets (CNNs) and Computational Pattern Recognition (CPR) are slight, in my opinion, compared to the separation between Biological Neural Nets (BNNs) and Biological Pattern Recognition (BPR). The vertical scale is similarly skewed; e.g., I think that CI is much closer in some ill-defined sense to AI than AI is to Biological Intelligence (BI).

2Dave Fogel, an ardent student and chronicler of the history of computational models that emulate evolution, tells me, in his words, that: "I think it would be a fair designation to say that evolutionary computation dates back to 1954 (Barricelli had a paper in the journal Methodos - it's in Italian - but the paper's intro is reprinted in a later paper in Methodos in 1957 that's in English)". History aficionados can do no better than Chap. 3 of [13] for a more complete discussion.

13

Input Complexity ~

t Human Knowledge BNN CBPR - Organic + Sensory Inputs

>- Knowledge Tidbits +-' 'x + Computation CAPR - Symbolic Q)

a. + Sensor Data E U 0 u Computation cCPR

+ Sensor Data

Fig. 2.1. Commuting through the ABCs [1, 5]: my new view

The BNN is one of the physiological systems that facilitates BPR. A key input to the BNN that helps it do this is sensory data; another is "knowledge". In turn, BPR is but one aspect of BI. At the other end of the complexity spectrum, and, I believe, in an entirely analogous way, CNNs that depend solely on sensor data to complete their assigned tasks are (but one!) facilitator of CPR, which in turn is but one aspect of CI. The term CNN as I use it here stands for any computational model that draws its inspiration from biology. CNNs include, but are not limited to: feed forward classifier networks, self-organizing feature maps, learning vector quantization, neocognitrons, adaptive resonance theories, genetic algorithms, Hebbian, Hopfield and counter propagation networks, evolutionary computing, and so on. Keep this in mind especially when you get to my discussion of Dave Fogel's opinions.

Familiar terms in Fig. 2.1 include ANN, AI and the three biological notions in the upper row. The symbol (c) in this figure means "is a subset of' in the usual mathematical sense. For example, I am suggesting along the bottom row that CNNs c CPR c CI. In [1, 5], I used inclusion symbols along all of the vertical paths too, instead of arrows such as (~ ) which are now used in between the middle and top rows of Fig. 2.1. I have switched to arrows to indicate non-mathematical ill-defined relationships such as "needs" or "helps enable" or "leads to" or "provides inspiration for". This important change may help you understand that I do not mean inclusion in the mathematical sense vertically when passing from A to B. This also avoids the obvious and embarrassing logical inaccuracy that strict vertical inclusion would imply, e.g., that everything in the A category (man-made, by my definition) is automatically biological (which is obviously false). Thus, I think that CI is a proper subset of AI, but that AI is not a subset of BI; rather, BI is used to guide AI (and thus CI) models of it. This oversight became clear to me during the writing of this paper, because other authors in fact suggest different inclusion/exclusion relationships

14

for A, Band C, and who's to say they are wrong? Not me! This is an opinions paper, so facts - when and if they exist - are only of secondary importance3.

As defined then, every computational system is artificial, but not conversely. So, I am definitely suggesting that CI and AI are not synonyms. CI is in my view a proper subset of AI. Bob Marks suggested the following example to me, which is contrary to my view of CI. Consider the human intelligence which is implicitly embedded into a low-level, pixel-based image segmentation algorithm when its inventor nominates and uses particularly clever features extracted from the raw image. OK, this is certainly an intelligent thing to do. But is this program a computationally intelligent entity? If it is, then by analogy, all computer programs possess the intelligence of their creators. Does this entitle us to call any computer program a knowledge-based system (Eberhart et al. [9] call this intelligent behavior)? In some broad sense I guess it does. But I bet you would get laughed out of the room if you stood up at the next AAAI and announced that your newest matrix inversion routine was knowledge-based because you had the knowledge to create the program. Figure 2.1 suggests that the CNN is at best a building block for computationally intelligent systems. I don't think the CNN or any other low-level algorithm deserves a stronger designation.

I think an intelligent system is one that attempts to get to (that is, perform like) BI in Fig. 2.1, and the question is - how do we do it? We want our models to move upwards and to the right in Fig. 2.1, towards BI. I call Fig. 2.1 commuting through the ABCs by analogy to commutative diagrams in mathematics. What, if any, paths are open? Which ones provide the quickest access, the best approximation to BI? I am suggesting that we need the middle row (A=artificial), and that it definitely involves more than the bottom row, and far less than the top row, for it offers us a means of extending computational algorithms upwards towards their biological inspirations through symbolic representation and manipulation of non-numeric data. Fuzzy models seem particularly well suited for a smooth transition from C to A because they can accommodate both numerical and semantic information in a common framework.

Figure 2.1 illustrates other differences between my B, A, and C levels of complexity. For example, (strictly) computational systems depend on numerical data supplied by man-made sensors, and do not rely upon encoding knowledge. Matrix inversion and pixel-based image segmentation, e.g., fall into this category. Let me illustrate, by describing how to make ANNs and APR from their lower level progenitors.

Let

x = {Xl' ... , X } U {X 1 1"'" X } n n + m ~~

apples pears

31 remember reading in Herb Caen's column, circa 1963, in the San Franciso Chronicle that" Having an opinion is an art - any clod can have the facts". 1 don't know if it was his own line, or he was simply repeating a previous quote of someone else. But it's a good line to remember - words to live by.

15

be 2-class labeled training data for classifier design,

This data might be used to train a feed forward CNN to classify apples and pears, converting the CNN to CPR. Suppose the j-th feature for each vector is the number of bumps on the bottom of the fruit. Adding a rule such as <if x jk = 5 then xk is a (red) delicious apple> to the j-th node of the input layer to the

CNN is an example of adding what I call a Knowledge Tidbit (KT) to the CNN, thereby rendering it more like what I want an ANN to be. Similarly, adding a rule like <if the area of this blob is very large, it is probably not a tank> to an image segmentation algorithm might qualify the method as APR (moving from CPR type image processing towards image understanding). More generally, I think syntactic pattern recognition deserves APR status, since it usually uses numerical techniques to extract structural features from data, and then adds knowledge tidbits about structural relationships that are often dealt with at the symbolic level.

I hope these examples convince you that it is important and useful, in the context of the relationship between NNs and PR, to distinguish carefully what is meant by the terms artificial and knowledge. The word artificial seems more properly applied in its usual context in AI than as used in NNs. The difference I propose between the lower and middle rows of Fig. 2.1 involves reserving the term artificial for systems that use knowledge tidbits. And what, you are wondering, can this curious term possibly mean? Imagine that someone asks you to close your eyes, hands you an apple, and requests the identity of the object in your hand. Almost everyone will correctly identify the apple within a few seconds. I have seen 3 failures of this test in about 200 attempts during talks I have given about this. Can you imagine NOT being able to do it? This is BPR, done with sensory data and real knowledge invoked via associative recall.

Moreover, you can, at the instant of recognition, also answer dozens (hundreds!) of questions related to this apple - where it grows and will not, what it's colors may be and are not, what vitamins it provides and does not, what diseases it prevents and can not, how much it probably costs, etc. And your mind's eye "knows" what it looks like, what it smells like, how it tastes, etc. How many training sessions did you have before you knew all these things? Perhaps five or six, or maybe a dozen. I guess this is what workers in AI would call "deep knowledge". Certainly it is one indicant of BI. Perhaps the most important aspect of your intelligence is associative memory; your ability to instantly link subdomains of your BNN to recall this knowledge.

Imagine asking a computer to identify an apple in the same way? Using what sensor data? Using what "facts about apples" stored in it's memory? I contend that at best you can only store a few knowledge tidbits - pieces of relevant information, but not the whole story - about this simple idea. Which ones? How many? How to "train"? This distinction creates the middle row in Fig. 2.1, which separates low-level computational models from biological (ROLE) models. I prefer to reserve the word artificial for attempts to incorporate non-numerical knowledge tidbits into computational models.

16

Since knowledge tidbits are knowledge, I would be pretty surprised if anyone in AI found this distinction interesting - it is workers in NNs and PR that I address here. I contend that artificial system models utilize sensor data, and also try to capture and exploit incomplete and often unconnected pieces of non-numerical information, rules, heuristics and constraints that humans possess and can imbed in computer programs. This is generally not the case in NNs. I know that the CNN can, given the right data and architecture, represent rules and reasoning pretty well. But I am talking about augmenting the already trained CNN with non-numerical rules and knowledge tidbits such as in the 5-bump example given above.

The distinction between A, Band C, is also important because our semantic descriptions of models, their properties, and our expectations of their performance should be tempered by the kind of systems we want, and the ones we can build. For example, you often read that a feed-forward CNN learns from examples. Semantically, this is nonsense. The CNN is a computational modelit learns (its parameters) in exactly the same way that the expectationmaximization (EM) algorithm for finding maximum likelihood estimators from labeled data does. So in this context learning means acquiring model parameters via iterative improvement. CNN models are optimized by some computational strategy; acquisition of their parameters via learning = training = iterative improvement has nothing explicit to do with biological knowledge or intelligence. And evaluation on test data is often called recall. What's wrong with evaluation - the technically correct term? Well, it just doesn't sound very neural, does it? Now I will jump to the conclusion section of [5].

~onciUSions from [5] ]

Here is the definition of AI given in Webster's New World Dictionary on Computer Terms [14]: ARTIFICIAL INTELLIGENCE:

Definition W ([14]): Artificial Intelligence

The branch of computer science that studies how smart a machine can be, which involves the capability of a device to perform functions normally associated with human intelligence, such as reasoning, learning, and self improvement, See EXPERT SYSTEMS, HEURISTIC, KNOWLEDGE BASED SYSTEMS, AND MACHINE LEARNING./Abbreviated AI.

So, what is computational intelligence? I am still not sure that a formal definition of computational intelligence is useful or desirable, but I did publish one in [1], so I will conclude this section by summarizing what I have said in Table 2.1. According to Table 2.1 computational intelligence is "low-level cognition in the style of the mind". CI is distinguished from AI only by the lack of KTs. Mid-level systems include knowledge (tidbits); low-level systems do not. How can you determine if your system is computationally intelligent using this definition? You can't. But then you can't use Webster's Definition W to show that your mid-level system is artificially intelligent either.

Let me extend my definition of CI to make it a little more specific. I have characterized computational models as low-level architectures that utilize sensor

17

data, and have asked that we reserve the term artificial for architectures that have a clearly identifiable non-numerical component of knowledge. According to Definition W, this would involve things such as reasoning, learning and self improvement, which I view as high level operations in humans but low level operations by computers unless and until human knowledge tidbits are somehow encapsulated by the scheme. Webster's definition seems pretty much in agreement with what I have said. And I have discussed the hypothesis implied by Fig. 2.1, that neural networks are but one facilitator for pattern recognition, and that pattern recognition bears the same relationship to the notion of intelligence - at all three levels, A, Band C.

Table 2.1. Defining the ABC's [1, 5]

Your hardware: Processing of your BNN the brain sensory inputs

Mid-level models: Mid-level processing in the ANN CNN (+) Knowledge Tidbits style of the brain

Low-level, biologically Sensor data processing in CNN inspired models the style of the brain

Your search for structure Recognition of structure BPR in sensory data in your perceptual environment

Mid-level models: Mid-level numeric and APR CPR (+) Knowledge Tidbits syntactic processing

Computational search for structure All CNN s + fuzzy, statistical, CPR in sensor data and deterministic models

Your software: Cognition, memory and BI the mind action: vou have them!

Mid-level models: Mid-level cognition in the AI CI (+) Knowledge Tidbits style of the mind

Low-level algorithms Low-level cognition in the CI that reason computationally style of the mind

Assume that we have reasonable quantitative definitions of computational adaptivity and computational fault tolerance to go with current notions of speed and error rate optimality. If these four properties are hallmarks of biologically intelligent systems (certainly there are many others), then I suggest that these should also be used to qualify computational intelligence. Thus,

Definition B (Bezdek; 1994 in [5]): Computational Intelligence

A system is computationally intelligent when it: deals only with numerical (low-level) data, has a pattern recognition component, does not use knowledge in the AI sense; and additionally, when it (begins to) exhibit (i) computational adaptivity; (ii) computational fault tolerance; (iii) speed

18

approaching human-like turnaround, and (iv) error rates that approximate human performance.

An artificially intelligent (AI) system is a CI system whose added value comes from incorporating knowledge (tidbits) in a non-numerical way. Now can you test your system for computational intelligence? Of course not. But if you describe its properties with terms such as these so that we can see what they mean, measure them, compare them, and correlate them with our understanding of their more commonly held usage, you will have done a real service to science.

Well, there it is. To get right down to it, the purpose of this article (i.e., [5]) was simply to get you thinking about how we use terms such as "intelligent system4". If it has done this, I have succeeded. I want to discourage the use of seductive semantics in algorithmic descriptions; and to encourage strict, verifiable definitions of computational properties. There is little doubt that CNNs will find an important place in pattern recognition, and in CI and AI systems. I hope that the ideas put forth here have some utility for travelers along the way.

~onciUSions' from [5] ]

My remarks in [1, 5] were limited to models as they are used for pattern recognition (feature analysis, clustering and classifier design). I wanted to emphasize what I believe to be the very great difference between CNNs and their ultimate role model, BI, as we currently understand these two terms. In this context, CNNs are but one of many alternatives for computational (or numerical) pattern recognition. CPR includes deterministic, fuzzy and statistical models that do not offer biological rationales (e.g., hard, fuzzy and possibilistic c-means, k-nearest neighbor rules, Bayesian discriminant functions, etc.). Eberhart et al. contend in [9] that all computational models are biologically inspired. Do you agree? Start thinking about this - I will return to it later. Finally, it is clear that you can construct diagrams like Fig. 2.1 for other disciplines (e.g., control); later I will show you one (Fig. 5.1).

My WCCI talk based on [5] contained examples of each of the 9 nodes shown in Fig. 2.1 and Table 2.1. Those examples were not published in the 1994 paper, so I want to record them here. After all, without at least one example of each node, you can rightly argue that the node is not needed. Figures 2.2-2.4 contain illustrations of each node. All three figures have the main title "Refining the ABCs", and are further organized by increasing complexity, progressing from NNs in Fig. 2.2 to PR in Fig. 2.3 to intelligence in Fig. 2.4. (Alternatively, think of these as refinements, left to right, of the three columns in the middle of Fig. 2.1)

Figure 2.2 begins with the CNN node that is at the bottom left side of Fig. 2.1. This is the lowest and least complex level in my diagram both horizontally and vertically, and corresponds to many of the computational learning models that we all know and love so well. Perhaps the canonical example is the standard

41 have an advertisement for a new journal titled Intelligent Data Analysis. What do you think the articles in it are about? Can you imagine doing unintelligent data analysis, and asking anyone to publish your results? Of course, this happens anyway!

19

feed-forward neural network. It knows only what the data you use to find its parameters with can supply. In this sense, it is no better (or worse) than simple statistical models. One set of data provides one estimate, the next observation mayor may not be well characterized by the model. A nice example is the use of the CNN to approximate functions. If you construct training data by sampling the function f(x,y) = 2x2 - 4y2 over a regular lattice on, say, the unit square in

9t2 , it is easy to find a feed forward, back propagation CNN that provides a remarkably good approximation to this function over the domain of training (i.e. , sensor) inputs.

Structure BNN

The brain processes your sensory inputs

ANN

CNN (+) KT's Process Sensor Inputs and KT's

in the style of the brain y

CNN Biologically

inspired models Process Sensor

Inputs in the

style of the brain

x

y

Fig. 2.2. Refining the ABCs: neural networks

+~ +

+~

The middle row in Fig. 2.2 shows the ANN - any CNN with knowledge tidbits (facts, rules, rules of thumb, heuristics, constraints etc.) added to help it perform its task more efficiently. For example, the teacher in Fig. 2.2 knows that

20

the function being approximated is a sum of two quadratic terms. You might enhance the ability of the CNN to approximate it by separating the network into 2 subnets, one for each term as shown in the figure. If you knew the signs of the coefficients (a and b), imposing further constraints on the architecture might yield an even better approximation. This illustrates transformation of the CNN to an ANN by incorporation of tidbits about the problem being solved into the architecture chosen to represent the function.

Finally, the top row of Fig. 2.2 shows the BNN. I don't need to elaborate on the hardware that processes your sensory inputs, but we have only a very, very rough idea its functional and physical structure.

Figure 2.3 is a refinement of pattern recognition, the middle column in Fig. 2.1. The lowest level in this diagram is computational pattern recognition. In the example shown, the addition of labels for each subclass (tanks and trucks) in the data indicates that the training data can be used for classifier design. The use of CNNs for this purpose is well documented, and the only significant difference between the CNN and CPR nodes of Fig. 2.1 lies with the task assigned to the model - here, identification, classification, and perhaps prediction. These are slightly more complicated tasks than function approximation, the task used to illustrate the basic CNN.

The middle row of Fig. 2.3 can be realized from the bottom row by adding knowledge tidbits to the model. For example, the blackboards in this row show logical rules such as "If the object has a barrel-like structure, it is not a truck"; and structural rules such as "the barrel [of a tank is] connected to the turret". The computer must know what barrel-like structures are, and this leads to syntactic pattern recognition. In my view syntactic pattern recognition uses knowledge tidbits and operates at a much higher conceptual level than numerical pattern recognition, which is based on object and relational data alone. To repeat, I think syntactic pattern recognition is a good example of what I would call Artificial Pattern Recognition (APR). Note that I show two ways to get to APR: CPR+KTs or ANN+PR. This illustrates the commutative aspect of Fig. 2.1.

As in Fig. 2.2, the top row of Fig. 2.3 is self-explanatory. How does the bee recognize where to find the nectar it seeks? I don't know, but this is certainly biological pattern recognition, and it must involve perception which is cued by sensory inputs as well as rudimentary memory.

Figure 2.4 is a refinement of the three levels of intelligence shown as the rightmost column in Fig. 2.1. The lowest level in this diagram is the object of this article - namely, computational intelligence. What would I ask of a computationally intelligent system? It would be able to perform low level cognitive tasks that humans can do with at least some success. The bottom panel of Fig. 2.4 shows multiple copies of CNN structures that are somehow organized by low-level control to perform multi-platform automatic target recognition. There are labeled data for 3 classes of vehicles (ships, planes and ground vehicles) that are further subdivided as military or civilian. You can look at such vehicles and label them into classes 1-6 quite easily. The model shown would be computationally intelligent if it could do the same thing with relatively good accuracy - say, 70% correct. One of the key ingredients of this system would be some rudimentary form of adaptivity. For example, control of parallel structures might evolve via evolutionary computation as more data become available. The

21

CNNs provide some unspecified form of fault tolerance for noisy and confusing data (fuzzy CNNs do this automatically, but in a non-specific way), and speed and error rate optimality are clearly present. Thus, this system has all the ingredients I want for it to be called computationally intelligent. Note that this system is also missing the key ingredient of AI - the explicit use of imbedded knowledge tidbits to help the system do its job.

Structure

BPR

Your search for structure In sensory

data

APR

CPR + KTs

or

ANN+PR

CPR

The search for structure

in sensor data

CNN+PR

Exam Ie

Fig. 2.3. Refining the ABCs: pattern recognition

+ Barre

connects to turret

22

The only change between the lower and center panels of Fig. 2.4 is the addition of knowledge tidbits, shown here as sets of instructions supplied by its operators to the system. For example, the CNNs might be under the control of a Takagi-Sugeno fuzzy system, with linguistic rules about the classes of vehicles providing overall control. Knowledge of vehicle properties would enhance the performance of the system, as well as that of each component of it. This moves the system from CI to AI through the vertical path in Fig. 2.2.

BI Your

software

The mind

AI

Mid-level models

CI + KT's

or

APR+I

CI

Low-level models

that "reason"

CPR+I

Fig. 2.4. Refining the ABCs: intelligence

~~r~ 999 ~~~

~~~ ~~~

23

Another path to AI is realized by adding logical and structural rules as discussed in connection with Fig. 2.3 to each of three systems for ships, planes and ground vehicles. In this case I interpret the AI system in the center panel of Fig. 2.4 as being realized by binding three APR systems together with an intelligence ingredient (I) comprising system control. This again emphasizes commutativity through the ABCs: there is more than one path to AI. Finally, the top panel in Fig. 2.4 represents an example of BI - no comment necessary. OK, this is the end of my own view of CI; now let's see what others have been saying.

3. The Party Lines

My definition for CI is, I suppose, philosophical in nature, and I was led to propose it for very different reasons than the reasons on our immediate horizon. If you tell me what you do, and ask me "am I in CI?", my response will be - I don't know, and why is it important anyway? If you do what you are interested in well, I will be interested in it too. But societal pressure runs counter to my personal tastes. Many want to know where they fit into the fabric of our profession. Since the 1994 WCCI was planned as (and was) a compendium of conferences on neural networks, fuzzy systems and evolutionary computation, it was a short conceptual step for Bob Marks to take when he stated ([3] p. 737):

3.1 Definition M (Marks 1993 in [3]): Computational Intelligence

Neural networks, genetic algorithms, fuzzy systems, evolutionary programming, and artificial life are the building blocks of CI.

Fig. 3.1. The umbrella of CI according to Marks [3]

Figure 3.1 shows CI as an umbrella that collects these three (well, Bob named five, but that's a minor point) loosely defined fields into a single "superfield". This is a far cry from my position, which was really not very specific about what fields might be involved in CI (with the exception of NNs, which are explicitly discussed in connection with Fig. 2.1), but rather, about what concepts should be included in each level of the ABCs of NNs, PR and

24

Intelligence. In fact, my view precludes artificial (A) as an adjective that modifies the word "life" (B). Nonetheless, both definitions are descriptive, and others will follow these first two.

The structural organization in Fig. 3.1 is the working definition of CI taken by the authors of [3,4, 7], and more generally, by many leaders of the IEEE NNC. Walter Karplus, the (1995-1996) president of the NNC, presented a chart at the June 2, 1996 ADCOM meeting of the NNC that reaffirmed this clearly, but added several interesting new twists:

3.2 Definition K (Karplus 1996): Computational Intelligence

CI substitutes int~nsive computation for insight into how the system works. NNs, FSs and EC were all shunned by classical system and control theorists. CI umbrellas and unifies these and other revolutionary methods.

'This definition affirms the umbrella of Fig. 3.1, but adds a little more in two ways. First is the idea that formal methods are not the only coin of engineering and science. We in fuzzy sets know only too well how repugnant the abandonment of Newton's laws in favor of expert opinions, rules and judgment is to classical control theorists. But fuzzy controllers work (very well), and there is no need in 1997 to prove it (but I would argue with Karplus about his use of the term insight - I think this is a hallmark of fuzzy models, not one of its missing elements). Similarly, NNs and EC provide success stories that are often hard to justify with formal mathematical models (which are, I emphasize, but a subset of all computational models, some of which are based on mathematics, and some of which are not). Second, the last four words in Definition K admit other disciplines under the CI umbrella - for example, virtual reality, which is certainly computational in nature but definitely not grounded in the classical laws of physics.

Table 3.1. Keywords for INSPECI CASSIS database searches

Artificial Intelligence (AI) Computational Intelligence (CI)

artificial intelligence neural nets expert systems neural networks machine intelligence neurocomputers intelligent systems fuzzy (anything)

genetic algorithms evolutionary programming artificial life

Articles [3,4, 7, 15] all display curves that plot AI against CI measured in terms of numbers of papers/year since 1989 and patents/year since 1986. The data are relevant to the topic at hand, and are interesting, so I will summarize them here. Table 3.1 lists the keywords used in [3,4, 7, 15] to extract data about

25

relative numbers of papers and patents. The choice of keywords clearly affects the extracted data, and there is little doubt that AI researchers would lodge a valid objection to the keywords chosen for AI. For example, the phrases machine learning and knowledge-based in Definition W should fall under AI. (Some influential AI people that I asked about this strongly insisted that NNs, FSs and even EC are already in AI, and should appear under it!) This agrees with my view of the relationship between AI and CI as seen in Fig. 2.1, where CI is shown as a proper subset of AI.

The first database discussed in [3] is the Information Service for Physics and Engineering Communities (INSPEC) database compiled by the IEE and the IEEE. INSPEC was founded in 1989, and lists papers by titles, abstracts and authors from over 4000 journals. INSPEC is also augmented with data about books, reports and conference records, and it supports keyword searches over both titles and abstracts. This database focuses on physics, computer science, electronics and electrical engineering. About 2 million entries have been logged since 1989.

The second graph comparing AI with CI presented by Marks in [3] was based on US patent data obtained from the Classification for Search Support Information System (CASSIS). Key word searches in this database are also performed on both titles and abstracts, and Marks searched CASSIS back to 1986 for his criginal article.

Figures 3.1 and 3.2 show, respectively, the numbers of papers/year in INSPEC and the number of US patents/year in CASSIS since 1989 as found by keyword searches against titles and abstracts using the phrases in Table 3.1. The totals for both AI and CI are the UNION of the hits for each keyword (and not the sum). Many papers that are retrieved for some keyword under one heading are also retrieved under the other.

Figures 3.1 and 3.2 are somewhat subjective, being at the mercy of the keywords chosen as the search criteria. For the keywords in Table 3.1, CI is clearly growing while AI is declining in terms of both papers/year published and US patents/year issued. According to Zurada et at. [4] the intersection of CI with AI at the 1994 sampling of INSPEC was about 14% (that is, roughly 14% of the papers retrieved under AI were also retrieved under CI). And the intersection in terms of CASSIS patents was about 33%. The authors of [4] cite these statistics as evidence that CI and AI, at least in terms of the Table 3.1 keywords, are clearly different disciplines, and they are experiencing opposite growth trends. I disagree with this interpretation. Since I think CI is a set of enabling technologies for AI, I take these data as evidence that more recent effort is being devoted to methods that can be used to eventually arrive at solutions of hard AI problems.

Data for Figs. 3.1 and 3.2 for 1993and 1994 were added by Palaniswami et al. [7] to the original graphs published in [3, 4] These data correspond to sampling the databases in October, 1995. The (total) INSPEC entries reproduced from [7]: 11,423 fuzzy papers, 29,243 NN papers, 39,866 CI papers, and 45,791 AI papers. Almost all of the patents comprise either NN or FS devices. Marks recently updated these cumulative INSPEC totals. In [15] Marks lists 12,605 fuzzy papers, 34,839 NN papers, 45,966 CI papers, and 48,916 AI papers. Overlap data from the INSPEC search process led to the following statistics:

26

NN II AI = 8,670 => 19% of AI or 25% ofNN papers FL II AI = 3,228 => 7% of AI or 26% of FL papers CI II AI = 10,948 => 22% of AI or 24% of CI papers

Papers (INSPEC) ... -:." . . ", .

1000C •• l ·4". ...

I I - I -.;(~ - I .......

t- t- - r.~·l·~ ... ..... t- - t-

800e - ~ - ~.", .. :" ~ , .. " :':"'~~ ~ .,... ...'" ~

''-''''''''~?i~ ......... ...... , .......... , I 4 4.:~ ~"',,''''~,' I I - - _ .. • # ••

I . - I I I : .. ,. AI CI 600e , ., -

I r", .,.

I I .;.i ,., .'" ... ,., '" .",

"'o - r r r r r ,., ", ,,,. .:r .. , ,,,.

.4~~r l- I- l- I- I-.1!.'

400e ~. -

L L L L L

200e I

I I I I I- I- I- I- 1-

year

1989 1990 1991 1992 1993 1994

Fig. 3.1. Numbers of papers/year in the INSPEC database in AI and CI [3, 4, 7, 15]

Marks' most current figures are: 14,008 fuzzy papers, 38,219 NN papers, 50,907 CI papers, and 50,710 AI papers. At least for the keywords being used by Marks, the total number of CI papers indexed by INSPEC since 1989 has now surpassed the total number of AI papers written in the same period. This measure is probably biased a little towards the party line fields (NNs, FSs, EC). What it offers me is strong evidence that many people are working in one or more of these fields. From this Marks draws two conclusions: (i) CI has now passed AI; and (ii) CI is NOT contained in AI. Here of course, containment is in the very well-defined sense of INSPEC searches based on the keywords in Table 3.1.

27

Patents (CASSIS) AI ~

CI ................. • ~ ............ I

250 - T - - -L .... I ..... " ... , ..... . .,

~.

T T T T T T - ::, - 1 .~ ~

200 i- i- i- i- i- i- .:. i- ~ •.. . ~ 'o'

.L .L .L .L 1- L .': 1- J ~( -,,-I ,.' I ./ . .,

150 - - .. - -I I .: .• I I .::

'o '

I' T I' T I' :;,r- I' I' l . ., .. ' 100 I I I I 1":." I I 1 .' ....

+- +- +- +- , ~~+- ~ -:40' ~. ,.'

50 1. 1. 1. I ~.~: ./ --., ..

I I ," I - - - - . :,~ -I I~ J,,~' I I year I ......... ··1 .. ... "

~. }!I~"'A(Wj.J.;··

1'90 1 1 1 '93 >j I 1 1

'86 '87 '88 '89 '91 '92

Fig. 3.2. Numbers of patents/year in the CASSIS database in AI and CI [3, 4, 7, 15]

3.3 The Z-Man's View

Lotfi Zadeh offered a slightly different view about the meaning of and relationship between AI and CI to participants of the NATO ASI. During his talk, he presented the chart which I have reproduced with his permission as Fig. 3.3.

Zadeh feels that traditional (hard) computing is the computational paradigm that underlies artificial intelligence, whereas soft computing is the basis of computational intelligence. And Zadeh agrees with the party line about which fields together comprise the basis of soft computing, viz. fuzzy logic, neural networks, and evolutionary computation. For Zadeh, the essential distinction between CI and AI lies with the type of reasoning employed, crisp logic and rules in AI, and fuzzy logic and rules in CI. Notice that Zadeh does not show a relationship between AI and CI, and therefore disagrees with my interpretation of CI as a subset of and enabling technology for AI.

28

Fig. 3.3. Zadeh's interpretation of AI and CI

4. An Evolving Definition

David Fogel [6] wrote a wonderfully readable review of [4] that was published in the November, 1995 issue of the IEEE Transactions on Neural Networks. I will excerpt a few passages from this review, and reply to his assessment of Definition B. Fogel began by restating the (perceived) party line (Definition M), describing the 1994 WCCI this way ([6], p. 1562).

These technologies of neural, fuzzy and evolutionary systems were brought together under the rubric of Computational intelligence, a relatively new term5 offered to generally describe methods of computation that can be used to adapt solutions to new problems and do not rely on explicit human knowledge.

The last six words in this quote acknowledge the distinction I made between A and C in Fig. 2.1. After obliging his charge admirably (viz. to review the papers in the book), Fogel asserts that the appearance of the three fields in the same volume allows you to assess their relative maturity. He opines that EC is clearly the least mature of the three, even though, by his reckoning, it predates fuzzy sets by at 11 years [11] . Fogel then offers his view on CI. He begins by discussing AI this way ([6], p. 1564):

It can be argued with some conviction that an AI program that cannot solve new problems in new ways is emphasizing the "artificial" and not the "intelligence." The vast majority of AI programs have nothing to do with

5Fogel's footnote in [6]: The term computational intelligence has at least a 10 year history.

29

learning. They may play excellent chess, but they cannot learn how to play checkers, or anything else for that matter. In essence, they are complicated calculators. They may outperform humans in certain circumstances, but I do not anticipate any agreement in calling calculators, no matter how many fixed rules they have, or how many symbols they manipulate, intelligent.

You will notice a shift here from defining CI or AI to instead defining intelligence itself (something I did not do). This quote seems to indicate that Fogel strongly disagrees with Definition W, since Webster's identifies learning as a key component of AI. Fogel uses the word learning without assigning a specific meaning to it. Moreover, it seems from the above that in his view, CI is in fact superior to AI (and not inferior, as implied by my inclusion relation in Fig. 2.2). Fogel continues in this vein, and without explicitly stating it, ends up with what is, in my view, his definition of intelligence:

Definition F (Fogel 1995 in [6]): Intelligent Behavior

Any system, whether it is carbon-based or silicon-based, whether it is an individual, a society, or a species, that generates adaptive behavior to meet goals in a range of environments can be said to be intelligent [13]. In contrast, any system that cannot generate adaptive behavior and can only perform in a single limited environment demonstrates no intelligence.

There are some important and interesting differences between this and previous definitions. First, Fogel identifies my B-systems as carbon-based, and my A and C systems as silicon-based, and then lumps all three together in Definition F. This places his philosophy about intelligence in the same camp as Hofstadter [16], and I guess I have to admit at this point that I tend to lean towards Searle's [17] view of intelligence - I do think there are facets of human behavior enabled by intelligence that cannot be realized by man-made devices. But I certainly don't want to digress about Turing tests (see [18] if you like discussions about this kind of stuff - the authors of [18] assert that passing the Turing test is not a sensible goal for AI).

I very much like Fogel's insistence that one hallmark - indeed, the defining hallmark in his view - of intelligence is adaptation to the system's environment. If you pressed me to name the most important characteristic of intelligent behavior, I might agree with Fogel about this. Unfortunately, saying this still leaves me with the dilemma I discussed in [5] - viz. how can I rate or measure or assess the adaptivity (and therefore, Machine Intelligence Quotient (MIQ), to use Zadeh's term) of an artificial system? I still contend that without precise definitions and quantitative measures to work with, readers of papers about "adaptive" or "intelligent" systems are left in the quagmire of what I called seductive semantics in [5].

Fogel concludes with a critique of Definition B (my definition) of CI. His specific complaint is that I asserted that no computational pattern recognition algorithm is adaptive in the way that humans are (and, I think he agrees with me to a certain extent by saying "and perhaps not, at least not in isolation"). Fogel then states ([7], p. 1564).

30

Evolutionary computation however, has generated applications that are adaptive in this sense6, as offered in Fogel [19]. Holland [20] and others.

Fogel states in a footnote to this quote that my use of the word requirement in my definition of computational adaptivity is ambiguous. [Fogel's footnote in [6]: The term requirements is somewhat ambiguous in this context and could be taken to mean "constraints" or "alternative tasks". If the latter, then the requirement for intelligence would appear to necessitate some degree of consciousness as well, and ability to self-model so as to change behavior (i.e., change tasks) in light of presumed deficiencies in current performance. The methods of computational intelligence have yet to demonstrate practical efforts in artificial consciousness.]

Here is the quote from [5] he reproduced in [6], and I also reprint the very next sentence from my paper, which he did not quote:

An algorithm is computationally adaptive if and only if it can adjust local processing parameters and global configurations of processors to accommodate changes in inputs or requirements without interruption of current processing" (end of Fogel's quote from [5]).

(next sentence in [5]): If a CNN (or other) algorithm is able to alter its parameters, and reconfigure its network structure "on the fly" - that is, without interruption of on-line service, and can also assign itself new tasks when the demand exists, I would be happy to call that algorithm computationally adaptive.

I don't think there is any question that I used the words requirement and task interchangeably, and I still don't think any computational system exhibits this property in the sense I discussed. I can make this a little clearer by describing a typical 30 second sequence of events for a human driving a car that involves adaptive task-switching: tune the radio, sing along with song, apply brakes, park car, exit car, lock car, put money in parking meter, adjust tie in mirror. This human has just completed a variety of very different tasks, and has relied on her or his intelligence - mostly as background processing in fact - to (sub?)consciously switch from one task to another. This is adaptivity in my sense. What would an equivalent example for a computer program be? How about a program that automatically knows when to switch from matrix inversion to spell-checking to playing checkers to target recognition - on the fly, without being told to do so. I don't agree with Dave about the success of EC in this respect, but he has done me a service in focusing on an aspect of intelligent behavior that I may not have paid enough attention to - consciousness.

Summarizing Fogel's view then: he thinks that adaptive behavior is the primary ingredient of intelligence (I agree), that B and A systems can both exhibit it (I agree), that consciousness is an important aspect of task switching (I

6nis refers to my definition of computational adaptivity in [5], which Fogel quoted in [6].

31

agree), that CI subsumes AI (I disagree), and that EC has already exhibited computational adaptivity in my sense (I disagree, and Dave's footnote leaves him some wriggling room on this point). Now we move on to Eberhart et at, who have an even greater dislike for my definition of CI.

5. Eberhart, Dobbins and Simpson's Definition of CI

Eberhart offers this forceful conclusion to [8], which is partially excerpted from [9] :

It is the author's belief that a new age is dawning: the Age of Computational Intelligence.

Inputs The Environment

The Intelligent System (Carbon or Silicon-Based)

Recognition, Clustering

Fig. 5.1. Eberhart et a1.'s CI (after Figure 9.2 of [9])

Intelligent Behavior

Let's see what lies underneath this ringing endorsement. Eberhart's view is captured in Fig. 5.1 , which is an adaptation of Fig. 9.2 in [9]. Eberhart's figure is more sophisticated than my reproduction of it in that his arrowhead sizes are proportioned to show his estimate of the relational strength between various edges connecting the five nodes contained in the box labeled "The Intelligent System". Figure 5.1 will do for this article. First I will review Eberhart et a1.'s

32

discussion of the relevant aspects of Fig. 5.1, and then I will address four criticisms he levels at my Fig. 2.1.

Figure 5.1 shows the intelligent system imbedded in an environment. The system shown is either carbon-based or silicon-based, so this aspect of Fig. 5.1 is very similar to the position taken by Fogel. The response of the system to its inputs is called intelligent behavior, the same term that appears in Definition F. However, Fig. 5.1 has much more detail than Fogel's discussion, beginning with a set of five interacting nodes that comprise the intelligent system.

Like Fogel, Eberhart et al. [9] discuss intelligence itself, and while they are not explicit about making a definition of it, I believe that the following states their position.

Definition EDSI (Eberhart et al. 1996 in [9], Chap. 9): Intelligent Behavior

If there is no action or communication [from the system in response to its inputs] that affects the environment, there is no intelligent behavior.

This is a very different statement than Definition F, which places responsibility for intelligent behavior squarely on the shoulders of adaptation to the environment. In definition EDS I the hallmark of intelligent behavior seems to be the ability to alter or act on the environment, not adapt to it. To me there are some fairly obvious problems with this position, not the least of which is exemplified by what I would consider very unintelligent behavior by humans towards their environment (pesticides, pollution, nuclear weapons testing, etc.). All of the examples I just cited are signs of intelligent behavior in Fogel's sense (pesticides are one way humans attempt to adapt to their environment, just as nuclear weapons are). But the use of pesticides and nuclear weapons to alter the environment is, for many of us anyway, very unintelligent behavior.

This seems to add a new dimension to the discussion (value judgments), but actually I don't think it does. I would not argue that MY computationally intelligent vehicle recognition system could be put to unintelligent uses (like killing people), Definitions F and EDSI are wonderful evidence that my original argument about semantic mischief is important. These two definitions of intelligent behavior seem very opposite to each other, and lead us (me, anyway) far,jar away from the original goals in [1, 5] of stimulating sensible discussions about algorithmic descriptors - how do we choose words that accurately describe properties of computational engines without entering the abyss of misinterpretation caused by the imprecision of natural language? Rather than clear the air, I think these two definitions prove my point.

Returning to Fig. 5.1, Eberhart et al. show CI as an internal node of the intelligent system. According to this figure, adaptation is the hallmark of CI only (and not, as in Definition F, of intelligent behavior itself). Indeed, Eberhart et al. state:

Definition EDS2 (Eberhart et al. 1996 in [9], Chap. 9)): Computational Intelligence

33

In this book, computational intelligence is defined as a methodology involving computing (whether with a computer, wetware, etc.) that exhibits an ability to adapt to and/or deal with new situations, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The output of a computationally intelligent system often includes predictions and/or decisions.

This is a long definition that involves many different ideas. Eberhart et al. shorten it considerably after discussing Figure 9.2 in [9] and the notion of learning. While Fogel avoids an explicit discussion of this term, Eberhart et al. carefully distinguish between their views of adaptation and learning. In short, they assert that learning is what the entire intelligent system in Fig. 5.1 does, whereas adaptation mainly applies to the area where computational intelligence is relevant. This leads them to state that:

Definition EDS3 (Eberhart et al. 1996 in [9], Chap. 9): Computational Intelligence

In summary, adaptation is arguably the most appropriate term for what computationally intelligent systems do. In fact, it is not too much of a stretch to say that computational intelligence and adaptation are synonymous.

This is a very compact definition: CI is adaptation. Do you like this definition? If you do, the next thing you will need is a working definition of adaptation. I discussed this in some detail in [5], and won't repeat my opinion here, other than to state that there is a pretty diverse set of opinions in our engineering and scientific literature, e.g., about what an adaptive algorithm is.

Eberhart et al. also assert that "computational intelligence systems in silicon often comprise hybrids of paradigms such as artificial neural networks, fuzzy systems and evolutionary computation systems, augmented with knowledge elements". This suggests that they accept the party line umbrella in Fig. 3.1 for disciplines that afford CI capability, and their usage of the term ANN is in agreement with the structure I proposed in Fig. 2.1. On the other hand, the authors draw particular attention in the introduction of Chap. 9 of [9] to four points of disagreement they have about Fig. 2.1, and I will turn to these now.

EDS point 1. Eberhart et al. disagree with, in their words, my dichotomy between biological and computational systems. Instead, they side with Fogel in making no distinction between carbon-based and silicon based intelligence. Well, this implies that there is an indisputable test for the possession of intelligence, and of course there is not. Whether there should be a distinction or not is entirely a matter of opinion, and in this instance EDS and F hold a different opinion than I do. Neither stance is (verifiably) correct - they are simply different.

EDS point 2. Eberhart et al. disagree with my statement that some computational models do not have biological equivalents, and offer this to prove their point:

34

All computational models implemented by humans must have biological analogies, since humans conceived of, designed, developed and tested them. We can implement only what we create out of our consciousness. [ ... ] It is likely therefore, that intelligence exists that has no biological equivalent, but computational models developed by humans must have biological analogies.

What? This is pretty deep stuff. Surely EDS do not mean to suggest that mathematical models have biological analogs simply because we thought of them, and we are biological. I don't think there is a biological analogy for, say, the irrational numbers. If there is, I would like to know what it is.

EDS point 3. Eberhart et al. disagree with my characterization of nodes in Fig. 2.2 as subsets of other nodes. I partially agree with them about this, and this specific objection led me to alter the original figure to the more imprecise one now shown as Fig. 2.1 and explained above. However, I still think that the inclusion relationships for the two bottom rows are correct, and this is based on my knowledge-tidbits distinction between the words computational and artificial.

EDS point 4. Eberhart et al. object to [their perception] of my requirement that nodes such as CI pass through nodes such as AI to get to BI. Either this is really objection 1 in disguise (obviously this point is moot if the B level is given equal status with the C and A levels), or EDS misunderstood my use of the word "commutative". The point of calling Fig. 2.1 commutative is to suggest that, for example, if nodes 1 and 2 are connected, and nodes 2 and 3 are connected, there is a direct path from node 1 to node 3. Perhaps my use of the term commutative in its mathematical sense was a poor choice. I have tried to clarify this in discussions about Figs. 2.3 and 2.4, by writing, for example, that there are two ways to get to APR from CNNs: (CNN + CPR+KTs) or (CNN + ANN + PR), etc.

Summarizing the views of Eberhart et al.: they think that adaptive behavior is the primary ingredient of computational intelligence (I agree), but not of intelligence (I don't know), that B and A systems can both exhibit intelligence (I agree), and that CI is an integral part of every intelligent system (I disagree: I don't think human intelligence is computational at all).

6. Conclusions

A paper like this doesn't really need conclusions, but let me offer one anyway. My purpose in [1, 5] was directed towards elimination of the use of seductive semantics in scientific writing. I don't think it is useful (in fact, I think it is unintentionally misleading) to read that your algorithm "learns" or is "adaptive" unless you tell me in a technical way that is specific to your model what you mean by these words. Everyone seems to have such models - whose learns best? which are more adaptive? Writing that uses words such as these would be

35

greatly improved if the meaning intep.ded by authors was clearly specified, and that is really what I wanted to focus on - crisp technical writing. Again, I point out that I am not exempt from this criticism myself, having used the word adaptive without specifying a meaning for it in [21].My distinction between computational and artificial models based in the injection of knowledge tidbits from humans arose from a desire to specifically point out that feed-forward neural networks do not "learn" in the sense I think humans do (BDS think otherwise, and I think they are wrong because I don't think biological intelligence in the whole can be replicated by non-biological systems). But [I, 5] have inspired many authors to jump into the "what is intelligence?" fray - a jump that for me - a low level nuts and bolts kind of guy - is fraught with peril. I don't think there is a safe landing place - at least not one we will ever agree to.

So, is computational intelligence more than just a buzzword? Others seem to think so, but I still don't know. Let me pass along this quote that Bob Marks used for another purpose in [15] that displays the value of powerful buzzwords:

So we went to Atari and said, 'Hey, we've got this amazing thing, even built with some of your parts, and what do you think about funding us? Or we'll even give it to you. We just want to do it. Pay our salary, we'll come work for you.' And they said 'No'. So then we went to Hewlett-Packard, and they said 'Hey, we don't need you. You haven't got through college yet.'

Steve Jobs, founder of Apple Computer Inc. on attempts to get Atari and H-P interested in his and Steve Wozniac's personal computer.

Perhaps if the two Steves had told Atari and H-P that they had a design for a computationally intelligent system, their proposal would have been funded. This emphasizes the utility of a good buzzword. But isn't there more to it than that? Well, sure. I think the real point of using the term is that it places the emphasis on models and methods that try to solve realistic problems, as opposed to "teaching machines how to think". I emphasize again that for me, models and methods such as FL, NNs and BC are enabling technologies for AI. Certainly many AI researchers have come to think so.

Artificial intelligence sets some pretty lofty goals, and the realization that most of them are just not attainable with our current understanding of the way the BNN enables intelligence or intelligent behavior has left a terminology vacuum for those who want to back away from such grand objectives. The word computational is much less provocative than the word artificial, and really connotes a "feet on the ground" approach to problem solving. I think this is the real appeal of the term, and I think this is a good way to use it.

Acknowledgment. Supported by ONR Grant # NOOOI4-96-I-0642.

36

References

1. Bezdek, J. (1992). On the relationship between neural networks, pattern recognition and intelligence, Int. Jo. Approx. Reasoning, 6(2),85-107.

2. Cheeseman, P. (1988). An Inquiry into Computer Understanding, Compo Intell., 4, 57-142.

3. Marks, R. (1993). Intelligence: Computational versus Artificial, IEEE Trans. Neural Networks, 4(5), 737-739.

4. Zurada, J., Marks, R. and Robinson, C. (1994). Introduction to Computational Intelligence: Imitating Life, ed. J. Zurada, R Marks and C. Robinson, IEEE Press, Piscataway, NJ, v-xi.

5. Bezdek, J.C. (1994). What is Computational Intelligence? in Computational Intelligence: Imitating Life, ed. J. Zurada, R Marks and C. Robinson, IEEE Press, Piscataway,I-12.

6. Fogel, D. (1995). Review of Computational Intelligence Imitating Life, ed. J. Zurada, B. Marks and C. Robinson, IEEE Press, Piscataway, NJ, IEEE Trans. Neural Networks, 6(6), 1562-1565.

7. Palaniswami, M., Attikiouzel, Y., Marks, RJ., Fogel, D. and Fukuda, T. (1995). Introduction to Computational Intelligence: A Dynamic System Perspective, ed. Palaniswami, M., Attikiouzel, Y., Marks, RJ., Fogel, D. and Fukuda, T., IEEE Press, Piscataway, NJ, 1-5.

8. Eberhart, R. (1995). Computational intelligence: a snapshot, in Computational Intelligence: A Dynamic System Perspective, ed. Palaniswami, M., Attikiouzel, Y., Marks, R.J., Fogel, D. and Fukuda, T., IEEE Press, Piscataway, NJ, 9-15.

9. Eberhart, R, Dobbins, R W. and Simpson, P.K. (1996). Computational intelligence PC tools, in press, Academic Press Professional (APP), NY.

10. Mccullogh W. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity, Bull. Math. Biophysics, 7, 115-133.

11. Barricelli, N.A. (1954). Esempi Numerici di Processi di Evoluzione, Methodos, 6(21-22), 45-68.

12. Zadeh, L.A. (1965). Fuzzy Sets, Information and Control, 8, 338-352. 13. Fogel, D.B.(1995). Evolutionary Computation: Toward a New Philosophy

of Machine Intelligence, IEEE Press, Piscataway, NJ. 14. Webster's New World Dictionary of Computer Terms, 3rd ed., Prentice

Hall, Englewood Cliffs, NJ, 1988, 13. 15. Marks, R (1996). Neural Network Evolution: Some Comments on the

Passing Scene, in Proc. IEEE ICNN (Plenary, Panel & Special Sessions Volume), IEEE Press, Piscataway, NJ.I-6.

16. Hofstadter, D. (1981). The Turing Tests; A Coffeehouse Conversation, in The Mind's l, ed. D.Hofstadter and D. Dennett, Bantam, NY, 69-91.

17. Searle, J. (1981). Minds, Brains and Programs, in The Mind's I, ed. D. Hofstadter and D. Dennett, Bantam, NY, 353-372.

37

18. Hayes, P. and Ford, K. (1995). Turing test considered harmful, in Proc. I995IJCAl, (1), Morgan-Kaufmann, San Mateo, CA., 972-977.

19. Fogel, L.J., Owens, A.J. and Walsh, MJ. (1966). Artificial intelligence through simulated evolution, Wiley, NY.

20. Holland, J.H. (1975). Adaptation in Natural and Artificial Systems, U. of Michigan Press, Ann Arbor, MI.

21. Lee, J.S.J and Bezdek, J.C. (1988). A Feature Projection Based Adaptive Pattern Recognition Network, Proc. IEEE ICNN, I, IEEE Computer Society Press, 497-505.

Computational Intelligence: Extended Truth Tables and Fuzzy Normal Forms

I. Burhan Tiirk!jen

Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario M5S 3G8, Canada [email protected]

Abstract. Our native intelligence captures and encodes our knowledge into our biological neural networks, and communicates them to the external world via linguistic expressions of a natural language. These linguistic expressions are naturally constrained by syntax and semantics of a given natural language and its cultural base of abstractions. Next, an accepted scientific paradigm and its language further restricts these linguistic expressions when propositional and predicate expressions are formulated in order to express either assumed or observed relationships between elements of a given domain of concern. Finally the symbols are represented with numbers in order to enable a computational aparatus to execute those assumed or observed relationships within the uniqueness of a given numerical scale. In this manner, our knowledge of a particular system's behavior patterns are first expressed in a linguistic form and then transformed into computational expressions through at least two sets of major transformations stated above, i.e., first from language to formulae next from formulae to numbers.

In this context, fuzzy normal-form formulae of linguistic expressions are derived with the construction of "Extended Truth Tables". Depending on the set of axioms exhibited and/or we are willing to impose on the linguistic expressions of our native intelligence, we might arrive at different computational intelligence expressions for purposeful goal-oriented control of systems. In particular, it is shown that derivation of normal-form formulae for "Fuzzy Middle" and "Fuzzy Contradiction" lead to unique and enriched interpretations in comparison to the classical "Excluded Middle" and "Crisp Contradiction" expressions.

1. Introduction

In this paper, we discuss the transformations of our knowledge, captured and encoded by our native intelligence which is expressed initially in linguistic expressions of a natural language, and then transformed to some numerical expressions of computational intelligence.

Within a linguistic perspective of intelligent systems, it is appropriate to unify the views of the 19th century philosopher and logician C.S. Peirce [1], the 20th century systems scientist L. von Bertalanffy [2] and the 20th century author of "General Semantics", A. Korzybski [3], in a framework.


39

In "A Systems View of Man", it is stated that a human is a "denizen of two worlds" [2]: (i) a biological organism and (ii) a universe of symbols. Besides certain biological differences, the prime distinguishing characteristic of humans appears to be their creation of a universe of symbols in thought and language. Clearly humans live not just in a world of things but of symbols. Humans with their native intelligence create, develop and apply, i.e., on the one hand, dominate but, in turn, are dominated by such a universe of symbols. In this perspective, languages, arts, sciences, and other cultural forms are able to attain a relatively autonomous existence trancending the personalities and lifetimes of their individual creators. From anthropology, we learn that the degrees of socio-cultural developments of different civilizations depend on their capacity to produce higher and higher abstractions which eventually culminate in a general consciousness of abstracting. As A. Whitehead [4] noted, we are gradually recognizing and coming to a state of comprehension that civilizations can only rejuvinate themselves by pushing beyond their current state of abstractions; otherwise they are doomed to sterility after a brief period of progress. In general, abstractions become codified into paradigms such as "Aristotelian", "non-Aristotelian", etc., systems [3]. In this regard, theories of fuzzy sets and logics have emerged as a result of many attempts to break out of the constraints of Aristotalian system of though that have dominated most of western culture and its specific thought processes and patterns. J. Lukasiewicz's "three-valued logic"[5], M. Black's "vagueness" [6], A. Korzybski's "general semantics" [3] are only a few amongst so many attempts that have culminated eventually in L.A. Zadeh's seminal work on "Fuzzy Sets" [7]. From the perspective of semiotics and Peirce's [1] interpretation of signs, we find that linguistic expressions and logical formulae are intermidiary referends to our native intelligence and to our repository of knowledge that are our mental models.

In this sense, Zadeh's [1965-1996] contributions are very significant in proposing a new representation for vague and imprecise data, information and knowledge with a new encoding of signs interms of information granules determined with fuzzy set theory.

In general, knowledge is encoded initially in atomic or compound linguistic expressions of a natural language and then first transform into axiomatic expressions known as propositions and predicates with set theories, membership functions and their connectives and then next transformed into computational expressions with the assignment of numbers to the symbols determined in the first transformation (see Table 1). In our framework, human intelligence as encoded in biological neural networks is omitted since we are to discuss here the transformations of human intelligence from linguistic to computational expressions.

In this framework, we start out with linguistic neural networks LNN which are all linguistic expressions interconnected in a given universe of discourse. They are formally known as semantic nets. This is the domain of any spoken or written words known as "text" in the discussions of post modernist [8]. There is a mutual interaction between LNN and linguistic pattern recognition, LPR. LPR expressions contain vague and ambiguous linguistic values of linguistic terms, such as

40

"If the inventory is low and the demand is high, then the production should be high" (1)

where low, high are the linguistic values of linguistic variables, i.e., inventory, demand and production, respectively. It should be observed that low, high are linguistic information granules (summaries) and are known as "fuzzy sets" [7].

Table 1: Transformations of linguistic intelligence to computations

Representation Paradigm-Complexity Level

1 LNN .. LPR"Ll L-Linguistic

I I I 0 Set Theorie ANN" APR .. AI A-Axiomatic

I 1 I .~ CI.) -~ 0 Numerical CNN .. CPR .. CI C-Computational

U

Such linguistic pattern recognition expressions are then used to construct syllogistic reasoning and hence are.a part of our linguistic intelligence, LI. That is, if we observe in a production system that "the inventory is a bit low and the demand is a bit high then given that our knowledge is encoded in a linguistic pattern recognition expression such as rule (1) stated above, we would deduce through our linguistic reasoning that "the production should be just a bit high".

Naturally, this is a restricted version of our native human knowledge and intelligence encoded in our biological neuronal constructs. But it appears that most people in everyday life reason in somewhat a similar manner where information granules are identified and processed with linguistic terms of a natural language via human information processing capability. Bio-physics and psycho-linguistic research attempts to discover hidden mysteries of human information processing but in the main our knowledge is not sufficient to know and understand this process explicitly. Instead we turn our attention to scientific abstractions and formalisms with axioms and hypothesis. For this purpose, we transform linguistic neural networks, LNN, linguistic pattern recognition, LPR, and linguistic

41

intelligence, LI, into axiomatic neural networks, ANN, axiomatic pattern recognition, APR, and axiomati-c intelligence, AI.

Within the scope of scientific abstraction, we generate short hand notations to represent linguistic variables such as: inventory, demand, and production, with symbols X, Y, Z, respectively, and to represent linguistic values such as "low", "high" and "high", with fuzzy set symbols A, B, and C, respectively. Thus the transformation of the linguistic expression (I) becomes:

"If X isr A and Y isr B, then Z isr C" (2)

where "and", "if ... then" are linguistic connectives and "isr" is a multi-valued, many-to-many relational mapping [9]; that is, from membership function to membership function mappings. This many-to-many relational mapping operator is to be interpreted depending on context as "belongs to", "is compatible with", "is similar to", etc. [10]

As an example of these linguistic transformations we will first investigate the expressions of fuzzy normal forms by constructing fuzzy truth tables. A derivation of fuzzy normal forms will then be given for "A OR B" and "A AND B". This will form the bases for the derivation of normal form formulae for the "Fuzzy Middle" and "Fuzzy Contradiction". We will first examine the fundamental atomic expressions such as A, and B then their propositional expressions with affirmations and negations and then their particularization, in terms of the elements of a set and their affirmation and negation, arriving at predicate expressions. We will discover that while the expressions of "Fuzzy Middle" and "Fuzzy Contradiction" becomes "Laws" in the absolute sense in the Boolean Theory, they are graded "Laws" in a relative sense in Fuzzy Theories. Furthermore, there are the Laws of Conservation to be recognized in all logic theories whether Boolean or Fuzzy.

2. Symbols, Propositions and Predicates

In our man-made world, every knowledge tid-bit or every observation is expressed with a symbol. The term 'symbol' stands for a variety of things, words in a natural language or letters X, Y, Z, ... A, B, C, ... , etc., in scientific expressions. Furthermore, a 'symbol' is defined as a "sign" which stands for something. If it does not stand for something, then it becomes not a symbol but a meaningless sign [1]. Thus symbols have their associated semantics. Usually, the reality behind any symbol is a biological mental state which is the most precious characteristic of human beings.

Let for example, the set symbol A stands for our mental state that is linguistically expressed as "the set of inventory levels that are high", and the symbol B stand for our mental state that is linguistically expressed as "the set of demand rates that are low". Let also X denote "the set of inventory levels" and Y denote "the set of demand rates". Furthermore, let X'cX be the subset of inventory levels that are assigned to the inventory levels that are high, i.e., A; and

42

Y' c Y be the subset of demand rates that are assigned to the demand rates that are low, i.e., B.

In studies of logic, we investigate propositions of such linguistic expressions with the symbols of their set representations. It should be recalled that a proposition is "an expression in a language that is either true or false". For the example case stated above, the propositions of the linguistic expressions with their symbols defined above are stated below:

(1.1) "X' is in the set of inventory levels that are high, A", is true, T; i.e., X' isr A is T;

(1.2) "X' is not in the set of inventory levels that are high, A", is false, F; i.e., X' isr not A is F;

(2.1) "Y' is in the set of demand rates that are low, B", is true, T; i.e., Y' isr B is T;

(2.2) "Y' is not in the set of demand rates that are low, B", is false, F; i.e., Y' isr not B is F.

where "isr" is a short hand notation that stands for "is in", "belongs to", "compatible with",etc., depending on context [9].

At times, we need to specify a particular inventory level or demand rate with a predicate. Again it should be recalled that a predicate is "something that is affirmed or denied of the object in a proposition in logic".

For example, the predicates of the two propositions (Ll) and (1.2) stated above are:

(1.1)' "A particular inventory level say x=100, or any other X EX', in the set of the inventory levels, that is assigned to the set A with the membership value ll(x,A)=a" is true, T, i.e., "X E X' isr A with a" is T;

(1.2)' "A particular tinventory level say X= 1 00, or any X EX', in the set of the inventory levels, that is assigned to the set A with the membership value ll(x,A)=a", is false, F, i.e., "x E X' isr A, with a" is F.

In a similar manner the predicates of the two propositions (2.1) and (2.2) stated above are:

(2.1)' "A particular demand rate say y=50, or any other y E Y' , in the set of the demand rates, that is assigned to the set B with the membership value ll(y,B)=b" is true, T, i.e., "y E Y' isr B with b" is T;

(2.2)' "A particular demand rate say y=50, or any other y E Y' , in the set of the demand rates, that is assigned to the set B with the membership value ll(y,B)=b" is false, F, i.e., "y E Y' isr B with b" is F.

2.1 Fuzzy Truth Tables

Naturally there are two possible cases in which these predicate expressions hold valid, i.e., for case where a$ b or a>b. We can construct the Extended Truth Table, i.e., The Truth Table which is constructed over Fuzzy sets A and B with two-valued logic {T,F}, in a similar manner to the classical Truth Table

43

construction but realizing the fact that there will have to be 8 entries. Four entries corresponding to the case a:S b and another four entries corresponding a>b where a, b are the membership values for the particular x E X' and y E Y' that identify the predicate assignments of objects x and y to the fuzzy sets A and B, respectively.

Table 2. Extended Truth Table representing linguistic predicate expressions (1.1)" (1.2)" (2.1), and (2.2)' for cases a:S band a>b. This is a Truth Table formed over Fuzzy Sets, A

and B with two-valued Logic of {T, F}, i.e., TLFS.

Predicate Set Labels Membership Linguistic Expression

Values A B

(1.1)' (2.1)' T T

(1.1)'(2.2)' T F a:Sb

(1.2)'(2.1)' F T

(1.2)'(2.2)' F F

(1.1)'(2.1)' T T

(1.1)' (2.2)' T F

a>b (1.2)'(2.1)' F T

(1.2)'(2.2)' F F

It should be noted that in this Extended Truth Table, i.e., The Truth Table which is constructed over Fuzzy sets A and B with two-valued logic {T,F}, we have separated the set membership values and the truth assignments to the linguistic expressions. If the linguistic expression is "affirmed" the corresponding set label is assigned a "T", true, on the other hand if it is "negated" then the corresponding set label is assigned an "F". Thus, the Truth Tables constructed in this manner, represents a two-valued logic formed over infinite (fuzzy) valued set theory. This is the first extension of two-valued logic formed over two-valued set theory. The next extension of Truth Tables require the formation of infinite-valued logic over infinite-valued set theory. This is a future topic of research. For clarity, let us identify these three classes of logic as follows: (i) two-valued logic over two-

44

valued sets, TLTS, (ii) two-valued logic over fuzzy-valued sets, TLFS, and (iii) fuzzy valued logic over fuzzy valued sets, FLFS.

2.2 Fuzzy Normal Forms

With the construction of Extended Truth Tables proposed above and constructed with those principles, and shown in Table 2, we can now derive Fuzzy Normal Forms of combined concepts such as "A OR B" and "A AND B", etc.

For this purpose let us consider two linguistic expressions as follows: OR - "the set of inventories levels that are high", A, or "the set of demand rates

that are low", B, i.e., "A OR B". AND -"the set of inventories levels that are high", A, and "the set of demand

rates that are low", B, i.e., "A AND B". We show in Table 2 the definition of combined concepts "A OR B" and "A

AND B". It is to be noted that the first four and last four entries of these tables are exactly equivalent in the form only to the two-valued logic and two-valued set based Truth Tables entries of "A OR B" and "A AND B"; except that there is the duplication due to the fact that a,b E [0,1], we have two possibilities of a::;; band a>b. Observing the definitions of the Truth Table 3 and recalling the "Normal Form" generation algorithm of two-valued logic (Appendix), we can now write the Fuzzy Disjunctive Normal Form, FDNF, and Fuzzy Conjunctive Normal Form, FCNF for "A OR B" and "A AND B" as follows:

FDNF(A OR B) = (A !I B)u(A !I N(B))u (N(A) !I B) U

(A !I B)u(A!I N(B))U (N(A) !I B)

FCNF(A OR B) = N[(N(A) !I N(B))U(N(A)!1 N(B))] = (AuB)!I(AuB)

FDNF(A AND B) = (A!lB)u(A!lB)

FCNF(A AND B) = N[(A !I N(B))U (N(A)!IB)U (N(A) !I N(B)) U

(A !I N(B))u (N(A) !I B)u (N(A)!1 N(B))]

=(N(A)u B)!I(A U N(B)) !I (A uB) n (N(A)uB)!I(A U N(B)) !I (A U B)

Again, it is clear that these normal forms are equivalent to the two-valued logic and two valued set based normal form expressions except for the fact that there are duplicate terms due to the infinite valued set based two-valued logic formation.

It is also clear that these normal forms are in general for the t-norm-conorm and standard negation connectives that are commutative, associative, but not idempotent. For the special case of fuzzy set theory which is known as Zadehean fuzzy set theory with Max-Min and Standard Negation based connectives the FCNF and FDNF collapse into an equivalent form to those of two-valued logic and two-valued set. But this equivalence is in form only due to the difference in

45

the fact that in fuzzy set theory a,b E [0, l] whereas in two-valued set theory, a,bE {O,l}.

In two-valued logic and fuzzy set theory FDNF( e) :;t: FCNF( e) for the general case of t-norm-conorm and standard negation [10]. It is shown, however, that FDNF(e) CFCNF(e) [10,15] for the case ofZadehean fuzzy theory. While it is

known (F)DNF (e) =(F)FCNF (e) in two-valued logic and two-valued set based normal forms with Boolean set operators.

With these preliminaries we are now ready to re-investigate the concepts of "Excluded Middle" and "Crisp Contradiction" in Boolean Theory and then reassess "Fuzzy Middle" and "Fuzzy Contradiction" in the Fuzzy Theory.

Table 3. The Extended Truth Table definitions for the combined concepts of "A OR B" and "A AND B" in two-valued logic formed over infinite (fuzzy) valued sets, i.e., TLFS.

Predicate Set Labels Membership Linguistic Expression

Values A B AORB AANDB

(1.1 )'(2.1)' T T T T

(1.1)'(2.2)' T F T F a:5b

(1.2)'(2.1)' F T T F

(1.2)'(2.2)' F F F F

(1.1)'(2.1)' T T T T

(1.1)'(2.2)' T F T F

a.>b (1.2)' (2.1)' F T T F

(l.2)' (2.2)' F F F F

3. Excluded Middle and Crisp Contradiction

At the origin of concept formation, the notions of the "Excluded Middle" and its dual, the "Crisp Contradiction" are abstract mental models. They are: (i) first expressed linguistically in a natural language with the linguistic "OR" and "AND" connectives, respectively, as the combinations of two independent statements

46

which are: (1) the definition of a concept label as a set, and (2) the negation of the same concept label as the complement of the set defined in (1), (ii) secondly, these concept definitions are re-stated as propositions with assertions about their truthood; and (iii) thirdly, they are turned into predicates, i.e., truth qualified statements for individual items, objects, subjects, etc., in a given proposition. Predicate statements require assignments of membership values to elements of a given set. Hence if we chose the two-valued set membership valuation in {O, I}, then we find that the classical Laws of the Excluded Middle and Crisp Contradiction are upheld by the expressions of "Excluded Middle" and "Crisp Contradiction". On the other hand, if we chose the infinite-valued (fuzzy) set membership valuation in [0,1], the classical Laws of the Excluded Middle and Crisp Contradiction are no longer laws in the classical sense. Instead, however, we find that there are new expressions which we call "Fuzzy Middle" and "Fuzzy Contradiction". In fuzzy set theory, they are valid expressions that hold as a matter of degree which are bound by fuzzy normal forms known as Fuzzy Disjunctive and Conjunctive normal forms, FDNF and FCNF. It is discovered that there are also laws of "Conservation" that are upheld between these FDNF and FCNF expressions. These we discuss next.

With the preliminary concepts and the definitions given in Section 2 and the FDNF (-) and FCNF(-) determined in Section 3, we are now ready to determine the FDNF's and FCNF's of "A OR N(A)" and "A AND N(A)" for both the Boolean and Fuzzy Logics.

3.1 Boolean CNF's and DNF

In order to obtain "A OR N(A)" and "A AND N(A)" expressions, all we have to do substitute N(A) in place of B and A in place of N(B) in the FDNF and FCNF expressions that were derived in Section 2.2. Thus we obtain first the fuzzy set formulae for these expressions and then with the application of Boolean set theory axioms, we get the expected results. Hence, we start with FDNF and FCNF expressions of, "A OR N(A)", as:

FDNFl (A OR N(A» = (A n N(A» u (A n A) u (N(A) n N(A» U (A n N(A» u (A n A) u (N(A) n N(A»

(Commutativity, Idempotency) = (AnN(A»uAu(N(A» (LEM)

DNI)2 (A OR N(A» = (A n N(A» u X = X (Absorption)

FCNI)2 (A OR N(A» = N{(N(A)nA) u(N(A)nA)} (Involutive Negation) = (AuN(A)n(AuN(A» (Idempotency) = (A u N(A» (LEM)

CNF(2 (A ORN(A» = X

It is to be noted that these are I-variable and 2-dimensional expressions [13] hence the superscript 2 and subscript 1.

47

Hence, with the application of Boolean set theory axioms, i.e., idempotency and LEM, it is found as it should be

DNF,2 (A OR N(A» = CNFI2 (A OR N(A» = X

In a similar manner, we derive the DNF and CNF expressions of "A AND N(A)" in Boolean Theory. It should be noted that again the expressions are originally derived for fuzzy set theory with the application of normal form derivation algorithm (see Appendix) and its application to Table 3 and only after the application of LEM, LC and Idempotency axioms of Boolean theory, we get the two-valued set theory results; hence FDNF becomes DNF and FCNF becomes CNFas:

FDNFI2 (A AND N(A» = (A n N(A» u (N(A) n A)

= AnN(A) (LC)

DNF,2 (A AND N(A» = 0

(Commutativity and Idempotency)

FCNF? (A AND N(A» = N{(AnA)u(N(A) nN(A»u(N(A)nA) U

(AnA)u(N(A) n N(A»u(N(A)nA)} (Involutive Negation)

=(N(A)uN(A»n(Au A)n(AuN(A» n

(N(A)uN(A»n(A u A)n(A u N(A» (Commutativity, and Idempotency)

CNF?(A AND N(A» = (N(A)nA)n(A uN(A» (LEM) = (N(A)nA)n X (LC) = (N(A)nA)

=0 Again as expected it is found that

DNF,2 (A AND N(A» = CNF12 (A AND N(A» = 0

3.2 Fuzzy DNF's and CNF's

When we relax the restriction of the reductionist philosophy and allow the shades of gray to exist between the black and white, we receive the infinite (fuzzy) valued set theory, where the set symbols {A,N(A)} become values (labels) of linguistic variables such as "the inventory levels that are high" and "the inventory levels that are not high" as in our example. This relaxation, i.e., letting a, n(a) be in [0,1] resolves the Russell paradox and its other varieties, amongst others paradoxes.

Without any of the simplifications that were applied in the Boolean case above in Section 3.1, we get the FDNF and FCNF expressions of "Fuzzy Middle" directly from Table 3 with the substitution of N(A) in place of B and A in place of N(B):

FDNF[2 (A OR N(A» = (AnN(A»u(AnA)u(N(A)nN(A» U

(A nN(A»u(A nA)u(N(A)n N(A»

48

FCNF[2 (A OR N(A» = (A u N(A»n(A u N(A» (Involutive Negation)

But now, it is known that the crisp operators of the fuzzy set theory, i.e., t-norms and conorms, are non-idempotent. Therefore, FDNFt (A OR N(A» and FCNFt (A OR N(A» can not be simplified for the general class of fuzzy sets that are combined with t-norm and conorm operators. Furthermore, the Laws of Excluded Middle and Contradiction are no longer applicable since Au N(A) ~ X and An N(A);;2 0. Therefore we get FDNFt (A OR N(A» ::/:. FCNF[2 (A OR N(A».

In a similar manner, we then derive the FDNF and FCNF expressions for the "Fuzzy Contraction" as:

FDNF[2 (A AND N(A» = (A n N(A) u (A n N(A»

FCNF[2 (A AND N(A» = (N(A)uN(A»n(AuA)n(AuN(A» n (N(A) u N (A» n (A u A) n (A u N(A» (Involutive Negation)

Once again, t-norms and conorms are non idempotent and AnN (A) ;;2 0 and Au N(A) ~ X in general in all fuzzy set theories therefore there is no simplification possible in these terms. Hence, we find that

FDNF[2 (A AND N(A» ::/:. FCNF[2 (A AND N(A»

3.3 Zadehean FCNF's and FDNF's

Zadehean Fuzzy Logic is a special subclass of fuzzy theories where the axioms of "Distributivity", "Absorption" and "Idempotency" are applicable both in propositional and predicate domain expressions. Thus, we have the following normal forms for the Zadehean Fuzzy Logic:

FDNF[2 (A OR N(A» = Au N(A)

FCNF[2 (A OR N(A» = Au N(A)

FDNF[2 (A AND N(A» = An N(A)

FCNF[2 (A AND N(A» = An N(A)

(Commutativity, Idempotency, Max-Absorption)

(Idempotency)

(Idempotency)

(Commutativity, Idempotency, Min-Absorption)

It is to be observed that these are similar to their Boolean equivalents of "A OR N(A)" and "A AND N(A)" in form only before the application of LEM & LC in the Boolean theory.

49

Furthermore, it is to be realized that the "Fuzzy Middle" and "Fuzzy Contradiction" are upheld only as a matter of degree specified by a v n(a) E [0,1] and a 1\ n(a) E [0,1] . For example, this can be realized either from two different sensors or from the same sensor at two different time periods when we receive two sources of data such that one claims A and the other N(A).

3.4 T and S Normed FCNF's and FDNF's

In the t-norm-conorm class of fuzzy theories, let us investigate two well known subclasses formed by Algebraic Product and Sum, and Bold Intersection and Union for the case of "A OR N(A)" and "A AND N(A)" where the axioms of "Distributivity", "Absorption", and "Idempotency" are not applicable either in the propositional or in the predicate domain expressions.

3.4.1 Algebraic Product and Sum

In the predicate domain, the well known operators of Algebraic Product and Sum are:

T(a,b) = af>b = ab and S(a,b) = a Ee b = a+b-ab

where a,b E [0,1] are the generic membership values for every x E X. We rewrite the propositional expressions obtained above with the application of

commutativity and associativity, and we get by rearranging:

Jl(FDNF.2 (A OR N(A» = (af>n(a» Ee (af>n(a» Ee (af>a) $ (af>a) $(n(a)f>n(a» $ (n(a)f>n(a»

Jl(FCNF.2 (A OR N(A» = (a $ n(a»f>(a$ n(a»

Jl(FDNF.2 (A AND N(A» = (af>n(a» $ (af>n(a»

Jl(FCNF;2 (A AND N(A» = (a $ n(a»f>(a $ n(a»f>(a E9 a)f> (a Ee a)f>(n(a) $ n(a»f>(n(a)$ n(a»

We observe the Type II representation of "A OR N(A)" and "A AND N(A)", respectively. Hence the interpretation of the meta-linguistic expressions of "Fuzzy Middle" and "Fuzzy Contradiction" are to be stated as a relative matter of degree in the interval defined by Jl(FDNF.' (A OR N(A» and Jl(FCNFt (A OR N(A» and by Jl(FDNFt (A AND N(A» and Jl(FCNF;' (A AND N(A», respectively. Therefore, we cannot conclude an absolute contradiction when we observe both A and N(A) in two identical but independent experiments and/or when we receive two input information one for A and the other for N(A) from two identical but independent sensors.

50

3.4.2 Bold Intersection and Union

In the predicate domain, the well known operators of Bold Intersection and Union are:

T(a,b) = a t b = Max{O, a+b-1} S(a,b) = a s b = Min{ 1, a+b}

where a,bE [0,1] are the generic membership values for every xEX. Again, we rewrite the propositional expressions obtained above with the

application of commutativity and associativity, and we get by rearranging:

f1(FDNF12 (A OR N(A» = (a t n(a» s (a t n(a» s (a t a) s (a t a)

s (n(a) t n(a» s (n(a) t n(a»

Il(FCNF12 (A OR N(A» = (a s n(a» t (a s n(a»

Il(FDNF12 (A AND N(A» = (a t n(a» s (a t n(a»

Il(FCNF12 (A AND N(A» = (a s n(a» t (a s n(a» t (a sa) t (a sa) t (n(a) s n(a» t (n(a) s n(a»

Again we observe the Type II representation of "A OR N(A»" and lOA AND N(A»", respectively. Again the interpretations of the meta-linguistic expressions of "Fuzzy Middle" and "Fuzzy Contradiction" are to be stated as a relative matter of degree in the intervals defined by Il(FDNFt (A OR N(A» and Il(FCNFt (A OR N(A» and by Il(FDNFt (A AND N(A» and Il(FCNFt (A AND N(A», respectively. Here, again, we cannot conclude an absolute contradiction when we observe both A and N(A) in two identical but independent experiments and/or when we receive two input information from two identical but independent sensors one for A and the other N(A).

4. Laws of Conservation

In two-valued logic over two-valued sets, i.e., TLTS, we find that

Il[DNF12(A OR N(A» = CNF1\A OR N(A»] + f1[DNF12(A AND N(A»

== CNF12 (A AND N(A»] = 1

This is known as the Law of Conservation. It is found that in two-valued logic formed over fuzzy sets, i.e., TLFS, there are also "Laws of Conservation" but they are formed in a modified manner. It is found that, in TLFS, the Laws of Conservation hold between FDNF;"(A OR N(A» and FCNFt(A AND N(A» as well as FDNF12(A AND N(A»and FCNF12(A OR N(A». That is, we have:

51

I![FDNF(2(A OR N(A))] + I![FCNF(2(A AND N(A))] = I, and

I![FDNF(2(A AND N(A))] + I![FCNF(2(A OR N(A))] = 1

For example, it can be shown that the Laws of Conservation hold for Max-Min as well as for Algebraic Sum and Product, and for Bold Union and Intersection operator sets with Standard Negation.

4.1 Zadehean Laws of Conservation

For Max-Min and Standard Negation, we have from above:

I![FDNF(2 (A OR N(A))] = a v n(a), I![FCNF(2 (A AND N(A))] = a /\ n(a)

Therefore, a v n(a) + a/\ n(a) = 1.

On the other hand, we have:

I![FDNF(2(A AND N(A))] = a /\ n(a), I![FCNF(2(A OR N(A))] = a v n(a)

and hence a /\ n(a) + a v n(a) = 1.

4.2 Laws of Conservation T and S Nonned Fuzzy Logic

In the t-norm-conorm class of fuzzy theories, let us investigate the laws of conservation for the two well known subclasses formed by Algebraic Product and Sum, and Bold Intersection and Union operators for the case of "A OR N(A)" and "A AND N(A))" where the axioms of "Distributivity", "Absorption", and "Idempotency" are not applicable in either the propositional or in the predicate domain expressions.

4.2.1 Laws of Conservation in Algebraic Product and Sum

Laws of Conservation can be written directly with the results obtained above for the Algebraic Product and Sum and Standard Negation. Rewriting the predicate expressions of these normal forms, we have:

I![FCNF(2(A OR N(A))] = (aE9 n(a)) 9 (a E9 n(a)) ,

I![FDNF(2(A AND N(A))] = (a9 n(a) E9 (a9n(a))

Therefore, it is straight forward to show that:

I![FCNF;2(A OR N(A))] + /l[FDNF;2(A AND N(A))] = 1

52

At the same time, we have

Jl[FDNF(2(A OR N(A»] = (as n(a»E9 (as n(a» E9 (as a) E9 (as a) E9

(n(a) S n(a» E9 (n(a) S n(a»

Jl[FCNF(2(A AND N(A»] = (aE9 n(a»S (aE9 n(a» S (aE9 a)S (aE9 a) S

(n(a) E9 n(a» S (n(a) E9 n(a»

Hence it can also be shown that

The Laws of Conservation for the Algebraic Product and Sum do not present a direct closed form solution. For these laws, one can show the result with numerical calculation. This is left for the readers to verify.

4.2.2 Laws of Conservation in Bold Intersection and Union

Laws of Conservation can be written directly with the results obtained above for Bold Intersection and Union operators and Standard Negation. Rewriting the predicate expression of these normal forms, we have:

Jl[FCNI12(A OR N(A»] = (a s n(a» t (a s n(a»,

Jl[FDNF(2(A AND N(A»] = (a t n(a» s (a t n(a».

Therefore, it is straight forward to show that:

(a s n(a» t (a s n(a» = Max{O, Min{ 1, a+n(a)} + Min{ 1, a+n(a)}-l} = 1 (a t n(a» s (a t n(a» = Min{ 1, Max{O, a+n(a)-l} + Max{O, a+n(a)-l}} = °

Therefore,

In a similar manner it can be shown that

by substituting the Bold Intersection and Union operators where appropriate.

53

5. Normal Forms of Re-Affirmation and Re-Negation

An affirmative expression such as "temperature is cold" may be re-affirmed by two independent sources of information, say, by two independent sensor. Similarly, an negative expression such as "temperature is not cold" may also be re-negated by two independent sources of information again, say, by two independent sensors. Such expressions at times are known as sub-expression [12, 13]. Let us next investigate the fuzzy normal forms of such expressions.

5.1 Normal Forms of Re-Mfirmation

At the linguistic level a re-affirmation is a combination of an affirmative statement by another affirmative statement with one of the linguistic connectives "OR", "AND".

5.1.1 Re-Affirmation with "OR"

Such expressions may be stated in general in meta linguistic form as "A OR A". However, they would be expressed in detailed linguistic form in predicate expressions as follows:

"xEX' C X isr AI, with a l E [0,1], is T" or "xEX' C X isr A 2 , with

a2 E [0,1] is T", is T

"xEX' C X isr AI, with a l E [0,1], is T" or "xEX' C X isr A 2 , with

a2 E [0,1] is F", is T

"xEX' C X isr AI, with a l E [0,1], is F" or "xEX' C X isr A 2 , with

a2 E [0,1] is T", is T

"xEX' C X isr AI, with a l E [0,1], is F" or "xEX' C X isr A 2 , with

a2 E [0,1] is F", is F

where "E" means "which is a subset of' and A I and A 2 represent the two affirmations that are received from the first and the second independent sources of information.

Clearly, we can reconstruct the Extended Truth Tables of TLFS and obtain the fuzzy normal forms for the meta-linguistic expression" Al OR A 2 " which stands for the expression that represents in detailed the realization of the re-affirmation expression, "A OR A", where in the above detailed statements we have labeled the first A as A I , and the second A as A 2 in order to emphasize the fact that they are received from distinct independent sources of information and or they are received at different moments in time as t = 1, 2 from the same source, which could come from some instrument readings. But without re-producing these tables, we can write the fuzzy disjunctive and conjunctive normal form expressions again substituting this time A in place of Band N(A) in place of N(B) in the original expressions obtained in Section 2.2 above as follows:

54

FDNF,2 (AORA)= (AnA)u(AnN(A»u(N(A)nA) U

(A nA)u(An N(A»u(N(A)nA)

FCNF,2(AORA)= (AuA)n(AuA)

Clearly if we are in two-valued theory, i.e., in TLTS, we would get:

f..L[DNF/(A OR A)] = f..L[CNFI2 (A OR A)] = aE {O,I}

That is re-affirmation would be true absolutely as expected. However, if we are in Zadehean Max-Min and Standard Negation theory, then we get:

But now, the re-affirmation would be true only to a degree! Finally, if we are in t-Norm-Conorm and Standard Negation theory, we would

realize a separation between FDNF and FCNF expressions. For example, for Bold Union and Intersection, and Standard Negation, we would obtain:

f..L[FDNFI2 (A OR A)] = Min{ 1,2 Max{O, 2a-I}}

f..L[FCNFI2 (A OR A)] = Max{O, 2 Min{ I,2a}-I}

Therefore, the re-affirmation "A OR A" would be true in an interval of membership degrees specified within the interval [f..L[FDNFt (A OR A)] , f..L[FCNFI2(A OR A)]] where f..L[FDNFI2(A OR A)] :::; f..L[FCNFI2(A OR A)].

5.1.2 Re-Affirmation with "AND"

On the other hand, such expressions of re-confirmation may also be stated in general in meta-linguistic form as "A AND A". Their detailed linguistic expressions in predicate form are stated as follows:

"xEX' C X isr AI, with a l E [0,1], is T" and "xEX' C X isr A 2 , with

a 2 E [0,1] is T", is T

"xEX' C X isr AI, with a l E [0,1], is T" and "xEX' C X isr A 2 , with

a 2 E [0,1] is F", is F

"xEX' eX isr AI, with a l E[O,I], is F" and "xEX' C X isr A 2 , with

a 2 E [0,1] is T", is F

"xEX' C X isr AI, with al E [0,1], is F" and "xEX' C X isr A 2 , with

a2 E [0,1] is F", is F Again, we can write the FDNF and FCNF expressions by re-constructing the

Extended Truth Table of TLFS, which is again left for the reader to do. But

55

without reproducing these tables here, we can write the fuzzy disjunctive and conjunctive normal form expressions with appropriate substitution as follows:

FDNFl2 (A AND A)

FCNFl2 (A AND A)

= (AnA)u(AnA)

= (AuN(A))n(AuN(A))n(AuA) n (N(A)u A)n(A u N(A))n(A u A)

Again clearly, if we are in two-valued theory, i.e., TLTS, we would get:

I![FDNFI2(A AND A)] = I![FCNFt(A AND A)]= aE {O,l}

Hence the re-affirmation would be true absolutely as expected.

In Zadehean Max-Min and Standard Negation, again we get:

Therefore, the re-affirmation would be true only to a degree! Next, if we are in t-Norm-Conorm and Standard Negation theory, again, we

would realize a separation between FDNF and FCNF expressions. For example, for Bold Intersection-Union, we would obtain:

I![FDNFI2(A AND A)] = Min{ 1,2 Max{O, 2a-l}}

I![FCNFI\A AND A)]= Max{O, 2 Max{O, Min{ 1, 2a} }-l}

Therefore the re-affirmation "A AND A" would be true In an interval of membership degrees specified within the interval

[1![FDNFI2(A AND A)], I![FCNFI2(A AND A)l] where

I![FDNFI2(A AND A)] ~ I![FCNFI2(A AND A)l.

5.2 Normal Forms of Re-Negation

At the linguistic level re-negation is a combination of a negative statement by another negative statement with one of the linguistic connectives "OR", "AND". We leave writing such expressions to the reader and move directly to propositional expressions of normal forms for re-negation.

5.2.1 Re-Negation with "OR"

Without re-constructing the Truth Tables, we can write the Fuzzy Disjunctive and Conjunctive Normal Forms of "N(A) OR N(A)" as follows:

56

FDNFl2 (N(A) OR N(A» = (N(A) 11 N(A» u (N(A) 11 A) u (A 11 N(A» u (N(A) 11 N(A» u (N(A) 11 A) u (A 11 N(A»

FC~2 (N(A) OR N(A» = (N(A)u N(A»I1(N(A)u N(A»

Clearly, if we are in two-valued theory, we would get:

Jl[DNFI2 (N(A) OR N(A»] = Jl[CNFI2 (N(A) OR N(A»] = n(a)E {O,l}

That is re-negation would be true absolutely as expected. If however, we are in Zadehean Max-Min Standard Negation theory, we would

then get:

Jl[DNFI2 (N(A) OR N(A»] = Jl[CNFI2 (N(A) OR N(A»] = aE {O,l}

But now, the re-negation would be true only to a degree! Finally, if we are in t-Norm-Conorm and Standard Negation theory, we would

realize a separation between FDNF and FCNF expressions. For example, for Bold Union and Intersection and Standard Negation, we would obtain:

Jl[FD~2(N(A) OR N(A»] = Min{ 1,2 Max{O, 2n(a)-I}}

Jl[FCNFI2(N(A) OR N(A»] = Max{O, 2 Min{ 1, 2n(a)}-I}

Therefore, the re-negation "N(A) OR N(A)" would be true in an interval of membership degrees specified within the interval [Jl[FDNFt(N(A) OR N(A»], Jl[CNFI2 (N(A) OR N(A»)]], where

5.2.2 Re-Negation with "AND"

Again without re-constructing the Truth Tables, we can write Fuzzy Disjunctive and Conjunctive Normal Forms of "N(A) AND N(A)" as follows:

FDNFl2 (N(A) AND N(A» = (N(A)11 N(A»u(N(A)11 N(A»

FCNFl2 (N(A) AND N(A» = (AuN(A»I1(N(A)uA)11

(N(A) u N(A» 11 (A u N(A» 11 (N(A) u A) 11 (N(A) u N(A»

Clearly, if we were in two-valued theory, we would get:

57

~[DNF12 (N(A) AND N(A»] = ~[CNF12 (N(A) AND N(A»] = n(a) E {O, 1 }

That is re-negation would be true absolutely as expected.

If however, we are in Zadehean Max-Min Standard Negation theory, we would then get:

~[FDNF12 (N(A) AND N(A»] = ~[FCNF12 (N(A) AND N(A»] =n(a) E [0,1]

But now, the re-negation would be true only to a degree!

Finally, if we were in t-Norm-Conorm and Standard Negation theory, we would again realize a separation between FDNF and FCNF expressions. For example, for Bold Union and Intersection and Standard Negation, we would obtain:

~[FDNF12(N(A) AND N(A»] = Min{ 1,2 Max{O, 2n(a)-I}},

~[FCNF12(N(A) AND N(A»] = Max{O, 2 Min{ 1, 2n(a)}-I}.

Therefore, the re-negation "N(A) AND N(A)" would be true in an interval of membership degrees specified within the interval [~[FDNFt (N(A) AND N(A»], ~[FCNF12(N(A) AND N(A»]], where

6. Conclusion

We have demonstrated that one should start out one's discussion in logic initially with a conceptual linguistic statement and transform them to meta linguistic expressions, then to propositional expressions and finally to predicate expressions before launching onto numerical computations either in {O,I} or [0,1] domains of membership assignments. It was also pointed out that we must distinguish and separate membership assignments from the truth qualifications attributed to propositions and in particular to predicates in the re-construction of the Truth Tables. This distinction was not needed in two-valued theory and its associated Truth Table constructions. The results obtained by re-constructed Truth Tables produce the same well known results of two-valued theory under the axioms applicable to that theory. But the advantages of re-constructed Truth Tables are that:

1) We get more expressive information content from the "Fuzzy Middle" and the "Fuzzy Contradiction" expressions in infinite valued theory as opposed to "Excluded Middle" and "Crisp Contradiction" expressions of two-valued theory.

58

2) We can still get results fo.r sub co.ncepts such as "Re-affinnatio.n" and "Renegatio.n" but again with mo.re expressive info.rmatio.n co.ntent.

These results suggest that the Po.wer o.f expressio.n in the infinite (fuzzy) valued theo.ry helps us discern mo.re info.rmatio.n co.ntent fo.r a better systems analyses and decisio.n-making.

In the final analysis, it is impo.rtant to. no.te that the valuatio.n o.f sets and valuatio.n o.f Io.gic expressio.ns must be separated in o.rder to. have a better understanding o.f the strengths and weaknesses o.f two.-valued set and Io.gic theo.ry versus infinite-valued set and two.-valued Io.gic theo.ries. Furthenno.re, there is still the o.pen questio.n o.f what Wo.uld be the no.rmal fo.nn expressio.ns fo.r infinitevalued set and infinite valued Io.gic theories!

Appendix

a) First, assign truth values T, F to. the meta-linguistic values (labels, variables) A and B and then assign truth values T, F to. the meta-linguistic expressio.n o.f co.ncern, say "A AND B" in o.rder to define its meaning.

b) Next, co.nstruct primary co.njunctio.ns o.f the set symbo.ls, A, B, co.rresPo.nding to. linguistic values such that in a given row

i) if a T appears, then take the set affirmatio.n symbo.l o.f that meta-linguistic variable; o.therwise

ii) if a F appears, then take the set co.mplementatio.n symbo.l o.f that metalinguistic variable;

iii) next co.njunct the two. symbo.ls.

c) Then co.nstruct the disjunctive no.rmal fo.nn o.f the meta-linguistic expressio.n o.f co.ncern:

i) first, take the co.njunctio.ns co.rresPo.nding to. the T's o.f the truth assignment made under the co.lumn o.f the meta-linguistic expressio.n, such as "A AND B".

ii) next co.mbine these co.njunctio.ns with disjunctio.ns.

d) Next, co.nstruct the co.njunctive no.rmal fo.rm o.f the meta-linguistic expressio.n o.fco.ncern:

i) first, take the co.njunctio.ns co.rresPo.nding to. F's o.f the truth assignment made under the co.lumn o.f the meta-linguistic expressio.n, such as "A ANDB",and

ii) then, co.mbine these co.njunctio.ns with disjunctio.ns, and iii) next take the co.mplement o.f these disjuncted co.njunctio.ns.

59

References

1. C. Peirce, Reasoning and the Logic of Things, Harvard University Press, Cambridge, Mass., 1992.

2. L.V. Bertalanffy, A System View of Man, (in) P.A. La Violette (ed), Westview Press, Boulder Calorado, 1981.

3. A. Korzibsky, Science and Sanity, Fifth Edition, Institute of General Semantics, Englewood, New Jersey, 1995.

4. A.N. Whitehead, Science and Modem World, Mcmillan, New York, 1995. 5. 1. Lukasiewicz, On Three Valued Logic, (in) N. Rescher (ed), Many Valued Logic,

McGraw Hill, 1969. 6. M. Black, "Vagueness: An Exercise in Logical Analysis", Philosophy of Science 4,

427-455, (1937).

7. L.A. Zadeh, "Fuzzy Sets", Information and Control,. 8, 338-353, (1965). 8. S.K. White, The Recent Works on Jurgen Habermas, Cambridge University Press,

1988. 9. L.A. Zadeh, Fuzzy Logic = Computing with Words, IEEE Transactions on Fuzzy

Systems 4, 2103-111, (1996).

10. I.B. TIirk§en, "Fuzzy normal forms", Fuzzy Sets and System, 69, 319-346, (1995). II. LB. Turk§en, A. Kandel, Y.-Q. Zhang, "Univers~ Truth Tables and Normal Forms",

(submitted).

12. LB. Turk§en, "Fuzzy Truth Tables and Normal Forms", Proceedings of BUFL '96, December 15-18, TIT, Nagatsuta, Yokohama, Japan 7-12, (1996).

13. G.W. Schwede and A. Kandel, "Fuzzy maps", IEEE-Trans on System, Man Cyber, SMC-7, 9669-9674, (1977).

14. P.N. Marinos, "Fuzzy logic and its application to switching systems", IEEE Trans. Comput. C-18, 4,343-348, (1969).

15. LB. TIirksen, "Interval-Valued Fuzzy Sets Based on Normal Forms", Fuzzy Sets and Systems, 20, 191-210, (1986).

Uncertainty Theories by Modal Logic

Germano Resconi

Department of Mathematics, Catholic University, via Trieste 17, Brescia, Italy [email protected]

Abstract. In a series of papers initiated by Resconi et al. [1], interpretations for various uncertainty theories were proposed, including fuzzy set theory, DempsterShafer theory, and possible theory, using models of modal logic [1-2-3]. There were two main reasons for pursuing research in this direction. The first reason was to offer the standard semantics of modal logic as a unifying framework within which it would be possible to compare and relate uncertainty theories to each other. Since, from time to time, some" of the uncertainty theories are questioned regarding their internal adequateness, the second reason was to support them by developing interpretations for them in a relatively well-established area: in our case, modal logic. This paper is a summary of these efforts. To avoid unnecessary repetition of previous material, we will not repeat all the basic definitions and properties; the reader is referred to relevant literature for fuzzy set theory [5], for possibility theory [3], for Dempster-Shafer theory [4], and for modal logic [8]. A more thorough treatment of the material summarised in this paper is covered in the above-mentioned papers [4,5].

1. Basic of Modal Logic

Modal logic is an extension of classical proposition logic. Its language consists of the set of atomic proposition or propositional variables, logical connectives -. , A , V , ~ , <=> , modal operators of necessity 0 and possibility 0 , and supporting symbols ( , ) , { , } ...... The other objects of interest are formulas.

1. an atomic proposition is a formula

2. If A and B are formulas, then so are -. A, AvB, AAB, A~B, A<=>B, DA,OA.

When developed formally, different modal systems are characterized by different rules. Since we are not interested in developing formal systems, we have omitted a discussion of this matter.

The meaning of a formula is its truth value in a given context. Various contexts are usually expressed in terms of modal logic. A Kripke model M, of a modal logic is the triple


61

M=<W ,R, V> (1)

where W, R, V denote, a set of possible worlds, a binary relation on W, and a set of value assignment functions, (one for each world in W), respectively, by which truth (T) or falsity (F) is assigned to each atomic proposition. Value assignment functions are inductively extended to all formulas in the usual way, the only interesting cases being

Vj(DA)=TiffforallwjE W ,<Wi,Wj>E Rimpliesvj(A)=T (2)

Vi (¢A) = T iff there is some Wj E W such that < Wi, Wj> E Rand Vj( A) = T (3)

Relation R is usually called an accessible relation; we say that world Wj is accessible to world Wi when < Wi , Wj > E R. If not specified otherwise, we always assume that W is finite and that its cardinality is denoted by n ( I W I = n ). It is convenient to denote

W = {WI, W2, •.••••.•.. ,wn}

and to represent relation R by n x n matrix R = [ rij] , where

r .. = {I if < W j , W j > E R I.J 0 if < W j , W j > E R

and to define for each world Wj E Wand each formula p

_ {I if v j (p) = T (p)j - 0 if Vj(p) = F

2. Interpretation of Fuzzy Set Theory

Mamdani remarks on contradiction and modal logic:

(4)

(5)

(6)

When information is incomplete or missing the symbolic technique has arisen out of attempts to increase the expressive power of classical logic in this ways:

a) extend the reasoning ability of the logic:

The reasoning system is aware that some of the statements is reasons about may not actually be TRUE, and that these may give rise to contradictions in the conclusion drawn while classical logic would unable to cope with any

62

form of contradiction, a system with a reasoner that is aware of this possibility can allow recovery from contradiction. Only a subset of the statement are definitely known to be TRUE. Such system exhibit the technical property of nonmonotonicity so that as more definitely TRUE information become avairable the conclusions do not grow monotonically but some of them may need retraction.

b) Modal Logic

Instead of making the reasoner wise to deficiencies in the categorical truth of the statements, one may increase the vocabulary of writing the statements, which would allow the expression of the statement as well as its attitude (or modality). The reasoning system also needs to be extended in order to reason about not just the statement that are TRUE, but also their modalities. When we increase the vocabulary we use the possible world defined in this way:

Definition of Mamdani of possible world:

Possible world is a collection of mutually consistent statements. where the consistency require a mathematical theory be free from contradiction

In this paper we suggest the possibility to relate the two different extensions in one.

We begin with modal logic representation of the vagueness and after we came back to many value logic.

First part: Given a model with n possible worlds and proposition

ax : " x belong to a given set A " (7)

where x E X and A denotes a subset of X that is based on vague concept The vagueness involved in defining A can be captured by a model based on multiple worlds of some modal system, in which proposition ax for some x are valued differently in different worlds.

In the model M we have the valuation vector

(8)

when the set X is finite set we have the tensor form of the valuation

(9)

where T k,i = (a k)i

63

Second part: From modal logic model, we move to many value logic by the weighted additive

composition of the valuation vector

n

J1A (x) = Lco; (ax); (10) ;=1

where /lA (x) is a logic value between True and False, many value logic, and also the membership function of the fuzzy set A [5]. for the weights we have

n

I:co; = 1 (11) ;=1

Remark 1: It is useful to remark that the weighted sum (10) is well known in neurone

model, so this passage from modal logic to many value logic will be modelled by ordinary approach to neurone.

3. Operations on Fuzzy Sets

3.1 Complement [5]

c g(/l) = g-J (l-g(/l» (12)

Jl --. g ~ g (Jl)

1 ! C g ( ) C ( )

(13)

1 ! C g (/l) --. g ~ g(1)-g(Jl)

When g (/l) = /l we have the ordinary complement

C(/l)=l-/l (14)

64

with the modal logic model

n n

C (Jl) = LW; (,ax); = LW; [1-( ax); ] = 1-}l (15) ;=1 ;=1

Remark 2: For the ordinary complement we have the conservative property

c (~) + ~ = constant (16)

For the general fuzzy complement we lose the conservative property, in fact for Sugeno complement

(17)

we obtain

(18)

with

H(A.,O)= 1 ,H(A., 1)= 1

For A. > ° and ° < H < 1 we have the minimum value

..JI+f. -1 ~min = A.

H. = 2 (1 + A)_oJ! + A mm A(1+A)

for -1 < A. < ° and H > 1 we have the maximum value

..JI+f. -1 ~max = A.

H = 2 (1 + A) - .Jl + A max A (1 + A)

In conclusion, the value of H is not constant when change the fuzzy measure ~. In the modal logic model we obtain

cg(II)=~ c ~ c ~ c r- ",-,OJ; (...,ax); = ",-,OJ; [1-( ax); 1 = 1- ",-,OJ; (a;)

;=1 ;=1 1=1

(19)

65

where the weights are transformed for the complement operator eg, in the complement operator the weights are not invariant.

Proposition 1 The law of transformation for the weights is

Proof For the modal logic model we have

w,e I

n n n

(20)

H(A., ~)= C g(~) + ~ = l-I,wjC(a.)j+I,wj(a.)j =l-I,(oojc-wj)(a.)j (21) i=1 i=l i=l

So

n

I,(OOjC-wj)(a.)j =l-H(A.,~) j=1

one of the possible solutions of the equation (22) is

• Example 1 For Sugeno complement we have

3.2 Fuzzy intersection

Modal logic model [6]

n

J.1AnB (X) = L w;(a x I\bJ; ;=1

(22)

(23)

(24)

66

Definition 1: The two vectors llj and bi are consonant when

(25)

Example 2 : The two vectors

a = ( T , F , F , T) and b = ( F , F , F , T )

are consonant. In fact

a ~a -[~ F F

~l bH~[~ T T

~l T T T T I J T T T I J T T T

T F F T T T

and Cij = always True

The two vectors

a = ( T , F , F , T) and b = ( F , F , T , T )

are dissonant. In fact

and C1,3 = False the two vectors are not consonant in the position 1 and 3.

Proposition 2: When two vectors are consonant then

n n

~A"B(X) = min(~A (X)'~B(X» where ~A (x) = L(Oi(a.)i ' ~B(X) = L(Oi(b.)i (26) i=1 i=1

Proof Formula (26) is true when

( ax A bx) i = False for ax = True, bx = False, ax = False, bx = False

67

or

( ax A bx) i = False for a,. = False, bx = False, ax = True, bx = False

and

Jl AnB (x) = min(Jl A (x), JlB (x»

• Proposition 3:

The consonant relation between two vectors is an equivalent relation.

Proof: For the formula (25) it is easy to obtain that

1. any vector is consonant to itself (reflexive property)

2. when a vector v is consonant to the vector w ,then w is consonant to v (symmetric property )

3. when v is consonant with wand w is consonant with z then v is consonant with z, (transitive property)

So the consonant relation is an equivalent relation .

• 4. Scalar Product of the Vectors and Fuzzy Intersection (Interaction)

For

n n

~A(X)=L(X;(ax)i' ~B(X)=L~j(bx)j (27) j=1 ;=1

The scalar product p of the two vectors (Xj (ax)j and ~j (b x)j is

n n

p = L(Xj~j (ax );(bx)j = LOOj (ax A bx)j (28) j=1 ;=1

Remark 3:

In the space of worlds we have a valued field of weights and any point (world) of the space is labelled by a True-False value. Fuzzy set measure is the sum of all

68

the weights which logical value is True. Same fuzzy set measure can be obtained with different fields of weight. These fields are equivalent.

Fuzzy intersection and union are given by interaction offields and superposition offields.

4.1 Scalar Product and Consonant Condition

Two vectors are consonant when one vector is the projection of the other. With the scalar product for consonant vectors we obtain:

Proof The proof is consequence of the formula (25)

w is the Projection of I-_______ the vector v

(True, False, True)

Fig.I. Example of the projection of the vector v.

4.2 Fuzzy set intersection and scalar product p

In previous works [6] and in the formula (24) ,we give particular modal logic interpretation of the fuzzy logic intersection. We want stress that extensions of the previous definitions are possible. Fuzzy intersection can obtained by scalar product in this way

n n

KI.a;/3i (axMbx)i = I.lOi (ax /\bx)i = IlAnB i=1 i=1

(30)

69

The scalar product definition of fuzzy intersection and modal logic reflect the interaction among the fuzzy measures JlA , JlB [7]

Remark 4: When (Xi = ~i = 1 the membership of the fuzzy set intersection given by the

scalar product is equal to the fuzzy set intersection given previously [6] or (24).

4.3 Tensorial Product of the Vectors and Fuzzy Intersection

The tensorial product Mij of the two vectors (Xi (ax)j and ~ j (bx)j is

(31)

The fuzzy intersection by tensorial product is

Membership function Jl depend on the function f(i) = j .

Remark 5: When f(i) = i the membership value given by the tensorial product is equal to the

membership value obtained by the scalar product.

Example 2:

Given the vectors

The tensorial product is

M=a®b= [~ o (X1~3 (X1~41 o 0 0

o 0 0

o a4~3 (X4~4

for j = f(i) =i we take only the elements on the principal diagonal of M. So we obtain

70

When f(i) isj = f(1) = 3 ,j = f(2) = 4 ,j = f(3) = 1 ,j = f(4) = 2, we obtain

4.4 Commutative and Noncommutative Fuzzy Intersection

When we use the scalar product definition of fuzzy intersection (30) the fuzzy intersection commute, but when we use the tensorial product (32), the fuzzy intersection operation does not always commute.

In fact

n n

KI,ai~r(o (ax)i ®(bx )r(i) = I,O)i (ax)i 1\ (bx )r(o = IlAnB (33) i=1 i=1

where O)i = a i ~ rei)

When we commute the vector a with the vector b

where O)i * = ar(i) ~i . When ~f(i) (ax) f(i) ¢. ~ i (ax) i we have

Example 3: j = f(1) = 2 ,j = f(2) = 3, j = f(3) = 1 and

71

The two membership values are not always equal.

4.5 Idempotence

For the definition of fuzzy intersection by tensorial product the idempotence is not always true, in fact when

n

:La; (ax); ;:: IlA ;:l

for the tensorial product we obtain

and exist cases for which

4.6 Associative Law

Not always the associative law is true for (33) when we define the fuzzy intersection by tensorial product.

In fact for the tensor product we have

where (0; ;:: a; P [(;l Y g(;l and

where (O~ = a g(;) P; Y [(i) and we can have

11 An(BnC) * Il(AnB)nC

Example 4: a®b®c=

[ <x,~,y, a lP2Yl o'lP3'Yl o'lPl'Y2 0 0'1P3'Y2 o'lPl 'Y 3 o'lP2'Y3 <x,~,y, 1 0'2PI'Y1 0'2P2YI 0'2~3 'Y I 0 0'2~2'Y2 0'2~3'Y2 0'2~1 'Y 3 0'2~2 'Y 3 0'2~3'Y3 0'3PI'Y1 0'3P 2"f1 0'3~3'Y1 0'3PI"f2 0'3~2"f2 0'3~3"f2 0'3P I'Y3 0'3P 2'Y3 0'3~3'Y3

72

for f(1) = 2 , f(2) = 3, f(3) = 1 and g(l) = 2, g(2) = 3, g(3) = 1 , we obtain

and in general we have the non-associative property

J.I. Af'1(BnC) :¢: J.I.(AnBlnC

5. Fuzzy Union (Superposition)

For the previous works in modal logic model

n

/lAUB(X) = Lw;(ax vb); (36) ;=1

We can enlarge the rule (36) in this way

5.1 Fuzzy Union and Probability Superposition

For

n n

J.l.A(x)=KI,(li{a.)i' J.l.B(x)=KI,Pi(b.)i (37) i=1 i=1

We have definition of the fuzzy union by the probability superposition of the weights.

n n

J.l.AVB(X) = KI,«li +Pi -(liP;> (a. vb')i = I,O)i(a. vb')i i=1 i=1

for

73

5.2 Fuzzy Union and Quantum Superposition

(38)

5.3 Tensorial Sums of the Vectors and Fuzzy Union

The tensorial sum Sj,j of the two vectors a j (aJj and ~ j (bx)j is

(39)

Fuzzy union by probability tensorial superposition:

n

f.lAUB(X)= KL(ai +~r(i) -ai~r(i») (ax)j v(bx)r(i) i=1

(40a)

Fuzzy union by quantum tensorial superposition

(40b)

We can prove that the fuzzy union by tensorial sum can have cases in which the idempotence, the commutative and the associative laws are not true.

6. Tiirk~en Interval Valued Fuzzy Sets

When we have two different types of fuzzy sets measures we have

n n

f.lIA (x) = KI:al,i (ax )I,i ' f.ll,b (x) = KL~I';<bx )I,i .=1 i=1

(41)

(42)

With the previous rules we can obtain the fuzzy intersection and union by the modal logic for the Tiirksen interval valued fuzzy sets,

74

7. Identification of t-norm and t-conorm by the Modal Logic Model

n

T(X,Y) = KL(li~f(i) (a.)i ® (b. )f(i) (43) i=l

n S(X,Y) = K:L(aj + f3 !(i) -aif3 !(i»(ax)i v (bx ) !(i) (44)

i=1

or

(45)

The T(X,Y) and S(X;Y) are one of the t-norm and t-conorm as

1. Schweizer and Sklar's family

2. Yager's family

3. A. family 4. Frank's family

or other t-norm or t-conorm families When we solve the equations (43-45) we can give the identification of the t

norm and t-conorm measures by modal logic.

Simple Example 5: In the Schweizer and Sklar's family for p =1 we have

In the modal interpretation we take only two worlds for which we choose the true-false values

for

2 2

J.l.1A (x) = KL(lI.i(a. )l.i ' J.l.l.b(X) = KL~I.i(bx )l.i i=l i=l

75

we obtain

For

2

~A+~B-~A~B= JlAuB(X)=KL(aj +/3; -ad3;) (ax vbx); ;=1

we obtain for ~ = 0,

In conclusion the modal logic model of the Schweizer and Sklar's family for p=l is

and

when we use the probabilistic superposition definition of fuzzy union.

8. Interpretation of the Dempster-Shafer Theory [4]

Let X again denote the universe of discourse or frame of discernment and let P(X) denote the power set of X. To model the Dempster-Shafer theory (DST) [4] in terms of modal logic, we employ propositions of the form:

eA: = " A given incompletely characterised element £ is classified in set A"

where £ E A and AE P(X). Because of the inner structure of these propositions, it is sufficient to consider as atomic propositions (or propositional variables) only propositions eB. where B is part of the class of equivalence obtained by the intersection of all the set A that belong to X.

Example: If in the universal set X we have two seta Al and A2 ,the atomic proposition are

associate at the sets

76

with Bl ('\ B2 = 12)

Under these assumption, each of the four basic functions in Dempster-Shafer theory has a modal logic interpretation. That is, any finite model of modal logic that contain n worlds yields the following equations [4]

n

Bel ( A ) = L ai (D eB L (47) i=1

n

Pll ( A ) = L ai (0 eB L (48) i=1

Where Bel and Pll are the Belief and the Plausibility measures [5], D is the modal logic operator "necessity", 0 is the modal logic operator "possibility", the index i = 1 , ... ,n is the index of all the set Bi cA.

8.1 Belief Measure

From the theory [4] the belief measure is a non additive measure

Bel (A u B);::: Bel (A) + Bel (B) with A ('\ B = 12)

Proof of (49) by modal logic:

In the belief measure Bel (A u B) there is

the accessible relation inside A and B

the accessible relation among A and B.

When accessible relation among A and B is different from zero we have

Bel (A u B) > Bel (A) + Bel (B) with A ('\ B = 12)

A B

Fig.2. Accessible relation among sets A and B

(49)

(50)

77

In Fig. 2, the arrows are the accessible relation among worlds associate to elements inside the set.

The set between the two sets A and B is accepted as true because the two worlds one in the part of A and the other in the part ofB are in A u B.

For the sets.A and B separate we have the situation in Fig. 3. The set C is divided in two parts associate to two worlds. One part or world assume the logic value true because belong to A, the other part or world is false because it does not belong to A.

c

Fig.3. The set C is divided in two parts of worlds, with two different logic values for the proposition p = "X belong to the set C and A"

In Fig. 4, for the part of C included in A, the proposition p = "x belong to the set C and A" is true but necessary false. The set C must be excluded from the set A in the belief measure.

C

A

FigA. The case when a part of C is included in A

In conclusion for the sets A and ~ separate only internal sets are considered. Sets at the boundary as C are introduced only in AuB. So we prove with the modal logic that

Bel (A u B) > Bel (A) + Bel (B) with A ('\ B = 0

When no sets join A and B as in Fig. 5, the accessible relation between A and B is equal to zero and

78

Bel (A u B) = Bel (A) + Bel (B) with A () B = 0 (51)

8 B

Fig.5. The case when no sets join A and B

8.2 Plausible Measure

For the plausible measure we have the property

PI (A v B)::; PI (A) + PI (B) with A () B = 0

Proof of (51) by modal logic:

For the sets A and B we have the situation shown in Fig. 6.

·8 Fig.6. The case when the sets A and B have the same set C at their boundaries

In Fig. 7, the worlds in C are possible true for the possible modal logic operator. For the part C external to A, the proposition p = "x belong to the set C and A" is false but is possible true. Similarly, for the part of C external to B the proposition p = "x belong to the set C and B" is false but is possible true. The set C must be included in set A and B for the plausible measure.

The set C is introduced in PI(A) and PI(B) twice, for PI (A v B) only once. So we have

PI (A u B)::; PI (A) + PI (B) with A () B = 0

79

A

Fig.7. The case when the worlds in C are possible true for the possible modal logic operator

9. Conclusion

In this paper we present different types of uncertainty relate to the set theory and Dempster-Shafer theory of the evidence, by the modal logic structure. We show that the modal logic can be used as a meta-language to unify different types of uncertainty .

References

1. G. Resconi, GJ.Klir, and Ute St. Clair, Hierarchical uncertainty metatheory based upon modal logic. International J.Gen.Systems 21, (1992) 23-50

2. G. Resconi, GJ.Klir, U. St.Clair and D. Harmanec, On the integration of uncertainty theories.lnternat. J. Uncertainty Fuzziness Knowledge-Based Systems 1, (1993) 1-18

3. G.J. Klir and D.Harmanec, On modallogic interpretation of possibility theory. Internat. J. Uncertainty Fuzziness Knowledge-Based Systems 2, (1994) 237-245

4. G. Shafer. A Mathematical Theory of Evidence. Princeton Univ. Press. Princeton, NJ (1976)

5. GJ. Klir and B.Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applications. PrenticeHall,Englewood Cliffs NJ (1995)

6. G. Resconi,GJ. Klir, D. Harmanec, Ute St.Clair, Interpretation of various uncertainty theories using models of modal logic: a summary. Fuzzy Sets and Systems 80 (1996) 7-14

7. I.B.Tiirk~en, Interval valued fuzzy sets based on normal forms. Fuzzy Sets and Systems 20(1986) 191-210

8. G.E. Hughes and MJ. Cresswell. A Companion to Modal Logic. Methuen, London (1984)

Sup-T Equations: State of the Art

Bernard De Baets*

Department of Applied Mathematics and Computer Science University of Gent, Krijgslaan 281 (S9), 9000 Gent, Belgium [email protected]

Abstract. The study of fuzzy relational equations is one of the most appealing subjects in fuzzy set theory, both from a mathematical and a systems modelling point of view. The basic fuzzy relational equations are the supT equations, with Tat-norm. In this overview paper, we deal with these equations in a general lattice-theoretic framework. We consider t-norms pn bounded ordered sets, and in particular on complete lattices. We then solve sup-T equations on distributive, complete lattices of which all elements are either join-irreducible or join-decomposable. Solution sets are represented by means of root systems. Some additional necessary and sufficient solvability conditions are listed. Also systems of sup-T equations are discussed. The theoretical results presented are applied to the real unit interval and to the real unit hypercube. In the latter case, particular attention is paid to pointwise extensions of t-norms defined on the real unit interval and the corresponding residual operators.

Keywords. Distributive lattice, join-decomposable, join-irreducible, residual implicator, root system, sup-T equation, t-norm, unit hypercube.

1. Introduction

The study offuzzy relational equations was initiated by Sanchez in 1974 [13]. The first equations considered were systems of max-min equations [14]. The more general max-T equations, with Tat-norm, are discussed in [8]. It is well known how to solve these equations on the real unit interval [7]. In 1987, Zhao showed how to solve sup-~ equations on complete Brouwerian lattices of which all elements have an irredundant finite decomposition in joinirreducible elements. In this paper, we generalize various aspects of the latter results. We consider t-norms on bounded ordered sets, and in particular on complete lattices. We then solve sup-T equations on distributive, complete lattices of which all elements are either join-irreducible or join-decomposable .

• Post-Doctoral Fellow of the Fund for Scientific Research - Flanders (Belgium).


81

The morphism behaviour of the partial maps of the t-norm T plays an important role in this study. We also discuss the notion of a maximally surjective t-norm and indicate where the solution procedures can be simplified for such a t-norm. We then solve systems of sup-T equations on distributive, complete lattices of which all elements are either join-irreducible or have a finite join-decomposition. The most important asset of our approach consists in the representation of the solution sets by means of the order-theoretic concept root system. The paper concludes with a discussion of sup-T equations on the real unit interval ([0,1],::;) and on the real unit hypercube ([0, l]m, ::;), m E No. In the latter case, it is shown that the solution procedures can be simplified when working with pointwise extensions of t-norms defined on the real unit interval.

2. Some Notions From Order Theory

We assume the reader to be familiar with basic order-theoretic notions such as (bounded) ordered sets, chains, antichains, (complete) lattices, ... [2,3]. The smallest and greatest element of a bounded ordered set (P,::;) will be denoted by Op and Ip. For an element a of an ordered set (P,::;) we define -J.. a = {!31!3 E P 1\ 13::; a}. The notation In, n E No, stands for {I, ... ,n}.

We will only recall some results concerning distributive lattices. Notice that the meet (resp. join) operation of a lattice (L,::;) is denoted by ,-., (resp. '--').

Definition 1. [2] A lattice (L,::;) is called distributive if and only if the following property holds:

Notice that any chain is distributive and that a Cartesian product of lattices is distributive if and only if all of these lattices are distributive.

Definition 2. [2] An element a of a lattice (L, ::;) is called join-irreducible if and only if

(V (13 , 'Y) E L 2 ) (13 '--' 'Y = a =} (13 = a V 'Y = a))

Notice that all elements of a chain are join-irreducible.

Proposition 3. [2] Let a be a join-irreducible element of a distributive lattice (L,::;) and (ai liE In) be a finite family in L, then the following property holds:

a ::; sup ai ¢:} (3i E In)(a ::; ad iEln

82

Definition 4. [4] An element a of a lattice (L, ~) is calledjoin-decomposable if and only if there exists a set A of join-irreducible elements of L, with #A ~ 2, such that a = sup A.

In the foregoing definition, the set A is called a join-decomposition of a. This join-decomposition is called irredundant if and only if ('VB E P(L))(B c A => supB i- a).

3. Root Systems

In the description of the solution sets of (systems of) SUP-I equations, root systems are of extreme importance.

Definition 5. [5] A subset R ofan ordered set (P,~) is called a root system of (P, ~) if and only if there exists an element a in P and an antichain 0 in .J- a such that

R= U [w,a] wEO

For a root system R, the corresponding element a and the corresponding antichain 0 are unique. The element a is called the stem of the root system. The elements of the antichain 0 are called the offshoots of the root system. A root system is called finitely generated if the set of offshoots is finite. The stem is the greatest element and the offshoots are the minimal elements of the root system.

We will further explain that, under certain conditions, the solution set of a SUP-I equation, if not empty, is a root system. These results are based on the following theorems. Also in view of solving systems of SUP-I equations, these theorems indispensable.

Theorem 6. [5] Let (Ri liE J) be a family of finitely generated root systems of a complete lattice (L,~) with stem ai and set of offshoots Oi. If the intersection n ~ is not empty, then it is a root system of (L,~) with

iEI stem a = inf ai and as offshoots the minimal elements of the set

iEI

{SUPWi I ('Vi E J)(Wi E Oi " Wi ~ an iEI

Theorem 7. [5] If the intersection of a finite family of finitely generated root systems of a complete lattice is not empty, then it is a finitely generated root system.

4. Triangular Norms

Triangular norms (or t-norms) were introduced in 1963 by Schweizer and Sklar, in the framework of their study of probabilistic metric spaces [15,16].

83

They were first considered in fuzzy set theory for defining the intersection of fuzzy sets by Alsina et a1. [1] and Prade [12]. T-norms are increasing, commutative and associative binary operations on the real unit interval, satisfying some additional boundary conditions. De Cooman and Kerre [9] have noticed that none of the defining properties is typical for operations on [0,1]. They can therefore easily be generalized to describe certain classes of operations on bounded ordered sets.

Definition 8. [9] At-norm T on a bounded ordered set (P, ::::;) is a p 2 -+ P map that satisfies:

(i) boundary conditions: (Va E P)(T(lp, a) = a); (ii) T is increasing w.r.t. (p2,::::;) and (P, ::::;); (iii) commutativity: (V(a,/3) E p2)(T(a,/3) = T(/3,a))j (iv) associativity: (V(a, /3, 'Y) E p 3 )(T(a, T(/3, 'Y)) = T(T(a, /3), 'Y)).

For example, the meet operation r-.. on a bounded lattice (L,::::;) (defined by a r-.. /3 = inf{ a, /3}) is a t-norm on (L, ::::;).

With at-norm T on a complete lattice (L,::::;) we associate two binary operators 'Ir and Lr on L defined by

'Ir(a, /3) = supb I 'Y E L 1\ T(a, 'Y) ::::; /3} Lr(a, /3) = infb I 'Y E L 1\ T(a, 'Y) ~ /3}

The operator 'Ir is usually called the residual implicator of r. Its restriction to {OL,lLP coincides with the Boolean implication: 'Ir(OL,OL) = 'Ir(OL, 1L) = 'Ir{1L, 1L) = 1L and 'Ir(lL,OL) = OL· Moreover, 'Ir is hybrid monotonous: its first partial maps are decreasing and its second partial maps are increasing.

Definition 9. Let (L,::::;) be a complete lattice. A transformation f of Lis called:

(i) a supremum-morphism if and only if

(VA E P(L) \ {0})(f(supA) = supf(A))

(ii) an infimum-morphism if and only if

(VA E P(L) \ {0} ) (f(inf A) = inf f(A))

(iii) a homomorphism if and only if it is both a supremum-morphism and an infimum-morphism.

The boundary conditions of at-norm T on a bounded ordered set (P,::::;) imply that the range of the partial map T(a,·) always is a subset of lOp, a].

84

Definition 10. [4] Let 7 be a t-norm on a bounded ordered set (P, ~) and a E P. The partial map 7(a,') is called maximally surjective if and only ifrng(7(a, .)) = [Op, a]. The t-norm 7 is called maximally surjective if and only if all of its partial maps are maximally surjective.

For example, the meet operation ~ on a bounded lattice is maximally surjective.

Proposition 11. [4] Let 7 be a t-norm on a complete lattice (L,~) and (a,{3) E L2. If the partial map 7(a,') is a maximally surjective homomorphism, then the solution set of the equation 7(a,x) = {3 is given by

{ [.c,(a,{3),I,(a,{3)] ,if {3 ~ a o , elsewhere

5. Sup-T Equations

We consider at-norm 7 on a complete lattice (L, ~). Let (ai liE I) be an arbitrary family in Land {3 E L, then we want to know the solution set of the equation

sup 7(ai' Xi) = {3 iEI

in the family of unknowns (Xi liE 1) in L. The families (ai liE I) and (Xi liE 1) can be seen as (L, ~)-fuzzy sets in the index set I. The problem can therefore be reformulated as follows: given A E F CL ,:;) (1) (with FCL,:;) (1) the class of (L, ~)-fuzzy sets in I [11]) and {3 E L, determine the solution set of the equation

supT(A(i),X(i)) = (3 iEI

in the unknown (L, ~)-fuzzy set X in I. Recall that (FCL ,:;) (1), ~) is a complete lattice, with order relation ~ (inclusion) defined by

A ~ B {:> (Vi E 1)(A(i) ~ B(i)),

and as infimum and supremum the intersection and union defined by

The solution set of the above equation then is a subset of the complete lattice (FC£,:;) (I), ~). Notice that a crisp subset A of I will be identified with its characteristic map XA, defined by

(. {1£, if i E A XA z) = o £ ,elsewhere

85

In order to be able to describe the solution set of a sup-T equation, a few restrictions should be imposed. Firstly, the index set I should be finite. Secondly, the complete lattice (L,:::;) should be distributive, and the elements of it should either be join-irreducible or join-decomposable. Thirdly, the partial maps of the t-norm T should be homomorphisms.

The following theorems state that, under these rather general restrictions, the solution set of a sup-T equation, if not empty, always is a root system. For a join-irreducible right-hand side, the offshoots of this root system can be written down immediately. For a right-hand side with a finite joindecomposition, the offshoots of this root system can be effectively computed.

Theorem 12. [4] Let T be a t-norm on a distributive, complete lattice (L, :::;), A E F(L,~)(In) and (3 be a join-irreducible element of L. If the partial maps of T are homomorphisms and the solution set of the equation

sup T(A(i), X(i)) = (3 iEln

is not empty, then it is a finitely generated root system of (F(L,~)(In), C;) with stem G defined by G(i) = Ir(A(i),{3) and as offshoots the elements of the set

{Mk I Mk E F(L,~)(In) A (3:::; A(k) A Mk C; G}

with Mk defined by Mk(i) = {.cr (A(k),{3) ,ifi = k o L , elsewhere

if T is also maximally surjective, then the set of offshoots can be written as

A necessary and sufficient solvability condition therefore is that G is a solution.

Theorem 13. [4] Let T be a t-norm on a distributive, complete lattice (L, :::;), A E F(L,-:;)(In) and {3 be a join-decomposable element of L with join-decomposition {3 = sup (3j. If the partial maps of T are homomorphisms

jEJ and the solution set of the equation

sup T(A(i),X(i)) = (3 iEln

is not empty, then it is a root system of (F(L,~)(In), C;) with stem G defined byG(i) = Ir(A(i),{3) and as offshoots the minimal elements of the set

{U Mt I Mt E F(L,~)(In) A (3j :::; A(k) A Mt C; G} jEJ

. . { .cr(A(k) (3J') ifi = k with M~ defined by M~(i) = " o L , elsewhere

86

if the join-decomposition is finite, then this root system is also finitely generated and if Cr ::; Ir, then the offshoots are the minimal elements of the set

{U Ml I Ml E F(L,$) (In) /\ (3j ::; A(k)} jEJ


Note that when (L,::;) is a complete chain and T is also maximally surjective, then Cr ::; Ir always holds.

Theorems 12 and 13 have been obtained by subsequently studying the following inequalities and equation:

(i) T(o, x) ::; (3, T(o, x) ~ (3 and T(o, x) = {3, (ii) supT(A(i),X(i»::; (3,

iEI (iii) sup T(A(i), X(i» ~ (3,

iEln

and appropriately combining the results obtained. Notice that the equation

sup T(A(i), X(i» = OL iEI

is in fact equivalent to the inequality supT(A(i),X(i» ::; OL, and can there-iEI

fore be solved under less stringent conditions.

Proposition 14. [4] Let r be a t-norm on a complete lattice (L,::;) and A E F(L,~)(I). If the partial maps ofT are supremum-morphisms, then the solution set of the equation

sup T(A(i}, X(i» = OL iEI

is the root system [X0,Gj of (F(L,~)(I),~) with stem G defined by G(i) = Ir(A(i),Od·

The foregoing discussion was concerned with the determination of the complete solution set of a sup-T equation. In some cases, one may be interested in the greatest solution only. Obviously, the imposed conditions will be less restrictive then. A related, more important question is whether there exist simpler necessary and sufficient solvability conditions, other than constructing the potential greatest solution and verifying whether it is indeed a solution. Some important results in that direction are listed next. Proposition 15 shows that the necessary and sufficient solvability condition of Theorems 12 and 13 holds in a more general setting. Propositions 16 and 17 present, for a complete chain or complete Brouwerian lattice, a readily verified solvability condition.

87

Proposition 15. [6) Let T be a t-norm on a complete lattice (L, :S), A E :F(L,'5.)(1) and (3 E L. If the partial maps of Tare supremum-morphisms, then the solution set of the equation


is not empty if and only if the (L, :S)-fuzzy set G in f defined by G(i) = I,(A(i), (3) is a solution. If the solution set is not empty, then G is the greatest solution.

Proposition 16. [6) Let T be a t-norm on a complete chain (L, :S), A E :F(L,'5.)(1) and (3 E L. If the partial maps ofT are homomorphisms and T is maximally surjective, then the solution set of the equation


is not empty if and only if (3:S supA(i). iEI

Proposition 17. [6) Let (L,:s) be a complete Brouwerian lattice, A E :F(L,'5.) (1) and (3 E L. The solution set of the equation

supA(i) r-- X(i) = (3 iEI

is not empty if and only if (3:S supA(i). iEI

6. Systems of sup-T Equations

We consider at-norm T on a complete lattice (L, :s). Let (Aj I j E J) be an arbitrary family in :F(L,'5.)(1) and ((3j I j E J) be an arbitrary family in L, then we want to know the solution set of the system (Ej I j E J) of equations

in the unknown (L, :S)-fuzzy set X in f. In order to be able to describe the solution set of a system of sup-T equa

tions, again a few restrictions should be imposed. Firstly, the index set f should be finite. Secondly, the complete lattice (L,:s) should be distributive, and the elements of it should either be join-irreducible or have a finite join-decomposition. Thirdly, the partial maps of the t-norm T should be homomorphisms.

88

Theorem 18. [6] Let T be a t-norm on a distributive, complete lattice (L, ~), (Aj I j E J) be a family in F(L,~){In) and (/3j I j E J) be a family in L such that all /3j are either join-irreducible or have a finite join-decomposition. 1£ the partial maps ofT are homomorphisms and the solution set of the system (Ej I j E J) of equations

Ej supT(Aj(i),X(i» = /3j iEI

is not empty, then it is a root system of (F(L,~)(In), ~). Moreover, if the index set J is finite, then this root system is finitely generated. Suppose that the solution set is not empty, and that the solution sets of the equations Ej are the root systems with stem Gj and set of offshoots OJ, then the solution set of the system (Ej I j E J) is the root system with stem G = n Gj and

as offshoots the minimal elements of the set

{U Nj I Nj E OJ /\ Nj ~ G} jEJ

jEJ


7. The Real Unit Interval

It is well known that the real unit interval is a complete chain. Hence, it is also distributive and all elements of it are join-irreducible. Moreover, a t-norm on ([0, 1],~) is continuous if and only if all of its partial maps are homomorphisms. Also, any continuous t-norm Ton ([0, 1],~) is maximally surjective and it holds that £, ~ I,. Theorem 12 can then be restated as follows. The class F([o,l],:$;) (In) is denoted as F{In).

Theorem 19. [7] LetT beat-norm on ([0, 1], ~), A E F{In) and/3 E [0,1]. If T is continuous and the solution set of the equation

sup T(A(i), X(i» = /3 iEI"

is not empty, then it is a. finitely generated root system of (F{In) , ~) with stem G defined by G(i) = I,(A(i), /3) and as offshoots the elements of the set

{Mk I Mk E F{In) /\ /3 ~ A(k)}

with Mk defined by Mk(i) = {£,(A(k),/3) ,ifi = k ° , elsewhere On ([0, 1],~) the following are the most important continuous t-norms:

(i) the minimum operator (meet) M: M(x, y) = min(x, y) (ii) the product P: P(x, y) = xy

(iii) the Lukasiewicz t-norm W: W(x, y) = max (x + y - 1,0)

89

The corresponding residual implicators Ir are given by:

{I ,ifx~y

(i) the Godel-Brouwer implicator: IM(X,y) = y ,elsewhere

{I ,ifx<y

(ii) the Goguen implicator: Ip(x,y) = / -y x ,elsewhere

(iii) the Lukasiewicz implicator: Iw(x, y) = min(l - x + y, 1) The corresponding operators .cr are given by:

{1,ifX<y

(i) the operator 12M: .cM(X,y) = y ,elsewhere

(ii) the operator .cp: .cp(x, y) = y/x ,if 0 < y ~ x {l'ifX<y

o ,elsewhere

{I ,ifx<y

(iii) the operator .cw: .cw(x, y) = 1 - x + y ,if 0 < y ~ x

o , elsewhere

Example 20. Consider the t-norm P and the sup-P equation

supP(A(i),X(i)) = (3 iE I5

with A = (0.8 0.4 0.1 1 0) and (3 = 0.4. According to Proposition 16, the solution set is not empty. The greatest solution G is given by

G = (0.5 1 1 0.4 1 )

According to Theorem 19, there are three minimal solutions M1 , M2 and M4 given by:

Ml = (0.5 0 0 0 0 )

M2 = (0 1 0 0 0)

M4 = (0 0 0 0.4 0)

8. The Real Unit Hypercube

In this final section, we consider the product lattice ([0, 1]'"" ~), m E No, called the real unit hypercube of dimension m. As Cartesian product of chains, the lattice ([0, l]m,~) is distributive and complete. All elements of the real unit hypercube are either join-irreducible or have an irredundant finite join-decomposition. Consider a E [0, l]m, a =f:. (0, ... ,0) = Om, then a can be decomposed as

with rri = (0, ... ,a(i), . .. ,0).

a = sup Q.i ",(i)IO

Theorems 12 and 13 can then be written combinedly as follows. For the case (3 = Om, we refer to Proposition 14.

90

Theorem 21. Let I be a t-norm on ([0, l]m, ::;), A E F([O,l]",,::;)(In) and (3 E [0, l]m, (3 :f. Om. If the partial maps of I are homomorphisms and the solution set of the equation

sup I(A(i),X(i)) = (3 iEln

is not empty, then it is a finitely generated root system of (F([O,l]'" ,::;) (In), ~) with stem G defined by G(i) = Ir(A(i),(3) and as offshoots the minimal elements of the set

{ U Mi I Mi E F([O,l]"',::;) (In) 1\ j3(j) ::; A(k)(j) 1\ Mi ~ G} jEJ

(3(J)#O

. h M j d fi db M j (·) _ {Lr(A(k),!!) ,iii = k WIt k e ne Y k Z -

Om , elsewhere if Lr ::; Ir, then the offshoots are the minimal elements of the set

{ U Mi I Mi E F([o,l]m,::;)(In) 1\ (3(j) ::; A(k)(j)} jEJ

(3(j)#O

From an implementational point of view, the real unit hypercube is the ultimate ordinal and numerical working environment; it allows for incomparability and offers at the same time the possibility to fall back on the underlying real unit interval.

Theorem 21 can be simplified considerably when working with a particular class of t-norms, namely the pointwise extensions of t-norms on the real unit interval.

Proposition 22. [4] Let I be a t-norm on a bounded ordered set (P,::;) and m E No, then the pm X pm -+ pm map 1m defined by

is a t-norm on (pm, ::;).

Proposition 23. [4] Let I be a t-norm on a complete lattice (L,::;) and m E No, then the operators IT,,, and LT,,, corresponding to the t-norm 1m on (Lm,::;) are given by

IT".((a1, .. · ,am),(j31,'" ,13m)) = (Ir(a1,(3d, .. · ,Ir(am,j3m))

Lr",((a1,'" ,am), ((31, ... ,(3m)) = { (Lr(a1,j3d, .. · ,Lr(am,j3m)) ,H(Vi E Im)(j3i::; ai)

(IL,". ,IL) , elsewhere

91

Notice that for at-norm T on ([0, 1],~) the property CT ~ IT does not imply that also CT,,, ~ LT,,, .

From the foregoing sections it should be clear that the morphism behaviour of the partial maps of the t-norm T plays an important role.

Proposition 24. [4]. Let T be a t-norm on a complete lattice (L,~) and m E No, then the following properties hold:

(i) if the partial maps ofT are infimum-morphisms, then also the partial maps of Tm are infimum-morphisms;

(i) if the partial maps of Tare supremum-morphisms, then also the partial maps of Tm are supremum-morphisms.

Proposition 25. [4] Let T be a t-norm on a bounded ordered set (P,~) and m E No. If T is maximally surjective, then also Tm is maximally surjective.

The foregoing propositions imply that for a continuous t-norm Ton ([0, 1], ~), the t-norm Tm on ([0, l]m, ~) is maximally surjective and its partial maps are homomorphisms. Theorem 21 can then be stated in the following simplified manner. The most important difference is that in Theorem 26 the offshoots can be written down immediately, and do not have to be determined as minimal elements of an auxiliary set.

Theorem 26. [4] Let T be a t-norm on ([0, 1], ~), A E .r([o,l]m,:::;) (In) and (3 E [0, l]m, (3 =I- Om· If T is continuous and the solution set of the equation

sup Tm(A(i),X(i)) = (3 iEln

is not empty, then it is a finitely generated root system of (.r([0,1]'" ,:::;) (In), ~) with stem G defined by G(i) = LT,,, (A(i), (3) and as offshoots the elements of the set

{ U Mt I Mt E .r([O,l]",,:::;)(In) 1\ (3(j) ~ A(k)(j)} jEJ

!3(j)#O

with Mt defined by Mt(i) = {CT,"(A(k),f!) ,ifi = k Om , elsewhere

Notice that CT,,,(A(k) , f}) = (0, ... ,CT(A(k)(j), (3(j)), ... ,0).

Example 27. Consider the t-norm W2 on ([0, 1]2,~) and the SUP-W2 equa-tion

sup W2 (A(i),X(i)) = (3 iEh

with A = ((0.4,0.6) (0.7,0.4) (0.6,0.5)) and (3 = (0.3,0).

92

We construct the potential greatest solution G:

G(I) = IW2 ((004, 0.6), (0.3,0)) = (Iw(004,0.3),Iw(0.6, 0)) = (0.9,004) G(2) = IW2 ((0.7, 004), (0.3,0)) = (Iw(0.7,0.3),Iw(004,0)) = (0.6,0.6)

G(3) = IW2((0.6,0.5),(0.3,0)) = (Iw(0.6, 0.3),Iw(0.5, 0)) = (0.7,0.5)

and hence G = ( (0.9,004) (0.6,0.6) (0.7,0.5)). One easily verifies that G is a solution. Since ('Vk E 13 )(.8(1) ~ A(k)(I)), it follows from Theorem 26 that there are three minimal solutions MJ, MJ and MJ, given by:

Mf = ((0.9,0) (0,0) (0,0))

Mi = ((0,0) (0.6,0) (0,0))

MJ = ((0,0) (0,0) (0.7,0))

For instance, we have that

Mf(l) = .cW2 ((004,0.6), (0.3,0)) = (.cw(004, 0.3), .cw(0.6, 0)) = (0.9,0)

Example 28. Consider the t-norm ,-... on ([0, 1]3, ~), i.e. ,-...= M 3 , and the sup-'-'" equation

sup A(i) ,-... X(i) = .8 iE I3

with A = ( (0.2,0.6,0.5) (0.7,0.5,0.9) (1,0.8,0.7)) and .8 = (0.7,0.5,0.7). The potential greatest solution G is given by:

G = ((1,0.5,1) (1,1,0.7) (0.7,0.5,1))

One easily verifies that G is a solution. According to Theorem 26, the minimal solutions are given by

Mi U Mf U M~ = ((0,0.5,0) (0.7,0,0.7) (0,0,0))

Mi U Mf U Ml = ((0,0.5,0) (0.7,0,0) (0,0,0.7))

Mi U M~ U M~ = ((0,0,0) (0.7,0.5,0.7) (0,0,0))

Mi U M~ U M; = ( (0,0,0) (0.7,0.5,0) (0,0,0.7) ) 1 23 M2 U M3 U M2 = ( (0,0,0) (0.7,0,0.7) (0,0.5,0) )

Mi !J M; U M; = ((0,0,0) (0.7,0,0) (0,0.5,0.7))

MJ U Mf U M~ = ((0,0.5,0) (0,0,0.7) (0.7,0,0))

MJ U Mf U M; = ( (0,0.5,0) (0,0,0) (0.7,0,0.7))

MJ U M~ U M~ = ((0,0,0) (0,0.5,0.7) (0.7,0,0))

MJ U M~ U M; = ((0,0,0) (0,0.5,0) (0.7,0,0.7))

MJ U M; U M~ = ((0,0,0) (0,0,0.7) (0.7,0.5,0))

MJ U M; U M; = ((0,0,0) (0,0,0) (0.7,0.5,0.7))

93

References

1. C. Alsina, E. Trillas and L. Valverde, On non-distributive logical connectives for fuzzy sets theory, BUSEFAL 3 (1980), 18-29.

2. G. Birkhoff, Lattice Theory, AMS Colloquium Publications Volume XXV, American Mathematical Society, Providence, R. I., 1967.

3. B. Davey and H. Priestley, Introduction to Lattices and Order, Cambridge University Press, 1990.

4. B. De Baets, An order-theoretic approach to solving sup-T equations, Fuzzy Set Theory and Advanced Mathematical Applications (D. Ruan, ed.), Kluwer Academic Publishers, 1995, pp. 67-87.

5. B. De Baets, Crowns and root systems (submitted). 6. B. De Baets, Systems of sup-T equations (submitted). 7. B. De Baets and E. Kerre, A primer on solving fuzzy relational equations

on the unit interval, Int. J. Uncertainty, Fuzziness and Knowledge-Based Systems 2 (1994), 205-225.

8. B. De Baets and E. Kerre, A representation of solution sets of fuzzy relational equations, Cybernetics and Systems '94 (R. Trappl, ed.), Proceedings of the Twelfth European Meeting on Cybernetics and Systems Research, World Scientific Publishing, Singapore, 1994, pp. 287-294.

9. G. De Cooman and E. Kerre, Order norms on bounded partially ordered sets, J. Fuzzy Mathematics 2 (1994),281-310.

10. A. Di Nola, S. Sessa, W. Pedrycz and E. Sanchez, Fuzzy Relation Equations and their Applications to Knowledge Engineering, Theory and Decision Library. Series D. System Theory, Knowledge Engineering and Problem Solving, Kluwer Academic Publishers, 1989.

11. J. Goguen, L-Fuzzy sets, J. Math. Anal. Appl. 18 (1967), 145-174. 12. H. Prade, Unions et intersections d'ensembles fious, BUSEFAL 3 (1980),

58-62. 13. E. Sanchez, Equations de Relations Ploues, These Biologie Humaine, Fac

ulte de Medecine de Marseille, 1974. 14. E. Sanchez, Resolution of composite fuzzy relation equations, Information

Control 30 (1976),38-48. 15. B. Schweizer and A. Sklar, Associative functions and abstract semi-groups,

Publ. Math. Debrecen 10 (1963),69-84. 16. B. Schweizer and A. Sklar, Probabilistic Metric Spaces, Elsevier, New

York, 1983. 17. E. Trillas and L. Valverde, On implication and indistinguishability in

the setting of fuzzy logic, Management Decision Support Systems Using Fuzzy Sets and Possibility Theory (J. Kacprzyk and R. Yager, eds.) , Verlag TUV Rheinland, KOln, 1983, pp. 198-212.

18. C. Zhao, On matrix equations in a class of complete and completely distributive lattices, Fuzzy Sets and Systems 22 (1987), 303-320.

Measures of Specificity

Ronald R. Yager

Machine Intelligence Institute lona College New Rochelle, NY 10801, USA

Abstract. Specificity is introduced and shown to be a measure of the information contained in a fuzzy subset. A characterization of this measure is provided as well as a number of manifestations of the measure.

Keywords Uncertainty, measures, information, fuzzy sets.

1. Introduction

The concept of specificity [1-10] plays a fundamental in information engineering [11] by providing a measure of the amount of information contained in a fuzzy subset or possibility distribution. Specificity plays a role in fuzzy set and possibility theory analogous to the role that entropy plays in probability theory. The specificity measure evaluates the degree to which a fuzzy subset points to one and only one element as its member. It is closely related to the inverse of the cardinality of a set. Klir [12-14] has discussed a related idea which he calls nonspecificity. The concept of granularity introduced by Zadeh [15] is highly correlated with the concept of specificity.

We must emphasize the distinction between specificity and fuzziness. Fuzziness is generally related to the lack of clarity, relating to membership in some set, whereas specificity is related to the exact knowledge of some attribute. For example, knowing that the length of a river is between fifty and sixty miles is not fuzzy, for we know with clarity what are the possible values for the length of the river, however it is not specific for we don't the exact length of the river. In most cases these types of uncertainty appear together as in the knowledge that the river is approximately 50 miles long.

Many applications involve the use of this measure. Kacprzyk [16] describes its use in a system for inductive learning. Specificity has important applications in decision making where Yager [1] has shown its usefulness as a measure of the tranquility of making a decision. The more specific the set of choices the easier,


95

the less anxiety provoking, the decision. Another important area of application of this concept is in the measurement of performance of fuzzy expert systems. In this environment the specificity concept plays a central role in the determination of the usefulness of the information provided by an expert system. In this regard we note that an increase in specificity of information provided generally tends to increase the usefulness of the information. Consider an expert system which is used to predict the weather. Assume that the system says that the temperature will be above zero degrees Fahrenheit. While this system will in most cases be correct, the information it provides will not be very useful if the output of the system is to be used to determine what kind of clothes we should wear. This example focuses on a very fundamental uncertainty principle in information theory which we call the specificity-correctness tradeoff. What this principle says is that in providing information we generally must make a tradeoff between being very specific and running the risk of being incorrect or being unspecific and in turn assure ourselves of being correct. In expert and other knowledge based system we desire both correctness and specificity with its effect of providing more useful information. Thus the performance of systems should be judged by its performance in both these measures, specificity and correctness [3].

Another area where specificity is seen to play a fundamental role is in deductive reasoning systems. A central principle in reasoning is the entailment principle introduced by Zadeh [17, 18]. This principle allows us to infer for example, from the knowledge that John is 23 years old the fact that John is over 20 years old. The principle is a manifestation of the principle that one can always infer less specific information from more specific information. We note that normally we can't reason the other way, go from less specific to more specific. We can't conclude from the fact that John is over 20 that he is 23 years old. The principle of minimal specificity introduced by Dubois and Prade [19, 20] is a manifestation of this concept. Dubois and Prade using the concept of minimal specificity show the central role of specificity in the theory of approximate reasoning [21]. Many operations used in approximate reasoning are described by indicating required properties allowing for many manifestations; for example the intersection operation must be a t-norm. Since the principle of minimal specificity says select the manifestation resulting in an output with the least specificity, we are essentially selecting the operation which introduces the least unjustified additional information. In [9] Yager has suggested the use of specificity in default reasoning. However, in default reasoning we go from a less specific piece of knowledge to a more specific. Thus in default logic the reasoning mechanism is essentially a process contradicting this rule of minimal specificity which is part of the reason for the great difficulty in dealing with these systems.

2. Characterization of Specificity

The concept of specificity was originally introduced by Yager [1-4] to measure the degree to which a fuzzy subset contains one and only one element. Higashi

96

and Klir [12] introduced a closely related idea called non-specificity. In [14] the authors provide a comprehensive discussion of this concept. In many applications of fuzzy subsets, especially those based on a possibilistic interpretation [22] of a fuzzy subset, specificity can be seen as measuring the amount of information contained in the fuzzy subset. Consider the following three fuzzy subsets expressing information relating to the age of a person

AI: 35 years old

A2: about 35

A3: middle age

It should be clear Al provides more information than A2 which in tum provides

more information than A3' The concept of specificity plays a role in possibility

theory comparable to the concept entropy in probability theory. Both of these measures the amount of information contained in the associated distribution by calculating the degree to which the distribution points to one and only one element as its manifestation.

In [10] we provided a characterization of the measure of specificity over a finite universe X. In the following we shall provide an alternative and more general characterization of the measure of specificity over a finite universe. In

this definition we shall let A be a fuzzy subset over X and let aj be the jth largest

membership grade in A.

Definition:. A measure

Sp: IX -7 I (I = [0, 1]) is called a measure of specificity if it has the following properties:

1) Sp(A) = 1 if and only if A is a singleton set, A = {x} 2) Sp(0) = 0 3) i. aSp(A) > °

aal

ii. aSp(A) :::;; 0 for all j ;:: 2 aaj

We see that condition one requires that the specificity is maximal, equal one, for only sets that are singletons. Condition two provides the second boundary condition, a measure of specificity assumes its minimal value for the null set. It should be noted unlike the first property we have not required this to be the only case for this to happen. The third condition imposes the characteristics that specifically increases as the largest membership grade increases and it decreases as any of the other membership grades increases.

From the above definition a number of basic properties about measures of specificity can be easily obtained.

Theorem: Assume A and B are two normal fuzzy subsets of X, normal fuzzy

97

subsets have at least one element with membership grade one. Let aj and bj be the ordered membership grades in these sets. If aj ~ bj for all j then Sp(B) ~ Sp(A).

Proof: Since starting from B we can obtain A by increasing the j = 2 to n membership grades and since a measure of specificity is non-increasing with respect to changes in these elements we obtain the result.

From this theorem we get some other basic properties of specificity .

Corollary 1: If A and B are normal fuzzy subsets of X and A c B then Sp(B) ~ Sp(A)

Proof: If A c B then aj ~ bj for all j ~ 2 and the result follows from the above theorem.

Corollary 2: If A and B are two non-null crisp subsets of X where card(A) ~ card(B) then Sp(B) ~ Sp(A).

Proof: Assume card(A) = nl and card(B) = n2, where nl > n2' In this case

aj = 1 for all j ~ n 1

aj = 0 for all j > n 1 and

bj = 1 bj =0

Since n 1 > n2 we have aj

for all j ~ n2 for allj > n2

~ bj for all j.

The definition for specificity allows many distinct manifestations of the measure of specificity. A particular manifestation may be useful for different applications. In discussing individual definitions of specificity we shall find the following definitions useful.

A.

Definition: Assume Sp and Sp are two definitions for measures specificity on A.

the space X. We shall say that Sp is a stricter measure of specificity then Sp, A. A.

denoted Sp ~ Sp, if for all fuzzy subsets A of X Sp(A) ~ Sp(A) Definition: A measure of specificity will be called regular if for all fuzzy subsets in which the membership grade is constant, A(x) = c for all x, we have Sp(A) = O.

3. Measures of Specificity

As noted the characterization of the measure of specificity given in the previous section allows many different manifestations. These measures playa particularly important role in the development of procedures and algorithms for manipulating

98

and reducing uncertainty. With this measure we have a tool which can guide us in the correct direction by telling us when information content is increasing or decreasing. In most cases the actual value of the degree of uncertainty is not as important as the relative uncertainty. This situation gives us considerable freedom in selecting the actual form of the measure to be used. One important consideration in the selection of a measure of specificity for particular applications is the ease with which we can manipulate the specificity measure under the operations needed in that application. For example, in applications involving learning, we may desire a measure which is easily differentiable. Simple measures are always desirable. This situation supports efforts to find many different manifestations of this measure so that we can select the appropriate one for a given application.

In [1] Yager introduced a measure of specificity as

Sp(A) = f a max 1 da o card(Aa)

here a max is the largest membership grade in A, Aa is the a-level set of A,

Aa = {x I A(x) ~ a} and card(Aa> is the number of elements in A. In [9] Yager introduced a class of specificity measures which he called linear

specificity measures which we define below.

Definition: Assume X is a finite set of cardinality n and let A be a fuzzy of X. A linear specificity measure is defined as

n Sp(A) = al -.L Wj aj

J=2

where aj is the jth largest membership grade in A and the Wj'S are a set of weights

satisfying: l)Wj E [0,1]

n 2) L W· = 1

. 2 J J=

3)wj ~ Wi for j < i.

First we show that this measure satisfies our three required conditions a specificity measure. If A is a singleton set then a 1 = 1 and aj = 0 for j ::;; 2 and it

follows that Sp(A) = 1. Assume Sp(A) = 1. Since a.i E [0, 1] then the only way n

for this to happen is if al = 1 and.L Wj aj = 0 which requires that aj = 0 for all J = 2

j ~ 2. If A = 0 then aj = 0 for all j ~ 1 and hence Sp(A) = o. Finally we see that

aSp(A) = 1 and for j > 1 aSp(A) = -Wj ::;; o. aal aaj

99

It should be noted that this is a regular measure of specificity. Assume aj = a. for n n

all j, then Sp(A) = ex - L Wj a. = ex - ex L Wj = ex - a. = o. j=2 j=2 .

The most strict and least strict member of this case can easily be identified. Consider the measure in which w2 = land Wj = 0 for all j > 2 giving us

Sp(A) = al - a2·

Consider any other measure of specificity with weights Wj. In this case ........ n Sp(A) = al - L ;;j aj. Since arS; a2 for allr~ 3 then

j = 2 ........ n n Sp(A) ~ al - L ;;j a2 ~ al - a2 L Wj

j=2 j=2 ........ Sp(A) ~ al - a2·

Thus we see that Sp(A) = a 1 - a2 is the most strict measure of specificity and it

occurs when w2 = 1 and Wj = 0 for all j > 2.

Consider the case now when Wj = _1_ for all j ~ 2. In this case n - 1

n Sp(A) = al - _1_ L aj

n-l j=2 This can be seen simply as

Sp(A) = largest membership grade -average of the others. It can be shown [9] that this is the least strict measure of specificity in this class.

It is interesting to consider the formulation of a measure of specificity from the perspective of a multi-criteria decision making problem. The requirements for a measure of specificity is that there exists one element with membership grade one and all others are zero. Consider the formulation

n Sp(A) = al (1 - a2) (1 - a3) ... (I- an) = al rr (1 - aj)

j=2 where aj is the jth largest membership grade in the fuzzy subset A. We easily see

that this equals one iff a 1 = 1 and aj = 0 for all j = 2 to n. If A is the null set

then al = 0 and hence Sp(A) = o. It is easy to see that oSp(A) > 0 and oal

aSp(A) ---< 0 for allj = 2 to n.

aaj This measure of specificity is not regular, it doesn't have zero specificity

when all membership equal. In addition we see that if the second highest membership grade is one then this measure has Sp(A) = O.

A slightly more general form can be obtained if we use n

Sp(A) = al ,rr (kaj + (1 - aj)) J=2

100

where k E [0, 1]. It is easy to show that this satisfies the necessary conditions for a specificity measure. Let us look at the effect of k. When k = 0 we get the measure introduced above. As k increases the specificity value increases.

4. Distance Related Measures of Specificity

In this section we shall provide another view of measures of specificity which makes use of the hypercube view of fuzzy subsets introduced by Kosko [23, 24]. Let X be a set of dimension n. We can, as suggested by Kosko, represent a fuzzy

subset A of X as a vector of dimension n· in which the ith component of the vector is A(xi), the membership grade of Xi in A. Thus any fuzzy subset can be

represented as a point in the space In. We shall let Ei indicate the singleton fuzzy

subset where A(xi) = 1 and all other membership grades are zero. The Ei can be

seen as a collection of basis vectors. In this framework we can conjecture the measure of specificity of a fuzzy subset A as being related to its distance from the closest one of the Ei's.

We recall that a metric d on the space In is a mapping [25]

d: In x In ~ R (non-negative real numbers) such that

1. dCA, B) = 0 if and only if A = B 2. dCA, B) + d(B, E) ~ dCA, E)

It can easily be shown [25] that a metric is symmetric, dCA, B) = deB, A). We shall call a metric normalized if the range of the metric is the unit interval [0, 1], that is dCA, B) E [0, 1].

If A has n elements denoted ai, i = L.n and B has n elements denoted bi the

prototypical metric is

rlp(A, B) = (.:E /ai - bJP)lIP. 1=1

It should be noted that this is not a normalized metric. Let rlp be a metric of the preceding class. Consider the transformation

F(rlp(A, B» = rlp(A, B) if rlp(A, B) ~ 1

F(dp(A, B» = 1 if rlp(A, B) > 1

It can be shown that F(rlp) is also a metric as follows. F(rlp(A, B» = 0 iff rlp(A,

B) = 0 and since dp(A, B) = 0 iff A = B condition one is satisfied. Next we must

show F(rlp(A, B» + F(rlp(B, E» ~ F(rlp(A, E»

Two cases must be considered. 1. At least one of rlp(A, B) or rlp(B, E) are greater than one.

In this case assuming it is rlp(A, B) we get

F(rlp(A, B» + F(dp(B, E» = 1 + F(rlp(B, E» ~ 1 ~ F(rlp(A, E»

2) If both rlp(A, B) and rlp(B, E) ~ 1

101

If both are less than one then F(dp(A, B» + F(dp(B, E) = dp(A, B) + dp(B, E) ~ dp(A, E) ~ F(dp(A, E).

Thus we see that F( dp) is a normalized metric.

In the following we shall let mp be the metric obtained from F(dp). Consider now the measure of specificity defined as

Sp(A) = 1 - minj(mp(A, Ej». Thus we are measuring the specificity as the complement of the distance from the closet basis vector to A. We now show that this is a measure satisfying our three basic conditions. First let A be the fuzzy subsets with membership grade one for

the kth element and zero for all others. In this case mp(A, Ek) = 0 and

minj(mp(A, Ej) = 0 and hence Sp(A) = 1. Assume A*- Ej for some j, then

mp(A, Ej) *- 0 for all j and minj(mp(A, Ej) > 0 and Sp(A) < 1. Consider now

A = 0. For any p dp(0, Ei) = 1 and therefore min/mp(A, Ej» = 1 and hence

Sp(0) = O. Before proving the validity of the last necessary condition for specificity we

provide a useful lemma.

Lemma: Let A be any fuzzy subset of X and the largest membership grade occur for the kth component then

MinjCdp(A, Ej» = dp(A, Ek)

Proof: Without loss of generality we shall assume k = 1. Consider n

(dp(A, El»P = (1- al)P + .:E ~J' J = 2 n

(dp(A, Ei»P = (1 - ai)P + .:E ~J' J = 1 j*i

(dp(A, Ej»P - (dp(A, El»P = (1 - ai)P - (1- al)P + a}- af Since al ~ ai then (1 - ai)P - (1 - al)P ~ 0 and a} - af ~ 0 and

therefore dp(A, Ei) ~ dp(A, El)' From this it follows that Min/mp(A, Ei) = mp(A, Ek) where ak is the largest membership grade in A. Thus

Sp(A) = 1 - mp(A, Ek) when the largest membership grade in A occurs at the kth component. Without loss of generality assume a 1 is the largest membership grade in A. In this case

Sp(A) = 1 - J f aI! + (1 - al)P)~) J.·U =2 )

Here we see that if al increases then (1- al) decreases and Sp(A) increases.While

if aj' for j *- 1, increases then Sp(A) decreases.

Thus we see that our proposed measure is indeed a measure of specificity. In particular we have shown that if A is a fuzzy subset with largest membership grade ak then

102

Sp(A) = 1 - mp(A, Ek) is a measure of specificity.

5. Specificity of Probability Qualified Statement

Assume V is a variable taking its value in the set X. In the preceding we have considered measures of specificity associated with statements of the form V is A where A is a fuzzy subset of X. In the following we shall consider measuring the specificity of statements of the form

V is A is A. probable. That is we allow a probabilistic qualification to be associated with our knowledge.

In order to appropriately deal with this kind of knowledge we must introduce the Dempster-Shafer belief structure as a representative framework [26-29]. The Dempster-Shafer belief structure provides a very general structure which allows for the representation of different kinds of uncertainty. In particular, it allows for the representation of probabilistic as well as non-specific types of uncertainty.

A belief structure m defined over the space X has an associated collection of non-null crisp subsets of X, Bl" ... , Bn called focal elements and a set of non

n negative weights m(Bj) such that.~ m(Bj) = 1. As suggested by Yen [30]

J=l and other researchers [2, 28, 31 - 35] one can extend this idea to fuzzy belief structures by allowing the Bj to be fuzzy subsets of X. In the following unless

otherwise indicated we shall assume this more general setting. One interpretation that can be associated with this belief structure is that of

random sets [36]. In this interpretation a belief structure corresponds to a random experiment whose outcomes are subsets of X, fuzzy or otherwise. In this interpretation m(B) is the probability that Bi is the outcome of the experiment.

Here we note that the outcome of the experiments rather than being a specificity point from X is a set, it is nonspecific.

Two measures can be associated with these belief structures. The first measure was called the measure of plausibility by Shafer [37]. Dempster [26], who takes a view more in the spirit of random sets, called this the upper probability. Let D be any subset of X then

n PleD) = . ~ Poss[DlBj] m(Bj)

J = 1 where Poss[DlBj] = Maxx[D(x) 1\ Bj(x)]. The second measure was called by Shafer the measure of belief and by Dempster the lower of probability, it is defined as

n Bel(D) = . ~ Cert[DlBj] m(Bj)

1 =..1 where Cert[DlBj] = 1 - Poss [DIBjl- It can be shown that for any subset D

Bel(D) ::;; Prob(D) ::;; PleD),

103

here we clearly see that justification of the terminology of upper and lower probability used by Dempster.

Viewing m(Bj) as the probability that Bj occurs we see that the measure of

plausibility is the expected possibility and the measure of belief is the expected certainty. Generalizing this idea Yager in [2] suggested a measure of specificity associated with a belief structure m with focal elements B 1, .... , Bn' In

particular n

Sp(m) = 1: SP(Bj) m(Bj)' j = 1

Thus the specificity of a belief structure is the expected specificity of the focal elements.

Consider now the piece of knowledge V is A is A. probable

where A is a fuzzy subset of X and A. E [0, 1]. We shall assume that A is normal and there exists at least one element having zero membership grade. Consider now the belief structure m which has two focal elements Bland B2 for which

m(Bl) = A. and m(B2)= 1 - A. and the focal elements are defined as follows

B 1 (x) = 1 for A(x) = 1

B 1 (x) = 0 for A(x) :;:. 1

B2(x) = 0 for A(x) :;:. 0

B2(x) = 1 for A(x) = 0

First we note that in case when A is a crisp set B 1 = A and B2 = A. Let us show

that for this belief structure Prob(A) = A.. Consider PI(A) = Poss(AlBI) A. + Poss(AlB2) (1- A.) here we note that

Poss(AIB 1) = Maxx[A(x) A B 1 (x)] = 1

Poss(AlB2) = Maxx[A(x) A B2(x)] = 0 hence PI(A) = A.. Consider now Bel(A) = Cert(AlBI) A. + Cert(AlB2) (1 - A.) since

Cert(AlBI) = 1- Poss(AlBI) = 1- 0 = 1

Cert(AlB2) = 1 - Poss(AlB2) = 1 - 1 = 0 we get Bel(A) = A.. Since Bel(A) ~ Prob(A) ~ PI(A) we get Prob(A) = A..

Using our definition for specificity we get Sp(m) = A. Sp(Bl) + (1 - A.) Sp(B2)

Let us consider the situation when our measure of specificity is defined as Sp(F) = Max Membership in F -Average of the other Memberships in F

First we shall assume n is the cardinality of the underlying space. We shall let n 1

be the number of elements having membership grade of one in A and n2 as the

number of elements having zero membership grade in A. In this case

Sp(Bl) = 1 _ nl - 1 = n - I-nl + 1 = n - nl n-l n-l n-l

104

S (B ) - 1 n2 - 1 _ n - n2 p 2 - --- ---n-l n-l

Sp(m) = A n - nl + (1 _ A) n - n2 n-l n-l

Sp(m) = n - n2 - A (nl - n2). n - 1

Here we see that if A = 1 then Sp(m) = n - nl and if A = 1 then Sp(m) = n - n2. n-l n-l

6. Specificity Under Similarity Relations

In some environments we may need a somewhat modified version of our measure of specificity. Consider the problem of deciding what jacket we shall wear. Assume we know the temperature will be above 80 degrees. While this is not very specific information, for the purposes at hand, the information is specific enough since all these temperatures lead to the same choice. In order to capture this idea we need a generalized measure of specificity based upon the similarity of the underlying elements. In order to construct this measure we need discuss the concept of a similarity measure introduced by Zadeh [38].

Definition: A similarity relation S is a fuzzy relation on X x X which is (1) reflexive: S(x, x) = 1 for all x E X (2) symmetric: S(x, y) = S(y, x) (3) transitive: S(x, z) ~ Maxy(S(x, y) A S(y, z»

Assume S is a similarity relation on X x X. Let Sa' 0 ~ a ~ 1, be the alevel set of S, then each Sa is an equivalence relation on X. We shall let 1ta denote the set of equivalence classes of S for a given level a. For any al > a2,

1ta2 is a refinement of 1tal , that is if x and y are in the same equivalence class

for a2 = a then they are in the same equivalence class for all 1ta where a < a. Furthermore it can be shown that x and y belong to the same equivalence class of 1ta if S(x, y) ~ a. Thus S(x, y) equals the largest value of a for which x and y

are in the same equivalence class. An important idea introduced by Zadeh in [38] is that of similarity classes.

Assume X is a set and S is a similarity relation on X x X. With each x E X we can associate a similarity class denoted S[x1" This similarity class is a fuzzy subset of X which is characterized by the membership function S[x](y) = S(x, y).

If S[x] is a similarity class then the a-level set of S[x]' S[x]a' consists of the set

of elements that are in the same equivalence class of x in the 1ta partition. Assume X is a set of elements. Let S be a similarity relationship on

X x X. Let A be a fuzzy subset of X. We define Sp(AIS) as the specificity of A under the similarity relation S. Essentially Sp(AIS) measures the degree to which A contains one class of similar elements.

105

We fIrst recall the defInition for the specifIcity of A introduced in [1].

J (lmax 1 da.

o Card(Aa.)

We also recall that for each a. Sa is a similarity relation. Let 1ta be the

partition of Sa. Each 1ta consists of a collection of objects. denoted 1taW. consisting a subset of elements of X which are equivalent at that level. Thus

1taG) n 1ta(i) = 0 and Uj 1taG) = X. We now defIne a new object denoted AafS. Each Aa/S consists of a subset

of objects from the partition 1ta . In particular. 1ta G) E Aa/S if there exists an element x contained in 1ta G) and Aa. Thus the membership grade of 1ta G) in Aa/S equals Maxx[Aa(x) 1\ 1ta G) (x)]. We note that each x is in only one 1ta G). We now can defIne Sp(AIS) as

Sp(AIS) = J (lmax 1 da o Card(Aa./S)

The following example illustrates the use of this defInition.

Example: Assume X = {Xl. x2. x3. x4. xs. x6} and S is a similarity relation onXxX where

S=

1 .2 1 .6.2.6

.2 1 .2.2.S .2

1 .2 1 .6.2.6

.6 .2 .6 1 .2.S

.2 .S .2 .2 1 .2

.6 .2 .6 .s .2 1 Then similarity classes for this relation are the following L1 = {Xl. x2. x3. x4. xs. x6} L2 = {Xl. x3. x4. x6} L3 = {x2. xS} L4 = {Xl. x3} LS = {x4. x6} L6 = {x2. xS} L7 = {Xl. x3} LS = {x4} 4 = {x6} LIO = {x2} Ll1 = {xs} Figure 1 shows these equivalence classes.

0< a ~ 0.2 0.2 < a ~ 0.6 0.6 < a ~ O.S

O.S ~ a< 1

In addition. assume A is a fuzzy subset of X where

A = {.:.l.. A. ~. 2. l. l}. The level sets of A are the following Xl x2 x3 x4 x5 x6

Aa = X 0 ~ a ~ 0.1

Aa = {x2. x3. x4. xs. x6} 0.1 < a ~ 0.4 Aa = {x3. x4. xs. x6} 0.4 < a ~ 0.6 Aa = {x4. xs. x6} 0.6 < a ~ 0.7

106

a::;; 0.7

(L7) (LS) (L9) (LID) (LlI)

XI.X3 X4 X6 X2 X5 .8 ~ (l~ I

(L4) (L5) (L6)

XI.X3

(L2)

We now calculate AalS

AalS = {Ltl AalS = {L1}

AalS = {L2, L3}

AalS = {L2, L3}

AalS = {LS' L6}

AalS = {LS' L6}

AalS = {LU' L9} From this we see that

Card(AaIS) = 1

Card(AaIS) = 2

hence

X4.X6

(L3)

(Ll)

XI. X2. X3. X4. X5. X6

Fig.1. Equivalence classes for S

0::;; a::;;; 0.1

0.1 < a::;;; 0.2

0.2 < a::;;; 0.4

0.4 < a::;;; 0.6

0.6 < a::;;; 0.7

0.7 < a ::;; 0.8

0.8 < a::;;; 1

0< a < 0.2

0.2 < a::;; 1

X2.X5

Sp(AaIS) = 1 da= Ida + Ida Jl J.2 II o card(AaIS) 0 1 .2 2

Sp(AaIS) = (1)(.2) + (.5)(.8) = .2 + .4 = .6

.6 < (l ~.8

.2 < (l ~.6

o < (l~.2

we now shall indicate some properties of this new definition of similarity based upon specificity, proofs of these results can be found in [8].

Property 1. For the similarity relation S where S(x, x) = 1 and S(x, y) = 0 for

107

y *" x we get Sp(AIS) = Sp(A). Thus the original specificity relation is a special case of this more general

definition under the assumption that all the elements are completely dissimilar at all levels. We shall denote the above similarity relation as I.

Property 2. For any S if A is a singleton set then Sp(AIS) = 1. The above indicates that under any measure of similarity a one point set

attains the maximal specificity, one. However, as we next indicate one elements sets are not necessarily the only situation in which we get maximal specificity.

Property 3. Sp(AIS) = 1 iff (1) there exists at least one element in A such that A(x) = 1 (2) S(x, y) ~ A(x) /\ A(y) for all x and y

The following relates to a very special case of the above theorem.

Property 4. A crisp set A has specificity one under S if for all y and x in A it is the case that S(x, y) = 1.

Essentially the above implies that if the membership grades of two elements are such that one disappears from Aa before they become distinguishable then we

never see these as more than one element in the calculation of A.

7. Specificity in the Continuous Domain

Assume X is a continuous space such as some interval of the real line. In [10] we suggested a measure of specificity useful in this continuous environment. Before introducing this measure we must provide some necessary definitions. We recall if A is a fuzzy subset of X the a-level set of A is a crisp subset of X denoted Aa and defined as Aa = {x I A(x) ~ a}. A fuzzy measure [39,40] Jl is a set function defined on X where

Jl: 2X ~ [0, 1] having at least the following properties

(1) Jl(0) = 0 (2) Jl(X) = 1 (3) if A c B then Jl(A) ~ Jl(B).

Furthermore, we shall consider only measures for which Jl(A) = 0 iff A is a singleton set or empty.

In [10] a general class of specificity measures in this continuous domain was defined as

Sp(A) = f.CXmax F<l1<Aa>)da.

where a max is the maximal membership grade in A and F is a function having

the following properties F: [0, 1] ~ [0, 1]

where (1) F(O) = 1 (2) F(1) = 0 (3) F(x) ::;; F(y) ::;; 0 for x> y.

108

Let us see how this definition respects the desired properties of a specificity measure. It is easy to show if A(x) = 0 for all x, then Sp(A) = O. The requirement for Sp(A) = 1 is a little more subtle. To satisfy this condition we

need Sp(A) = rmax F()1(Aa»)da = 1 This reqWres fils. that )1(Aa) = 0 for all a.

Thus for each a Aa must be a nonnull set of measure zero, a singleton. From

this it follows that A must be a singleton set. If the largest membership grade incr~ases then <Xmax increases and Sp(A) increases. If any other membership

grade increases it can be sen that the value of Sp(A) decreases. The following example will illustrate the use of this kind of membership

function. We note that in the example we use a Lebesgue-Stieltjes measure for /.1 [41].

Example: We shall assume that X = [0, 1] and that /.1 is defined for intervals in this space as /.1([a, b]) = b - a. Furthermore we shall assume that F is defined as F(z) = 1- z.

1) consider the possibility distribution shown if Figure 2. This fuzzy set is defined by:

A(x) = 2x 0 ::;; x ::;; 0.5 A(x) = - 2x + 2 0.5 ::;; x ::;; 1 A(x) = 0 elsewhere.

For any a, Aa = [a, 2 - a] and hence /.1(Aa) = 2 - a - a = 1 - a. Since 2 2 2 2

a max = 1 then

SP(A) = e F()1(Aa»da = f.' 1 - (1 - a) da = f: 1 - (1 . a) da = 0.5

1

o 0.5 1

Fig .. 2. Triangular membership function

2) Consider next A shown in Fig.3.

109

1

o .3 0.5 .7 1

Fig. 3. Second triangular fuzzy subset.

In this case A(x)=O x::;; 0.3 A(x) = 5x - 1.5 0.3 ::;; x ::;; 0.5 A(x) = - 5x + 3.5 0.5::;; x ::;; 0.7 A(x) = 0 0.7 ::;; x ::;; 1

At the a level ITa = [ a +51.5, 3.55- a ] and therefore

J.1(Aa) = 3.5 - a Ja + 1.5) = 3.5 - 2a - 1.5 2 - 2a = 2.. (1 - a). 5 5 5 5 5

Therefore

Sp(A) = e F(J.1(Aa}}da = e 1 - 2.. (1 - a) da = e ~ -2..a da = ~ + 1. = ~ Jo Jo 5 Jo 5 5 5 5 5

A very useful special form of the specificity measure in the continuous environment can be obtained. Assume our space X is some interval on the real line, [a, b]. We shall let F(z) = 1 - z. In this case

(amax (amax Sp(A) = Jo F(J.1(Aa }}da = Jo (1 - J.1(Aa }}da

(amax Sp(A) = CXmax - Jo J.1(Aa )da

We shall use for J.1 a Lebesque measure, J.1(A) = Length(A). We recall that if E is b-a

q

a subset of X then E can always be expressed as E = U Ei where each of the Ei i = 1

is an interval of X, Ei = [ai, bi]. In this case the Lebesque measure ofE is

Recalling that

q

J.1(E) = _1_ .:E (bi - ai) b-al=l

110

SP(A) = "max - C""'" l'lAa>da

and representing Aa as qa;

Aa = U Aa;i i = 1

where Aai = [aai, bai] and denoting LEN(Aai) = (bai - aai) then we get qa;

using this we get

f.!(Aa,) = _1_ . ~ Len(A~) boa 1=1

lamax Sp(A) = ~ax - _1_ b - a 0

(Len(~»da

The term Lam.. (Len(~»)da can be seen to be the area under the fuzzy subset

A. Hence we see that we get Sp(A) = ~ax - area under A . Since area under A boa boa

is the average membership grade in A which we shall denote as Av-mg(A) we get Sp(A) = a max -Av-mg(A)

Then using this formulation we get the very simple and useful form for A measure of specificity in the continuous domain as the difference between the maximum membership grade in A and the average membership grade of A. It is important to note the role that the domain [a, b] plays in the process. We see that we divide the area under A by (b - a) this as this increases our specificity increases even if the area of the fuzzy subset remains the same.

8. Application of Specificity to Validation of Expert Systems

In the following we shall briefly describe an application of specificity measure of performance of fuzzy expert systems. (see fig. 4).

Input Data Fuzzy Expert System

Fig 4. Fuzzy expert system

Output Value

~

111

Assume for the ith input to the expert system the output value is the fuzzy set Ai

and the correct output value is xi. One can consider first as a measure of

performance of this expert system the degree of correctness of this system for this case. As a measure of correctness we can use the membership grade of xi in Ai,

Ai(xi). However we must be careful in only using this measure. For example if

the output Ai = X, the whole output space we get as our measure of correctness

Ai(xi) = 1. The problem here is that while we did indeed give a correct answer the

answer we useless, we effectively said that the answer was any value. In order to avoid this kind problem of providing to general answers we must also associate with the answer provided by the system a measure of its usefulness or informativeness. This second measure can we captured by using the measure of specificity of Ai, Sp(Ai). Combining this to measures we can obtain a measure

of performance of the expert system on this ith case as Per(i) = correctness x the specificity = Ai(xi) Sp(Ai). More generally if we have n observations on this

system the overall performance of the system, Per, can be obtained as the average D

of the performances of each trail, Per = 1 L Per(i). More generally if the n i = 1

known output of the system is a fuzzy subset Bi instead of a specific value Xi we

must calculate the measure of correctness in a slightly different manner. One way to obtain this is to use the possibility measure Correctness(B/ Ai) = Maxx[Bi(X) A Ai(X)].

9. Conclusion

We have discussed the measure of specificity. We described the characterizing features of this measure and introduced some particular manifestations of it.

References

[1] Yager, R.R., "Measuring tranquility and anxiety in decision making: An application offuzzy sets," Int. J. of General Systems 8, 139-146, 1982.

[2] Yager, R.R., "Entropy and specificity in a mathematical theory of evidence," Int. J. of General Systems 9, 249-260, 1983.

[3] Yager, R.R., "Measuring the quality of linguistic forecasts," Int. J. of ManMachine Studies 21, 253-257, 1984.

[4] Yager, R.R., "Measures of specificity for possibility distributions," in Proc. of IEEE Workshop on Languages for Automation: Cognitive Aspects in Information Processing, Palma de Mallorca, Spain, 209-214, 1985.

[5] Yager, R.R., "Toward a general theory ofreasoning with uncertainty part I: nonspecificity and fuzziness," Int. Journal of Intelligent Systems 1,45-67, 1986.

[6] Yager, R.R., "Ordinal measures of specificity," Int. J. of General Systems 17,57-72, 1990.

112

[7] Yager, R.R., "Specificity measures of possibility distributions," Proceedings of the Tenth NAFIPS Meeting, U. of Missouri, Columbia, MO, 240-241, 1991.

[8] Yager, R.R, "Similarity based specificity measures," International Journal of General Systems 19,91-106, 1991.

[9] Yager, RR, "Default knowledge and measures of specificity," Information Sciences 61, 1-44, 1992.

[10] Yager, RR., "On the specificity of a possibility distribution," Fuzzy Sets and Systems 50, 279-292, 1992.

[11] Dubois, D., Prade, H. and Yager, RR, Fuzzy Information Engineering: A Guided Tour of Applications, John Wiley & Sons: New York, 1997.

[12] Higashi, M. and Klir, G.J., "Measures of uncertainty and information based on possibility distributions," Int. J. of General Systems 9, 43-58, 1983.

[13] Klir, G. J. and Folger, T.A., Fuzzy Sets, Uncertainty and Information, Prentice-Hall: Englewood Cliffs, N.J., 1988.

[14] Klir, G.J. and Bo, Y., Fuzzy Sets and Fuzzy Logic: Theory and Applications, Prentice Hall: Upper Saddle River, NJ, 1995.

[15] Zadeh, L.A., "Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic," Fuzzy Sets and Systems 90, 111-127, 1997.

[16] Kacprzyk, J., "Inductive learning from considerably erroneous examples with a specificity based stopping rule," Proceedings of the Int. Conference on Fuzzy Logic and Neural Networks, Iizuka, Japan, 819, 1990.

[17] Zadeh, L.A., "A theory of approximate reasoning," in Machine Intelligence, Vol. 9, Hayes, J., Michie, D., and Mikulich, L.I. (eds.), New York: Halstead Press, 149-194, 1979.

[18] Yager, RR, Ovchinnikov, S., Tong, R and Nguyen, H., Fuzzy Sets and Applications: Selected Papers by L. A. Zadeh, John Wiley & Sons: New York,1987.

[19] Dubois, D. and Prade, H., "A note on measures of specificity for fuzzy sets," International Journal of General Systems 10, 279-283, 1985.

[20] Dubois, D. and Prade, H., "The principle of minimum specificity as a basis for evidential reasoning," Uncertainty in Knowledge-Based Systems, Bouchon, B. and Yager RR, (Eds.), Springer-Verlag: Berlin, 75-84, 1987.

[21] Dubois, D. and Prade, H., "Fuzzy sets in approximate reasoning Part I: Inference with possibility distributions," Fuzzy Sets and Systems 40, 143-202, 1991.

[22] Zadeh, L.A., ,jFuzzy sets as a basis for a theory of possibility," Fuzzy Sets and Systems 1, 3-28, 1978.

[23] Kosko, B., "Fuzzy entropy and conditioning," Information Sciences 40, 165-174, 1986.

[24] Kosko, B., Fuzzy Engineering, Prentice Hall: Upper Saddle River, NJ, 1997.

[25] Moore, T.O., Elementary General Topology, Prentice-Hall: Englewood Cliffs, NJ, 1964.

113

[26] Dempster, A P., "Upper and lower probabilities induced by a multi-valued mapping," Ann. of Mathematical Statistics 38,325-339, 1967.

[27] Dempster, AP., "A generalization of Bayesian inference," Journal of the Royal Statistical Society, 205-247, 1968.

[28] Shafer, G., "Belief functions and possibility measures," in Analysis of Fuzzy Information, Vol 1: Mathematics and Logic, edited by Bezdek, J. C., CRC Press: Boca Raton: Florida, 1987.

[29] Smets, P., "Belief Functions," in Non-Standard Logics for Automated Reasoning, Smets, P., Mamdani, E.H., Dubois, D. and Prade, H. (eds.), London: Academic Press, 253-277, 1988.

[30] Yen, J., "Generalizing the Dempster-Shafer theory to fuzzy sets," IEEE Transactions on Systems, Man and Cybernetics 20, 559-570, 1990.

[31] Dubois, D., "Belief structures, possibility theory and decomposable confidence measures on finite sets," Computers and Artificial Intelligence (Bratislava), Soumis, L. (ed.), 403-416, 1986.

[32] Dubois, D. and Prade, H., "A set-theoretic view of belief functions," International Journal of General Systems 12, 193-226, 1986.

[33] Yager, R.R, "Reasoning with uncertainty for expert systems," Proc. of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, 1295-1297, 1985.

[34] Yager, RR, "The entailment principle for Dempster-Shafer granules," Int. J. of Intelligent Systems 1, 247-262, 1986.

[35] Yager, R.R., "Arithmetic and other operations on Dempster-Shafer structures," Int. J. of Man-Machine Studies 25,357-366, 1986.

[36] Goodman, I.R and Nguyen, H.T., Uncertainty Models for Knowledge-Based Systems, North-Holland: Amsterdam, 1985.

[37] Shafer, G., A Mathematical Theory of Evidence, Princeton University Press: Princeton, N.J., 1976.

[38] Zadeh, L.A, "Similarity relations and fuzzy orderings," Information Sciences 3, 177-200, 1971.

[39] Sugeno, M., "Theory of fuzzy integrals and its application," Doctoral Thesis, Tokyo Institute of Technology, 1974.

[40] Sugeno, M., "Fuzzy measures and fuzzy integrals: a survey," in Fuzzy Automata and Decision Process, Gupta, M.M., Saridis, G.N. & Gaines, B.R (eds.), Amsterdam: North-Holland, 89-102, 1977.

[41] Halmos, P.R, Measure Theory, D. Van Nostrand: Princeton, NJ, 1950.

What.s in a Fuzzy Membership Value?

Sukhamay Kundu

Computer Science Department, Louisiana State University, Baton Rouge, LA 70803, USA [email protected]

Abstract. Although a good amount of theory has evolved over the last 30 years about the desirable or "good" properties of membership values for fuzzy sets and relations, relatively little is known about how to define the membership values in an objective way to satisfy those properties. As a result, one often ignores those properties when developing applications of fuzzy sets. We present here some successful designs of membership functions having good properties.

Keywords. Min-transitivity, representation theorem, membership in a fuzzy group.

1. Introduction

The key to the notion of a fuzzy set is its membership function J1A(X). Each fuzzy entity (a set, a relation, a rule, or a formula) has a membership function on a suitable domain associated with that entity. The physical concept or relationship that is modeled by a fuzzy entity should directly influence both its membership values and the logical (and/or/not) and other algebraic operations defined on those values. When we use concrete membership functions in a given application, we need to verify relevant properties for those membership functions. Otherwise, we lose the connection between the "theory" and the "reality" that is being modeled. It is one thing to say, for example, that we define the "tall and pretty" to mean J1tall-and-pretty(x) = min{J1tau(x), J1 pretty(x)} and it is another thing to say that the above equality follows from what we know about the relationship among the notions "tall", "pretty", and "tall and pretty", and the way the membership values for these concepts are formed. It is not a common occurrence in the fuzzy literature, however, that a membership function (which is often defined based on intuition alone) is analyzed thoroughly to determine whether it satisfies the relevant properties or not.

We review below some recent successful examples of concrete membership functions with good properties. In particular, we consider a successful method for defining a min-transitive fuzzy relation [7, 8, 9] that has broad applications. We also present in this connection a new representation theorem for a general mintransitive relation. As another example, we present a successful method of constructing a membership function for a fuzzy group. This last example illustrates both the effect of the group-structure on the membership values and the role of min-transitivity [10].


115

2. Leftness Relationship

There have been many attempts [2,6, 11, 14] in defining a suitable fuzzy notion of "leftness" between two objects in a 2-dimensional scene. Note that any such method can be used directly to define the notions of "rightness", "aboveness", etc. There have also been other efforts [4, 12] in defining a fuzzy preference relation between fuzzy membership functions. In each case, the core of the problem lies in defining the leftness (or preference relation) between two one dimensional objects, i.e., two intervals. Interestingly, not much attention has been paid to this special case although a good notion of leftness between 2-dimensional objects must give rise to a good notion of leftness between two intervals A and B since we can regard an interval A as the limiting case of the rectangle Ax[O, t] as f J... O.

leftnessl (A, B) = limitt~ leftness2(Ax[O, f], Bx[O, t]) (1)

However, none of the existing definitions of leftness for 2-dimensional objects yield a leftness relation for intervals via (1) that satisfies the min-transitivity property (2). (The min-transitivity property plays a fundamental role in the theory of fuzzy sets and its applications.)

Left(A, C);;?: min{Left(A, B), Left(B, C)} (2)

We can view the leftness relation as a preference relation, where more left means more preferred. Preference relations are used in fuzzy decision making [12, 13]. The preference relations defined in [4, 12] do not satisfy the min-transitive property. The relation Left(A, B) defined in [7] is shown in the 2nd column of Table 1; it satisfies the min-transitivity property [8]. The extension of Left(A, B) to two and three dimensional objects are also considered in [7]. An alternative and simpler notion of leftness for intervals is Left N(A, B), which is shown in the 3rd column of Table 1; it too satisfies the min-transitivity [9]. However, it is less sensitive than Left(A, B) because it does not depend on b = the length of AnB; in fact, LeftN(A, B) is obtained by putting b equal to 0 in Left(A, B).

We briefly review below the method used in [7] in constructing Left(A, B); a similar method might be useful in defining membership values for other concepts. The notion of leftness between two points is easy to define. The same is true if we regard the real line (-00, + 00) as a juxtaposition of unit length bricks and we consider the leftness between two such bricks. In each case, we get a boolean 0-1 valued notion of leftness. Now, consider two intervals made of these bricks as shown in Fig. 2. We can take the proportion of queries "Ai is to the left of B j" that are

true as a first approximation to the notion of leftness. To make this definition independent of the length of the bricks, we subtract from it the proportion of the queries "Ai is to the right of B/ that are true. The final value of Left(A, B) is shown in (3), expressed in terms of probabilities (corresponding to brick length J... 0). Here, P(x < y) = the probability of x being less than y where x and y are random variables with uniform distribution in the intervals A and B, respectively.

Left(A, B) = max{O, P(x < y) - P(x > y)} (3)

116

Table 1. The values of Left(A, B) and LeftN(A, B).

Cases (each of a, b, c ~ 0) Left(A, B) LeftN(A, B)

(0°) A = [0, a] 1 1 B= [a+b, a +b +c] b~O.

(1°) A = [0, a+b] b2 1 B=[a,a+b+c] l-

as in Fig. l(i). (a + b)(b + c)

(2°) A = [0, a + b + c] ( a-c) ( a-c) B=[a,a+b] max. 0, b max. 0,--

a+ +c a+c as in Fig. 1 (ii).

(3°) A = [a, a+b] ( c-a) ( c-a) B = [0, a + b + c] max. 0, b max. 0,--

a+ +c a+c as in Fig. l(iii).

(4°) A = [a, a + b + c] ° ° B = [0, a +b] as in Fig. l(iv).

A A

(i) I a I b c a tOJ cl (ii)

(iii)

B B

A A

I a I b I c I a I b c

B B

Fig. 1. The non-trivial cases of overlapping intervals A and B

A

Bl B2 B31 B41 Bsl B61

B

Fig. 2. Defining the notion of Left(A, B)

(iv)

117

Among the 24 queries "Ai is to the left of B/ in Fig. 2, only 21 queries evaluate to true, giving the initial value of leftness(A, B) = 21124. Then, only 1 of the 24 queries "Ai is to the right of B j" being true, we get the modified value of leftness(A, B) = 21124 - 1124 = 5/6. Finally, as the brick length J, ° in the limit, the limit gives the same value 5/6. Thus, for A = [0,2] and B = [1, 3], which have the same overlapping structure as in Fig. 2 (the right half of A equals the left onethird of B), we get Left(A, B) = P(x < y)-P(x > y) = 5/6.

2.1 Related AND/OR Operations

Given any leftness relationship, it is natural to define the related concept "left of A" for a fixed A by (4).

f.lleft-A(X) = Left(X, A) (4)

Clearly, the goodness of a leftness relation can be judged in terms of the properties of its associated f.lleft-A (X). In particular, one may ask how to define the "AND"/"OR" of two such concepts. For instance, is there an interval C such that left-C = (left-A) AND (left-B)? To be more precise, what is a suitable of notion of "AND" for which such an interval C would exists, and in that case how is C determined by the intervals A and B? Following the abstract theory of fuzzy sets, one would like to have for the AND-operation Left(X, C) = min{Left(X, A), Left(X, B)} (Zadeh's AND-operation) or Left(X, C) = max{O, Left(X, A)+Left(X, B) - I} (Lukasiewicz's AND-operation). Similarly, for the OR-operation, one would like to have Left(X, C) = max{Left(X, A), Left(X, B)}

(Zadeh's OR-operation) or Left(X, C) = min{ 1, Left(X, A)+Left(X, B)} (Lukasiewicz's OR-operation). But one rarely achieves these equalities when the ANDIOR operations are defined based on the structures of the objects and the fuzzy concept in question, rather than simply "define" them abstractly to satisfy the equalities. In reality, one can at best hope to have the inequalities shown in (5) for Zadeh's case (and similar inequalities for Lukasiewicz's case, which are weaker than (5». For the notion of leftness of points, the equalities for the ANDIOR operations in (5) can be achieved [7] (cf. Theorem 2 in Section 3).

max {f.llejt-A (X), f.lleft-B(X)} :5; f.lleft-(A-OR-B/X)

f.llejt-(A-AND-B)(X) :5; min {f.lleft-A (X), f.lleft-B(X)} (5)

It is shown in [7] that any notion of Left(A, B) which satisfies some reasonable postulates, including the scale invariance property mentioned earlier and the monotonicity postulate shown below, satisfies (5). Moreover, the definitions for the intervals Ar.'lB and A\!IB in Fig. 3 serves for the AND and OR operations. We point out that there is no such interval corresponding to the NOT operation, i.e., there is no interval B corresponding to the membership function f.lleft-B(X) = Left(X, B) = 1 - f.lleft-A (X). This is not surprising because even for the notion of leftness between points, one does not have a point q such that left(x, q) = 1 -left(x, p), for all points x [7].

118

Monotonicity postulate: If 0 < Left(A, B) < 1, then moving an endpoint of A to the left strictly increases the value of Left(A, B) and moving an endpoint of A to the right strictly decreases Left(A, B). The opposite holds when the end points of B are moved to the left or to the right.

Theorem 1 [7]. The fuzzy relation Left(A, B) defined by (3) satisfies rnintransitivity. The inequalities in (5) hold if we define A(.')B = A-AND-B and AI!IB = A-OR-B as in Fig. 3. The usual distributive properties hold for the operations {r.'I,

I!I}: A(.')(BI!IC) = (A(.')B)I!I(A(.')C) and AI!I(B(.')C) = (AI!IB)(.')(AI!IC). 0

B B

A

(i) A(.')B = A and AI!IB = B. (ii) A ~ A(.')B, AI!IB ~ B.

leftend(A(.')B) = rnin{leftend(A), leftend(B)} rightend(A(.')B) = min{rightend(A), rightend(B)}

leftend(AI!IB) = max.{leftend(A), leftend(B)} rightend(AI!IB) = max.{rightend(A), rightend(B)}

Fig. 3. Illustration of A(.')B and AI!IB when A and B partially overlap or one is contained in the other

In [7], we give several interesting properties of A(.')B and AI!IB. If A and B are disjoint intervals and A is to the left of B, then A(.')B = A and AI!IB = B; for A = B, A(.')B = AI!IB = A. For all A and B, we have AnB = [A(.')B] n [AI!IB] and AuB = [A(.')B] u [AI!IB]. The operations r.'I and I!I are commutative and associative. Let 10 denote the formal interval [-00, -00] (= the empty set); 10 is the limiting case of an interval which is moved to the far left. Then, A(.')/o = 10 and Al!l/o = A, and thus 10 acts like the O-element with respect to r.'I and I!I. Similarly, if we denote by 1\ the formal interval [+00, + 00] (= the empty set, also), then A(.')/\ = A and AI!I/\ = 1\, and thus 1\ acts like the I-element. For any reasonable definition of Left(A, B), we may assume that Left(A, 1\) = 1 and Left(A, 10) = o. If we write -A = [-a, -b] when A = [a, b], then -(-A) = A, -[A(.')B] = [-A]I!I[-B], and -[AI!IB] = [-A](.')[-B] , However, there is no connection between "-B" and the negation of the concept "left of B", i.e., Left(A, - B) *' 1- Left(A, B).

3. Representation of Min-Transitive Relations

We now show that all min-transitive relations can be constructed, in an abstract sense, using a single general principle. We begin by considering the problem of

119

defining a min-transitive dominance relation J.lo(A, B) among fuzzy sets A and B on an arbitrary domain X. The intervals are special kinds of sets for two reasons: (1) they are crisp sets, and (2) they have a particular form related to the linear ordering of the real numbers. The fuzzy relation Left(A, B) makes use of both of these features. We now use a different approach for defining J.lo(A, B), which reflects in some way the fuzzy truth value of "A dominates B", i.e., A ;;2 B or, equivalently, J.lA(X) ~ J.lB(X) for all x. In the new approach, we do not compute the proportion of the queries "J.lA(X) ~ J.lB(X)" that are true. The importance of our definition of J.lo(A, B) lies in its special properties given in Theorems 2, 3, and 4. We pay a cost, however, for the generality of J.lo(A, B) compared to Left(A, B). If we consider the intervals A and B as fuzzy subsets of X = (-«>, +00), then we get J.lo(A, B) is a crisp binary relationship; J.lo(A, B) = 1 if A ;;2 B, and 0 otherwise.

Let J.lA(X) and J.lB(X) be the membership functions of two fuzzy sets in X. For 0 ~ a ~ 1, let Bla denote the fuzzy set with the reduced membership function aAJ.lB(X) obtained by thresholding J.lB(X) at the level a. As abecomes smaller, it is more likely that the condition A ~ Bla, which is equivalent to Ala ~ Bla, will hold; for example, it always holds for a = O. This leads to the following definition.

J.lo(A, B) = sup{a: Ala ~ Bla} (6)

Thus, a value such as J.lo(A, B) = 0.2 means that J.lA(X) ~ J.lB(X) whenever J.lB(X) < 0.2 (or ~0.2, if X is finite) and at the remaining points J.lA(X) ~ 0.2, where we also have J.lB(X) ~ 0.2. For crisp subsets Aand B of X, J.lo(A, B) takes boolean values; it equals I if and only if A ;;2 B as crisp sets. In this sense, J.lo(A, B) is a proper generalization of the crisp notion of "superset". Theorem 2 below, which is easily proved, provides further support in this regard. In particular, if we write J.ldom.A(X) = J.lo(X, A), then it shows that (dom-A)-AND-(dom-B) = dom-(AAB) and (domA)-OR-(dom-B) = dom-(AvB), with equalities holding throughout (5).

Theorem 2. Given two fuzzy sets J.lA(x) and J.lB(X) of X, there is a unique smallest fuzzy set J.lc(x) such that J.lo(C, A) = 1 = J.lo(C, B) and it is given by C = AvB, i.e., J.lC<x) = max{J.lA(x), J.lB(X)}. Similarly, there is a unique largest fuzzy set J.lc (x) such that J.lo(A, e) = 1 = J.lo(B, e) and it is given by e = AAB, i.e., J.lc (x) = min{J.lA(x), J.lB(X)}. Also, J.lo(A, A) = 1 for all J.lA(X). 0

Example 1. Fig. 4 illustrates the notion of J.lo(A, B). A fuzzy set A = a/Xl + b/X2 on X = {Xlo X2} is represented here by the point (a, b) in the unit square. The crisp subsets of X correspond to the 4 comer points (0,0) = 0, (1,0) = {xd, etc. As Fig. 4(ii) shows, J.lo(A, B) is not symmetric, in general. 0

Theorem 3. The fuzzy domination relation J.lo(A, B) satisfies the min-transitivity: J.lo(A, C)~min{J.lo(A, B), J.lo(B, e)}.

Proof. Suppose a= J.lo(A, B) and P= J.lo(B, e). We may assume without loss of generality that a> P> O. For all sufficiently small e> 0, we have

120

J.lA (X) ~ (a - £)I\J.lB(x) for all X, and J.lB(x) ~ ([3 - £)I\J.lc(x) for all x.

Thus, J.lA(X) ~ ([3 - £)I\J.lc(x) for all X and hence J.lo(A, C) ~ [3= mint a, [3}. 0

X2

1 0.9 A = (004, 0.9)"

A = COA, 0.6) 0.6 --:1

I

_---'-___ ...l...-_ XI 004 0.6

(i) X = {xI, X2}, A = OA/x. + 0.9/X2' and A = A10.6 = (004, 0.6) .

X2

1 0.9 ---"iAvB

I I ,

0.6 ., I'B= (0. ,0.6)

__ -:-'-__ ...l...-_ XI

004 0.7

(ii) B = 0.7/Xl + 0.6/X2' J.lo(A, B) = 004, and J.lo(B, A) = 0.6.

Fig. 4. Illustration of Ala, J.lo(A, B), AI\B, and AvB

Note that J.lo(A, B) is not related to the fuzzy subsethood (or fuzzy supersethood) relation defined in [5]; in particular, the latter is not min-transitive. If we use linear scaling instead of thresholding, i.e., define Bla to have the membership function aJ.lB(x), then the resulting transitivity takes the product form: J.lo(A, C) ~ J.lo(A, B)J.lo(B, C), which is weaker than (implied by) the min-transitivity. On the other hand, if we use the probabilistic approach, where we let J.lo(A, B) = the proportion of the queries "J.lA(X) ~ J.lB(X)" that are true (for a finite set X, say), then the resulting transitivity takes the Lukasiewicz's product form J.lo(A, C) ~ max{O, J.lo(A, B) + J.lo(B, C) - I}, which is even weaker than the product form.

For a singleton set X = {XI}, each fuzzy set on X is a number in the interval [0, 1] and J.lo(A, B) is given by (7). It is easy to see that J.lo(a, b) is min-transitive. The relationship between (6) and (7) is given by (8), which also immediately shows that J.lo(A, B) is min-transitive. The fuzzy sets A and B in Fig. 4 give J.lO(J.lA(X.), J.lB(XI» = J.lo(0.4, 0.7) = 004, J.lO(J.lA(x2), J.lB(X2» = J.lo(0.9, 0.6) = 1.0, and J.Lo(A, B) = 0041\1.0 = 004.

{I, if a ~ b

For a, b E [0, 1], J.Lo(a, b) = . a, If a < b

(7)

(8)

We now state our key result on the representation of min-transitive relations.

121

Theorem 4. Let R(x, y) be any abstract min-transitive fuzzy relation on a domain X, where R(x, x) = 1 for all x. Then, there is a family of fuzzy sets {,uAm): x E

X} on a suitable domain n such that R(x, y) = ,uD(:i, j) for all x and y.

Proof. Let n = X. We associate with each element x E X the fuzzy set x on n defined by ,ux(Y) = R(x, y). Since ,ux(m) must somehow reflect the relationships R(x, y) of x to all y, it is not surprising that we define ,uAm) in this way. Now, let R(x, y) = a. The min-transitivity of R(x, y) implies that for all mEn, ,ux(m) =

R(x, m) ~ min{R(x, y), R(y, m)} = a/\ ,uy(m)}, which shows that ,uD(:i, YJ ~ a. By putting m= y, we get ,ux(m) = a and ,uy(m) = I and therefore ,uD(X, YJ cannot

be> a. 0

We point out that Theorem 4 by no means implies that the problem of defining the membership values for a min-transitive relation is solved. It only shows that this problem is intimately connected with defining certain related fuzzy sets (cf. the notion of ,uleft-A(X) vs. Left(A, B». We remark that the association x ~

,ux used in the proof of Theorem 4 may not be one to one, in general. Clearly, ,ux(m) = ,uy(m) for all m if and only if R(x, m) = R(y, m) for all m. If we define ,ux(m) = R(m, x), then it is easy to see that ,uD(X, YJ = R(y, x) and once again ,uD(i, j) is reflexive and min-transitive. In particular, if R(x, y) is symmetric, then the same is true for ,uD(i, yJ. The similarity relation ,us (a, b) = ,uD(a, b)/\,uD(b, a) = minta, b} for a, b E [0,1] is used extensively in [16].

Note that although we have Left(A, A) = ° unlike ,uD(A, A) = 1, this is not a shortcoming of Left(A, B) or ,uD(A, B). If we consider the strict dominance relation ,uv(A, B) obtained from ,uD(A, B) by (9) following Orlovsky (13], then ,uv(A, A) = 0; ,uv(A, B) will be min-transitive by Orlovsky's theorem [13] since ,uD(A, B) is min-transitive. For a, b E [0, 1], we have from (7) and (9) that ,uD< a, b) = 1 - b if a ~ b and 0, otherwise. However, if we use ,uv( a, b) in (8), we do not get ,uv(A, B); ,uv(a, b) is typically too small to produce non-trivial values via (8). Note that Left(A, B) given in (3) is the strict form obtained by (9) from the fuzzy relation ,u(A, B) = P(x < y); this ,u(A, B) is not, however, min-transitive [8].

,uv(A, B) = max{O, ,uD(A, B) - ,uD(B, A)} (9)

We can define an alternate domination relationship ,uD(a, b) on the interval [0, 1] by first mapping each a E [0, 1] to the interval La = [0, a] so that larger values of a corresponds to larger intervals and then using the relation Left(A, B) as shown in Fig. 5. Note that while ,uD(a, b) is particularly sensitive for a < b,

,uD(a, b) is sensitive for the opposite case a > b, as was the case for ,uv(a, b). If we use ,uD(a, b) in (8) to define ,uD(A, B), we run into the same problem that we encountered with ,uv(a, b); it often does not give non-trivial values for ,uD(A, B).

4. Membership Function for a Fuzzy Group

Finally, we consider the problem of defining membership values in presence of an algebraic structure, specifically the group structure. We first show that the

0' I

122

Fig. 5. A domination relation on [0, 1] using Left(A, B)

membership function f.l(x) of a fuzzy group G has a close connection with that of a similarity relation on G; in particular, with the min-transitivity of a similarity relation [10]. We then show that, under some natural assumptions, the membership values f.l(x) in a fuzzy group can be given a concrete representation, namely, when we view each group element x as a permutation of a suitable universe.n, then f.l(x) = the proportion of the elements of.n which are the fixed-points of x.

A fuzzy group G is a group with a membership function f.l(x) ~ 0 such that the conditions (G 1)-(G2) below are satisfied [15]. If e is the identity element of G,

then putting y = x-I in (G2), we get f.l(e) ~ f.l(x) for all x. Henceforth, we assume f.l(e) = 1. We sometimes denote the fuzzy group G by the pair (G, f.l).

(Gl) f.l(x) = f.l(x- I ) for all x (G2) f.l(xy) ~ min{f.l(x), f.l(y)} for all x and y

A similarity relation cr(x, y) is a reflexive (cr(x, x) = 1), symmetric (cr(x, y) = cr(y, x», and min-transitive fuzzy relation. The likeness of the min-transitivity property (2) with (G2) suggests a possible connection between similarity relations and fuzzy groups. We say a similarity relation o(x, y) on G is right invariant if (10) holds. The left invariance is defined in a similar way.

right invariance: cr(x, y) = cr(xz, yz) for all x, y, z in G (10)

Theorem 5 [10]. If (G, f.l) is a fuzzy group and f.l(e) = 1, then o(x, y) = f.l(xy-I) gives a right invariant similarity relation on G, with f.l(x) = o(e, x). Conversely, if o(x, y) is a right invariant similarity relation on a group G, th~n f.l(x) = o(e, x)

defines a fuzzy group (G, f.l). A similar result holds if we replace right invariance

by left invariance throughout and let o(x, y) = f.l(x-1 y). 0

Clearly, o(x, y) = f.l(xy-I) is both left and right invariant if and only if f.l(XY)

= f.l(Yx) for all x, y. In that case, we say f.l(x) has the commutative property. If Gis a commutative (abelian) group, then f.l(x) is commutative. For each t ~ 0, G[t] = {x E G: f.l(x) ~ t} is a subgroup of G, called a level-subgroup of G. If f.l is commutative, then each G[t] is a normal subgroup of G. The converse is also true.

It is well known [15] that given a similarity relation o(x, y), we obtain a family of nested (crisp) equivalence relations Rt = {(x, y): o(x, y) ~ t}, t ~ O. The relation o(x, y) is uniquely determined from R/s by o(x, y) = sup{t: (x, y) E Rt , t

~ O}. The following lemma is immediate.

123

Lemma 1 [10]. The level subgroup G[t] of a fuzzy group (G, fl) is the equivalence class [e]1 in the crisp equivalence relation aI' where a is the similarity relation o(x, y) = fl(xy-I) (or, o(x, y) = fl(x-1y)). Moreover, [Y]I equals the right-coset (resp., left-coset) of y with respect to the subgroup G[t]. 0

Example 2. The similarity relation o(x, y) in Fig. 6 on the commutative group G = {e, b, c, be} is right invariant and it gives the membership function fl(e) = 1.0, fl(b) = 0.3, fl(C) = fl(bc) = 0.2. There are three distinct level subgroups of G as shown in Fig. 6. For t = 1.0, there are 4 right-cosets {e}, {b}, {c}, and {be} of the level-subgroup G[l. 0], which are the equivalence classes of RI.o = {(e, e), (b, b), (c, c), (be, be)}. For t = 0.3, there are 2 right-cosets {e, b} and {e, be} of G[O. 3], which are the equivalence classes of RO.3 = RI.o U {(e, b), (b, e), (c, be), (be, e)}. Finally, for t = 0.2, there is only 1 right-coset {e, b, c, be} = G of G[O. 2], which is the equivalence class of RO.2 = GxG. 0

a(x, y)

e b e

be

e b e be

1.0 0.3 0.2 0.2 G[l. 0] = {e} = G[t] for 1 ~ t > 0.3 0.3 1.0 0.2 0.2 G[0.3] = {e, b} = G[t]for 0.3 ~ t > 0.2 0.2 0.2 1.0 0.3 G[0.2] = G = G[t] for 0.2 > t ~ O. 0.2 0.2 0.3 1.0

Fig. 6. A right invariant similarity relation o(x, y) on a group

G = {e, b, c, be}, where b2 = e = c2 and be = eb

An important consequence of Lemma 1 is that each equivalence class in RI has the same size. As shown in Theorem 6, this is all that one can say about the fuzzy similarity relation o(x, y) on a finite group [10]. In particular, the values taken by o(x, y) themselves do not have any role whatsoever (cf. Theorem 7).

Theorem 6 [10]. Suppose o(x, y) is a fuzzy similarity relation on a finite set G. Then, one can regard G as a fuzzy commutative group with a suitable membership function fl(X) and a suitable group operation on G such that o(x, y) = fl(xy-I) (or, = fl(X-1 y)) if and only if the equivalence classes in the crisp equivalence relation RI = {(x, y): o(x, y) ~ t} have the same size for each t ~ O. 0

Corollary 1 [10]. Given a membership function fl(x) on a finite set G, with the values (0 ::;) tl < t2 < ... < tn (= I), there exists a group operation on G which makes (G, fl) a fuzzy group if and only if IG[t)1 divides IG[tj_tll for 2::; j ::; n. 0

4.1 Representation of a Group Membership Function

An important question that remains to be answered is that to what extent the membership function fl(X) of a fuzzy group (G, fl(X) can represent concrete and

124

realistic properties of the group elements. If we regard each x E G as a permutation IIx on a universe .0. and we try to define fl(X) in a direct way by looking at the properties of IIx' then it is not always easy to satisfy the property (G2). For example, assume that .0. is finite and we let

1¢(x)1 fl(X) = In! ,

where ¢(x) = {w: IIAw) = w}, the fixed points of IIx (11)

In general, we can only say that ¢(xy) ;;;;1 ¢(x)n¢(y) and hence fl(XY) ~ max{O, fl(x) + fl(Y) - I}, which is much weaker than (G2); this definition does satisfy the properties fl(e) = 1 and (G 1).

If we assume that for each x, y E G either ¢(x) contains ¢(y) or vice-versa, then (G2) holds because l¢(xy)1 ~ l¢(x)n¢(y)1 = min{l¢(x)l, l¢(y)I}. We show below that, under some natural assumptions, we can indeed find a representation of the elements of G as permutations on a suitable .0. such that the above subset property of ¢(x)'s holds. Note that the similarity relation o(x, y) associated with fl(x) given by (11) equals I{w: IIAw) = IIy(w)}I/I.o.I, which is the proportion of the elements in .0. where the permutations IIx and IIy agree.

If fl(X) = 1 for some x "* e, then necessarily IIx = IIe = the identity mapping on .0., and thus the association x ~ IIx is not a group isomorphism from G to the group of permutations of.o.. Henceforth, we assume that fl(X) takes at least two distinct values. We say fl(X) satisfies the identity property if fl(X) < 1 for all x "* e.

Theorem 7 [10]. Suppose (G, fl) is a finite fuzzy group with values (0::;;) tl < t2 < ... < tn (= 1 = fl(e», n ~ 2. Then, there exists a representation of G as permutations {IIx: x E G} on a finite universe.o. such that (11) holds if and only if each tj is a rational number and fl(XY) = fl(yx) for all x and y. The IIx's are distinct if fl(X) has the identity property. 0

The assumption in Theorem 7 that each tj is a rational number is not a serious restriction since each irrational number may be approximated arbitrarily closely by a rational number. We point out, however, that the size of the universe .0. may depend on the size of numerators and denominators in the t j 's, and thus a closer approximation to an irrational number t j by a rational number may increase the size of.o.. But this is not of major concern here because the primary role of Theorem 7 lies in the conceptual understanding of the membership values. The basic idea of the proof of Theorem 7 is as follows (see [10] for details). If n = 2

and t\ = plq > 0, then we first form the left-translations II~I)(z) = Lx(z) = xz on a copy of G, and take q - p identical copies of this. Then, on additional p copies of G, we let II~2) to be the identity map for each x. Finally, let IIx consist of the map

pings II~I) and II~), on a total of q copies of G. It is easy to see that the mappings IIx, on .0. = q copies of G, satisfy the theorem.

For the general case, the theorem is proved by induction, and for that purpose we consider the two membership functions on G shown in (12), Each fl;(X)

125

makes G a fuzzy group, with only n -I distinct membership values {tl> t2, "., tn-2, I}, because Gi[t] = {x: Ji.i(X) ~ t} is a subgroup of G for all t ~ 0 and i = I, 2. Also, both Ji., (x) and Ji.2(X) have the commutative property, but only Ji., (x) has the identity property. Consider representations of (G, Ji.i) as permutations TI~) on a finite domain .Q.i such that Ji.i(X) = l(bi(X)III.Q.il, where (bi(x) is the set of fixed points of TI~) in .Q.i; the mappings TI~') are distinct. Now choose two integers N, and N2 such that = N,tn-21.Q.,1 + N2 1.Q.2 I = tn-, (N,I.Q.,I + N21.Q.21). Let.Q. consist of N,

copies of.Q., and N2 copies of .Q.2; copy the mappings TI~) on each copy of .Q.i' and let TIx be TI~l) on the copies of.Q., and TI~2) on the copies of .Q.2. One can show that the mappings TIx satisfy Theorem 7.

{I, for x = e

Ji., (x) = tn-2, if Ji.(x) = tn-, Ji.(x), otherwise

{I, for x = e

and Ji.2(X) = I, if Ji.(x) = tn-, Ji.(x), otherwise

(12)

Example 3. We illustrate the above construction using the fuzzy group G = {e, b, e, d}, where b2 = e = e2 and be = d = eb, with the membership function Ji.{e) = I, Ji.(b) = 3/10, and Ji.(e) = 2110 = Ji.{d). Fig. 7 shows the left-translation mappings LX<z) = xz on G. Figures 8(i)-(ii) show the representations of G based on the membership functions Ji.l (x) and Ji.2(X), respectively. The final representation is obtained by choosing N, = 7 copies of Fig. 8(i) and N2 = 2 copies of Fig. 8(ii). This gives I.Q.I = 7(4x4 + 4) + 2(2x4 + 2) = 160. Since l¢(b)1 = 2xl0 + 7x4 = 48, the proportion of fixed points of TIb equals 48/160 = 0.3 = Ji.(b); similarly, the proportion of fixed-points of both TIc and TId equal 0.2. 0

d

e e

d

Fig. 7. The left-translation mappings LX<z) = xz for the group in Fig. 6; here d = be. The arcs with label x together represent the mapping Lx

e

e,b,e,d

o

126

d

e

d

e,b,e,d e,b,e,d e,b,e,d o o o (i) The representation of G for the membership function

,ul(e) = 1 and ,ul(b) = ,ul(e) = ,ul(d) = 0.2.

e, b e,b

]4 copies of this

e, b, e, d

§ e,b,e,d

§ Jl copy of this

4 copies of this

Jl copy of this

(ii) The representation of G, based on the quotient group G/G[O. 3], and the membership function ,u2(e) = 1 = ,u2(b) and ,u2(e) = 0.2 = ,u2(d).

Fig. 8. Illustration of Theorem 7 for G = {e, b, e, d}, where b2 = e = e2,

be = d = eb, and ,u(e) = 1, ,u(b) = 0.3, and ,u(e) = 0.2 = .u(d)

127

References

1. P.S. Das, "Fuzzy groups and level subgroups", J. Math Anal. App\., 84, 264-269, (1981).

2. J. Freeman, "The modeling of spatial relations", Computer Graphics and Image Processing, 4,156-171, (1975).

3. GJ. Klir and T.A. Folger, Fuzzy sets, uncertainty, and information, Prentice Hall, New Jersey, 1988.

4. W. Kolodziejczyk, "Orlovsky's concept of decision making with fuzzy preference relations - further results", Fuzzy Sets and Systems, 19, 11-20, (1986).

5. B. Kosko, Neural networks and fuzzy systems, Prentice Hall, NJ, 1992.

6. R. Krishnapuram, J.M. Keller, and Y. Ma, "Quantitative analysis of properties and spatial relations of fuzzy image regions", IEEE Trans. of Fuzzy Systems, 1,222-233, (1993).

7. S. Kundu, "Defining the fuzzy spatial relationship Left(A, B)", Proc. Intern. Conf. on Fuzzy Sets and Applications, IFSA-95, Brazil, 1995.

8. S. Kundu, "Min-transitivity of the fuzzy leftness relationship and its application to decision making", Fuzzy Sets and Systems, 86, 357-367, (1997).

9. S. Kundu, "Preference relation on fuzzy utilities based on fuzzy leftness relation on intervals", Fuzzy Sets and Systems, in press.

10. S. Kuridu, "Membership functions for a fuzzy group from similarity relations", Proc. of 2nd Annual Joint Conference on Information Sciences, JCIS-95, North Carolina, Sept. 28-0ct. 1, (1995).

11. K. Miyajima and A. Ralescu, "Spatial organization in 2D segmented images: representation and recognition of primitive spatial relations", Fuzzy Sets and Systems, 65, 225-237, (1994).

12. K. Nakamura, "Preference relations on a set of fuzzy utilities as a basis for decision making", Fuzzy Sets and Systems, 20,147-162, (1986).

13. S.A. Orlovsky, "Decision making with a fuzzy preference relation", Fuzzy Sets and Systems, 1, 155-167, (1978).

14. G. Retz-Schmidt, "Various views on spatial relations", AI Magazine, Summer (1988), pp. 95-105.

15. A. Rosenfeld, "Fuzzy groups", J. Math Anal. Appl., 35, 512-517, (1971).

16. H. Thiele, "On isomorphisms between the DeMorgan algebra of fuzzy tolerance relations and DeMorgan algebra of fuzzy clusterings", Proc. IEEE Intern. Conf. on Fuzzy Systems, IEEE-FUZZ-96, New Orleans, Sept. 8-11, (1996).

17. RR Yager and D.P. Filev, Essentials of fuzzy modeling and control, John-Wiley & Sons, 1994.

New Types of Generalized Operations

Imre J. Rudas l and Okyay Kaynak2

IBlinki Donat Polytechnic, H-1081 Budapest, Nepszinhaz u. 8., Hungary 2Bogazi~i University, Bebek, 8085 Istanbul, Turkey

Abstract. New methods for constructing generalized triangular operators, using a minimum and maximum fuzziness approach are outlined. Based on the entropy of a fuzzy subset, defined by using the equilibrium of the generalized fuzzy complement, the concept of elementary entropy function and its generalizations are introduced. These functions assign a value to each element of a fuzzy subset that characterizes its degree of fuzziness. It is shown that these functions can be used to construct the entropy of a fuzzy subset. Using this mapping, the generalized intersections and unions are defined as mappings, that assign the least and the most fuzzy membership grade to each of the elements of the operators' domain, respectively. Next further classes of new generalized T-operators are introduced, also defined as minimum and maximum entropy operations. It is shown that they are commutative seroigroup operations on [0,1] with identity elements but they are not monotonic. Simulations have been carried out so as to determine the effects of these new operators on the performance of the fuzzy controllers. It is concluded that the performance of the fuzzy controller can be improved by using some sets of generalized T -operations for a class of plants.

Keywords. Fuzzy control, T-operators, fuzzy entropy.

1 Novel Generalized Operations

1.1 Introduction

Since the first application of fuzzy set theory to the control of a dynamic process, reported by Assilian and Mamdani [1], fuzzy controllers have been implemented in many experimental cases and in industrial applications. In these controllers the min and max operations have been widely used for the intersection and union of fuzzy sets. Evaluating the available literature one can detect that theoretical and experimental studies have indicated that some operations may work better than others in some situations [4]. This fact inspired our investigation to find, in a sense, the most 'appropriate' operations that can be used for designing sophisticated fuzzy controllers.

The original fuzzy set theory was formulated in terms of Zadeh's standard operations of intersection, union and complement. Since 1965 for each of these operations several classes of operators, satisfying appropriate axioms, have been


129

introduced. In their book Klir and Folger [6] give a certain type of axiom system for each of the three operations and overview their properties.

By accepting some basic conditions, a broad class of set of operations for union and intersection is formed by triangular operators.The concept of T-norm and the T -conOrm were originally developed by Menger [9] in the theory of probabilistic metric spaces. Since their introduction a great number (of various type) of Toperators has been developed [4], [6].

The concept of elementary entropy function, derived from fuzzy entropy, forms the basis of our investigation. In fuzzy set theory the entropy was introduced by De Luca and Termini [2]. They gave the axioms of entropy and an example of entropy of a fuzzy set in case of finite universal set. Kaufmann [5] showed that an entropy can be obtained as the distance between the fuzzy set and its nearest crisp set. Knopfmacher [7] and Loo [8] introduced a larger class of entropy that contains the entropies proposed by De Luca and Termini and Kaufmann as special cases. Yager [10] defined the entropy of a fuzzy set by distance between the fuzzy set and its complement.

Throughout this paper the following notations will be used; X is the universal set, X F is the class of all fuzzy subsets of X , 9t+ is the set of non negative real

numbers, A is the fuzzy complement of A E X F and IAI is the cardinality of A.

1.2 General Discussion on Generalized Operations

1.2.1 The Axiom System of K1ir and Folger

The definition of the generalized operations using the axiom systems given by Klir and Folger [6] are summarized in the following definition.

Definition 1.2.1.1.

D1. Let I be a mapping I: [0,1lx[O,11~[O.11 (1.2.1.1)

I is a generalized intersection, if and only if for all a, b, c E [0,1 J satisfies the

following axioms: A1.a 1(1,1) = 1, 1(0,1) = 1(1,0) = 1(0,0) = 0 that is I gives the same results as the classical intersection with crisp sets, (boundary conditions), A1.h I(a,b) = I(b,a) that is, I is commutative, A1.c I(a,b) ~ I(a,c), if b < C that is, I is monotonic, A1.d 1(/(a,b),c) = I(a,/(b,c» that is, I is associative.

D1.2. Let U be a mapping U :[0,11 x [0,11 ~ [0,11 (1.2.1.2)

130

U is a generalized union, if and only if for all a, b, c E [0,1] satisfies the

following axioms: Al.2.a U(O,O) = 0, U(O, 1) = U(1,O) = U(I, 1) = 1 that is U gives the same results as the classical union with crisp sets, (boundary conditions), Al.2.b U(a,b) = U(b,a) that is, U is commutative, Al.2.e U(a,b) ~ U(a,c), if b < c that is, U is monotonic, Al.2.d U(U(a,b),c) = U(a,U(b,c» that is, U is associative.

03. Let C be a mapping C: [0,1] ~ [0,1] (1.2.1.3)

C is a complement, if and only'if for all a,b E [0,1] satisfies the following

axioms: A3.a C(O) = 1 and CO) = ° that is C gives the same results as the classical

complement for crisp sets, (boundary conditions), A3.b C(a) ~ C(b), if a < b that is, C is monotonic nonincreasing, A3.e C is a continuous function A3.d C(C(a» = a, that is, Cis involutive.

For our further investigation we recall an important property of the fuzzy complement.

Definition 1.2.1.2. The value ep E [0,1] is said to be an equilibrium of a fuzzy

complement C if

(1.2.1.4)

Theorem 1.2.1.1. [6]. If C is a complement satisfying 03., then C has a unique equilibrium.

1.2.2 T ·Operators

T-operators I, U and C, are the generalization of the conventional fuzzy intersection, fuzzy union and fuzzy complement and are called T-norm, T-conorm and complement, respectively [3].

Definition 1.2.2.1. The definition ofT-operators are almost the same as it is given in Definition 1.2.1.1., except Al.a and A2.a., hence only the difference is given here.

Al.a* l(a,l) = a,

A2.a* U(a,O) = a.

From an algebraic point of view, I, and U are kommutative semigroup opearions on [0,1] with the identity 1 and 0, respectively.

131

1.3 Fuzzy Entropy, Elementary Entropy and Certainty Functions

1.3.1 Fuzzy Entropy

Definition 1.3.1.1. Let X be a universal set and A is a fuzzy subset of X defined as

A = {(X,,u A (X))IX EX, ,u A (x) E [OJ]} The fuzzy entropy is a function

e:XF~9t+

which satisfies the following axioms:

AE 1. e( A) = 0 if A is a crisp set.

AE 2. If A -< B then e( A) $ e( B) ; where A -< B means that A is sharper than

B. AE 3. e( A) assumes the maximum value if and only if A is maximally fuzzy.

AE 4. e( A) = e(ff), \if A E X F •

Let e p be an equilibrium of the complement C and specify AE 2. and AE 3. as

follows: AES 2. A is sharper than B in the following sense:

,u A ( x) $ ,u B ( x) for ,u B ( x) $ e p and

,u A ( x) ;:: ,u B ( x) for ,u B ( x) > e p' for all x EX.

AES 3. A is defined maximally fuzzy when ,u A (x) = e p \if x EX.

1.3.2 Elementary Entropy Function

Definition 1.3.2.1. Let A be a fuzzy subset of X and define the following function

{,uA(X), if ,uA(X) $ ep

q> A : [0,1] ~ [0,1]; q> A : x H ( ()) if () (1.3.2.1) C,uA x, ,uA x >ep

Definition 1.3.2.2. Let A be a fuzzy subset of X and q> A the az defined above.

Denote lP A the fuzzy set generated by q> A

lPA ={(x,q>A(x))lxEX, q>A(X)E[O,l]}

Theorem 1.3.2.1. The g(llP A I) is an entropy of the fuzzy subset of A, where

g:9t ~ 9t is a monotonically increasing real function and g(O) = O.

Proof. We have to prove that g(llP AI) satisfies the axioms of the entropy.

(1) If A is a crisp set then q> A == 0 and g(llP A I) = 0 .

(2) Suppose that A is sharper than B.

132

• If J.lA(X)~J.lB(X)~ep VXEX then q>A(x)~q>B(x)andinconsequence

of the monotony of 14> A I and g we obtain that g(l4> A (x)1) ~ g(l4> B (x)1) .

• If J.l A (x) ~ J.l B (x) > e p V X EX, then

q> A{X) = C(J.lA(X))~ C(J.lB(X)) = q>B(X) andg(l4> A (x)1) ~ g(l4> B(X)I).

(3) If A is maximally fuzzy then q> A E e p and g(l4> A I) takes its maximum value.

() {J.lA(X) = C(J.lA(X)) if J.lA(x) ~ ep' J.lA(X) > ep . . (4) q>A x = (). which IS the

C J.lA(x) =J.lA(X) if J.lA(x»ep, J.lA(x)~ep

definition of q> A (x), VA E X F so g(l4> A (x)1) = g~4>A(X)I) .•

Definition 1.3.2.3. Let A be a fuzzy subset of X. fA is said to be an elementary fuzzy entropy function if the cardinality of the fuzzy set

4> A = {(X,fA (x)~x E X,fA (x) E [OJ]) is an entropy of A.

It is obvious that cP A is an elementary entropy function.

Example 1.3.2.1. Let A be a fuzzy subset of X and if its conventional fuzzy complement, i.e.

if = {(x,J.l A (x) ~J.lA (x) = 1- J.l A (x), X EX}

In this case e p = 0.5 and the elementary entropy function of A is

{J.lA(X)' if J.lA(X) ~ 0.5 IPA: XH .

1- .uAx), If .uA(X) > 0.5 (1.3.2.2)

Let g be the identity function. Then the cardinality of the fuzzy set

4> A = {(x, q> A (X) )Ix EX, q> A (X) E [O,l]}

is an entropy of A. It is easy to verify that this entropy is equivalent to the Hamming-entropy which is generated by the Hamming-distance of A from the nearest crisp set [6]. The nearest crisp set ( C A) to A is defined as

J.lcJx)=O if .uA(X)~O.5 and

.ucJX) = I if .uA(X»O.5 The Hamming-entropy is

e( A) = L l.u A ( X) - .u c. ( x ~ xeX

The concept of elementary entropy function can be generalized as the following theorems show.

133

Theorem 1.3.2.2. The function I (qJ A) is an elementary entropy function if I has

the following properties:

1. I: [O.e p ] ~ [0,1]. 2. 1(0) =0, 3. I is strictly monotonically increasing in [0, e p ] •

Proof. We have to prove that IcP A I satisfies the axioms ofthe entropy.

(1) If A is a crisp set then I ( qJ A ) E 0 and IcP A I = 0 .

(2) Suppose that A is sharper than B . • If .uA(X):::;.uB(x):::;ep, VXEX then because of the monotony of I and

IcP AI we obtain that I(.u A (x)):::; I(.u B(X)), and IcP A (x)l:::; IcP B (x)l· • If.u A (x) ~ .u B ( x) > e p' V X EX, then

qJ A (x) = C(.u A (x)):::; C(.u B (x)) = qJ B (x) hence l(qJ A (x)):::; l(qJ B (x))

and IcP A ( x)1 :::; IcP B ( x )1· (3) If A is maximally luzzy then qJ A E e p hence I ( qJ A) and IcP A I take their

maximum value.

( ( )) { I (.u :d x)) = I ( c(.u A (x))) if .u:d x) :::; e p' .u A (x) > e p (4) I qJ x - which is

A - I ( C(.u A ( x))) = I (.u A (x)) if .u A ( x) > e p' .u A (x) :::; e p

the definition of qJ A (x), VA E X F so IcP A (x)1 :::; IcP B (x )1. •

Example 1.3.2.2. Let be

where p E [1,00) . It is very simple to show that g(lcP A I) gives the entropy

generated by the Minkowski type of distance, where

cPA ={(x,qJ~(X))XEX, qJ~(X)E[O,1]} The Minkowski type entropy is [6]

1

Ip( A)={I,(.uA( x)- .ucA (X)r}'P xeX

Theorem 1.3.2.3. The function h(.u A) is also an elementary entropy function if h has the following properties:

134

1. h: [0,1)-+ [0,1),

2. h(O)=O, h(l)=O ,

3. h is strictly monotonically increasing in [0, e p ] , and monotonically

decreasing in [ e p ,1] , 4. h( e p) is a unique maximum of h.

Proof. According to Definition 1.3.2.1. we have

so

{q>A(X)' if ,uA(x)Sep

,u A : x ~ C( q> A ( x n, if,u A ( x) > e p

if,uA(x)Sep

if,uA(x»ep

It is clear that h is a function of the elementary entropy function so it will be sufficient to show that it has the properties given in Theorem 1.3.2.2.

• It is evident that h: [O,e p ]-+[O,I]. • If ,u A (x) = lor,u A (x) = 0 trenq> A (x) = 0 therefore h(O) = O.

• Based on the definition of h it is obvious that the requirement of monotony in

[O,e p ] holds .•

1.3.3 Elementary Certainty Function

Definition 1.3.3.2. Let A be a fuzzy subset of X and define the following function 1

'A: (0,1]-+[1,00); 'A:X~-(-) (1.3.3.1) fA x

, is said to be the elementary certainty function or the elementary inverse entropy function. The definition is based on the consideration that the certainty of an element is should be in inverse ratio to its entropy.

Example 1.3.3.1. Let A be a fuzzy subset of X and if its conventional fuzzy complement. In this case e p = 05 and the elementary certainty function of A is

L : x ~ {!,,1(;)' l-,uA(x)' if ,uA(X) >05

(1.3.3.2)

135

1.4 Entropy-Based Generalized Operations

1.4.1 Novel Generalized Operations

From the point of view of their application in fuzzy control, the question can be asked: are all of these axioms necessary in a Fuzzy Logic Controller or is it possible to provide the same performance if weaker axioms are used?

From the four axioms the commutative and assocativ properties are natural requirements so the axiom of monotony and the axioms of l(x,I)=I, and U(x,O)=O have been investigated. Novel generalized operations, having the above properties defined by using the elementary entropy function will be introduced in the

following.

Definition 1.4.1.1. Let A and B be two fuzzy subsets of the universe of discourse X and denote lfJ A and lfJ B their elementary entropy functions, respectively. The

minimum fuzziness generalized intersection is defined as

I; = I; (A, B) = {(x,,u I; (x))x EX, ,u I; (x) E [o,I]} , where

j,uA(X1 if lfJA(X) < lfJB(X)

,uf* :x 1-7 ,uB(X), if lfJB(X) < lfJA (x)

'" min(.uA (X),,uB (x)1 if lfJA (x) = lfJB(X)

The geometrical representation of the minimum fuzziness generalized intersection can be seen in Fig. 1.4.4.1.

(1.4.1.1)

Definition 1.4.1.2. Let A and B be two fuzzy subsets of the universe of discourse X and denote lfJ A and lfJ B their elementary entropy functions, respectively. The

maximum fuzziness generalized union, is defined as

U; = U;(A, B) = {(x,,uu; (x))x E X,,uu; (x) E [OJ]}, where

if lfJA(X»lfJB(X) if lfJ B (x) > lfJ A (X)

if lfJA(X)=lfJB(X)

(1.4.1.2)

The geometrical representation of the maximum fuzziness operation can be seen in Fig. 1.4.1.2.

136

0.5

~ .... ~~--------~~ ........ x

Fig. 1.4.1.1. Generalized intersection

J.L

1

0.5

J.Lu. (x)

x

Fig. 1.4.1.2. Generalized union

Lemma 1.4.1.1. If qJ A (X) ~ qJ B (X) for all X€ X then the following relations hold

(I), JlA{X):S JlB(X) so that either JlA(X) and JlB(X):S.!. or JlA(X):S.!. and JlB{X) >.!. 222

or

1 and ,uB(X) ::;-

2

137

Proof. Suppose that CPA (X)::; cP B (X). With Eq.l 3.2.2 we get;

1 • If ,u A (X) and ,u B ( x) ::; - then ,u A ( x) = cP A (x) ::; cP B ( x) = ,u B ( x) .

2 1 1

• If ,u A (x) ::; - and ,u B ( x) > - then ,u A ( x) < ,u B ( x) . 2 2

1 • If ,u A (x) and ,u B ( x) > - then cP A (x) = 1 - ,u A ( x) ::; 1 - ,u B ( x) = cP B ( x) hence

2

,uA(X)~,uB(X) .

• If ,uA(X»! and ,uB(X)::;! then obviously ,uA(X»,uB(X) .• 2 2

Definition 1.4.1.3. Define the sets WI' W2 and W3 as follows:

~ = {XICPA(X) < CPB(X) & ,uA{X) < ,uB(X) or CPA(X) > CPB(X) & ,uA(X) > ,uB(X)}

W2 = { xlcp A (x) < cP B ( x) & ,u A (x) > ,u B ( x) or cP A (x) > cP B ( x) & ,u A (x) < ,u B ( X ) }

W3 ={XICPA(X)=CPB{X)} .

Theorem 1.4.1.1. The membership functions of I; and U; can be expressed in

terms of the conventional min and max operations as follows:

{ min(,uA(x),,uB(x)) if x E ~

,ul. = min(,uA(x),,uB(x)) if XEW3 max(,uA(x),,uB(X)) ifx E W2

{ max(,u A (x),,u B (x)) if x E WI

,uu. = max(,uA{x),,uB(X)) if XEW3 min(,u A (x),,u B (x)) if x E W2

(1.4.1.3)

(1.4.1.4)

Proof. It will be proved that Eq.1.4.1.3. and Eq.1.4.1.4. give the same results as Definition 1.4.1.1. and Definition 1.4.1.2. Denoting by ,u I' and,u u' the operations

• • defined by Eq. 1.4.1.3. and Eq. 1.4. 1.4. we have to prove that

,ul' =,ul and,uu' =,uu . . " " "

138

(1) If qJ A (X) = qJ B{X) then the statement is obvious.

(2) Assume that x E ~ , qJ A (x) < qJ B (x) and J.l A (x) < J.l B (x) . Then according to

Lemma 1.4.1.1. two cases can be distinguished, such as 1

(a) J.lA{X)<J.lB{X)<- or 2

(b) J.lA{X)~.!.' J.lB{X».!. and J.lA{x)<l-J.lB{x), 2 2

In both cases J.l1 = J.lA{X) = J.l 1• and J.lu =J.lB(X)=J.lU" which was to be " , " , proved.

(3) Suppose now that XE~ , qJA(X»qJB(X) and J.lA(X»J.lB(X), Using a

method similar to case 1 it is easy to see that the statements hold.

(4) Assume that XE~ , qJA(X)<qJB(X) and J.lA(X»J.lB(X), According to

Lemma 1.4.1.1. we have 1

(a) J.lA(X) > J.lB{X) >- or 2

1 1 (b) J.lA(X) >"2' J.lB(X) <"2 and I-J.lA(x)<J.lB(x),

Then in both cases J.l1 =J.lA(X)=J.l I • and J.lu =J.lB(X)=J.lU•· " 9''' ,

(5) Suppose now that XE~ , qJA(X»qJB(X) and J.lA(X) <J.lB(X) . By similar

arguments as in case (4) the statements can be obtained easily. •

Theorem 1.4.1.2. I; and U; have the following properties.

I;: B.1.a I;{O,x) = 0 , I;(1,x) = 1 ifV'XE 9t\{0}

B.1.b I; is commutative,

B.1.c I; is associative,

B.1.d I; is monotonic on each place of [0,1] X [0,1]. U; B.2.a U;(O,x) = x, U;(I,x) = x if x;tO

B.2.b U; is commutative,

B.2.c U; is associative,

B.2.d U; is monotonic on the non-overlapping sets

W1,W2 andW3 •

Proof. It is easy to verify that B.1.a and B.2.a hold. As is well known the min and max operations are commutative, associative so according to the construction of I qI and U qI given by Theorem 1.4.1.1. the axioms B.1.b, B.1.c and B.2.b, B.2.c are

satisfied.

139

Regarding the monotony of I; and U; consider Fig. 1.4.1.3-1.4.1.4 which show

the construction of I; and U; on the domain [0,1] x [0,1] .

Fig. 1.4.1.3. The construction of I;

J.l.B

Fig. 1.4.1.4. The construction of U;

140

It is obvious that U; is monotonic on the sets ~,W2 and W3 ' and because of

the well known relation min{a,b}:S; max{a,b} the monotony of I; on each place

of [0,1] x [0,1] is concluded. •

As consequences of the theorem we have: • I; serve as counter-example of the equivalence of the axiom systems of Klir

and Folger and the system of T -operators. • I; is a commutative semigroup operation on [0,1] with no identity element.

• U; is a commutative semigroup operation on [0,1] .•

1.4.2 Modified Generalized Operations

As a modification of these operators the following not monotonous operations are defined, which are commutative semi group operations with identity elements.

Definition 1.4.2.1. Let A and B be two fuzzy subsets of the universe of discourse X and denote II. andlB their elementary entropy functions, respectively. The

membership function of the minimum entropy generalized intersection, denoted by I ;in = I ;in (A, B) , is defined as

,uA{X}, ,uti"': x H ,uB{X},

min(,u A (x },,uB{ X}),

if IA{X} < IB{X} if IB{X} < IA{X} if IA{X}IB{X} = 0 or

IA{X} = IB{X}

(1.4.2.1)

Example 1.4.2.1. Let II. = qJ A and I B = qJ B be two elementary entropy functions

defined by the conventional fuzzy complement (see Example 1.3.2.1.). In this case the geometrical representation of the minimum fuzziness T-norm can be seen in Fig. 1.4.2. 1.

Definition 1.4.2.2. Let A and B be two fuzzy subsets of the universe of discourse X and denote I A and I B their elementary entropy functions, respectively. The

membership function of the maximum entropy generalized union, denoted by

U ;"x = U f ( A , B ) , is defined as

0.5

,uA{X), ,uU"w,: x ~ ,uB{X),

f

141

max(,u A (X).,uB{X))

if fA (x) > fB{X) if fB{X) > fA (x) if fA{X)fB{X) = 0 or

fA{X) = fB{X)

Fig. 1.4.2.1. The membership function of I:n

(1.4.2.2)

Example 1.4.2.2. Let fA = qJ A and fB = qJ B be two elementary entropy functions

defined by the conventional fuzzy complement (see Example 1.3.2.1.) In this case the geometrical representation ofthe maximum fuzziness T-conorm can be seen in Fig. 1.4.2.2.

Definition 1.4.2.3. Define the sets WI' W2 and W3 as follows:

~ = {XIJA{X) < fB{X) & ,uA{X) < ,uB{X) orfA{x) > fB{X) & ,uA{X) > ,uB{X)}

Wz = {X\fA{X) < fB{X) & ,uA{X) > ,uB{X) orfA{x) > fB{X) & ,uA{X) < ,uB{X)}

W3 = {X\fA (x) = fB{X) or fA (X)fB{X) = o} .

Lemma 1.4.2.1. If fA (cp A (X)) S; f B (qJ B (x)). x E X then the followings hold;

1, ,u A (X) S; ,u B (X) so that ,u A (x).,u B (X) S; e p or ,u A (X) S; e p and,u B (x) > e p

or

2, ,u A (X) ~ ,u B (X) so that ,u A (x).,u B (X) > e p or ,u A (X) > e p and,u B (x) S; e p •

142

Proof: Suppose that fA(CI'A{X))::;;fB(CI'B{X)), xe X.

• If ,u A (x), ,uB{X)::;; ep then ,u A (x) = CI' A (X) and ,uB{X) = CI' B{X) , so because of

the monotony off we obtain ,uA{X)::;; ,uB{X), • If ,uA{x)::;;ep and ,uB{x»ep then certainly ,uA{X)<,uB{X), • If ,uA{X) and ,uB{x»ep then Cl'A{X)=C(,uA{X)) andC(,uB(x))=CI'B(x)

hence as a consequence of the monotony off we have ,u A (X) ~ ,u B (X) . • If ,u A (X) > e p and ,u B ( x) ::;; e p then obviously ,u A (X) > ,u B ( X). •

Jl

1

,uU' (x) •

0.5

...... ~ __ L--L ________ ~~ __ ~ .. -. X

Fig. 1.4.2.2. The membership function of U;ax

Theorem 1.4.2.1 The membership functions of 17n and U;Ux can be expressed in

terms of the membership functions of the conventional min and max T-operations as follows:

{ min(,u A (X),,uB{X)) if x e WI

,uTi'" = min(,uA(x),,uB(x)) if xeW3 max(,u A (x),,u B(X)) if x e W2

{max(,u A (x),,u B (x)) if x e W.

,uUi' = max(,u A (x),,u B(X)) if x e W3 min(,u A (x),,u B (x)) if x e W2

(1.4.2.3)

(1.4.2.4)

143

Proof. It will be proved that Eq.1.4.2.3. and Eq.1.4.2.4. give the same results as Definition 1.4.2.1. and Definition 1.4.2.2., respectively. Denoting by I; and V; the operations defined by Eq.1.4.2.3. and Eq.1.4.2.4. we have to prove that I • I min d V· Vmax ,=, an ,=,.

(I) If qJ A (x) = 0 or qJ B(X) = 0 then the statement obvious.

(2) Assume that x E WI , fA (qJ A (x)) S fB (qJ B (x)) and ,u A (x) < ,uB(X). Then

according to Lemma 1.4.2.1. two cases can be distinguished such as

a, ,uA(X)<,uB(x)<ep or

b, ,u A (x) S e p' ,u B (x) > e p and ,u A (x) = qJ A (x) S C(,u B (x)) = qJ B (x) . In both cases I, = J.l A (x) = I; and V, =,u B(X) = V; which was to be

proved.

(3) Suppose now that XE~, fA(qJA(x))>fB(C'fJB(x))and ,uA(X»,uB(X). Using the method similar to case 2 it is easy to see that the statements hold.

(4) Assume that XE~, fA(C'fJA(x)}SfB(C'fJB(x))and ,uA(X»,uB(X). According to Lemma 1.4.2.1. we have

a, ,uA(X»,uB(x»ep or

b, Jl A (x) > e p' Jl B ex) < e p and C(,u A (x)) = C'fJ A (x) < ,u B ( X ) = C'fJ B (x) . Then in both cases I, = J.l A (x) = I; and V, =,u B ( x) = V; and this

proves the statements.

(5) Suppose now that XE~, fA(C'fJA(x)}>fB(C'fJB(x))and ,uA(X) <,uB(X). By similar arguments as in case (4) the statements can be obtained easily .

• Theorem 1.4.1.2. It and Vj have the following properties.

I,;,n: C.I.a It(O,x)=O, It(l,x)=x C.I.h I,;,n is commutative,

C.I.c I,;,n is associative,

C.I.d It is monotonic on each place of (0,1) X (0,1) . Vj C.2.a Vj(O,x)=x, Vj(I,x)=1

C.2.h V';''' is commutative,

C.2.c V j is associative,

C.2.d V j is monotonic on the non-overlapping sets

~,W2andW3·

144

Proof. The proof can be carried out analogously to the proof of the Theorem 1.4.1.2 .•

As consequences of the theorem we have:

• I;:" is a commutative semi group operation on [0,1] with the identity

element 1. • U;:" is a commutative semigroup operation on [0,1].

1.4.3 Construction of Non Monotone Operators

It was shown that the new generalized operators can be constructed by using the conventional max and min operations. These constructions gave the idea of the definition of the following operators. It will be seen that these class of operators satisfy only weaker axioms ofT-operators as defined below.

Definition 1.4.3.1. The operators satisfying the axioms of T -operators given by Definition 1.2.2.1 with the exception of monotony, i.e.

A.l.d** I is monotonic on (0,1) x (0,1) A.2.d** U is monotonic on (0,1) x (0,1)

are said to be W-operators.

Definition 1.4.3.2. Let A and B be two fuzzy subsets of the universe of discourse X and denote }.L A and}.L B their membership functions, respectively. Denote T andT* a T-norm and T-conorm, respectively. The W-norm is defined as

lw =lw(A,B)={(x,}.Llw(x))lxE X, }.LI)X)E[O,l]},

( ) _ {T(uA (X),}.LB(X)) if }.LA (X)}.LB(X) = ° or }.LA (x) = lor }.LB(X) = 1 }.Llw x - T*(uA(X),}.LB(X)) otherwise

(1.4.3.1)

Definition 1.4.3.3. Let A and B be two fuzzy subsets of the universe of discourse X and denote }.L A and}.L B their membership functions, respectively. Denote T and T* a T-norm and T-conorm, respectively. The W-conorm is then defined as

Uw =Uw(A,B)={(x,}.Luw(x))IXE X, }.Luw (x) E[O,ll},

(1.4.3.2)

145

Definition 1.4.3.4. Define the set W as follows:

W = { xl,u A ( x),u B ( x) = 0 or,u A (x) = 1 or ,u B ( x) = I}

Theorem 1.4.3.1. I w and U w are W-operators.

Proof. I w and Uw are T -operations on the sets W, and W = [0,1] X [0,1] - W so

both of the axioms are satisfied except monotony, but both of them are monotonic on (0,1) x (0,1) .•

W-operators constructed from the conventional min and max T-operators are given in the following.

Definition 1.4.3.5. Let A and B be two fuzzy subsets of the universe of discourse X and denote ,u A and,u B their membership functions, respectively. The W-norm

is defined as I w = Iw(A, B) = {(x,,u lw (x)}lx EX, ,u lw (x) E [0,1]},

( ) _ {min(.uA (x l,uB (x)) if,uA (x ),uB(X) = 0 or,uA (x) = lor ,uB(X) = 1 ,u1w x - max(.uA (X),,uB (x)) otherwise

(1.4.3.3)

Definition 1.4.3.6. Let A and B be two fuzzy subsets of the universe of discourse X and denote ,u A and,u B their membership functions, respectively. The

generalized union is then defined as U = U ( A, B) = {(x,,u u (x) )Ix EX} ,

In a similar manner new entropy based generalized intersection and union are defined.

Definition 1.4.3.7. Let A and B be two fuzzy subsets of the universe of discourse X and denote fA andfB their elementary entropy functions, respectively. The

membership function of the maximum entropy generalized intersection, denoted by I ;IIIX = I ;IIIX ( A , B) , is defined as

if fA (X)fB (x) = Oor fA (x) = fB{X) otherwise

(1.4.3.5)

146

Definition 1.4.3.8. Let A and B be two fuzzy subsets of the universe of discourse X and denote fA and f B their elementary entropy functions, respectively. The

membership function of the minimum entropy generalized union. denoted by U ;' ill = U t n ( A • B ) , is defined as

if fA (X)fB(X) = o orfA(X) = fB(X) otherwise

(1.4.3.6)

Theorem 1.4.3.2. l;ax and u;in have the following properties.

l;ax : D.1.a l;"X(O,x)=O, l;"X(l,x) = x

D.1.b lmax tp is commutative.

D.1.c lmax tp is associative.

D.1.d l;ax is monotonic on the non-overlapping sets

W1,W2 andW3 •

Umin tp D.2.a u;un(o,x)=x, U;in(l,x)=1

D.2.b U min tp is commutative.

D.2.c U min tp is associative.

D.2.d U min '"

is monotonic on each place of (0,1) x (0.1) .

Proof. The proof can be carried out analogously to the proof of the' Theorem 1.4.1.2 .•

As consequences of the theorem we have: • l;"x is a commutative semigroup operation on [0,1]. with the identity

element 1.

• u;un is a commutative semigroup operation on [0,1].

Definition 1.4.3.9 Let A and B be two fuzzy subsets of the universe of discourse X and denote fA and fB their elementary entropy functions, respectively. The

minimum fuzziness generalized T-norm is defined as

Tfmin = Tfmin (A, B) = {(x. tL7j'" (x»)x EX. tL7j'" (x) E [OJ]}.

jtLA(X} if fA (x) < fB(X) tLTmin : x ~ tLB(xl if fB(X) < fA (x) (1.4.3.7)

f T(}JA (X}tLB (x)} if fA (X)fB(X) = Oor fA (x) = fs(x)

147

where T(A,B) is an arbitrary T-norm.

Definition 1.4.3.10. Let A and B be two fuzzy subsets of the universe of discourse X and denote fA and fB their elementary entropy functions,

respectively. The maximum fuzziness generalized T-conorm is defined as

S;"'X = S;"'x (A, B) = {(x,,u S,W, (x))x EX ,,u Si' (x) E [OJ]},

j,uA(X), if fA {x} > fB{X} ,usmax :XH ,uB{X}, if fB{X} > fA {x}

f S(.uA {X},,uB{X}} if fA {X}fB{X} = Oor fA {x} = fB{X}

(1.4.3.8)

where S(A,B) is the T-conorm associated to T.

Definition 1.4.3.11. Let A and B be two fuzzy subsets of the universe of discourse X and denote fA andfB their elementary entropy functions,

respectively. The maximum fuzziness generalized T-norm is defined as

TflnOX = Tfmax (A, B) = {(X,,u1jM' (x})x EX, ,uTr'" (x) E [0,1]},

{min(.uA {x1,uB {X}} if fA {X}fB {x} = Oor fA {x} = fB{X}

,uTjX :xH ,usrax(.uA(X},,uB(X}} otherwise

(1.4.3.9)

Definition 1.4.3.12. Let A and B be two fuzzy subsets of the universe of discourse X and denote fA and fB their elementary entropy functions,

respectively. The minimum fuzziness generalized T-conorm is defined as

st = st( A, B} = {(x,,u Sji" (x})x EX ,,u sf"" (x) E [OJ]},

{maX(.uA (X1,uB (X}) if fA (X}fB (x) = Oor fA {x} = fB{X}

,usmin :xH ,uTmin(.uA(X},,uB(X}} otherwise f f

(1.4.3.10)

1.4.4 Minimally and Maximally Certain Operators

By using the concept of elementary certainty function equivalent definitions of the maximal and minimum entropy generalized operations can be given.

Definition 1.4.4.1. Let A and B be two fuzzy subsets of the universe of discourse X and denote SA and S B their elementary certainty functions, respectively. The

membership function of the minimally certain generalized intersection denoted by

I t n = I t n ( A , B) , is defined as

148

if 'A(X)< 'B(X) if 'B (x) < 'A (X) if fA (X)fB(X) = Oor'A (X) = 'B(X)

(1.4.4.1)

Definition 1.4.4.2. Let A and B be two fuzzy subsets of the universe of discourse X and denote 'A and 'B their elementary certainty functions, respectively. The membership function of the maximally certain generalized union, denoted by U t'X = u tax (A, B ) , is defined as

JlUmux :x ~ JlB(x1 if 'A (X) > 'B (X) if 'B(X) > 'A (x) {

JlA(X1

, max(.uA (X1JlB (x)1 if fA (X)fB (x) = Oor'A(x)= 'B(X) (1.4.4.2)

Theorem 1.4.4.1. I rn and V,X are equivalent with the operations If and

V'tn respectively.

Proot: Based on the definitions of the operators the statements are obvious. •

Definition 1.4.4.3. Let A and B be two fuzzy subsets of the universe of discourse X and denote 'A and, B their elementary certainty functions, respectively. The

membership function of the maximally certain generalized intersection, denoted by I" ax = I" ax ( A . B) • is defined as

{min(,uA(x1JlB(X)) if fA (X)fB(X) = o or'A(X) = 'B(X)

JlI, : x ~ JlUmoxx (,uA (X1/lB(X)) otherwise , (1.4.4.3)

Definition 1.4~4.4. Let A and B be two fuzzy subsets of the universe of discourse X and denote 'A and, B their elementary certainty functions, respectively. The

membership function of the minimally certain generalized union, denoted by

U t n = V t n ( A , B) , is defined as

{max(.uA (X1JlB(X)) if fA (X)fB(X) = Oor'A (X) = 'B(X)

Jlurn :x ~ Jlrlinmin, (,uA (x1 JlB (x)) otherwise , (1.4.4.4)

Theorem 1.4.4.2. I, and Vr are equivalent with the operations I'tn and Vf respectively.

Proot: Based on the definitions of the operators the statements are obvious. •

149

2 Design of Fuzzy Logic Controllers

Simulations have been carried out so as to determine the effects of the new generalized operations on the performance of the fuzzy controllers. Gupta and Qi [4] used some typical T-operators in a simple fuzzy logic controller which is based on their proposed fuzzy control algorithm and the effects of these Toperators on the controller's performance were studied. Firstly, in order to compare our results, their SISO (single input single output) controller was utilized.

2.1 Simulation Environment in Case of Single Input Single Output Systems

The architecture of the used simple fuzzy controller can be seen in Fig.2.I .I.

'------------------1 Measurement ,,_-.....

Fig. 2.1.1. The architecture of the controller

The actual error · and the actual change in error at a sampling instant were calculated as follows

e(n} = SE· [x(n} - y(n}]

L1e(n} = SDE[y(n} - y(n -I}]

(2.1.1)

(2.1.2)

where x(n) = const. is the reference input, y(n) is the output at the nih sampling and SE, SDE are scaling factors. The values were chosen as x(n) = 3, SE = 0.5, SDE = 0.5.

2.1.1 The Inference Method

Fuzzy control systems are essentially mappings between inputs and outputs of a fuzzy controller. Such mappings are given by rules of the form

150

, IF Aj AND Bj THEN C;' in which Aj and B; are fuzzy subsets defined on

universal set E of inputs (error) and AE of inputs (change in error), respectively, while Cj is a fuzzy subset defined on the universe t1V of change of outputs. These

conditional statements are represented by the implication function

(2.1.3)

Each of these rules are combined by the ELSE connective to yield an overall fuzzy relation R.

In Mamdani type conventional fuzzy logic controllers the AND and ELSE connectives are realized by Zadeh's min and max operations, respectively. The aims of our inv~stigation were; to compare the performance of the fuzzy controllers using different T -operators. Denoting generalized intersection and

generalized union by 1(·) and V(·), respectively, the following generalized

Mamdani's implication functions was used

(2.1.4)

The overall fuzzy relation R is

PRi (e,t1e,u) = if (PRi (e,t1e,u)) j=)

(2.1.5)

Now if a particular time instant the actual error, the actual change in error and the actual controller output take on values from E', AE' and V', respectively, then the generalized form of the compositional rule of inference is given by

(2.1.6)

If i = j then each Ii works together with its own Vi, while in case of i '* j , mixed operators are used. Following Gupta and Qi the purpose of this method is to investigate how each couple affects the performance of the fuzzy controller.

2.1.2 Fuzzification

Seven fuzzy sets were used to represent the control variables; negative error (NE), positive error (PE), negative change in error (N.6E), positive change in error (P.6E), negative change in input (N.6U), zero change in input (UU) and positive change in input (P.6U). Their membership functions are given in Figure 2.1.2.

151

NE PE NLlli PLffi N~U UU P~U

-L o L e -L o L Je -L o L u

Fig. 2.1.2. The membership functions of the fuzzy sets

2.1.3 Rule base

The rule base was selected as follows:

• If e is negative and ae is negative then au is negative

• If e is negative and ae is positive then au is zero

• If e is positive and ae is negative then au is zero

• If e is positive and ae is positive then au is positive.

2.1.4 Defuzzification

For defuzzification, the method of 'Centre of Gravity' was chosen so the process input .1u(n) is calculated as

LJ.l./Ju'( n)x.1u .1u( n) =~==--

LJ.l./Ju'( .1u)

where J.l./Ju'(.1u) is the actual change in in the process input.

2.2 Simulation Results

(2.1.7)

On the basis of the work of Gupta and Qi from the possible set of T-operations only four typical were chosen as listed in the first four rows of Table 2.2.1. For the simulation three different plants were used:

y' + y = u,

y" + y' + y2 = u,

y" + y' + Iny = u

152

Table 2.2.1. Generalized operations used in simulation

No Generalized Intersection I Generalized Union U Negation

1 min{u,v) max{u,v) l-u

2 U'v u+v-uv l-u 3 max{u + v -1,0) min{u+v,l) l-u

4 uv u+v-2uv l-u u+v-uv l-uv

5 1mb! U nuvt l-u 9' 9'

6 ImtIX U min l-u 9' 9'

The results of the simulation can be seen in Figs 2.2.1-2.2.3. Each curve in a figure represents the response of the plant that uses the indicated generalized intersection and generalized union. The results were compared by using the following two condition;

• find the first I for which Iy(i) - x{i)1 < e , for all i > I, (from this point on it is

referred to as the "duration of the transient behaviour"),

• find the total sum of the square of the errors H = L (y(i) - X{i))2 , (from i

this point on it will be referred to as the "integrated quadratic error").

The generalized operations defined by Definition 1.4.1.1. and Definition 1.4.1.2. did not provide stabile control so the simulation results using these operators are omitted.

3

2 (1.1)

(4,4)

(2,2)

(5,5)

(6,6)

n

100 2

Fig. 2.2.1. Outputs of second order nonlinear plant, y 1/ + y' + lny = u

153

3

2 (5,5)

(2,2)

(6,6)

(1,1)

(4,4)

n

I 0 2

Fig. 2.2.2. Outputs of first order linear plant, y' + y = u

3

2 (5,5)

(2,2)

(6,6)

(1,1) (4,4)

n

100 200

Fig. 2.2.3. Outputs of second order nonlinear plant, y" + y' + y2 = U

Table 2.2.2-2.2.4 contain the simulation results. In each of the cells the first number is I, and the second one is H. During the simulation the following restrictions were used:

a) in each case considered the same individual membership functions also identical to that of the paper by Gupta and Qi were used;

b) the same linguistic rules were applied here;

c) in each case the same positive limit E = 0.01 was used for defining the "duration o/the transient phase", I;

d) the difference between the examples considered exclusively consists in the difference between the appropriate norms applied, and in that of the systems' dynamics.

154

Table 2.2.2. Values of I and H in case of y" + y' + lny = u

10, 21, 10, 10, 10, 9, II 8.16 9.78 8.16 9.3 9.48 2.62 13, 15, 13, 18, 18, 18, 12

8.04 9.54 8.04 8.88 8.88 8.94 12, 22, 12, 11, 10, 10, 14 8.1 9.54 8.1 9.12 9.18 9.24 32, 37, 41, 46, 49, 49, 15

13.86 15.36 14.4 15.54 15.36 15.18 32, 37, 41, 46, 49, 49, 16

13.86 15.36 14.4 15.54 15.36 15.18 149, 149, 149, 149, 149, 149, 13 810 810 810 810 810 810 U5 U6 Ul U4 U2 U3

Table 2.2.3. Values of I and H in case of y' + Y = u

29, 24, 29, 48, 50, 52, 12 18.66 15.12 18.66 30.6 31.74 32.94 51, 52, 51, 87, 94, 100, 14

32.64 32.82 32.64 57.78 60.0 64.2 49, 64, 71, 80, 87, 93, 15

31.14 40.74 45.54 51.12 55.74 56.64 49, 64, 71, 80, 87, 93, 16

31.14 40.74 45.54 51.52 55.74 56.64 65, 77, 65, 110, 124, 179, II

40.74 48.84 40.74 70.2 78.6 113.4 149, 149, 149, 149, 149, 149, 13 564 564 564 564 564 564 U5 U6 Ul U4 U2 U3

Table 2.2.4. Values of I and H in case of y" + y' + y2 = u

43, 35, 43, 73, 76, 79, 12 18.6 15.18 18.6 30.54 31.68 32.94 78, 80, 78, 136, 146, 149, 14

32.52 32.94 32.52 55.44 59.22 63 73, 96, 107, 121, 132, 142, 15

31.14 40.68 43.36 50.7 55.08 58.98 73, 96, 107, 121, 132, 142, 16

31.14 40.68 43.36 50.7 55.08 58.98 100, 119, 100, 149, 149, 149, II 40.8 48.66 40.8 68.4 75.6 103.2 149, 149, 149, 149, 149, 149, 13

214.8 214.8 214.8 214.8 214.8 214.8 U5 U6 Ul U4 U2 U3

155

As it can well be seen in Tables 2.2.2-2.2.4, the use of the norm 13 was found to be generally "disadvantageous". The pairs pertaining to the "optimal" solution are summarized in Table 2.2.5.

Table 2.2.5. The optimal solutions found

y"+y'+lny=u y'+ y = u y"+y'+/=u

Optimal pair Cross (II, US), (12, U6) (12, U6), with coupling 1= 10 1=24 1=35

respect to 1 Regular (II, Ul) (IS, US) (IS, US) cou2ling 1= 10 1=49 1=73

Optimal pair Cross (12, US), (12, U6) (12, U6) with coupling H=8.04 H= 15.12 H = 15.18

respect to H Regular (II, Ul) (IS, US) (IS, US) couEling H=8.16 H=31.14 H=31.14

On the basis of the results of the computations it can be stated, that "optimal" pairing of the operators cannot be independent of the criteria set for describing the optimum conditions. It also depends on the dynamics of the particular system to be controlled. It is also clear, that in several important cases the new entropybased operators also appear between the optimal pairs. Generally its is also reasonable to check the possibility for using them especially before initiating the mass production of a certain product.

156

References

[1] Assilian, S., Mamdani,.E.: An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller. Int. Journal Man-Machine Stud. 7. 1-13 (1974).

[2] De Luca, A., Termini, S.: A definition on nonprobabilistic entropy in the setting of fuzzy theory. Inform. and Control 20, 1972, 301-312 (1972)

[3] Gupta, M.M., Qi, J.: Theory of T-norms and fuzzy inference. Fuzzy sets and systems 40, 431-450. North-Holland., (1991)

[4] Gupta, M.M., Qi, J: Design of fuzzy logic controllers based on generalized Toperators. Fuzzy sets and systems 40. 473-489. North-Holland, (1991)

[5] Kaufmann, A.: Introduction to the Theory of Fuzzy Subsets. Academic Press, New York., 1975

[6] Klir, GJ., Folger, T.A.: Fuzzy sets, Uncertainty, and Information, PrenticeHall International Editions, 1988

[7] Knopfmacher, J.: On measures of fuzziness, J. Math. Analysis and Applications. 49, 529-534, (1975)

[8] Loo, S.G.: Measures offuzziness. Cybernetica, 20, 201-210, (1977)

[9] Menger, K.: Statistical Metrics. Proc. Nat. Acad. Sci., 28, 535-537, (1942)

[10] Yager, R.R.: On the measure of fuzziness and negation. Part I: membership in unit interval. Int. J. General Systems, 8, 169-180, (1982)

Intelligent Fuzzy System Modeling

1. Burhan Ttirk!jen

Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario M5S 308, Canada [email protected]

Abstract. A systems modeling is proposed with a unification of fuzzy methodologies. Three knowledge representation schemas are presented together with the corresponding approximate reasoning methods. Unsupervised learning of fuzzy sets and rules is reviewed with recent developments in fuzzy cluster analysis techniques. The resultant fuzzy sets are determined with a Euclidian distancebased similarity view of membership functions. Finally, an intelligent fuzzy system model development is proposed with proper learning in order to adapt to an actual system performance output. In this approach, connectives are not chosen a priori but learned with an iterative training depending on a given data set.

1. Introduction

In this paper, intelligent system model developments are discussed with a unification of fuzzy knowledge representation and inference. In particular, a fuzzy system is a set of rules that maps inputs to outputs. The set of rules defines a many-to-many map F: X ~ Y. The set consists of linguistic "IF .... THEN" rules in the form of "IF X isr A THEN Y isr B", where X, Y are input and output base variables, respectively, of a system, A and B are corresponding fuzzy sets that contain the elements xEX and y EY to some degree (Zadeh 1965), i.e.,

A = {(x, IlA (x))1 xEX, IlA (X)E [O,l]) and B = {(y, IlB (y))1 yEY,IlB (y)E [O,l])

and "isr" is a short hand notation that may stand for "is contained in", "is a member of', "is compatible with", "belongs to", "is a possible member of ", etc., depending on the context (Zadeh 1996) where 11 is a membership function. Membership functions are interpreted with various views. A brief summary of these views is included in the sequel.

Fuzzy rules represent our approximate knowledge extracted from a systems input-output analyses and identified by combinations of membership functions in the input-output space of Xx Y. They define information granules (Zadeh 1996) or subsets of the state space. Less certain more imprecise rules specify large information granules. More certain and more precise rules define smaller


158

information granules. In practice, however, we do not know the exact shape of the information granules, i.e., the membership functions or the structure of the rules that connect the inputs to outputs of a system. Hence, we ask an expert or implement supervised or unsupervised learning techniques to find and/or modify fuzzy information granules as close as we can for a given system with the available expertise and/or sample data. The best learning schemes ideally provide a good coverage of the optimal information granules of a given systems hiddeninherent rules.

Learning schemes may be either unsupervised clustering techniques (Bezdek 1981; Kandel 1982; Sugeno and Yasukawa 1993; Nakanishi et al. 1993; Emani et al. 1996) or supervised learning with respect to target optimality criterion (Emami et al. 1996). Unsupervised learning methods are faster but they produce a first level of approximation in knowledge representation. Usually a good sample data set is satisfactory for this purpose. Supervised learning methods require more knowledge of a system and/or a large sample data set in order to compute a more accurate representation and often requires orders of magnitude more training cycles.

Fuzzy rule bases have a natural explanatory power embeded within their structure and semantics, i.e., in their meaning representation via fuzzy set membership functions. Fuzzy systems do not admit proof of concepts in traditional senses.

In this paper, we discuss supervised learning of intelligent fuzzy system models where the parameters of the model are learned with training in order to minimize a performance measure, such as the model response error.

1.1 Fuzzy Model Systems

Here, we review briefly two essential components of a fuzzy system model which are: (i) a knowledge base consisting of a set of linguistic rules and (ii) an inference engine that executes an approximate reasoning algorithm.

A linguistic rule "IF X isr A THEN Y isr B" of a fuzzy knowledge base may be interpreted as an information granule depending on our knowledge of a system's characteristics in at least three different ways as follows:

a) Conjunctive Reasoning Perspective: It defines a Cartesian product AxB. This is a standard conjunctive model (Zadeh 1973, 1975) that is used most often in fuzzy control systems. At times, this is known as Mamdani's (1974) heuristic. b) Disjunctive Reasoning Perspective: It defines an implication between A and B, i.e., A ~ B, depending on whether it is a strong implication S or a residual implication R.

The interpretations (a) and (b) are myopic views of fuzzy theory known as Type I fuzzy theory. It is based on the reductionist view that the combination of two Type I fuzzy sets produce a Type I fuzzy set; i.e., 11 A: X ~ [0,1], IlB: Y ~ [0,1] such that for each interpretation stated above we have:

11 AxB : Xx Y ~ [0, 1] for case a) above, and 11 A .... B : Xx Y ~ [0,1] for case b) above.

159

c) Type II Fuzzy Set Perspective: It is based on the realization that the combination of two Type I fuzzy sets produce a fuzzy set of Type II fuzzy set, (Tiirk!jen, 1986, 1995, 1996); i.e., Il A: X~[0,1], IlB: Y~[0,1] such that

Il A .... B : Xx Y ~ P[O, 1] where P[O, 1] is the power set of [0,1]. Now from the perspective of approximate reasoning, each of the three

interpretations stated above provide a different solution when faced with a new system observation A'(x). When a new sytem state is observed, say A'(x), with these three interpretations of knowledge representation, three different consequences may be obtained when we are interested in providing a decision support for a system diagnosis or prediction or control. These are:

a) Il~ (y) = V ll'A (x)TIl A (x)TIlB (y), Y E Y xeX

b) 1l~(Y) = V Il~ (X)T[IlCNF(A .... B) (x, y)], y E Y xeX

c) 1l~(Y)E[V 1l~(X)T[IlFDNF(A .... B)(X'y)], V 1l~(X)T[IlFCNF(A-->B)(X'y)]],yEY xeX xeX

where Il~ (y), V y E Y is a consequence of an inference result that may be obtained for each of the three cases of approximate reasoning for a given rule: "IF x E X isr A, THEN Y E Y isr B" and a given observation A'(x) where the rule is interpreted in accordance with one of the three perspectives stated above, V is the maximum operator and T is a t-norm to be selected from an infinitely many

possibilities of t-norms introduced by Schweitzer and Sklar (1983) where Il~ (x),

xEX, is the observed membership value IlA(X), IlB(y) E[O,I], xEX, yEY are membership values of input, and output variables, respectively. In addition Il A (x) T Il B (y) is the interpolation heuristic based approximate knowledge

representation, whereas IlCNF(A .... B) (x,y) is the membership value of A ~ B, which

is a direct and common use of the Boolean Conjunctive normal form, CNF, of implication. Hence, approximate reasoning with cases a) and b) are the result of the myopic view of knowledge representation in fuzzy set and logic theory (Tiirk!jen, 1986, 1995).

But approximate reasoning with case c) is based on non-reductionist interpretation as stated above. It is worthwhile to emphasize that IlFDNF(A .... B) (x,y) and IlFCNF(A .... B) (x,y) are the boundaries of Type II 'Interval-Valued' fuzzy set representation of our knowledge. It should be recalled that Fuzzy Disjunctive Normal Form, FDNF, of A ~ B and Fuzzy Conjunctive Normal Form, FCNF, of A ~ B represent bounds on the representation of membership values of A ~ B. This is because FDNF(.):t FCNF(.) for all the combined concepts in fuzzy theory whereas DNF(. )=CNF(. ) in two-valued theory (Tiirk!jen 1986, 1995).

Since most of the current fuzzy expert systems are constructed either with case a) or b) representation and inference, in Section 3 of this paper, we present a unique knowledge representation and inference approach that unifies a) and b) interpretations of fuzzy knowledge representation and inference. Type II fuzzy

160

knowledge representation and inference is not discussed in this paper for it would make the presentation rather lengthy (see Tiirk§en 1995, 1996).

1.2 Membership Functions

Since Zadeh (1965) introduced fuzzy sets the main difficulties have been with the meaning and measurement of membership functions as well as their extraction, modification and adaptation to dynamically changing conditions.

1.2.1 Meaning of Membership

Particularly, lack of a consensus on the meaning of membership functions has created some confusion. This confusion is neither bizarre nor unsound. However, this cloud of confusion has already been disfused with a rigorous semantics and practical elicitation methods for membership functions (Bilgic and Tiirk§en 1996).

In general, there are various interpretation as to where fuzziness might arise from. Depending on the interpretation of fuzziness one subscribes to, the meaning attached to the membership function changes. It is the objective of this sub-section to review very briefly the various interpretations of membership functions.

We first start with the formal (i.e., mathematical) definition of a membership function. A fuzzy (sub)set, say P, has a membership function J.lF, defined as a function from a well defined universe (the referential set), X, into the unit interval as: J.lF: X ~ [0,1] (Zadeh 1965).

Thus, the vague predicate "temperature (x=35°C) is high (H)" for a summer day is represented by a number in the unit interval, J.lH (x) E [0,1]. There are several

possible answers to the question "What does it mean to say J.lH (x) = 0.7?":

likelihood view: 70% of a given population declares that the temperature value of 35°C is high. random set view: 70% of a given population describes "high" to be in an interval containing the given temperature value, 35°C. similarity view: the given temperature value, 35°C, is away from the prototypical temperature value, say 45°C, which is consider truly "high" to the degree 0.3 (a normalized distance). utility view: 0.7 is the utility of asserting that the given temperature value, 35°C, is high measurement view: When compared to other temperatures, the given temperature value, 35°C, is higher than some and this fact can be encoded as 0.7 on some scale.

It is important to realize that each of these interpretations are hypotheses about where fuzziness arise from and each interpretation suggests a calculus for manipulating membership functions. The calculus of fuzzy sets as described by Zadeh (1965, 1973) and his followers are sometimes appropriate as the calculus of fuzziness but sometimes inappropriate depending on the interpretation.

161

1.2.2 Grade of Membership

When someone is introduced to the fuzzy set theory the concept of the grade of membership sounds fairly intuitive since this is just an extension of a well known concept. That is one extends the notion of membership in a set to "a grade of membership" in a set. However, this extension is quite a bit more demanding: "How can a grade of membership be measured?" This question has been considered in the context of many-valued sets by many people from different disciplines.

Although more than two thousand years ago, Aristotle commented on an "indeterminate membership value", the interest in formal aspects of many-valued sets has started in early 1900's (McCall and Ajdukiewicz 1967; Rosser and Turquette 1977). But the meaning of multiple membership values has not been explained to satisfaction. For some, this is sufficient to discard many-valued sets all together (Kneale 1962; French 1984). On the other hand, the intellectual curiosity never let go of the subject (Scott 1976).

Part of the confusion arise from the fact that in two valued theory, both the membership assignments, i.e., {O, I}, to a set and the truth value assignments {F,T} to a proposition are taken to be the same without a loss of generality. But this creates a problem in infinite-valued theory. As it is explained recently (Ttirk~en, 1996), the graded membership assignments in a fuzzy set must be separated and distinguished from the truthood assignments given to a fuzzy proposition.

In general and in particular, anyone who is to use fuzzy sets must answer the following three questions: (i) What does graded membership mean? (ii) How is it measured? (iii) What operations are meaningful to perform on it?

To answer the first question one has to subscribe to a certain view of fuzziness. Mainly there has been two trends in the interpretations of fuzziness: those who think that fuzziness is subjective as opposed to objective and those who think that fuzziness stems from individual's use of words in a language as opposed to a group of peoples use of words (or individual sensor readings versus a group of sensor readings, etc.).

Both the likelihood and the random set views of the membership function implicitly assume that there are more than one evaluator or experiments are repeated. Therefore, if one thinks of membership functions as "meaning representation", they come close to the claim that "meaning is essentially objective" and fuzziness arises from inconsistency or errors in measurement. On the other hand, during the initial phases of the development of fuzzy sets , it has been widely accepted that membership functions are subjective and context dependent (Zadeh 1965, 1975). The similarity and utility views of the membership function differ from the others in their espousing a subjective interpretation. The measurement view is applicable to both the subjective and objective views in the sense that the problem can be defined in both ways depending on the observer(s) who is (are) making the comparison. The comparisons can be results of subjective evaluations or results of "precise" (or idealized) measurements (Bilgic and Ttirk~en 1996).

162

2. Fuzzy Clustering

An intuitive approach to objective rule generation is based upon fuzzy clustering of input-output data. One simple and applicable idea, especially for systems with large number of input variables, was suggested by Sugeno and Yasukawa (1993) and further discussed in Nakanishi et al. (1993) and modified by Emami et al. (1996). In these approaches, we first cluster only the output space which can be always considered as a single-dimensional space. The fuzzy partition of the input space is specified at the next step by generating the projection of the output clusters into each input variable space, separately. Using this method, the rule generation step is separated from the input selection step.

The idea of fuzzy clustering is to divide the output data into fuzzy clusters which overlap with each other. Therefore, the containment of each data to each cluster is defined by a membership grade in [0,1]. In formal words, clustering

unlabeled data X= {x" X 2'"'' X N } C R h , where N is the number of data vectors

and h is the dimension of each data vector, is the assignment of c number of cluster labels to the vectors in X. c-Clusters of X are sets with (c,N) membership

values {U ik } that can be conveniently arranged as a (cxN) matrix U= [U ik 1. The problem of fuzzy clustering is to find the optimum membership matrix U for fuzzy clustering in X. The most widely used objective function is the weighted withingroups sum of squared errors function J m which is define as the following constrained optimization problem (Bezdek 1981; Kandel 1982):

(1)

where 0 ~ u ik ~ 1, V'i, k; V'k,3i, such that u ik > 0 ; N c

0< L,U ik <N, V'i; L,U ik =1, V'k; k=l i=1

v= {v" V 2'''' v c} is the vector of (unknown) cluster centers, and IlxiiA = ~ X T Ax is any inner product norm. A is a hxh positive definite matrix which specifies the shape of the cluster. The common selection for the matrix A is the identity matrix, leading to the definition of Euclidean distance, and consequently to a spherical cluster. There are, however, investigations where the matrix A is taken as the covariance matrix which generates models with elliptic clusters (Kosko 1996). It should be noted that equation (1) fit into the similarity view of membership.

In current literature most fuzzy clustering studies are carried out by the Fuzzy C-Means (FCM) algorithm through an iterative optimization of (1) (see Appendix 1).

But the FCM clustering algorithm suffers from three major difficulties which are usually treated through heuristics for a specific problem at hand: 1) Number of clusters (c) should be assigned a priori. 2) No theoretical basis for an optimal choice of weighting exponent "m" has been established to date.

163

3) Conditions specified in FCM algorithm lead to the determination of local extrema of Jm•

Different choices of initial Vo might lead to different local extrema. Therefore,

the knowledge of proper initial location of cluster centers should be available a priori.

In order to complete the systematic methodology of fuzzy system identification and modeling, it is necessary to find generic solutions to the above problems. For this purpose we propose a theoretical basis in order to get a better handle on the first two problems stated above. It is experimentally validated that the proposed approach produces a more efficient strategy than other solutions, as explained next.

2.1 Modifications of FCM Algorithm

In an attempt to derive a generic criterion for assignment of number of clusters from a theoretical point of view, we first introduce generalization of scatter criteria for the purpose of expressing the compactness and separation of the hard clusters (Duda and Hart 1973; Bezdek et al. 1980) in the following manner [Emami et al. 1996): Fuzzy within-cluster scatter matrix:

(2)

Fuzzy between-cluster scatter matrix:

SB = !(~I(Uik)m )<Vi - ;)(vi _ ;)T (3)

where both the fuzzy total mean vector, v, and each cluster centre, Vi' are

weighted with the membership values, of U ik ' and which are defined as:

(4)

It should be noted that in Sugeno and Yasukawa Model (1993) v is not a weighted mean of data.

Secondly, it is known that for "m" in the range of (1,00), the larger "m" generate "fuzzier" the fuzzy sets, i.e., there are large overlaps among fuzzy sets.

It was found (Emami et al. 1996) that the trace of ST = Sw + SB decreases

monotionically from a constant value K to zero, as "m" varies from one to infinity. K depends only on the data set

164

(5)

Therefore, we suggest a principle for a "good" clustering of a data set as follows: "a suitable value of'm' should be the one that gives a value of ST around Kl2."

A suitable value of "m" should be determined experimentally with a case data. Thirdly, in order to efficiently obtain a preference for initial locations of cluster

prototypes, one should implement an Agglomerative Hierarchical Clustering (AHC) algorithm as an introductory procedure to find a suitable guess for the initial locations of cluster prototypes for the FCM algorithm. For example, Ward's method (1963) (see Appendix II) could be used with unlabeled data X={ xp x2 , ... ,X N } or any other agglomerative hierarchical clustering algorithm (Kaufman and Rousseeuw 1990).

The results of the AHC algorithm is c hard clusters for the data, which is a good start for the fuzzy clustering procedure. With this method, we can choose the initial prototypes without any knowledge of the data a priori. This approach is much more efficient than random searches among different initial guesses.

It is clear that with the values of "m" and "c" that would be determined with the proposed modification of the FCM, one would apply the FCM algorithm (see Appendix 1), in order to find the fuzzy clusters.

2.2 Input Selection

The phase of input selection in fuzzy system identification is to find the most dominant input variables among a finite number of input candidates. A combinatorial approach is proposed by Sugeno and Yasukawa (1993) and implemented by Nakanishi et al. (1993) in which all combinations of input candidates are considered and the combinations of input variables which minimize a specific regularity criterion are selected. The main drawback of this approach is that for a system with a large number of input candidates, a large number of combinations must be considered.

One can solve this problem of the input selection in a simpler and more efficient fashion. In the proposed method, the membership functions are constructed for each input candidate through fuzzy clustering in such a. way that, those points which have output membership grades close or equal to one in each cluster are assigned the same input membership grades. Consequently, convex input membership functions are directly derived for all input candidates. As a result, those input variables which do not have a dominant effect on the output will have convex membership grades equal to one all across their domain. Since, "one" is the neutral element for t-norm operators, the in-effective input candidates with membership grades of "one" in their entire range can be canceled from the fuzzy rules.

165

3. Intelligent Fuzzy Model

Zadeh's proposal (1975) of linguistic approach as the model of human thinking introduced the "fuzziness" into systems theory. Fuzzy systems models are the outcome of such a linguistic approach. This idea was further developed by Tong (1979), Pedrycz, (1984), Sugeno and Yasukawa (1993), and Yager and Filev (1994).

Current literature on modeling and reasoning with fuzzy systems are essentially concentrated on Multi-Input Single-Output (MISO) systems. For these reason, we will consider in this section a fuzzy rule base that contains MISO rules as follows:

Rl: IF XI isr All AND X 2 isr AI2 AND ... AND Xr isr AIr THEN Y isr BI

R2:IF Xlisr A21 AND X2 isr A22 AND ... AND Xr isr A 2r THENYisrB2

Rn: IF XI isr AnI AND X2 isr An2 AND ... AND Xr isr Anr THEN Y isr Bn

where XI, ... , Xr are input variables, and Y is the output variable, A jj , i=l, ... ,n,

j=I, ... ,r, and B j , i=I, ... ,n, are fuzzy sets of the universes of discourse XI,

X2 , ... , Xr and Y, respectively.

Next to the generation of a fuzzy rule base, the second essential component of fuzzy modeling is the reasoning mechanism. In current methods of fuzzy modeling, the connectives of the inference mechanism are selected a priori before the identification procedure without any theoretical basis. In order to improve the objectivity of fuzzy modeling one needs to introduce an intelligent parameterized formulation to the reasoning process in fuzzy systems. For this purpose, four reasoning parameters, p, q, a. and ~, are introduced whose values will cause a continuous range of variation in the reasoning mechanism. Therefore, we are no longer restricted to stay at the extremes in any step of the reasoning process. Consequently, rather than selecting the connectives a priori for fuzzy modeling, the fuzzy model would adjust the reasoning process by adjusting these parameters and hence the selection of its connectives according to input-output data. A fast algorithm for the calculation of the parameterized family of triangular functions is introduced for this purpose.

3.1 Approximate Reasoning

In approximate reasoning we have essentially two methods: 1. First Infer Then Aggregate (FITA), and 2. First Aggregate Then Infer (FAT!) (Tiirk§en and Tian 1993)

In the FITA method, we need to: a) determine fuzzy aggregation of antecedents in each rule (AND connective)

166

b) determine implication relation for each individual rule (IF-THEN connective); c) make inference with each rule of the set of rules, using the observed input d) aggregate the fuzzy outputs obtained in (c) above to get the final fuzzy output (ALSO connective) e) defuzzify of the final fuzzy output whereas in the FAT! method, the step (c) and (d) are modified as follows: c') aggregate all the rules of the rule base to determine the overall rule (ALSO connective) d') make inference with the overall rule using the observed input

It can be shown that the two methods of inference with a rule set, i.e., FAT! and PITA always give the same fuzzy output for a singleton input. (Emami et al. 1996). Moreover, we propose a reasoning formulation that combines Mamdani's and formal logical approximate reasoning, i.e., two myopic reasoning approaches. This then forms the foundation of a intelligent parameterized fuzzy reasoning formulation. It should be recalled that in the introduction section we have identified Mamdani's reasoning as the case a) and formal logical reasoning as the case b). Both of these are in Type I theory and hence are myopic.

3.2 Crisp Connectives of Fuzzy Theory

In order to cover various types of set operators used in fuzzy set theory, some parameterized families of t-norms (T) and t-conorms (S) have been suggested among which parametric Schweizer and Skalar's operators (1983) are chosen for our modeling:

T(a,b)= l-[(l-aY +(l-bY -{I-aY (l-b)pf-;:;

S(a,b) = [aP + bP - aPb P f-;: ,p>O (6)

The main features of this parameterized family of t-norms and co-norms are that they are analytically simple and symmetric and when p changes continuously as a positive real number, this family covers all t-norms and t-conorms.

Analogous to the classical set theory, De Morgan laws establish a link between union and intersection via complementation. The conjoint t-norms and t-conorms introduced in (6) are duals when considered with the standard negation c(a)=l- a (Ruan and Kerre 1993).

Although t-norm and t-conorm functions were defined as binary operators on [0,1], their associativity property allows them to be extended to n-ary operations as:

Tn : [O,l]n --+ [0,1], Sn: [O,l]n --+ [0,1], and

T(a1, a 2 , ... ,an) = T(Tn-1 (aI' a2 , ... ,an_1),an),

167

where, T( ... ) and S( ... ) are the binary operators. It is proved that the n-ary

operators Tn and Sn satisfy similar properties as the original binary T and S (Ruan and Kerre 1993; Emami et al. 1996).

The extension of these operators to n arguments can be computed with a fast algorithm (see Appendix III) that has computational complexity of O(n) (Emami et al. 1996).

That is, the original formulation, i.e., [ Ix s(ap a2, .. ·,an)= taP - ttaPa~ + tttaPa~a~ -+ .. ±ITaP

i=1 I i=1 i=1 I J i=1 i=1 k=1 I J i=1 I

;oi ;Oik"j k"i

(7)

is transferred into a fast computational algorithm as follows:

s (ap a2, ... ,an) =

[ar + (1.,- ar)[ a~ + (1- a~ {. .. [ a~_2 + (1- a~_2)[a~_1 + (1- a~_I)a~ n .. J]]X (8)

where for both of the formulations T(.) is computed as:

It should be noted that equation (7) has an exponential complexity, i.e., 0(2 n ),

whereas the modified equation (8) has a complexity of O(n).

3.3 Implication and Aggregation

Within the theory of approximate reasoning, each fuzzy rule is in the form:

Ri - IF XI isr Ail AND ... AND X, isr Ai, THEN Y isr Bi (10)

where Ri is a fuzzy relation defined on the Cartesian product universe of XI x X2

X ... x X,x Y.

168

In agreement with the discussion presented in Section 3.1, t-norm operators are used to define conjunctions in the antecedent of the multi-input rule. Furthermore, modeling of implication relation is not unique in fuzzy logics. Two extreme paradigms for forming the implication relation are conjunctive and disjunctive methods which are identified as cases a) and b) in the introduction section. Under

conjunctive method, the fuzzy relation R j is simply the conjunction of antecedent and consequent spaces. Therefore:

where, R cj (XI, x2 •...• Xr ). Aij{x j). and Bj (y) are the membership grades of

R ej • A jj • and Bj • T is t-norm operator (with parameter p) for the rule

implication. and T' is t-norm operator (with parameter q) for the antecedent aggregation. On the other side, disjunctive method is obtained directly by generalizing the material implication defined in classical set theory as

A -7 B == A u B where A is the standard negation of A. Therefore. we have:

Rdj (XI. X2 •...• Xr .y)=S(c(T'( Aij (Xl)' Aj2 (X2 ) ... , A jr (Xr », B j (y»

=S(S'«I- AjJ(X1».(1- A j2 (X2», ...• (1- Ajr(X r ))). Bj(y» (12)

where. Sand S' are t-conorm operators with parameters p and q. respectively. Selection of the rule aggregation function depends on the selection of

implication function for individual rules (Tiirk!jlen and Tian 1993). For conjunctive method (equation 11). the aggretation of the rules should be with a Maximum opertor. In other words, the ALSO connective should be Maximum operator. Let x=( Xl. X2 •...• Xr ), then more compactly. we have:

Rc (x,y)= V Rej (X,y) 1

(13)

This method of aggregation was introduced by Mamdani (1974) which originates with Zadeh's conjunctive reasoning (1973). On the other hand, if each

basic proposition is regarded as ,,[X isr Aij]u [Y isr Bj] ", which is the disjunctive

method, then the knowledge "( X ~ Y) isr R" should be aggregated with the Minimum operator. In other words, the ALSO connective is a Minimum operator. In its compact form, we have

Rd (x,y)=/\ Rdj (x,y) i

We call this method formal logical myopic approximate reasoning.

(14)

169

3.4 Inference with a Rule Set

Considering Single-Input Single-Output (SISO) system, given the relationship " (X ~ Y) isr R" and the information that X belongs to a fuzzy set A' is observed, then the problem of inference is finding the fuzzy value for Y. According to Zadeh's Compositional Rule of Inference (CRI) (1973), the membership function of the output is defined as:

F(y) = V x [T"(A'(x), R(x,y»] (15)

in which v x means the maximum for all values of x E X.

It is known that two different methods of reasoning, Le., FITA or FATI, in a rule base R which lead to two different reasoning results (Ttirk~en and Lucas, 1991) because t-norm and conorm operators are not distributive in general.

Whether it is executed with Mamdani's Approximate Reasoning, Le.:

(16)

or with Formal Logical Myopic Approximate Reasoning, Le.:

(17)

For this reason, it is necessary first to combine all the rules to get the relation R of the set of rules and then compose the combined aggregate rule with the fuzzy input A'(x). This method, called FAT!, is quite inefficient in terms of computational time and memory (Ttirk~en and Tian 1993).

However, in most applications of fuzzy modeling and control the input X * is crisp and then, the input fuzzy set A'(x*) is a fuzzy singleton. In this case, it can be shown that the distributivity property holds for any type of t-norm and tconorm family (Emami et al. 1996) (see Appendix N). Hence, it is possible to fire each single rule first and calculate individual fuzzy reasoning output F i and finally aggregate fuzzy output of all the rules to obtain the inferred fuzzy output F(y). This method, called FITA, is computationally more efficient than FATI but equivalent to it.

For multi-input single-output systems, the antecedent of each rule i is a conjunction of r fuzzy sets Ail, Ai2' ... , Air. For crisp input x* = txp x2,···,x r J, with a given rule i:

Ri -IF Xl isr Ail AND ... AND Xr isr Air THEN Y isr Bi (18)

we need to compute:

170

(19)

* 't'j is called the Degree Of Firing (DOF) of rule i. Since X= x is a singleton, i.e.,

A'( x *) = 1, it can be shown that the fuzzy output membership function may be computed by two different methods, i.e., FIT A and FA TI, in both of the reasoning mechanisms as:

FC<y)=YVRic{x',y)=VVRic{x',y) = YRiC{x',y) I x X I 1

FD(y) =~ VRid{X', y)= V ARid{X', y) = ARid{X', y) 1 X X I 1

(20)

(21)

With these preliminaries, the intelligent reasoning mechanism is introduced as a linear combination of the two extremes, i.e., the myopic Mamdani and formal logical methods:

(22)

3.4.1 Defuzzification

Yager and Filev (1994) suggest a general defuzzification method, based on the probabilistic nature of the selection process among the values of a fuzzy set, called Basic Defuzzification Distribution (BADD) method:

* JY' IX * JY' IX JY' IX YB = yE (y)dy as YB = yE (y)dy / E (y)dy ~ ~ ~

(23)

As it is well known, BADD method is essentially a family of defuzzification methods parameterized by parameter n. By varying n continuously in the real interval, it is possible to have more appropriate mappings from the fuzzy set to the crisp value depending on the system behavior.

4. Case Study

A recent case study is briefly discussed in this section to demonstrate the effectiveness of the intelligent knowledge representation and inference model presented in the previous sections of this paper.

Real life process data was provided by an industrial company. It consists of 93 input-output data vectors. Each data vector has 46 input variables and one output variable.

171

First, the modified clustering method discussed in Section 2.1 and identified as ETG method is applied with the data. After proper analyses of the cluster validity index as a function of c within the reliable domain, it is found that m=2.5 with c =7 are suitable values for the industrial data. Next, the most important input variables that affect the output significantly are determined among all of the 46 input variables. It is found that variables x4, x9, xlO, XlI, x12, x17, x18, x20, x21, x28, x34, X44, x45 have a significant effect on the output variable which is the throughput efficiency measure of the process. It is found that there are seven effective rules, rule 0 - rule 6, that control the hidden behaviour of this industrial process.

The intelligent fuzzy inference model of this industrial system is developed next. The combined Mamdani and Logical myopic model with parameters, p=1.6190, q=0.8438, a. =8.2935 and ~ =0.001 produces the best fit for this

industrial process with the mean error equal to 0.0886 for the test data. The comparison of the combined model output and the actual system output are shown in Fig. 1.

100 ............ '", ............ -, ............ , ............ -,_ ............ r ............................. ; ............ -toO" ......... .

J "' I I • I • I I • I I •

I I • I • I

• • I I

90 -------;---, I

• II

........ ~ ........ .... -:- ............ ~ ........ ..... : .............. ;. ............ -:- ............ : I I I. •

• , I I. I

: : I: I I

70

...... ~ ............ -:- ........ n" ~ .. • I' I' ; '; I' __ • ___ L __ ~ ___ _

I : I 'l \11

80 - -.. -,--- 1--: II • I • II

:::--h: 60

outP~~ 40

- -f --t I. . .............. .... ........ - .... • .......... r .. ................. -.

! . . 30 ---.---: ------:-- .. .... -: .............. ;. ............................. ~ ............ ~ ............... :

: .: ~ I I I • I

• I. I

20 ............ ~ ............ ~ ........ ~ ......... ~ ......... ~ .......... : ........... -~ ..... - .... ~ ........... : • • I • • I • • I · . . . . . . . . • I • "' ., I

10~---·~--~·~--~·~--~·----~·----~--~·----~·--~· o ~ ~ 00 00 100 1~ 1~ 100 100

Number of Data

Fig. 1. Comparison of the combined Mamdani and myopic logical fuzzy model (dashed line) and real data (solid line) of industrial process

172

5. Conclusions

In this paper, we have proposed and discussed a fuzzy system development schema. For this purpose, we have identified three knowledge representation and approximate reasoning approaches. For the Type I fuzzy theory, we have described the extraction of fuzzy sets and fuzzy rules with the application of an improved fuzzy clustering technique which is essentially an unsupervised learning of the fuzzy sets and rules from a given input-output data set. Finally, we have introduced an intelligent (fuzzy) approximate reasoning formulation for fuzzy modeling and control. For this purpose, approximate reasoning parameters are introduced as p, q, a. and P in reasoning formulation. In particular, we have two common but myopic knowledge representation and approximate reasoning methods that are integrated in our proposed schema.

With our experimental evidence, we are cdnfident that our proposed fuzzy modeling and reasoning schema is one of the "best" in Type I fuzzy theory. It is believed that the current developments in Type II fuzzy theory and its applications will provide much more powerful results in real life applications. However, these development are left for further research.

Appendix I: FCM-Algorithm

Step 1: CHOOSE the number of clusters (c), the weighting exponent (m), the iteration limit (iter), the termination criterion (e >0), the norm for J m ,

Ilxk - villA' the norm for error, liVe - VI_III·

Step 2: GUESS the initial position of cluster centers: Vo = {v 1.0 ' V 2.0' V c.o} c R ch .

Step 3: ITERATE FOR t = 1 to iter;

CALCULATE , 'Vi,

CALCULATE , 'Vi, k .

IF error = IIVI - VI_III ::s; e , THEN stop, and put (U f' V f) = (U t, Vt ) NEXT t

173

Appendix II: An Agglomerative Hierarchical Clustering Algorithm

Given unlabed sample data X = {Xl>X2 ""XN }

Step 1: CHOOSE number of clusters (c); the matrix of dissimilarities D=[dij ] are

computed with the following Euclidean-based distance:

d .. =d(X.,X.)= 2NiN j Ilv h· -vh·11 ' IJ 1 J N.+N. 1 l

1 J

where V hi and V hj are mean vectors of hard clusters Xi and Xi'

respectively.

Step 2: LOOP FOR t = N to c; and Xi N =X i , i=I,2, .. ,N

FIND the pair of distinct clusters which have the minimum dij, say Xi,l

and Xj,l;

MERGE Xi,l and X i,l ;

DELETE Xj,l NEXT t

Appendix III: A Fast Computation Algorithm for n-Element tconorm

Step 1: COMPUTE af , a~, ... , a~ as 11), 112' ... , respectively.

Step 2: S = 1'1. Step 3: LOOP i FROM n-l to I STEP -I;

S= 11i + (1-11i)XS

NEXTi

Step 4: S = St.:

Appendix IV: Equivalence of FIT A and FATI Methods

FIT A == FA TI-Mamdani-Conjunctive Method of Reasoning

Ric (x,y)=T(A i (X),Bi(Y))

where Ai (x) =T' (Ai) (x) ), ... ,Ai,(x,))

174

FiC(Y) = V T" [A'{x'), T(Ai (X),Bi (y))]. If A'{x') =1, then we have: x

= VT(Ai{x'),Bi(y))= V RiC(x',y) = RiC(x',y), x x

since x=x* with A'(x*) = 1.

Fc(Y) = VV Ric(x',y) ~ FITA 1 x

= Y RiC(X*,y) 1

Fe (y) = VV RiC(X' ,y) ~ FATI x 1

= V RiC(X* ,y) 1

:. FITA=FATI

FITA == FATI-LogicaJ (Myopic)-Disjunctive Method of Reasoning

Rid (x,y) = S(Ai{X),Bi(y))

where Ai{X) = T(Ail(X), ... Ai'{X.))

F;d{Y) = V Til [A'(X'),S(Ai(X),Bi(y))] , if A'(x')=1 x

= V S(Ai(X*),Bi(Y))= V Rid (x*,y) = Rid (X',y), x x

since x=x* with A'(x*) = 1.

Fo{Y) =AVRid(X',y) I x

~FITA

= I-VV Rid(X',y) = I-V Rid(X*,y) I x I

Fo(Y) = V A Rid (x* ,y) ~ FATI x I

= I-VV Rid(X',y) == I-Y Rid(X*'Y) 1 x I

:. FITA=FATI

175

References

Alefeld, G., and Herzberger, 1 (1983), Introduction of Interval Computations, Academic Press, New York.

Bezdek, lC., Windham, M.P., and Ehrlich, R (1980), Statistical parameters of cluster validity functionals, International Journal of Comput. Inf. Sci. 9, 4, 324-336.

Bezdek, le. (1981), Pattern recognition with fuzzy objective function algorithms, Plenum Press, New York.

Bilgic, T., and Tiirk~en, LB. (1996), Measurement of membership function theoretical and empirical work, Handbook of Fuzzy Systems, Vol. I, Foundations (to appear)

Duda, RO., Hart, P.E. (1973), Pattern Classification and Scene Analysis, Wiley, New York.

Emami, M.R., Tiirk~en, LB., Goldenberg, A.A. (1996), An improved fuzzy modeling algorithm Part I: Inference Mechanism, Proceedings of NAFIPS, 96, Berkeley, CA, 289-293.

Emami, M.R, Tiirk~en, LB., Goldenberg, A.A.(1996), An improved fuzzy modeling, Part II: Systems Identification, Proceedings of NAFIPS 96, Berkeley, CA, 294-298.

French, S. (1984), Fuzzy decision analysis: some criticisms, in: H.J. Zimmerman, L.A. Zadeh and B. Gaines (eds.) Fuzzy Sets and Decision Analysis, North Holland, 29-44.

Ishibuchi, H., Morioka, K., and Tiirk~en, LB. (1995), Learning by fuzzified neural networks, IJ of Approximate Reasoning, 13,4,327-358.

Kandel, A. (1982) Fuzzy Techniques in Pattern Recognition, Wiley, New York. Kaufman, L. and Rousseeuw, PJ., (1990), Finding Groups in Data, Wiley, New York. Kaufmann, A., and Gupta, M.M. (1985), Introduction to Fuzzy Arithmetic, Van Nostrand

Reinhold, New York. Keller, lM., Gray, M.R., Givens, lA. (1985), A fuzzy k-nearest algorithm, IEEE Trans.

Systems, Man, and Cybernetics, Vol. SMC-15, No.4, 580-585. Kneale, W.C. (1962), The Development of Logic, Clarendon Press, Oxford, England. Kosko, B. (1997), Fuzzy Engineering, Prentice Hall, Englewood Cliffs, New Jersey. Mamdani, E. H. (1974), Application of fuzzy algorithms for control of simple dynamic

plant, Proc.IEEE, 121, 1585-1588. McCall, S. and Aydukiewica, K. (1967), Polish Logic: Papers by Ajdukiewicz, et al. 1920-

1939 Clarendon Press, Oxford, England. Nakanishi, H., Tiirk~en, LB., and Sugeno, M. (1993), A review of six reasoning methods,

Fuzzy Sets and Systems, 57, 3, 257-294. Pal, N.R, Bezdek, J.e. (1995), On cluster validity for the fuzzy c-means model, submitted

to IEEE Trans. Fuzzy Systems.

Pedrycz, W. (1984), Identification in fuzzy systems, IEEE Trans. Systems, Man, and Cybernetics, No. 14,361-366.

Rosser, T.B. and Turquette, A.R. (1977), Many valued logics, Westport, Greenwood Press, Connecticut.

Ruan, D., Kerre, E.E. (1993), On the extension of the compositional rule of inference, International Journal ofintelligent Systems, 8, 807-817.

Rumelhart D.E., McClelland, lL., and the PDP Research Group (1986), Parallel distributed Processing (Vol. I ), MIT Press, Cambridge, MA.

Schweizer, B., Sklar, A. (1983), Probabilistic Metric Spaces, North-Holland, Amsterdam.

176

Scott, D. (1976), Does many-valued logics have any use? in: S. Komer (ed.), Philosophy of Logic, Southampton, Camelot Press, UK, Chapter 2, 64-95.

Sugeno, M., Yasukawa, T. (1993), A fuzzy-logic-based approach to qualitative modeling, IEEE Trans. Fuzzy Systems, I; No. 1,7-31.

Tong, R.M. (1979), The construction and evaluation of fuzzy models; Gupta M.M., Regade, R.K., Yager, R.R., editors.: Advances in Fuzzy Set Theory and Application, North-Holland, Amsterdam.

Trojan, G.J., Kiszka, J.B., Gupta, M.M., Nikiforuk, P.N. (1987), Solution of multi variable fuzzy equations, Fuzzy Sets and Systems, No. 22,271-279.

TIirk~n, LB. (1986), Interval-valued fuzzy sets based on normal forms, Fuzzy Sets and Systems, 20, 2, 191-210.

Tilrk§en, I.B., Tian Y. (1993), Combination of rules and their consequences in fuzzy expert systems, Fuzzy Sets and Systems, No. 58, 3-40.

TIirk~n, I.B. and Lucas, C. (1991), A pattern matching inference method and its comparison with known inference methods, Proceedings of IFSA '91, July 7-12, Brussels, 231-234.

TIirk~en, LB. (1995), Type I and 'Interval-valued Type II fuzzy sets and logics, in: P.P. Wang, Advances in Fuzzy Theory and Technology, Vol. 3, 31-82, Bookright Press Raleight, NC.

TIirk~en, LB. (1996), Fuzzy truth tables and normal forms, Proceedings of BOFL '96, December 15-18,1996, TIT, Nagatsuta, Yokohama, Japan (to appear).

Ward, lH. (1963), Hierarchical grouping to optimize an objective function", J. American Statistics Association, No. 58, 236-244.

Yager, R.R., Filev, D.P. (1994), Essentials of Fuzzy Modeling and Control, Wiley, New York.

Zadeh, L.A. (1965), Fuzzy sets, Information and Control, 8, 338-353. Zadeh, L.A. (1973), Outline of a new approach to the analysis of complex systems and

decision processes, IEEE Trans. Systems, Man, and Cybernetics, SMC-3, 28-44. Zadeh, L.A. (1975), The concept of a linguistic variable and its application to approximate

reasoning I, II and III, Information Science 8,199-249,301-357 and 9, 43-80. Zadeh, L.A. (1996), Fuzzy Logic-Computing with Words, IEEE Trans. Fuzzy Systems, 4,

2,103-111.

Fuzzy Inference Systems: A Critical Review

Vladimir Cherkassky

Department of Electrical Engineering, University of Minnesota, Minneapolis, Minnesota 55455, USA [email protected]

Abstract. Fuzzy inference systems represent an important part of fuzzy logic. In most practical applications (Le., control) such systems perform crisp nonlinear mapping, which is specified in the form of fuzzy rules encoding expert or common-sense knowledge about the problem at hand. This paper shows an equivalence between fuzzy system representation and more traditional (mathematical) forms of function parameterization commonly used in statistics and neural nets. This connection between fuzzy and mathematical representations of a function is crucial for understanding advantages and limitations of fuzzy inference systems. In particular, the main advantages are interpretation capability and the ease of encoding a priori knowledge, whereas the main limitation is the lack of learning capabilities. Finally, we outline several major approaches for learning (estimation) of fuzzy rules from the training data.

1. Introduction

Fuzzy logic (or fuzzy systems) is a broad field originally motivated (by L. Zadeh in 1965) by the desire to sidestep the rigidity of the traditional Boolean logic, in which any statement is either true or false. In contrast, fuzzy logic allows degrees of truthfulness that measure to what extent a given object is included in afu'ZZY set. Fuzzy sets correspond to linguistic variables used in a human language. Hence, fuzzy methods are very appealing for encoding a priori (expert) knowledge in various applications.

The field of fuzzy logic is rather controversial, partly because of its terminology (confusing to outsiders) and sometimes over-rated claims. The term fuzzy logic is actually used in two different senses (Zadeh 1996). In a narrow sense, fuzzy logic can be viewed as an extension of multi valued logic and a formalism for 'approximate' reasoning. In a wider sense, fuzzy logic is used to denote fuzzy set theory, describing sets with vague (unsharp) boundaries. Since all mathematical disciplines are based on the notion of a set, any field (Le., graph theory, pattern recognition, topology etc.) can be, in principle, 'fuzzified' by replacing the concept of a crisp set by a fuzzy set. The practical usefulness of such fuzzification, however, remains application-dependent.

In this paper we are only concerned with applications of fuzzy methods in pattern recognition and predictive learning from data. In such applications, the


178

goal is to estimate (learn) functions (or mappings) from the available data (or training samples).

Recent interest in fuzzy methods has been triggered by successful applications in control systems (notably in Japan). For example, consider a subway control system for controling the train's brakes. Control actions in a traditional control system result from complex mathematical models of the system. In contrast, control actions in a fuzzy system can be described using a handful of 'fuzzy rules', such as:

if speed is HIGH and the next station is NEAR, apply brakes at HIGH pressure,

if speed is SLOW and the next station is NEAR, apply brakes at LOW pressure,

if speed is MEDIUM and the next station is FAR, apply brakes at NORMAL pressure,

where fuzzy sets {HIGH, SLOW, MEDIUM} encode the value of (crisp) input variable speed; fuzzy sets {NEAR, FAR} encode the value of a (crisp) input variable distance to next station; fuzzy sets {HIGH, NORMAL, LOW} encode the value of a (crisp) output variable brake pressure. All fuzzy sets are specified by their membership functions, which are provided by the human experts (i.e., subway engineers) and then tuned (optimized) by the fuzzy system designers. Essentially, fuzzy sets provide fuzzy quantization of input and output variables, i.e. each (crisp) input and output variable is represented as a (small) number of overlapping regions with vague boundaries.

A controller built from such rules naturally incorporates common sense/expert knowledge; it may be easier to build and to maintain than a conventional controller. The term 'fuzzy rule' itself is rather misleading, since most applications use these rules to specify an input-output mapping, and, in fact, represent associations between fuzzy sets in the input and output space. Indeed, fuzzy rules in the abovc example describe a mapping from the two-dimensional input space with coordinates (speed, distance to next station) to the output space (pressure). However, unlike usual functions (defined for continuous input variables) fuzzy rules specify a mapping by associating fuzzy sets (i.e., overlapping regions or fuzzy clusters) in the input space and output space (Kosko 1992).

Collection of fuzzy rules specifying an input-output mapping is called a fuzzy inference system. There are two advantages of such (fuzzy) representation of a mapping. First, it allows effective utilization and representation of a priori knowledge (human expertise) about the system. Second, this representation is highly interpretable. On a negative side, human knowledge is often subjective and context-dependent; hence, specification of fuzzy rules, and fuzzy sets (i.e., their membership functions) may be subjective. Hence, a big challenge in the design of a fuzzy system is to specify explicitly its limits of applicability. In practice, this is done via extensive experimental verification of a fuzzy system prototype (i.e., actual or software implementation).

Most engineering applications measure crisp values of input and output variables (such as sensor and actuator signals). Hence, in a practical system

179

combining crisp input/output signals with fuzzy a priori knowledge require two additional steps known as: - fuzzification, i.e. obtaining fuzzy representation of a crisp input value; - defuzzification, i.e. converting fuzzy output value into its crisp equivalent. This leads to the usual structure of a fuzzy inference system shown in Fig.1, where specific fuzzificationldefuzzification procedures as well as specification of input/output fuzzy sets and fuzzy rules themselves are provided by experts. A collection of fuzzy rules can accurately represent arbitrary input-output mappings (assuming there are enough rules). Common applications (in control) include approximation of real-valued functions (nonlinear in input variables). This is known as fuzzy nonlinear control. However, the structure in Fig. 1 effectively implements a crisp input-output mapping (i.e., a function), regardless of the fuzzy terminology used to represent this mapping. Hence, it may be useful to relate this (fuzzy) representation to other types of approximating functions (for regression) used in the field of neural networks and statistics.

puts In (C risp) ~

Fuzzi-fication

f

Fuzzy Processing

1 A Priori (Expert) Knowledge

Fig. 1. Block diagram of a fuzzy inference system

Defuzzi-fication

f

Output isp) (Cr

f----

Fuzzy inference systems are sometimes promoted as a totally new approach to modeling input-output mappings. However, as shown by many authors (Brown and Harris 1994), there is a close connection between neural network / statistical methods and fuzzy inference systems. The main purpose of this paper is to make this connection clear in order to understand better the advantages and limitations of fuzzy systems. As shown later, each fuzzy rule simply represents a local (or kernel) model under 'non-fuzzy' statistical or neural network parameterization. Moreover, specification of these fuzzy rules (by experts) is analogous to to specification of a parametric model under the statistical approach. Similarly, empirical tuning of fuzzy rules and membership functions is analogous to parameter estimation in statistics. The difference between the two approaches is highlighted by the following observations: (1) In fuzzy systems, parametric models are specified in local regions of the input space (a fuzzy rule for each region), whereas statistics usually describes global parametric models.

180

(2) Tuning a fuzzy system is not well-defined, as it combines intuition, domain expertise and evidence from data in an ad hoc way to produce a final mapping. In contrast, parameter estimation is a clearly defined problem corresponding to minimization of the empirical risk or training error (with respect to model parameters). In a strict sense, fuzzy systems theory does not specify how to fit the data.

In the framework of Predictive Learning (Vapnik 1995; Cherkassky and Mulier 1997), estimating a model from finite data requires specification of three concepts: a set of approximating functions, an inductive principle and an optimization procedure. For example, in the field of neural networks: - a set of approximating functions is multilayer perceptrons (MLP) networks parameterized by the connection weights. - an inductive principle is minimization of the (penalized) empirical risk, i.e. mean-squared-error loss function for the trainihg data. Commonly used forms of penalization (complexity control) include the choice of the number of hidden units, early stopping rules during training, initialization of parameters (weights) etc. (See Cherkas sky and Mulier 1997 for details.) - an optimization procedure (learning method) is a constructive procedure for implementing an inductive principle using a given set of approximating functions. For neural networks, backpropagation training is a method of choice, even though any other nonlinear optimization technique can be used as well.

Fuzzy methodology provides only a set of approximating functions (in the form of fuzzy rules), whereas the choice of the inductive principles and the optimization procedure is done in some ad hoc manner.

Meaningful comparisons between fuzzy and 'conventional' approaches for predictive learning/pattern recognition applications typically use traditional nonfuzzy criteria (such as quantization error, mean squared error, probability of misclassification etc.). These criteria originate from non-fuzzy formulations, and they correspond to physical quantities measured in real systems. In fact, there is a philosophical contradiction between the subjective nature of fuzzy descriptions and the need for objective criteria in most engineering systems.

As shown in this paper and elsewhere (Brown and Harris 1994), for pattern recognition applications fuzzy methods represent a possible modeling approach, rather than a totally new paradigm. Fuzzy systems provide a new way to incorporate human expertise into the design of a learning system. They are also useful for model interpretation. However, they are not necessarily useful for predictive learning from data.

This paper is organized as follows. Section 2 describes a taxonomy of approximating functions used for real-valued function estimation (regression) from samples. This taxonomy covers most methods in statistics and neural networks. Section 3 provides mathematical description of fuzzy inference systems with a real-valued output. This description enables the connection between fuzzy/neurofuzzy systems and the taxonomy presented in Section 2. The term 'neurofuzzy systems' refers to parameterized set of approximating functions (corresponding to a fuzzy inference system representation). Parameterization is usually specified for the fuzzy membership functions, whereas the fuzzy rules themselves are provided by experts. Then the parameters are estimated (learned)

181

from the trammg data. This setting fits exactly the usual formulation of the learning problem in terms of a (parameterized) set of approximating functions. In fact, it is shown in Section 4 that fuzzy inference systems have an equivalent basis function expansion (dictionary) representation. Finally, Section 5 reviews principled approaches for learning fuzzy rules from data.

2. Taxonomy of Methods for Function Estimation

Regression is the process of estimating a real-valued function based on a finite set of (input, output) samples:

(1)

where x E 9\d denotes a vector of (real-valued) input variables (in this paper, vectors and matrices are denoted by bold symbols). Input samples Xi come from a

fixed distribution with unknown p.d.f. p{x}. The output is a random variable

which takes on real values and can be interpreted as the sum of a deterministic function and a random error with zero mean:

y = g{x}+£ (2)

where (unknown) deterministic function g{x}called regression is the mean of the

output conditional probability:

g{x}= J yp (yJx)dY (3)

We seek to estimate (unknown) g{x) in a class of approximating functions

f {x, w}, WEn where n is a set of parameters. A set of functions f {x, w} , WEn is specified a priori. Examples of approximating functions include linear estimators, polynomial estimators, feedforward neural networks, radial basis function networks etc. For linear estimators parameters are linear coefficients, for feedforward neural nets parameters are connection weights.

A set of approximating functions may or may not contain the regression function (3). Therefore, we seek to find a model in the class f{x,w} that is close

(in some sense) to the regression. A common loss or discrepancy function for regression is the squared error (~)

L(y,f(x,W}}= (y- f(x,w}}2 (4)

Learning then becomes the problem of finding the function f(x,wo) (regressor)

which minimizes the risk functional

182

J(y- f(x.m)f p(x. y)dxdy (5)

using only the information available in the training data. Note that finding the model minimizing future prediction error (5) is an ill-posed problem, since all we have is a finite training sample (1). In fact, accurate estimation of prediction risk is a very difficult problem (Vapnik 1995; Cherkassky and Mulier 1997). It is closely related to the fundamental problem in Predictive Learning known as model selection, i.e. specifying a model of optimal complexity for a given training sample. Here the model of optimal complexity should provide the smallest expected risk (which is unknown). Typical constructive implementations of learning methods can be described as follows:

For a given (fixed) model complexity A.:

(1) find the model fA (x,(J)*) providing minimum of the empirical risk

(training error)

(2) for the model fA (x.m*) estimate future (prediction) error (5) using

analytic or data driven model selection criteria

Change model complexity A. and repeat steps (1) and (2) above

Select the final model providing minimum (estimated) prediction risk.

For example, polynomial regression in statistics assumes a class of approximating functions in the form of polynomials, where the polynomial degree specifies the model complexity, and the polynomial coefficients are estimated by least squares from the training data. Similarly, with neural nets, the number of hidden units can be used to specify model complexity, and parametr (weight) estimation is done by least squares minimization of training error (via backpropagation algorithm).

Most methods in statistics and neural nets use parameterization of approximating functions in the form of a linear combination of basis functions

m fm(x, w, v) = L Wigi(X, Vi )+wo

i=l (6)

where g i (x, Vi) are the basis functions with (adjustable) parameters

Vi = ~li' v2i •...• v pd and wi = [wo, wI' ... ' Wm ] are (adjustable) coefficients in a

linear combination. For brevity, the bias term Wo is often omitted in (6). The goal

of Predictive Learning is to select a function from a set (6) which provides minimum prediction risk. Equivalently, in the case of regression, the goal is to

183

estimate parameters Vi =~li,V2i'··.' VpiJ and Wi =[WO,WI, ... ,Wm ] from the

training data, in order to achieve smallest mean-squared-error for future samples. Representation (6) is quite general, and it leads to a taxonomy known as

'dictionary' methods (Friedman,1994), where a method is specified by given set of basis functions (called a dictionary). The number of dictionary entries (basis functions) m is often used as a complexity parameter of a method.

Depending on the nature of the basis functions, there are two possibilities: (a) fixed (predetermined) basis functions g;( x) which do not depend on the

response values Yi (but may depend on the xi-values)

m fm(x, w, v) = L wigi(X)+WO

i=l (7)

This parameterization leads to non-adaptive methods, since the basis functions are fixed and are not adapted to training data. Such methods are also called linear, since parameterization (7) is linear with respect to parameters wi = [wo, wI,··., wm ] which are estimated from data via linear least squares. The number of terms m can accurately estimated from data via analytic model selection criteria (Vapnik 1995; Cherkas sky et al. 1996). (b) adaptive basis functions use general representation (6), so that basis functions themselves are adapted to data, i.e. depend on the response values Yi. The

corresponding methods are called adaptive or flexible (Friedman 1994). Estimating parameters in (6) now results in a nonlinear optimization since basis functions are nonlinear in parameters. The number of terms m can be estimated, in principle, using analytic model selection for nonlinear models proposed, for example, in (Moody 1991) or by using resampling techniques. However, in practice model selection for nonlinear models is quite difficult since it is affected by a nonlinear optimization procedure and the existence of multiple local minima. Usually, an adaptive method uses the same type of basis functions g(x, vi)for all terms in the expansion (6), i.e.

m fm (x, W, v) = L Wigi(X, Vi )+WO

i=1

For example, multilayer perceptrons (MLP) use

(8)

(9)

where each basis functions is a univariate function (i.e., sigmoid activation) of a scalar argument formed as a dot product of an input vector x and a parameter vector Vi.

184

Radial basis function (RBF) networks use representation (8) with basis

functions

g(x, v,); g~x-v,il= 1Ix~v,!) (10)

where g~lx - viiI) is a radially symmetric basis function parameterized by a 'center'

parameter Vi. Note that g(t) = g~lx - vi lOis a univariate function. Often, radial basis

functions are chosen as radially-symmetric local or kernel functions K - which may also depend on a scale parameter (usually taken the same for all basis functions). Popular local radial basis functions include Gaussian

glt)=ex{-~2 ) and glt);~2+b2r (11)

MLP and RBF networks are usually presented in a graphical form as a 'network' where parameters are denoted as network weights, input (output) variables as input (or output) units, and basis functions are shown as hidden layer units.

Unlike dictionary representation (7), kernel methods use representation in the form:

n f(x) = L Ki (x, Xi )Yi (12)

i=l

where the kernel function K(x, Xi )is a symmetric function of its arguments that

usually (but not always) satisfies the following properties:

K(x, Xi);::: 0

K(x, Xi)= K~lx -x11)

K(x, Xi)= max

limK(t)=O t~oo

non-negative

radially symmetric

takes on its maximum when x = x'

monotonically decreasing with t = Ilx- x11

(13a)

(13b)

(13c)

(13d)

Representation (12) is called kernel representation, and it is completely specified by the choice and parameterization of the kernel function Ktx, x'). Note the duality between dictionary and kernel representations, in the following sense:

- dictionary methods (6) represent a model as a weighted combination of the basis functions;

- kernel methods (12) represent a model as a weighted combination of response values Yi'

Selection of the kernel functions Ki tx, x;) using available (training) data is conceptually similar to estimation of basis functions in dictionary methods. Similar

185

to dictionary methods, there are two distinct possibilities for selecting kernel functions: (a) kernel functions depend only on xi -values of the training data. In this case, kernel representation (12) is linear with respect to Yi-values, since Kitx,x;) do not depend on Yi. Such methods are called non-adaptive kernel methods, and, they are equivalrnt to fixed (predetermined) basis function expansion (7) which is linear in parameters. The equivalence is in the sense that for an optimal nonadaptive kernel estimate there is an equivalent optimal model in the fixed basis function representation (7). Similarly, for an optimal model in the fixed basis function representation there is an equivalent (non-adaptive) kernel model in the form (12); however the equivalent kernels in (12) may not satisfy the usual properties (13). (See Cherkas sky and Mulier 1997 for details.) (b) selection of kernel functions depends also on y-values of the training data. In this case, kernel representation (12) is nonlinear with respect to Yi -values, since Ki tx, xi) now depend on Yi. Such methods are called adaptive kernel methods, and they are analogous to adaptive basis function expansion (8) which is nonlinear in parameters.

The distinction between kernel and dictionary methods is often obscure in literature, since the term 'kernel function' is also used to denote local basis functions in dictionary methods.

It is also important to note that most adaptive methods for function estimation use dictionary rather than kernel representation. This is because model selection with a dictionary representation utilise all training data. In contrast, the kernel function K(x, Xi) with properties (13) specifies a (small) region of the input space

near point X' where IK(x, xi ~ is large. Hence, adaptive selection of the kernel

functions in (12) should be based on a small portion of the training data in this local region. The problem is that conventional approaches for model selection (such as resampling) do not work well with small samples (Cherkassky et al. 1996).

3. Fuzzy Inference Systems

Fuzzy inference systems usually perform crisp input-output mapping, as in control applications where the inputs correspond to (measured) system state variables, and the outputs are control signals. This mapping is represented via a set of fuzzy rules reflecting common sense knowledge about the problem at hand. These fuzzy rules enable compact and interpretable representation of a mapping (nonlinear in input variables). However, fuzzy inference systems provide representation of a (crisp) mapping and hence can be related to traditional methods for representing a function.

This section describes only systems with multivariate input x = [Xl I x2 , ... , xd ]

and univariate output y corresponding to real-valued function approximation I regression problems. Presentation is also restricted to a particular (commonly

186

used) choice of fuzzification / defuzzification procedures and fuzzy inference which uniquely specify a fuzzy inference system. This choice simplifies the analysis of a fuzzy system, in order to demonstrate the connection with traditional (mathematical) representations given in Section 2.

Even though here we discuss only fuzzy systems for regression problems, possible generalizations (i.e. fuzzy inputs / outputs, various fuzzification / defuzzification methods, categorical outputs etc.) can be easily obtained using the same methodology.

A fuzzy inference system implementing a real-valued function f(x}of d input

variables has the following form:

X ---> fuzzifier

I ---> fuzzy inference

I

---> defuzzifier ---> y I

knowledge base (fuzzy rules & membership fcts)

Here X=[XI,X2, ... ,xd]is a (crisp) multivariate input vector and yis a (crisp)

univariate output. The knowledge base contains several fuzzy rules encoding a priori knowledge about the function f(x}.

A formal description of the fuzzy processing steps in a fuzzy inference system is given next. Since we need to describe fuzzy systems for multivariate inputs, we need to formalize first the notion of a multivariate fuzzy set. Fuzzy membership function of a multivariate variable x = {Xl' X2 , ... , Xd} is usually defined as a

tensor product of univariate membership functions specified (separately) for each input coordinate. That is, given univariate fuzzy sets A(k) with a membership function JiA(k)(Xk} one can define a multivariate fuzzy set A = {A(I), A(2), ...

A(d)} with the foHowing fuzzy membership function

(14)

Expression (14) can be interpreted as a fuzzy membership function of a multivariate fuzzy set A in a d-dimensional space. Multivariate fuzzy set A with a membership function (14) is often represented as a fuzzy intersection of univariate variables (linguistic statements), i.e.

(15)

Note that with (commonly used) local fuzzy membership functions for A(k), the multivariate fuzzy set A specifies local neighborhood in x-space. In fact, if univariate fuzzy membership functions in (14) are local (i.e. Gaussians) then the multivariate membership function satisfies the usual properties of a local kernel (13) - this observation will be used later in Section 4 to establish the connection between fuzzy systems and the traditional representations described in Section 2.

Using the notion of a multivariate fuzzy set, we can specify an (input, output) mapping as a collection of mappings between local neighborhoods in the input and

187

output space. A fuzzy system is a set of m fuzzy rules providing an interpretable representation for a multivariate functionj(x) :

IF{x is AJTHEN y is Bi {i = 1,2 ...... ,m} (16)

where x is a d-dimensional input (crisp) and y is a scalar output (to be determined). Multivariate input fuzzy sets Ai have (pre-specified) membership functions; they are usually specified via fuzzy intersection of univariate fuzzy sets as in (14), (15). Univariate output fuzzy sets Bi are also provided as a part of a priori knowledge.

The goal of fuzzy processing is to produce a crisp output for a given (crisp) input. The three steps of fuzzy processing can be formally described as follows:

Fuzzijication. For a given input, calculate rule-i strength as a fuzzy membership function f.lA. {x} via (14). For notational convenience, denote rule-i strength

I

(weight) as:

Wi = f.lA. {x} I

(17)

Rule output evaluation. The output fuzzy sets of each rule are modified according to either the product rule:

(18)

or the min inference rule:

(19)

The product rule is assumed this paper, since it allows analytic treatment and is commonly used in practice.

Defuzzification. The preferred approach called additive defuzzification (and assumed in this paper) is to defuzzify each rule output first, then combine the resulting crisp outputs, as detailed next. Perform centroid defuzzification for each output fuzzy set:

_ f Yf.li {y)1y _ f YWif.lB j {y)1y _ f Yf.lB j {y)1y Yi - f f.li {y}dy - f wif.l B j {y}dy - f f.l Bj (y}dy (20)

Note that with the product rule (18) centroid value given by (20) is the same as the centroid of the original output set Bj (this is not true with other rules, i.e. minimum inference (19».

Next calculate the system output as:

A LWiYi

y = LWk (21)

188

Note that according to (21) the individual rule outputs are additively combined; hence this is known as additive defuzzification. In addition to its simplicity and analytic tractability, additive defuzzification enables simple hardware implementations.

Another popular prescription for defuzzification is to form the combined output fuzzy set via a fuzzy union of individual rule outputs, i.e.

J.l{y)= max fJ.Li (y)} l~i~m

(22)

and then defuzzify it using centroid defuzzification. There are many other prescriptions for fuzzification, rule evaluation and

defuzzification; however in this paper we assume singleton defuzzification (17), product rule output evaluation (18) and additive centroid defuzzification (20), (21) as they are most widely used and easy to analyze.

Finally, we describe another variants of a fuzzy system with a different specification of the rule outputs. Specification in the form (16) discussed above is known as Zadeh-Mamdani fuzzy rules. Another alternative is to use crisp rule outputs. This is known as Takagi-Sugeno fuzzy rules:

IF{x is Ai ) THEN Y is g i (x) (23)

where only the input sets Ai are fuzzy and gi (x) are crisp functions provided by

experts. Usually, gi (x) are simple functions, i.e. constant or linear functions. For

example, assuming constant (crisp) outputs gives a fuzzy system:

IF{x is Ai ) THEN Y is Yi (24)

where Yi are constant crisp values. In this case, the 'centroid' of each rule output

is the constant value specified in a rule itself:

(25)

Hence the final system output following the usual fuzzification, rule evaluation, defuzzification steps is:

(26)

The final output of the Takagi-Sugeno system (26) is identical to the output of Zadeh-Mamdani system (16), only the expressions (20) and (25) for calculating individual rule outputs are different. Note, however, that for Zadeh-Mamdani system the rule outputs values can be precalculated via (20). Hence, without loss of generality, both types of fuzzy systems are described by equations (24) - (26).

189

4. Equivalent Basis Function and Kernel Representation

As noted earlier, fuzzy systems encode a priori knowledge by specifying a function in local regions of the input space, so that each fuzzy rule describes an association between local regions in the input and output space. Hence, there is a strong connection between fuzzy inference systems (expressed using fuzzy logic terminology) and a mathematical representation of a function described in Section 2. This section formally establishes this connection for the class of fuzzy systems described in Section 3, under the usual assumptions about singleton fuzzification, product inference and additive centroid defuzzification.

Consider a fuzzy system (24) for simplicity. Assuming that input membership functions are local, the rule strength is a local function (of the input variables):

Wj = }lAo (x) = Kj(x,Cj,Oj) I

(27)

where a local functtion Ki specifies a local neighborhood in the input space corresponding to the input fuzzy set of rule -i. Each local function Ki is specified by its center Ci (where it reaches its maximum) and the width matrix OJ specifying the local neighborhood around Ci where the local basis function is large. These parameters are completely specified by the input fuzzy sets (a priori knowledge). For example, for Gaussian fuzzy membership functions, the width matrix is a diagonal matrix.

Using (27), the final system output (26) can be written as:

(28)

Note that representation (28) specifies an equivalent crisp representation of a fuzzy inference system. Expression (28) can be now related to the dictionary and kernel representations.

Dictionary representation: the fuzzy system output is a linear combination of the (normalized) basis functions (specified a priori) taken with weights Yi

(29a)

where Lg;(X)= 1 (29b)

Basis functions are calculated as

(30)

190

For example, a fuzzy system specified by 3 fuzzy rules:

IF x is HI THEN y=Y1 IF x is H2 THEN y=Y2

IF x is H3 THEN Y=Y3

has basis functions as shown in Fig. 2. This system effectively implements a combination of the piecewise-linear and piecewise-constant interpolation between Yi' depending on the amount of overlap between input fuzzy sets (see Fig. 2). Similarly, it can be shown that fuzzy systems with multivariate inputs perform piecewise-linear/piecewise-constant interpolation between the rule output values. The fact that fuzzy inference systems using centroid defuzzification implement piecewise-linear/constant interpolation has several interesting implications. For example, it is clear that interpolating between local extrema of a function would yield the most economical and accurate piecewise-linear representation. Hence, fuzzy input setsl fuzzy rules should be (ideally) specified at a local minimum (or maximum) of a function.

Note that normalization conditions (29b), (30) are a direct result of the centroid defuzzification rule. An alternative defuzzification is to combine (additively) the outputs of fuzzy rules without normalization, leading to the following basis function representation:

(31)

Comparing the normalized (28) versus unnormalized (31) representation, one can see that the normalized approach results in a more smooth interpolation between the y; values. For example, under normalized approach the output values of a function lie between the smallest and the largest values of y;, but with unnormalized basis functions the output values may be larger than the largest y; and smaller than the smallest y;.This explains the popularity of the centroid defuzzification among the fuzzy practitioners concerned mainly with control applications (i.e. using a fuzzy system to model smooth control surface). Of course, such smooth interpolation may or may not be desirable for other applications.

Dictionary representation also can provide some insights into extrapolation properties of fuzzy systems, i.e. the behavior of the system output outside local regions specified by input fuzzy sets. For example, consider a univariate fuzzy system (24) with Gaussian fuzzy (input) membership functions:

Jl(X)

1

g(X)

1

Y

Y1

H1 H2

I I I I I I I

I I "J

191

H3

x

I I x I

I I I I

V Y2

X

Fig. 2. Fuzzy systems perform piecewise constant / piecewise linear interpolation

For x-values far away from the centers of local regions Ci, the output of a normalized fuzzy (using centroid defuzzification) is:

I· A LYi 1m y=--x~oo m

However, for unnormalized system given by (31) lim y = 0 .

In other words, a fuzzy system with unnormalized local function representation extrapolates to zero output values, whereas the system with normalized basis functions extrapolates to the mean of the response values of the fuzzy rules. Here again, the 'best' extrapolation is application-specific.

192

Kernel representation. In a kernel representation (12), a function is represented as a linear combination of the training response values with weights given by the kernel f~nction. Since kernel representation is defined in terms of the training data, the connection to fuzzy systems is not very clear. The output of a fuzzy system output is a linear combination of the rule output values Yi which can be interpreted as 'prototype' output samples. Also, the centers Cj of input fuzzy sets can be interpreted as 'prototype' input samples. This leads to the kernel interpretation where the fuzzy system output (29a) is a weighted average of the 'response' values, with kernel functions satisfying normalization condition (29b). However, this is not very useful interpretation. A better interpretation is based on the concept of local learning described in (Vapnik 1995). According to the local learning formulation (Vapnik 1995) the goal is to estimate function locally (at a given point) from the training data. The local neighborhood (around the point of estimation) is given by a kernel function. In a fuzzy system, each input fuzzy set can be interpreted as a local neighborhood in the input space. Then a function is specified as several local mappings, where each local mapping (fuzzy rule) is specified by the local neighborhood (i.e. center Cj and width .oj of an input fuzzy set) and by the output value Yi.

Finally, we note that in spite of the similarity between fuzzy systems and mathematical representations discussed above, there is a big difference between the two. Namely, fuzzy systems are not usually concerned with learning from samples, since an output of a fuzzy system is completely specified by a priori knowledge. In contrast, kernel and basis function representation have been introduced in the context of flexible function estimation from training samples, when a priori knowledge is limited or non-existent.

Fuzzy system representation of a function may be useful for: - encoding a priori knowledge about a function when such knowledge completely specifies the function; - interpretation of a model estimated from samples (using statistical or neural network methods).

The connection between fuzzy and mathematical representations also becomes relevant for analysis of neurofuzzy systems described next.

5. Learning Fuzzy Rules from Data

In a fuzzy inference system, fuzzy rules and fuzzy membership functions are typically provided by the application domain experts, even though some heuristic tuning of the fuzzy membership functions is common in practice. In contrast, the goal of 'neurofuzzy systems' is to estimate fuzzy rules directly from the training samples using neural network learning. Most available research literature on neurofuzzy systems simply combines the two (poorly understood) heuristic methodologies, thus adding to technical confusion already existing in both fields. It is important to realize that neural networks, neurofuzzy systems and statistical methods have the same goal, that is estimating a function from a set of

193

approximating functions. This is often obscured by the field-specific terminology. Table 1 gives correspondence between mathematical terms used for function estimation from samples and the terminology used in neural networks and neurofuzzy systems. (Naturally, we prefer mathematical terminology.)

Table 1. Correspondance between mathematical terms and the terminology used in neural networks and neurofuzzy systems

StatisticslMath (parameterized) model basis function no. of basis functions parameters parameter estimation prediction risk regression/classification regularization

density approximation

Neural Nets neural network hidden unit output no. of hidden units connection weights learning (of weights) generalization supervised learning weight decaylbrain damage unsupervised learning

FuzzynNeurofUzzy (parameterized) fuzzy system fuzzy rule no. of fuzzy rules parameters of a fuzzy system learning fuzzy memb. functions generalization neurofUzzy learning not available

fuzzy clustering

Representational equivalence between fuzzy inference systems and traditional methods for function estimation enables a systematic treatment of 'neurofuzzy' systems. A neurofuzzy system can be viewed as a (parameterized) set of approximating functions (parameterized fuzzy rules) represented as a neural network. Each fuzzy rule corresponds to a hidden unit of a network. Hence, parameters of fuzzy rules can be learned (estimated) from the training data using statistical and neural network techniques.

This section describes several strategies for estimating fuzzy rules from data. The goal is to estimate a fuzzy system optimal in the usual sense of predictive learning. Depending on the amount of available a priori knowledge (about the function), we outline the following approaches.

Learning Fuzzy Rules Via Basis Function Representation. Let us assume there is no a priori knowledge about the fuzzy rules / input fuzzy sets. In this case, learning fuzzy rules from data is equivalent to local basis function estimation from samples. There is, of course, nothing inherently fuzzy in this approach, except that parameterization of approximating functions uses local basis functions:

m f(x)= L f3i Ki(X,Cj,nj)+ f30 (32)

i=1

Usually, estimating parameters in (32) is done using learning strategies for radial basis fUnction networks. That is, first estimate centers and width of basis functions using unsupervised training, and then find coefficients f3j via supervised training.

194

Estimating parameters of basis functions K j (x,Cj ,Qj ) from data corresponds to

learning fuzzy rules, whereas estimation of coefficients {Jj can be interpreted as a

defuzzification procedure. For example, using normalized radial basis functions in the parameterization (32) is equivalent to the usual additive centroid defuzzification procedure. Following training, local basis functions can be interpreted as fuzzy rules based on the dictionary representation of a fuzzy system. The number of basis functions (usually regularization parameter of a learning procedure) corresponds to the number of fuzzy rules, and can be estimated via model selection. Of course, the resulting fuzzy system is not necessarily optimal due to heuristic nature of nonlinear optimization and model selection strategies (Cherkassky and Mulier 1997).

Support Vector Machines. An optimal procedure for estimating representation (32) from data can be developed using a universal learning method called Support Vector Machine (SVM). According to SVM theory (Vapnik, 1995), the number of basis functions and their centers are determined uniquely from the training data via quadratic optimization formulation, which guarantees a unique solution. Moreover, in the optimal solution, the centers Cj correspond to a subset of the training input samples, called Support Vector Xj. An optimal SVM solution has the form:

A m f{x}= L {JjKj{x,xj,Q)+ 130 (33)

j=!

where - the local basis function K is chosen a priori, i.e. a Gaussian kernel of fixed width Q;

- the number of support vectors m is found automatically (from data); - the support vectors Xj and the coefficients f3j are determined automatically (from data). An SVM solution (33) using local kernels K can be interpreted, in principle, as an output of a fuzzy system using unnormalized basis function representation (32). However, an SVM solution guarantees only that the linear combination of local basis functions accurately approximates the true function, it does not guarantee that optimal coefficients f3j provide close local estimates of the true function at the support vectors Xj. In other words, SVM solution does not represent a collection of local models, which is the major advantage of a fuzzy representation. A possible approach to combine an accurate estimation provided by SVM with interpretation advantages of a fuzzy representation is outlined next: - first, estimate a function from training data using SVM method; - second, find local extrema of a model estimated by SVM; - third, represent this model as a collection of fuzzy rules, where each rule describes a function at a local extremum point (see Fig. 3).

195

1.2

A

Y

_0.21..----------"""---------------1 o 0.2 0.4 0.6 0.8 x

Fig. 3. Fuzzy rule representation of a function estimated from data via SVM

This approach is universal as it decouples model estimation from data and model interpretation. In principle, one can use any reasonable method for model estimation, and then interpret the final model via fuzzy rules as outlined above. Note that this approach does not suffer from the curse of dimensionality, or combinatorial explosion of the number of fuzzy rules needed to describe a multivariate function (Watkins 1996).

Learning Fuzzy Rules Via Local Risk Minimization. Let us assume that location of centers C i of the input fuzzy sets is known a priori (or found by clustering as a part of preprocessing). Of course, the number of fuzzy rules is also assumed to be known. The problem is to estimate the width OJ of the fuzzy membership functions and the outputs of fuzzy rules Yi from the training data. This can be done, in principle, via basis function estimation as described above. However, there is a better approach based on the kernel representation of fuzzy systems. By interpreting each fuzzy rule as a local neighborhood, one can use the framework of local risk minimization (Vapnik 1995) to determine the optimal neighborhood width and to estimate the optimal rule outputs Yi from the training data. Under this approach, the width OJ and the output Yi are estimated for each rule separately from the local training data around each center ci' Then the resulting fuzzy system (using centroid defuzzification) is given by expression (28).

Fuzzy Input Encoding. A popular technique in the neurofuzzy applications is to encode a continuous input variable as a set of fuzzy (linguistic) values. This is known as 'fuzzy l-of-N' encoding, meaning that each continuous input feature X is encoded as N overlapping feature values. Assuming such encoding is provided a priori, it effectively performs fixed nonlinear mapping from the input (x) space to

196

a new high-dimensional feature (z) space. Then one can use simple (linear) set of approximating functions (linear neural network) for estimating mapping z~y, rather than nonlinear basis functions (nonlinear neural network) typically used for estimating x~y mapping from data. Several empirical studies suggest that using fuzzy l-of-N encoding with a linear model usually results in a better prediction and is always computationally faster than a solution provided by a nonlinear network. (For detailed comparisons, see Cherkas sky and Lari-Najafi 1992.)

Fuzzy input encoding can be interpreted in terms of the basis function representation where the basis functions Kj(X,Ci,Oi) are determined by the

encoding of input variables, specified a priori. Then learning from data amounts to estimating a linear model:

m f(x)= Laizi +ao wherezi = Ki(x,Ci,Oi)

i=1 (34)

The main difficulty with such an approach is high dimensionality of the linear model in the encoded (z) space. The problem usually becomes unmanageable when the number of input (x) variables is moderately large, say greater than 4-5, provided each input is uniformly encoded. This is, of course, the manifestation of the curse of dimensionality recently rediscovered by fuzzy logicians as 'the trouble with triangles' (Watkins 1996). In practice, one can use a priori knowledge to perform non-uniform fuzzy encoding. Such non-uniform encoding effectively implements local feature selection and thus results in a smaller number of terms in the linear model (34).

Acknowledgement. This work was supported, in part, by the IBM Partnership Award and by a grant from 3M Corporation.

References

Brown, M. and C. Harris, Neurofuzzy Adaptive Modeling and Control, Prentice Hall, Englewood Cliffs, N.J. 1994

Cherkassky, V. and H. Lari-Najafi, Data representation for diagnostic neural networks, IEEE Expert, v.7, no.5, 43-53,1992

Cherkassky, V. and F. Mulier, Learning From Data: Statistical, Neural Network and Fuzzy Modeling, Wiley, 1997 (to appear)

Cherkassky, V., F. Mulier and V. Vapnik, Comparison of VC method with classical methods for model selection, Proc. World Congress on Neural Networks, 957-962, Lawrence Erlbaum, NI, 1996

Friedman, J.H., An Overview of predictive learning and function approximation, in Cherkassky, Friedman and Wechsler (Eds.), From Statistics to Neural Networks. Theory and Pattern Recognition Applications. Springer, NATO ASI Series, v. 136, 1994

197

Kosko, B., Neural Networks and Fuzzy Systems: A Dynamical Approach to Machine Intelligence, Prentice Hall, Englewood Cliffs, N.J. 1992

Moody, J.E., Note on generalization, regularization and feature selection in nonlinear learning systems, First IEEE Workshop on Neural Networks in Signal Processing, 1-10, IEEE Compo Society Press, Los Alamitos, CA, 1991

Vapnik, V.N., The Nature of Statistical Learning Theory, Springer Verlag, 1995 Watkins, F., Fuzzy function representation: the trouble with triangles, Proc.

World Congress on Neural Networks, 1123-1126, Lawrence Erlbaum, NJ, 1996

Zadeh, L.A., Fuzzy sets, Information and Control, v. 8, 338-353, 1965 Zadeh, L.A., Fuzzy Logic: a precis, Multivalued Logic, v.l, 1-38, Gordon and

Breach Science Publishers, 1996

Fuzzy Decision Support Systems

H.-J. Zimmermann

RWTH Aachen, Templergraben 55, D-52062 Aachen, Germany

Phone: (0241) 80 61 82, Fax: (0241) 88 88-168

zi @buggi.or.rwth-aachen.de

Abstract. Decision analysis and decision support is an area in which applications of fuzzy set theory, have been found since the early 1970s. Algorithmic as well as knowledge-based approaches have been suggested. The meaning of the term "decision" has also been defined differently in different areas, as has the meaning of "uncertainty". This paper will first sketch different meanings of "decision" and "uncertainty" and then focus on algorithmic approaches relevant for decision support systems.

1. Decisions, Decision Support Systems, and Expert Systems

1.1 Normative Versus Descriptive Decision Theory Versus Connectionistic Views

The meaning of the term 'decision' varies widely. Very often it is used without defining its interpretation and this frequently leads to misinterpretations of statements about decisions, decision analysis or decision support systems. To avoid misunderstandings we shall select from the large number of possible definitions three, which are of particular relevance to this paper:

a) Decision Logic (formal or normative)

Here a "decision" is defined as an abstract, timeless, contextfree "act of choice", which is best be described as a quintuple D (A, S, E, U, P), were A is the action space, S the state space, E the event space, U the utility space and P the probability space. Often this purely formal model for "rational" decision making is illustrated by examples which suggest a relationship of this model to the reality of deciSion making, obliterating the fact that decision logic is a purely formal, mathematical or logical theory, focusing on rationality in acts of choice. Nevertheless, this model of a decision underlies many methods for or theories about decision making.


199

b) Cognitive decision theory This is an empirical, descriptive, non-statistical, context related process theory [13] and considers a "decision" as a decision making process very similar to a problem solving process, which is a special, time consuming, context dependent information processing process. The human decision maker is considered in analogy to a computer system, i. e. data and knowledge has to be fed into the system. This and the type of information processing performed determines the outcome [24]. c) The connectionist paradigm Neural nets also model living (human) information processing, but on a more physical and not so functional level. Information is processed from input via hidden to output layers of artificial neurones. One of the differences between the "cognitive" and the "neural" decision model is, that the latter includes explicitly and even concentrates on learning and on topological features, while the former does not exclude learning but does not consider it as one of the points of major interest.

Utilities

Results

Actions States

Fig. 1. The structure ofthe choice model

1.2 Decision Support Systems Versus Expert Systems

Decision Support Systems (DSS), as successors of Management Information Systems (MIS), traditionally follow the decision logic line of thinking and include in MIS algorithmic tools to improve the choice activity of decision makers. This includes optimization methods, mathematical programming, multi criteria models

200

etc. They are "structure related", normally assume that the decision problem can be formulated mathematically and do not stress information processing and display

Modify

Increase

Solution space (perceived)

No

Inf. Proc.

Goal space

Proposal

No

Implementation

Figure 2. The decision as information process

Revise

By contrast to DSS, Expert System (ES) or knowledge-based systems, as successors of the "General Problem Solver" (Newell and Simon 1972) follow more the process paradigm of cognitive decision theory; they not necessarily assume that the decision problems can be formulated as mathematical models; they substitute human expertise for missing efficient algorithms and they are not

201

structure- but context-related, with much smaller domains of application than DSS. "Knowledge" is represented in many different ways, such as, frames, semantic nets, rules etc. Knowledge is processed in inference machines, which normally perform symbol processing, i.e. truth values of antecedents, conclusions etc.

In the more recent past the border between DSS and ES has become pretty fuzzy. Some experts consider ES as part of DSS (Turban 1988), others see DSS and ES as basically different systems and others combine the two approaches into "Knowledge-based DSS" (Klein and Methlic 1995). Even though all of these systems can support decisions in one or the other way, for the sake of argument I shall keep the distinction between algorithmic DSS and knowledge-based ES.

1.3 Scientific Background

As already mentioned above, "decision" is regarded differently in different decision theories; and, furthermore, different sciences are contributing to decision making paradigms. This fact will be important for the conclusions drawn at the end of this paper. Therefore, Fig. 3 sketches some of the relationships.

1.4 Main Deficiencies of DSS

Let us consider DSS in the broad sense as the application oriented result of decision analysis and consider some of their deficiencies, which are relevant for the interface with "Fuzzy Logic": DSS and ES-technology share the dichotomy character, which leads, however, to different weaknesses on either side. While on the DSS side models and algorithms become sometimes pretty bad approximations of real problems, on the ES-side this leads to symbol processing rather than to knowledge processing. The former might be much harder to detect than the latter. Both, DSS and ES share the suffering from the size of realistic problems: nowadays there is often an abundance of data rather than a lack of them.

Both areas are influenced by the discrepancy between demand and supply: while scientific contributions normally are very specific, i.e. developed in one scientific discipline and focusing on one (small) and imagined problem, the practitioner on the demand side is looking for tools and solutions to his problems, which are frequently multi disciplinary, and not for approaches that solve a part of his problem, probably even impairing other parts. This is more serious for DSS than for ES, because the latter generally have a much smaller domain of application. This and some other factors often impair userfriendliness and, hence, user acceptance to a degree that the use of the tools never really occurs.

Art

ific

ial

Inte

llig

ence

Sym

bol

Pro

cess

ing,

H

euri

stic

Beh

avio

ral

Dec

isio

n T

heor

y

Dec

isio

n Pr

oces

ses,

B

ound

ed

Rat

iona

lity

Exp

ert

Sys

tem

s

Infe

renc

e E

ngin

es

Dat

a-S

truc

ture

s,

Alg

orit

hms,

N

euro

Inf

orm

atic

s

DSS

Uti

lity

The

ory,

S

tati

stic

al,

Dec

isio

n T

heor

y

Cho

ice

Mod

el,

Rat

iona

lity

Axi

oms

...

Dat

aban

ks a

nd A

lgor

ithm

s

KB

-DSS

Bat

ch-T

ype

DS

S,

Inte

ract

ive

DSS

Fig.

3. S

cien

tific

bac

kgro

und

Opt

imis

atio

n,

Alg

orit

hms,

L

ogic

'" o '"

203

2. The Meaning of "Uncertainty"

2.1 Semantic Interpretations of Uncertainty

Fuzzy set theory has often been labelled as a theory primarily intended for uncertainty modelling. In order to judge whether this is justified or not, one obviously first has to define what "uncertainty" is. One would expect to find an appropriate definition of this term either in lexica or in scholarly books on "uncertainty" modelling (Goodman and Nguyen 1985). Surprisingly enough I have not been successful to find any definition for it.

The first question one should probably ask is whether "uncertainty" is a phenomenon, a feature of real world systems, a state of mind or a label for a situation in which a human being wants to make statements about phenomena (i.e. reality, models, theories). One can also ask whether "uncertainty" is an objective fact or just a subjective impression which is closely related to individual persons.

Whether "uncertainty" is an objective feature of physical real systems seems to be a philosophical question. In the following we shall not consider these "objective uncertainties" if they exist, but we shall focus on the human-related, subjective interpretation of "uncertainty" which depends on the quantity and quality of information which is available to a human being about a system or its behaviour that the human being wants to describe, predict or prescribe.

In this respect it shall not matter whether the information is inadequate due to the specific individuum or whether it is due to the present state of knowledge, i.e. whether the information is not available at present to anybody. Fig. 4 depicts our view of "uncertainty" used in this paper.

The most important aspects of this view are: 1. "Causes" of "uncertainty" influence the information flow between the observed system and the "uncertainty" model (paradigm chosen by the observer).

2. A selected "uncertainty" model or theory has to be appropriate to the available quantity and quality of input information.

3. A chosen "uncertainty" theory also determines the type of information processing applied to available data or information.

4. For pragmatic reasons the information offered to the observer (human or other) by the "uncertainty" model should be in an adequate language.

5. Hence, the choice of an appropriate "uncertainty" calculus may depend on

- the causes of "uncertainty"

- quantity and quality of information available

- type of information processing required by the respective "uncertainty" calculus

-language required by the final observer.

204

Information .. Uncertainty ... Model

System ......•..............•...................... Information .. ... Information

Data .. Processing ...

Fig. 4. Uncertainty as situational property

o (human) observer

Even this notion of "uncertainty" is rather vague, has many different appearances and many different causes. It is, therefore, difficult to define it properly and in sufficient generality. It seems easier to define "certainty" and then describe "uncertainty" as situations which in various ways are distinct from "certainty", Of course, such a definition of "certainty" is in way arbitrary and subjective. It can be more or less extreme with respect to the situation. Here we chose a very extreme definition in order to reserve all situations not covered by this definition for the consideration of "uncertainty".

Definition 1:

A proposed definition of "certainty": "Certainty" implies that a person has quantitatively and qualitatively the

appropriate information to describe, prescribe or predict deterministically and numerically a system, its behaviour or other phenomena.

Situations which are not described by the above definition shall be called "uncertain". There can be different reasons or causes for this "uncertainty", the information can qualitatively and quantitatively be inappropriate for "certainty" in the above sense, and various man-made theories have been used to describe these situations. It seems that a lot of confusion has been caused by confusing the "type of uncertainty" with the "cause of uncertainty" or the theory which is used to model "uncertainty". I shall, therefore, attempt to describe in the following these three aspects of "uncertainty" separately in order to arrive at a certain taxonomy of "uncertainty", the classes of which may not be disjunct nor exhaustive.

2.2 Causes of "Uncertainty"

a) Lack of Information Lack of information is probably the most frequent cause for "uncertainty". In

decision logic, for instance, one calls "decisions under uncertainty" the situation in

205

which a decision maker does not have any information about which of the possible states of nature will occur. This would obviously be a quantitative lack of information. With "decision making under risk" one normally describes a situation in which the decision maker knows the probabilities for the occurrence of various states. This could be called a qualitative lack of information. Since information about the occUrrence is available, it can also be considered complete in the sense of the availability of a complete probability function, but the kind of the available information is not sufficient to describe the situation deterministically. Another situation characterised by a lack of information might be called "approximation". Here one does not have or one does not want to gather sufficient information to make an exact description, even though this might be possible. In some cases the description of the system is explicitly called an "approximation", in other situations this is hidden and probably not visible to the normal observer.

b) Abundance of Information (Complexity)

This type of "uncertainty" is due to the limited ability of human beings to perceive and process simultaneously large amounts of data. This situation is exemplified by real world situations in which more data is objectively available to human beings than they can "digest" or by situations in which human beings communicate about phenomena which are defined or described by a large number of features or properties. What people do in these situations is normally, that they transform the available data into perceivable information by using a coarser grid or a rougher "granularity" or by focusing their attention on those features which seem to them most important and neglecting all other information or data. If such a situation occurs in scientific activities, very often some kind of "scaling" is used to the same end. It is obvious that in these situations a transfer to "certainty" cannot be achieved by gathering even more data, but rather by transforming available data to appropriate information.

c) Conflicting Evidence

"Uncertainty" might also be due to conflicting evidence, i.e., there might be considerable information available pointing to a certain behaviour of a system and additionally there might also be information available pointing to another behaviour of the system. If the two classes of available information are conflicting, then an increase of information might not reduce "uncertainty" at all, but rather increase the conflict. The reason for this conflict of evidence can certainly be different. It can be due to the fact that some of the information available to the observer is wrong (but not identifiable as wrong information by the observer), it can also be that information of non-relevant features of the system is being used, it might be that the model which the observer has of the system is wrong etc. In this case a transition to a situation of "certainty" might call for checking the available information again with respect to the correctness rather than gathering more information or putting the information on a rougher grid . In some cases, however,

206

deleting some pieces of information might reduce the conflict and move the situation closer in the direction of "certainty".

d) Ambiguity

By ambiguity we mean a situation in which certain linguistic information, for instance, has entirely different meanings or in which - mathematically speaking -we have a one-to-many mapping. All languages contain certain words which for several reasons have different meanings in different contexts. A human observer can normally easily interpret the word correctly semantically if he knows the context of the word. In so far this type of "uncertainty" could also be classified under "lack of information" because in this case adding more information about the context to the word may move us from "uncertainty" to "certainty".

e) Measurement

The term "measurement" also has very different interpretations in different areas. In the context of this paper we mean "measurement" in the sense of "engineering measurement", i.e., of measuring devices to measure physical features, such as weight, temperature, length, etc.

The quality of our measuring technology has increased with time and the further this technology improves, the more exactly it can determine properties of physical systems. As long, however, as an "imagined" exact property cannot yet be measured perfectly, we have some "uncertainty" about the real measure and we only know the indicated measure. This is certainly also some type of "uncertainty" which could also be considered as a "lack of information". It is only considered to be a separate class in this paper due to the particular importance of this type of "uncertainty" to engineering.

f) Belief Eventually, we would like to mention as cause of "uncertainty" situations in

which all information available to the observer is subjective as a kind of belief in a certain situation. This situation is probably most disputable and it could also be considered as "lack of information" in the objective sense.

A possible interpretation of this situation is, however, also that a human being develops on the basis of available (objective) data and in a way which is unknown to us (subjective) beliefs which he afterwards considers as information about a system that he wants to describe or prescribe. The distinction of this class from the classes mentioned above is actually that so far we always have considered "objective" information and now we are moving to "subjective" information. Whether this distinction can and should be upheld at all is a matter for further discussion.

207

2.3 Type of Available Information

So far we have discussed causes of "uncertainty" which in most cases depend on the quality or quantity of available information. As already mentioned, however, we will have to consider the type of available information in a situation which we want to judge with respect to "uncertainty" in more detail: the information which is available for a system under consideration can, roughly speaking, be numerical, linguistic, interval-valued or symbolic.

a) Numerical Information In our definition of certainty we requested that a system can be described

numerically. This normally requires that the information about the system is also available numerically. Since this numerical information can come from quite a variety of sources, it is not sufficient to require just that the information is given in numbers, but we also have to define the scale level on which this information is provided (Sneath 1973). This determines the type of information processing (mathematical operation) which we can apply to this information legitimately without pretending information which is not available. There is quite a number of taxonomies for scale levels, such as, for instance, distinguishing between nominal scale level, ordinal scale level, ratio scale level, interval scale level and absolute scale level. For our purposes, however, the distinction in nominal, ordinal and cardinal scale levels might be sufficient.

Roughly speaking, a nominal scale level indicates that the information provided (even though in numerical form) only has the function of a name (such as the number of the back of a football player or a license plate of a car), that numerical information on an ordinal scale level provides information of an ordering type and information on a cardinal scale level also indicates information about the differences between the ordered quantities, i.e. contains a metric.

b) Interval Information In this case information is available, but not as precise in the sense of a real

valued number as above. If we want to process this information properly, we will have to use interval arithmetic and the outcome will again be interval-valued information. It should be clear, however, that this information is also "exact" or "dichotomous" in the sense that the boundaries of the intervals, no matter how they have been determined, are "crisp", "dichotomous", or "exact".

c) Linguistic Information By linguistic information we mean that the information provided is given in a

natural language and not in a formal language. The properties of the type of information obviously differ from those of either numerical information or of information in a formal language. Natural languages develop over a time, they depend on cultural backgrounds, they depend on the educational backgrounds of the persons using this language and on many other things. One also has to distinguish between a word as a label and the meaning of a word. Very often there

208

is neither a one-to-one relationship between these two nor are the meanings of words defined in a crisp and a context-independent way. By contrast to numerical information there are also hardly any measures of quality of information for natural languages (e.g. there are no defined scale levels for linguistic information). Linguistic information· has developed as a means of communication between human beings and the "inference engines" are the minds of people about which is still much too little known.

d) Symbolic Information Very often information is provided in the form of symbols. This is obvious when

numbers, letters or pictures are being used as symbols. This is often not as obvious if words are being used as symbols because sometimes it seems to be suggested or assumed that words have natural meanings while symbols do not. Hence, if symbolic information is provided, the information is as valuable as the definitions of the symbols are and the type of information processing also has to be symbolic and neither numerical nor linguistic.

2.4 Type of Information Processing

"Uncertainty" information is processed in various ways in various "uncertainty" methods. It can be processed algorithmically, i.e. by mathematical methods requiring normally numerical information on a specific scale level.

To an increasing degree uncertain information or information about "uncertainties" is also processed in knowledge-based systems (Zimmermann 1988) which can either be systems which essentially perform symbol processing (classical expert system technology) or they perform meaning preserving inference. Obviously, for these systems different requirements exist and different types of information are offered at the end. Eventually, information can be processed heuristically, i.e. according to well-defined procedures which, however, do not necessarily have to be mathematical algorithms, but which can also require other types of languages.

2.5 Type of Required Information

To model, i.e. describe, prescribe or predict, a system or the behaviour of a system normally serves a certain purpose. It could serve a human observer, it could be the input to another mechanical or electronic system, it could be used for other mathematical algorithms etc. Hence, the information about the "uncertainty" of the system will have to be provided in a suitable language, i.e. either numerical, in the form of intervals, linguistically or symbolically.

209

2.6 Uncertainty Theories

Sections 2.2-2.5 of this paper focused on factors which should determine the "uncertainty" calculus, theory or paradigm to be used to model "uncertainty" in or of a certain situation. This certainty contradicts views that, for instance, any "uncertainty" can be modelled by probabilities, or by fuzzy sets, or by possibilities, or by any other single method. We do not believe that there exists any single method which is able to model all types of "uncertainty" equally well.

Most of the established theories and methods for "uncertainty" modelling, however, are focused either on specific "types of uncertainty" defined by their causes or they at least imply certain causes and they also require specific types or qualities of information depending on the type of information processing they use. One could consider these "uncertainty" methods and their paradigms as glasses through which we consider uncertain situations or with other words: there is no "probabilistic uncertainty" as distinct from "possibilistic uncertainty". One is rather looking at an uncertain situation with the properties that were specified in sections 1.1 to 1.4 and one tries to model this uncertain situation by means of probability theory or by means of possibility theory. Hence, the theory which is appropriate to model a specific "uncertainty" situation should be determined by the properties of this situation as specified above. At present there exist numerous "uncertainty" theories, such as: various probability theories, evidence theory (Shafer 1976), possibility theory (Dubois and Prade 1988), fuzzy set theory, grey set theory, intionistic set theory (Atanassov 1986), rough set theory (Pawlak 1985), interval arithmetic, convex modelling (Ben-Haim 1990), etc. Some of these theories are contained in other theories which shall not be investigated here.

Table 1 shows a rough picture of what we consider the constituents of what may be meant by "uncertainty". Depending on the type of assumptions made, type of information processing performed and type of information provided, each uncertainty theory can now be characterized by a 5-component vector. The components of this vector describe the type of uncertainty sketched in table 1 for which the theory is suitable. For fuzzy set theory, for instance, this vector would be

{a or b; a or c; ?; a or c; a or b}.

3. Fuzzy Algorithmic Approaches for Decision Support Systems

3.1 Multi-Criteria Decision Methods

Let us first consider the basic model of (normative) decision theory: Given the set of feasible alternatives, X, the set of relevant states, S, the set of resulting events, E. and a (rational) utility function, u - which orders the space of events with

210

respect to their desirability - the optimal decision under certainty is the choice of the alternative that leads to the event with the highest utility.

The set of feasible alternatives can be defined explicitly by enumeration or implicitly by constraints. This model has to be extended to handle multiple objectives. The decision maker is asked to trade off the achievement of one objective against another objective. This requires a subjective judgement of the decision maker. Modeling tools should allow the handling of such additional information, for example interactively in close contact with the decision maker (see Werners 1984).

In real world problems - and in particular in semi-structured situations -additional difficulties can arise, for example:

- The feasibility of an alternative cannot be crisply determined, but may be attained to a certain degree.

- The set of relevant states is either probabilistically or nonprobabilistically uncertain.

- The utility function depends on multiple criteria, subjective judgements and risk behaviour. The functional dependence is often not specified.

- The problem situation is not formulated mathematically, but is described by linguistic terms of vague concepts.

Multi-Attribute Decisions We shall restrict ourselves to one major family of MADM approaches, namely,

"aggregation-approaches" .

Let us now concentrate on decision problems with multiple objectives. In order to facilitate the comprehension of fuzzy models in this area we shall first define the classical (crisp) multi-attribute decision model (MADM):

Let X = {xj, i = I, ... , n} be a (finite) set of decision alternatives and G = {gj, j = I, ... ,m} a (finite) set of goals, attributes, or criteria, according to which the desirability of an alternative is to be judged. gj (Xi) is the consequence of alternative Xi with respect to criteria gj. The aim of MADM is to determine an alternative XO with the highest possible degree of overall desirability.

Example: Car selection problem A person wants to buy a new car. After a market study only five alternatives X =

{xJ, ... ,xs} remain relevant. They differ with respect to the following four criteria which are of main importance to the decision maker:

gl: The maximum speed should be high. g2: The gasoline consumption in town should be low. g3: The price should be low. g4: Comfort should be high.

Tab

le 1

. Rou

gh T

axon

omy

of u

ncer

tain

ty p

rope

rtie

s ve

rsus

unc

erta

inty

mod

els

Un

cert

ain

ty p

rop

erti

es (

not e

xh

aust

ive,

no

t dis

jun

ct)

Ty

pe

of

unce

rtai

nty

mod

el

Cau

ses

of

(sub

j)

Ava

ilab

le

Sca

le L

evel

of

Info

rmat

ion

Req

uire

d un

cert

aint

y In

form

atio

n (I

nput

) N

umer

ical

P

roce

ssin

g (p

roce

ss)

Info

rmat

ion

Info

rmat

ion

(Out

put)

P

roba

bili

ty T

heor

y ~

Evi

denc

e T

heor

y (a

) L

ack

of in

form

atio

n (a

) N

umer

ical

(a

) N

omin

al

(a)

Alg

orith

mic

(a

) N

umer

ical

P

ossi

bili

ty T

heor

y (b

) A

bund

ance

of I

nf.

(b)

Inte

rval

(b

) O

rdin

al

(b)

Kno

wle

dge

Bas

ed

(b)

Inte

rval

F

uzzy

Set

The

ory

(com

plex

ity)

(c)

ling

uist

ic

(c)

Car

dina

l (c

) H

euris

tic

(c)

ling

uist

ic

Kol

mog

orof

f,

(c)

Con

flic

ting

evid

ence

(d

) Sy

mbo

lic

(d)

Sym

bolic

K

oopm

an,

Bay

es,

(d)

Am

biqu

ity

(e)

Mea

sure

men

t Q

uali

tati

ve

(f)

Bel

ief

'---

--

-------,

212

The technical details of the different cars are summarized:

kmlh 1I100km )1;1 (Xi) )1;2 (Xi)

XI 170 10.5 X2 150 8 Xl 140 9 X4 140 9 Xs 160 10.5

Definition 2:

Let gj, j= 1 , ... ,m, be functions to be maximized.

An element Xi (strictly) dominates Xk <=>

gj (Xi) ;;:: gj (Xk) " j = 1, ... ,m

ECU )1;1 (Xi) 17,000 15,500 8,000 9,500 17,000

and gj (Xi) > gj (xJ for some j i {t, ... ,m}

degree 14 (x)

50% 10% 20% 10% 40%

An alternative Xi E X is an efficient solution of the MADM if there does not exist any Xk E X which dominates Xi.

Of special importance in multiple criteria models is the set of efficient solutions and a compromise alternative should be an element of this set.

Example: Car selection model g2 and g3 are goals to be minimized. This is equivalent to maximize -g2 and -g3.

It is then easy to see that Xl dominates X5 and X3 dominates 14. The elements Xl> X2, and X3 constitute the set of efficient solutions.

Decisions in a Fuzzy Environment Bellman and Zadeh (1970) have departed from a classical model of a decision

and suggested a model for decision making in a fuzzy environment, which has served as a point of departure for most of the authors in fuzzy decision theory. They consider a situation of decision making under certainty, in which the objective function as well as the constraint (s) are fuzzy and argue as follows:

The fuzzy objective function is characterized by the membership function of a fuzzy set and so are the constraints. Since we want to satisfy (optimize) the objective function as well as the constraints, a decision in a fuzzy environment is defined by analogy to non-fuzzy environments as the selection of alternatives, which simultaneously satisfy objective function and constraints.

Let us call this the maximizing decision Xmax with

J1v( X max)=max mintua( x),J1d x)} x

213

More generally we can define a decision in a fuzzy environment as follows:

Definition 3:

Let MCi> i=l, ... ,rp., be the membership functions of constraints on x, defining the decision space and Maj, j=l, ... ,n, the membership functions of objective (utility) functions or goals on x. A decision D is the defined by its membership function

where *, ® denote appropriate, possibly context dependent, aggregators (connectives). Let M be the set of points x IX for which IlD (x) attains its maximum if it exists. Then M is called the maximizing decision. If IlD has a maximum at XM, then the maximizing decision is a crisp decision, which can be interpreted as the action that belongs to all fuzzy sets representing either constraints or goals with the highest possible degree of membership.

Aggregation Approaches

Aggregation approaches generally consist of two steps:

Step 1 ~ The aggregation of the judgements with respect to all goals and per decision alternative.

Step 2: The rank ordering of the decision alternatives according to the aggregated judgements.

In crisp MADM models it is usually assumed that the final judgement of the alternatives are expressed as real numbers. In this case the second stage does not pose any particular problems and suggested algorithms concentrate on the first stage. Fuzzy models are sometimes justified by the argument that the goals gj themselves or their attainment by the alternatives Xi> respectively, cannot be defined or judged crisply but only as fuzzy sets. In this case the final judgements are also represented by fuzzy sets which have to be ordered to determine the optimal alternative. Then the second stage is, of course, by for not trivial. The aggregation procedure can be direct or hierarchical, establishing "consistent" weights for the different criteria.

Let rij be the (preferability) ratings of alternative i with respect to criterion j and Wj subjective weights which express the relative importance of the criteria to the decision maker. In crisp MADM models a frequently used and non-sophisticated way to arrive at overall ratings of alternatives, Ri> is

m Ri = L Wjfij

j=l

214

Generally the Ri are real numbers according to which the alternatives can easily be ranked.

Example: Car selection problem

With respect to the technical data the decision maker gives (preferability)-ratings rij for each alternative i with respect to each goal gj. Additionally he determines his subjective weights for the goals. The result is the following table:

4 (Xi) Xl 5 X2 1 X3 4 3 X4 4 2 X 2 4 Wj 113 117 119

Then for each alternatives Xi a rating Ri as defined above can be computed:

Xl results as the most preferred solution.

The rij as well as the Wj' however, will in many cases more appropriately be modelled by fuzzy numbers. This has the following consequences:

In step 1: The aggregation procedure of the single criteria ratings will have to be

modified.

In step 2: Ri will no longer be real numbers but fuzzy sets which have to be ranked.

In the following some approaches to handle fuzziness in MADM aggregation models shall be described exemplarily:

Hierarchical Aggregation Using Crisp Weights (Yager 1978)

Essentially Yager assumes a finite set of alternative actions X = {XJ and a finite set of goals (attributes) G = {gj}, J=I, ... ,m. The gj,gj= {(Xi> J.lg (Xi))) are fuzzy sets the degrees of membership of which represent the normalized degree of attainment of goal j by alternative Xi. The fuzzy set decision, D, is then the intersection of all fuzzy goals, i.e.

m f.t 5 (Xi) = min J.lg (Xi)' i-l, ... ,n,

j-l

and the maximizing decision is defined to be the X for which

~ D (X) = max min Ilg (Xi). i j

215

Yager now allows for different importance of the goals and expresses this by exponential weighting of the membership functions of the goals. If Wj are the weights of the goals the weighted membership functions, Ilg', are

For the determination of the wj Yager suggests the use of Saaty's method, i.e. the determination of the reciprocal matrix by pairwise comparison of the goals with respect to their relative importance (Saaty 1980). The components of the eigenvector of this m x m matrix whose total is m are then used as weights.

The rationale behind using the weights as exponents to express the importance of a goal can be found in the definition of the modifier "very" (Zimmermann 1996). There the modifier "very" was defined as the squaring operation. Thus the higher the importance of a goal the larger should be the exponent of its representing fuzzy set, at least for normalized fuzzy sets and when using the rninoperator for the intersection of the fuzzy goals.

The measure of ranking the decision alternatives is obviously the Ilo (Xi). For the ranking of fuzzy sets in the unit interval Yager suggests another criterion, which bases on properties of the supports of the fuzzy sets rather than on the degree of membership. This is particularly applicable when ranking different (fuzzy) degrees of truth and similar linguistic variables.

Example: Car selection problem Let X = {Xi> i=I, ... ,5} again be the set of alternative cars and {gj, j=I, ... ,4} the

fuzzy goals, for example described by piecewise linear membership functions

g,: maximum speed [krnIh]

10 x-IOO

/lg,(x)= 1100

x < 100 100:S;; x :s;; 200

200 < x

&: consumption in town [11100 km]

{?2 _ x 12 < x

Ilg,(x)= -5- 7~x~12 1 x < 7

216

g3: price [BCU]

10 20.000 < x

J.I. g (x) 20.000 - x 5.000 S x S 20.000 , 15.000 5 00

1 x < . 0

&: comfort [%]

The fuzzy goals with respect to alternatives {Xl} are then:

g( = {(Xh.7),(X2,.5),(X3,.4),(~,.4),(Xs,.6)} & = {(Xh.3),(X2,.8),(X3,.6),(~,.6),(Xs,.3)} g3 = {(Xh.2),(X2,.3),(X3,.8),(~,.7),(Xs,.2)} ~ = {(Xh.5),(X2,·I),(X3,.2),(~,.I),(Xs,.4)}

By pairwise comparison the following reciprocal matrix has been determined, expressing the relative importance of the goals with respect to each other:

g( & g3 ~

gl

[1:3 3 7 9]

W=g2 6 7

g3 117 1/6 1 3

g4 1/9 117 1/3 1

The eigenvector w = (wj),j=I, .. .4, for which S4i=1 WI =4, is

w = (2.32,1.2,.32,.16)

Weighting the gj appropriately yields

& 1 = {(Xl ,.44 )'(X2,.2),(x3,.12),(~,.12),(xs,.31)}

& 1 = {(Xl ,.24 ),(x2,' 76),(X3,.54 ),(~,.54 ),(xs,.24)}

~' = {(xh.6),(X2,.68),(X3,.93),(~,.89),(xs,.6)}·

g' = {(xh.9),(X2,.69),(X3,.77),(~,.69),(xs,.86)}

Hence,

217

D = {( xi.mr ~g'j (Xi))} = {(x! .. 24).(X2 • .2).(X3 . .12).(X4.12).(X5 •. 24)}

and Dmax = {XI}.

Multi-Objective Decision Making By contrast to the MADM model the decision space in MODM-models is continuous and defined implicitly by functions (constraints). The type of model to be considered in the linear case is therefore:

Maximize Z =Cx (objective functions)

such that Ax {~ = ~} b}( constraints)

x~o

We shall now consider a standard LP in which the decision maker has established an aspiration level. z. as an approximate level of the objective function which he wants to achieve and in which the m constraints are flexible. i.e. they can be violated to certain degrees. This problem can be modelled as

Find x such that

Ax ~ b.

x ~ o. (1)

Here ~ denotes the fuzzified version of ~ and has the linguistic interpretation "essentially smaller than or equal".

We see that (1) is symmetric with respect to objective function and constraints. and we now make that even more obvious by substituting

(~)=B and (:)=d. Then (1) becomes

Find x such that

Bx ~ d.

x ~ o. (2)

218

Each of the m+ 1 rows of (2) will now be represented by a fuzzy set, the membership functions of which are Ili (x), which can be i.!!terpreted as the degree to which x fulfils (satisfies) the fuzzy inequalities (BX)i ~ d;, i=I, ... ,m+l, where (BX)i denotes the i-th row of (2).

If we accept the possibilistic attitude to model the intersection of fuzzy sets by the minimum operator, the membership function of the fuzzy set "decision" is

m+l

11 Ii {x) = min III {x} i=l

(3)

Assuming that the decision maker is interested not in a fuzzy set but in a crisp "optimal" solution xo, we could suggest to him the "maximizing solution" to (3) which is the solution to the possibly non-linear programming problem

m+l

max min Ili (x) = 11M (x)

X ~ 0 i=1

(4)

We now have to specify the membership functions Ili (x). They should be 0 if the constraints (including objective function) are strongly violated, 1 if they are very well satisfied (i.e. satisfied in the crisp sense); and they should increase monotonically from 0 to 1, i.e.

{ = 1 if

J.1 i (x) E [0,1] if

=0 if

(Bx)i ~ d j

d j < (Bx)j ~dj+pj,i=l, ... ,m+l

(Bx)j > d j + Pj

(5)

Using the simplest type of membership function, we assume them to be linearly increasing over the tolerance interval [d;,di+pi]:

(6)

The Pi are subjectively chosen constants of admissible violations of the constraints and the objective function Substituting (5) into (3) yields, after some rearrangements

219

m+lj (BXi)-di max min (1---------)

x ~ 0 i = 1 Pi

(7)

Introducing one new variable, A, which corresponds essentially to J.15 (x) in (3),

we arrive at

Maximize A

such that APi + (BX)i ,5; di+Ph i= 1 , ... ,m+ r O,5;A,5; 1,

x~O.

(8)

If the optimal solution to (8) is the vector (AO,XO), then XO is the maximizing solution (4) to the model (1), assuming membership functions as specified in (6).

Slightly modified versions of the model (6) and (7) result if membership functions are defined as follows:

A variable th i=l, ... ,m+l, 0,5; ti,5; Ph is defined which measures the degree of violation of the ith constraint: The membership function of the ith row is then

The crisp equivalent model is then

Maximize A

such that A Pi + 11 ,5; Pi, i=l, ... ,m+ 1

(BxL - 11 ,5; dh tj ,5; Ph

A, x, t~ O.

(9)

(10)

This model is larger than model (8), even though the set of constraints 11 ,5; Pi is actually redundant. Model (10) has some advantages, however, in particular when performing sensitivity analysis, which will not be discussed here.

The main advantage of (1) is that the decision maker is not forced into a precise formulation for mathematical reasons, even though he might only be able or willing to describe his problem in fuzzy terms. Linear membership functions are obviously only a very rough approximation. Membership functions which monotonically increase or decrease in the interval [dj , dj + pil can also be handled quite easily, as will be shown later.

So far the objective function as well as all constraints were considered fuzzy. If some of the constraints are crisp, Dx,5; b, then those constraints can easily be added to the formulation (8) or (10). Thus (8) would, for instance, become

220

Maximize A such that APi + (BX)i ~ di + Ph i=I, ... ,m+l

Dx ~ b

A ~ 1 X,A ~ o.

(11)

Even though we now have fuzzy and crisp constraints, the model remains symmetric in the sense that objective function (s) and constraints have exactly the same representation. The reason is, of course, that crisp constrains can be regarded as special cases of fuzzy constraints for which the membership functions degenerate to characteristic functions.

This changes if for the objective function no aspiration level can be established. In this case the notion of a "maximizing set" can be used in order to re-establish the symmetry of the model. In this case the equivalent crisp model becomes (Werners 1988; Zimmermann 1987);

Maximize A such that A (fo-ft ) - cTx ~ -flo

AP +Ax ~ b+p, Dx~b',

A~ 1, A,X~ 0,

(12)

where fo and f t are the optimal solutions, respectively, of two linear programs, one using the support of the solution space as feasible space and the other the a-cut for a=1.

So far we have used the minimum operator to model the intersection and assumed linear membership functions for the fuzzy sets representing objective functions or constraints.

It is quite obvious that linear membership functions will not always be adequate, and it has been shown empirically that the min operator is often not an appropriate model for the "and" used in decision models. We shall first consider the problem of non-linear membership functions, keeping the min operator as aggregator - and then we shall investigate what happens if other aggregating procedures are used.

The linear membership functions used so far could all be defined by fixing two points; the upper and lower aspiration levels or the two bounds of the tolerance interval.

The most obvious way to handle non-linear membership functions is probably to approximate them piecewise by linear functions. Some authors have used this approach and shown that the resulting equivalent crisp problem is still a standard linear programming problem. This problem, however, can be considerably larger than the model (to), because, in general, one constraint will have to be added for each linear piece of the approximation. Quite often S-shaped membership

221

functions have been suggested, particularly if the membership function is interpreted as a kind of utility function (representing the degree of satisfaction, acceptance etc.).

Some authors (Leberling 1981) have shown that non-linear membership functions can also be accommodated by equivalent linear crisp models. It has even been proven (Werners 1984) that the following holds:

Theorem 1:

Let {Fk}, k=I, ... ,K, be a finite family of functions fk: RnRl, xOeXIRn, glR1IR1

strictly monotonically increasing, and Ak,A'eR. Consider the two mathematical programming problems

and

Maximize A

such that AS; fk (x),

xeX

(13)

k=I, ... ,K,

Maximize A' (14 )

such that A'S; g(Mx)), k=I, ... ,K,

xeX

It there exists an AOeRl such that (AO,XO) is the optimal solution of (13), then there exists an A'e Rl such that (A '0 ,XC) is optimal solution of (14).

This theorem suggests that quite a number of nonlinear membership functions can be accommodated easily. Unluckily the same optimism is not justified concerning the other aggregation operators.

The computational efficiency of the approach mentioned so far rested to a large extent on the use of the min operator as a model for the logical "and" or for the intersection of fuzzy sets.

As already mentioned, others than the min-operator have been suggested and proven to model human behaviour better in the context of decision-making.

The disadvantage of these operators is, however, that the resulting crisp equivalent models are no longer linear, hence reducing the computational efficiency of these approaches considerably or even rendering the equivalent models unsolvable within acceptable time limits.

There are some exceptions to this rule, two of which we want to present in some more detail.

One of the objections against the min operator (see, for instance, [Zimmermann and Zysno, 1980], is the fact that neither the logical "and" nor the min operator is compensatory, in the sense that increases in the degree of membership in the fuzzy sets "intersected" do not influence at all the membership in the resulting fuzzy set

222

(aggregated fuzzy set or intersection). There are two quite natural ways to cure this weakness:

a. Combine the (Iimitational) min operator as model for the logical "and" with the fully compensatory max operator as a model for the inclusive "or". For the former the product operator might alternatively be used, and for the latter the algebraic sum. This approach departs from distinguishing between "and" and "or" aggregation, being somewhere between the "and" and the "or". (Therefore it is often called "compensatory and").

b. Stick with the distinction between "and" and "or" aggregators and introduce a certain degree of compensation into these connectives.

Compensatory "and"

For some applications it seems to be important for the aggregator used to map above the "max operator" and below the "min operator". The y-operator would be such a connective. For purposes of mathematical programming it has, however, the above mentioned disadvantage of low computational efficiency. An acceptable compromise between empirical fit and computational efficiency seems to be the convex combination of the min operator and the max operator:

m Ilc(x)=ymin lli(X) + (l-y) maxlli (x), ye[O,I]

i=l

(15)

For linear membership functions of the goals and the constraints, and denoting the coefficients of the A-matrix and the objective function by di, the crisp equivalent model is

m m max (minY{lli (diTx)) + (1-y) max {Ili (diTx)}). xeX i=l i=1

This is equivalent to

or

Maximize yl..\ + (l-y) A,z such that 1..\ ~ Ili (diTx),

Maximize

such that

A,z ~lli(d7x) xeX.

yl..\ + (1-y) A,z, 1..\ ~ Ili (diTx),

i=I, ... ,m,

for at least one ie [l, ... ,m],

i=I, ... ,m,

(16)

(17)

m

L Yi ~ m-l, iE 1

YE{O,l},

XEX,

223

i=l, ... ,m,

where MYi are very large real numbers. (17) is a mixed integer linear program which can be solved by available codes (APEX, MPSX, etc.).

3.2 An Interactive System for Decision Support

3.2.1 Basic Considerations

The basic motivations for the DSS described in the following were to provide an efficient computer based and interactive DSS for mathematical programming type problems which,

1. can cope with multiple (fuzzy) objective functions and crisp and fuzzy constraints;

2. uses realistic and empirically tested membership functions,

3. provides adequate connectives which can be chosen by the user contextdependently.

In a DSS which is to be of practical use not only scientific considerations, for instance, of best empirical fit of membership functions and operators, play a part but also the requirement of efficiency, i.e. computing time, storage space and user orientation. In the following we will describe some compromises between empirical fit and efficiency which had to be made in order to keep the overall system practically useful.

Membership Functions

Essentially four types of membership functions were initially considered for the fuzzy sets used in the DSS: Linear and hyperbolic functions, the logistic function and spline functions. Theorem 1 showed how hyperbolic functions can be transformed into equivalent linear membership functions. It can also be shown that hyperbolic membership functions and logistic membership functions are isomorphic mathematical models. By choosing appropriate parameters the resulting membership functions are equal. Determining a cubic spline function, however, requires the determination of quite a number of parameters, which might not be feasible at all for a decision maker. The DSS was, therefore, restricted to accommodate the first three types of membership functions.

224

Operators

The computational efficiency of the DSS depends primarily on the type of resulting "equivalent model" such as (10). (11) etc. This. however. does not only depend on the membership function chosen or the operator used for aggregation. but the combination of both. We shall show this for 2 operators and different membership functions.

Let us first consider a linear combination of min and max and linear membership functions of the type ~j (CjTX) as used before: The resulting models are shown in (16) and (17). respectively.

This is a mixed integer linear program (MILP) which can be solved with available efficient software such as APEX. MPSX etc. This is not true if we use the logistic function as a membership function:

Then the model becomes

max [Y(1 + et·' +(1-yX1+e-).2t]

~~ aj(djTx-bj) for at least one ie {1 •...• m}.

xeX

(18)

Similar results can be obtained for the "fuzzy and" operator. The following table summarizes some operator-membership function combinations and their resulting equivalent models.

Table 2. Resulting equivalent models

Membership Function

Linear Logistic Hyperbolic Linear LinearlNonlinear LinearlNonlinear Linear Linear Logistic

Operator

Min Min Min ymin+(l-y)max Product y-operator aHd or ai'id

Model

LP LP LP MILP nonconvex NLP nonconvex NLP LP MILP NLP with lin. cons.

with: aiid ==rrrin(x,y)+(l-r)~ (x+y) or==rIlllX(x,y)+(I-r)~(x+y)

225

For the DSS only combinations were included that lead to linear equivalent models. They are mentioned again in the following table:

Table 3. MF-Operator Combinations included in DSS

Operatox Alg. M.-Function Min MinIMax Mean and

Linear LP MILP LP LP Logistic LP

3.2.1 Working with the DSS

Using the decision support system first the decision maker has to give his goals and constraints for a fuzzy programming model. Goals and constraints are not treated equally as in the fuzzy decision model of Zadeh mentioned earlier. Instead we consider as the main difference between a fuzzy goal and a fuzzy constraint that the decision maker is able to give more information about a constraint than about a goal. Similar to crisp programming models where he only distinguishes between 0 and 1 degree of membership for satisfying a constraint, the decision maker a priori gives a membership function for each constraint. The membership function of a fuzzy maximization goal cannot be given in advance but depends on what is possible when satisfying the constraints. So additional information has to be attained about the dependencies of the model. This can be done by the system. Here extremal solutions are determined optimizing one goal over each of two crisp feasible regions: One which contains all solutions with a degree of membership equal to 1 and another which contains all solutions with nonzero degrees of membership. The results are used to determine membership functions of the goals.

To solve crisp vector maximum problems it was suggested (Zimmermann 1987) to derive a fuzzy set representing the goals from the pessimistic and the ideal solution of the vector-maximum· problem. The concept used in the DSS is a generalization which is necessary to handle fuzzy goals under crisp and fuzzy constraints. Aggregating all membership functions i.e. of goals and constraints, a compromise solution is determined. Interactively the decision maker can now change to proposed membership functions until he is satisfied with the compromise solution.

The interactive fuzzy programming system supports a decision maker, especially in two different ways:

- first, it determines extremal solutions and proposes membership functions describing the goals.

- second, it evaluated efficient compromise solutions with additional local information.

226

After each presented compromise the decision-maker gets more and more insight into the model and can articulate further preference information:

-locally, by modifying membership functions,

- globally, by modifying the model.

The following rough flow chart sketches how the DSS works:

The system "DSS" is a menu-oriented dialogue system for solving multi-criteria problems with crisp and fuzzy restrictions. A static description of the structure shall be omitted, because the structure of the system is highly dependent on the wishes of the decision maker.

227

Decision Maker (OS) I Information to OM EDP

... Objectives Constraints

Ling. Variables

~ Merrbership functions ...

Objectives , Constraints

I

Ind'lIIiduai L.,.. Determine Optima I .... Ind. Optima LP

+ ... .1 Merrbership functl r 1 Objectives

~ Aggregation "" Operators ,

"" I Generate III Equivalent Model

... ,

I Modifications I Aspirations ""

- Operators ~ Compromise Compo Sol. and I I MILP I solution and add. Information add. Information

"--- Merrbership-Fctlons

Goals, Constraints - Weights

Fig 5. Rough flowchart of DSS operations

228

References

Atanassov, K.T. [1986]. Intuitonistic fuzzy sets. Fuzzy Sets and Systems, 20,87-96.

Bellmann, R, and Zadeh, L.A. [1970]. Decision-making in a fuzzy environment. Management Scin. 17, B-141-164.

Ben-Haim, Y., and Elishakoff, I. [1990]. Convex Models of Uncertainty in Applied Mechanics, Elsevier Science Publishers, Amsterdam.

Dubois, D., and Prade, H. [1988]. Possibility Theory. New York, London.

Dubois, D., and Prade, H. [1989]. Fuzzy sets, probability and measurement. EJOR. 40, 135-154.

Goodman, I.R, and Nguyen, H.T. [1985]. Uncertainty Models for Knowledge-based Systems. North Holland.

Kandel, A., Langholz, G. (eds.) [1992]. Hybrid Architectures for Intelligent Systems. CRC Press, Boca Raton.

Klein, RL.; Methlie, L.B. [1995]. Knowledge-Based Decision Support Systems. 2nd Ed. Wiley, Chichester.

Klir, G.J.; Folger, T.A. [1988]. Fuzzy Sets, Uncertainty and Information. Prentice-Hall, Englewood Cliffs.

Klir, GJ. [1987]. Where do we stand on measures of uncertainty, ambiguity, fuzziness, and the like? Fuzzy Sets and Systems, 24, 141-160.

Leberling, H. [1981]. On finding compromise solutions in multicriteria problems using the fuzzy min-operator, Fuzzy Sets and Systems, 6, pp. 105-118.

Newell, A.; Simon, H.A. [1972]. Human Problem Solving. Prentice-Hall, Englewood Cliffs.

Pawlak, Z. [1985]. Rough sets. Fuzzy Sets and Systems, 17,99-102.

Saaty, T.L. [1978]. Exploring the interface between hierarchies, multiple objectives and fuzzy sets. Fuzzy Sets and Systems, 1, 57-68.

Shafer, G.A. [1976]. A Mathematical Theory of Evidence. Princeton. Sneath, P.H.A., and Sokal, R [1973]. Numerical Taxonomy. San Francisco.

Turban, E. [1988]. Decision Support and Expert Systems. 2nd Edit. Macmillan, New York.

Werners, B. [1984]. Interaktive Entscheidungsunterstutzung durch ein flexibles mathematisches Programmierungssystem. Munich.

Werners, B. [1988]. Aggregation models in mathematical programming. In: Mitra (ed.): Mathematical Models for Decision Support, Springer-Verlag, Berlin, pp.295-319.

Yager, RR, [1978]. Fuzzy decision making including unequal objectives. Fuzzy Sets and Systems, 1,87-95.

Yager, RR, Zadeh, L.A. (eds.) [1992]. An Introduction to Fuzzy Logic Applications in Intelligent Systems. Kluwer, Boston.

Zimmermann, H.-J. [1978]. Fuzzy programming and linear programming with several objective functions. Fuzzy Sets and Systems, 1,45-55.

Zimmermann, H.-J., and Zysno, P. [1980]. Latent connectives in human decision making. Fuzzy Sets and Systems, 4, 37-51.

229

Zimmermann, H.-1., and Zysno, P. [1983]. Decisions evaluations by hierarchical aggregation of information. Fuzzy Sets and Systems, 10,243-266.

Zimmermann, H.-1., Zadeh, L.A., Gaines, B.R. (eds.): [1984]. Fuzzy Sets and Decision Analysis. North-Holland, Amsterdam.

Zimmermann, H.-J. [1985]. Multi Criteria Decision Making in Crisp and Fuzzy Environmen,ts. In: Jones et al. (eds.): Fuzzy Sets and Applications. Reidel, Dodrecht, pp. 233-256.

Zimmermann, H.-J. [1987]. Fuzzy Sets, Decision Making, and Expert Systems. Kluwer, Boston, Dordrecht, Lancaster.

Zimmermann, H.-1. [1988]. Uncertainties in Expert Models. In G. Mitra (ed.): Mathematical Models for Decision Support, Springer Verlag, Berlin, pp. 613-630.

Zimmermann, H.-1. [1991]. Cognitive Sciences, Decision Technology and Fuzzy Sets. Information Sciences, 57/58, pp. 287-295.

Zimmermann, H.-1. [I 991]. Fuzzy Sets, Decision Making and Expert Systems. Kluwer, Boston.

Zimmermann, H.-1. [1996]. Fuzzy Set Theory and its Applications, 3rd Ed., Kluwer, Boston.

N euro-Fuzzy Systems

Rudolf Kruse and Detlef Nauck

Faculty of Computer Science, Otto-von-Guericke-University of Magdeburg Universitatsplatz 2, D-39106 Magdeburg, Germany Phone: +49.391.67.18706, Fax: +49.391.67.12018 [email protected], http:J Jfuzzy.cs. uni-magdeburg.de

Abstract. This paper is about so-called neuro-fuzzy systems, which combine methods from neural network theory with fuzzy systems. Such combinations have been considered for several years already. However, the term neura-fuzzy still lacks proper definition, and still has the flavour of a buzzword to it. In this paper we try to give it a meaning in the context of three applications of fuzzy systems, which are fuzzy control, fuzzy classification, and fuzzy function approximation.

Surprisingly few neuro-fuzzy approaches do actually employ neural networks, even though they are very often depicted in form of some kind of neural network structure. However, all approaches display some kind of learning capability, as it is known from neural networks. This means, they use algorithms which enable them to determine their parameters from training data in an iterative process. From our point of view neuro-fuzzy means using heuristic learning strategies derived from the domain of neural network theory to support the development of a fuzzy system.

Keywords Classification, control, function approximation, fuzzy system, learning, neural network, neuro-fuzzy system

1. Introduction

This paper deals with neuro-fuzzy systems. This term refers to combinations of neural networks and fuzzy systems. However, it is not completely clear what kind of models this term applies to. If we take a look into the relevant literature, we can find several different approaches which are called neurafuzzy, neural fuzzy, or fuzzy-neuro systems. This probably means they have something in common, and with this paper we discuss the general idea of such approaches, and give a meaning to neuro-fuzzy, currently the most popular of the three aforementioned terms.

Before we present our interpretation ofneuro-fuzzy systems we must discuss three other terms: neural network, learning and fuzzy system.


231

Neural networks - also called connectionist systems - are designed to (very roughly) model certain aspects of the human brain. They consist of simple processing elements (neurons) that exchange signals along connections (network structure). The signals are changed when they travel along the connections: They are combined (usually multiplied) with the (connection) weights. A neuron gathers the input from all connections leading to it, and computes an activation value by using a (usually non-linear) activation function. The units of a neural network are usually organised in layers which are called input layer, hidden layer(s) or output layer, depending on their functionality. A neural network with n units in its input layer and m units in its output layer, implements a mapping f : In --+ om. The sets I ~ It and 0 ~ It are the input and output domain. The hidden layer(s) add to the complexity of a neural network, and are important, if arbitrary mappings have to be represented.

The most interesting feature of neural networks is their ability to learn (see also below). There are so-called learning algorithms which can be used to determine the connection weights by processing a set of examples (the learning problem) that describes the desired input/output behaviour of the network. The weights are learned in such a way that two similar input vectors (patterns) result in outputs which are similar to each other. The neural network must not memorise the examples, but it is supposed to generalise.

The goal of a learning algorithm is to minimise an error function that is defined over the difference between desired and actual output. However, the algorithm does not perform some kind of global optimisation, but it computes only local weight modifications. The weights are modified according to local information, by distributing the global error inside the network. We can view a weight (or a connection) as an active entity that changes itself due to the local conditions in its environment, independent of other weights. The sum of all modifications in the network is supposed to lead to a global error reduction.

Certain types of neural networks, like multilayer perceptrons or radial basis function networks, are universal approximators, i.e. they can approximate any continuous function on a compact domain to any given degree of accuracy [14, 35]. However, the desired solution cannot be found analytically. It has to be obtained by iteratively processing the training data with the learning algorithm. Success is not guaranteed: the algorithm can get stuck in local minima, for example. Even if the learning problem is solved, it is not obvious whether the network has generalised. The reaction of the network to new, unknown inputs must be tested on a set of test patterns. It is usually not possible to prove analytically that the network has learned the desired mapping. A neural network is a black box-- the networks learns, but the user does not learn anything from the network. It is usually not possible to express the knowledge hidden in the network by some kind of rules, or to interpret the network in some other sense. This can be a severe drawback, if a neural

232

network is to be used in an application, where questions of reliability and safety play a role. For an introduction to neural networks see for example [1,13,43].

So neural networks are able to learn from data. What does learning mean in this context? It is rather simple: learning means, to find good values for parameters (weights) by an iterative procedure which processes sample data. The procedure is guided by an error measure that rates the current degree of performance. This is a very simple form of learning, and depending of the scientific background (e.g. philosophy, psychology, cognitive science, artificial intelligence, etc.) one may be reluctant to call this kind of iterative parameter determination a learning method. Learning as it is described has only marginal connections to learning as it occurs in human beings, or as it is examined in areas like machine learning. To be more precise we should speak of statistical learning or learning from data. It is also known as supervised learning, because there is a virtual supervisor or trainer, who can determine an output error.

There are other terms for this kind of parameter estimation. It may be more suitable to say that a neural network is trained, and therefore a neural network learning algorithm is also called training algorithm. From the viewpoint of a statistician a multilayer perceptron is a nonlinear regression or discriminant model. In this case, learning (or training) means to iteratively update estimates.

In the following we use the term learning or training in the sense described above: an iterative learning (training) algorithm tries to adjust parameters of a model by processing sample data (a learning problem). We discuss two forms of supervised learning:

- (Plain) supervised learning tries to reduce the difference between actual and desired output, and needs a fixed learning problem, i.e. a data set, where for each input pattern an output pattern is given.

- Reinforcement learning tries to produce outputs that have a certain observable effect to an environment. It needs a free learning problem, where there is no known output for a given input pattern, and an external reinforcement signal indicating whether the desired effect occured or not.

Finally, we have to say what we understand by the term fuzzy system in this paper. We assume the reader to be familiar with terms like fuzzy set, fuzzy logic, fuzzy rule, t-norm and t-conorm. For an introduction see [20]. The kind of fuzzy system we discuss has nothing to do with fuzzy logic in the narrow sense, i.e. we do not consider systems of generalised logical rules. In this paper a fuzzy system is used for function approximation or classification. Certain types of fuzzy systems are -- like neural networks -universal approximators [7, 18].

A fuzzy system consists of a set of fuzzy rules Rj like

Rj: if Xl is J.t} and ... and Xn is J.tj then y is Vj

233

where Xl, ... ,XZ E JR are input variables, and y E JR is an output variable. ILJ : JR -+ [0,1] and Vj : JR -+ [0,1] are fuzzy sets which are labelled with linguistic terms like small or approximately zero. The fuzzy sets are usually represented by parameterised membership functions like triangular, trapezoidal or bell shaped functions. The and connective in the antecedent of the rule is evaluated by a t-norm, usually the minimum function. A complete set of rules is evaluated by max-min inference, resulting in an output fuzzy set v:

V(y) = "kax {min{ILJ1(xd, ... ,ILjJXn),Vj(y)}}. J

This evaluation procedure is well known from Mamdani controllers [23]. The output fuzzy set v is then transformed into a crisp value by a defuzzification procedure, like e.g. center of gravity or mean of maximum.

Another well known type of fuzzy controller is the Sugeno controller [38]. It is a fuzzy system that uses rules like

n Rj : if Xl is ILl and ... and Xn is IL'J then y = L aiXi + ao·

i=l

Here the conclusion is not a fuzzy set, but a linear combination of the input variables, and the and connective is evaluated by a multiplication. There are various other ways to evaluate a set of fuzzy rules, see e.g. [21, 22] for an overview.

A fuzzy system like this approximates an unknown function f : JR n -+ JR. Mappings to JRm can be obtained straightforward by just adding variables to the conclusions of the rules. Each rule can be interpreted as a fuzzy sample, and the inference method results in an interpolation in a fuzzy environment [17]. This means fuzzy systems can be used for the same tasks as neural networks. The difference is that fuzzy systems are not created by a learning algorithm. They are built from explicit knowledge which is expressed in form of linguistic (fuzzy) rules. However, it is sometimes difficult to specify all parameters of a fuzzy system (rules and membership functions). If the performance of the fuzzy system is not satisfactory, the parameters must be tuned manually. This tuning process is error prone and time consuming.

So the idea of applying some kind of learning algorithm to a fuzzy system is not surprising. From the number of possible ways to accomplish this, the combination with neural network methods was, and is still very popular, and a lot of approaches are discussed in the literature. The preference of these socalled neuro-fuzzy systems is probably due to the fact that neural networks and fuzzy systems- especially fuzzy controllers - became popular roughly at the same time, at the end of the eighties. Applicants of fuzzy controllers, who had troubles with tuning them, perhaps have admired the ostensible ease by which neural networks learned their parameters. On the other hand neural network users may have admired the transparency and interpretability of a rule based fuzzy system, while a neural network is only a black box.

234

In the following section we describe our notion of a neuro-fuzzy system. Then we examine how to represent a fuzzy system in a neural-network-like architecture and discuss three applicational areas for neuro-fuzzy models: function approximation, control, and classification.

2. What Is a Neuro-Fuzzy System?

How can we combine the learning capabilities of a neural network and a fuzzy system? Before we consider this question, we should think about another one: Why do we want to do this anyway? A neural network is capable of learning a mapping from the input to the output domain. A fuzzy system also implements such a mapping. So if we have problems to specify the parameters of a fuzzy system, why not forget about it and simply use a neural network?

To answer this question let us consider typical situations in which a fuzzy system or a neural network would be used:

Fuzzy system: We have (at least some) knowledge about the relation between input and output, i.e. for some input situations we can (vaguely) specify the outputs. We can describe this knowledge by fuzzy rules.

Neural network: We have a lot of training data that describes the input/ output relation of our problem, and we know little or nothing about this relation.

If we decide to use a fuzzy system, and find out that we cannot derive all parameters from our knowledge about the problem, then we may think of learning them. However, if we plan to use neural network techniques to do this, we must have a lot of training data. This means we are in a classical situation to apply a neural network. Consider the following Table 1.

Table 1. Comparison between fuzzy systems and neural networks

fuzzy system neural network

interpretable black box making use of linguistic knowledge learning from scratch

Using a fuzzy system has obviously some benefits over using a neural network. We can interpret a fuzzy system as a system of linguistic rules. This means we can at least to some extend check our solution for plausibility. A neural network is a black box to the user. If we have at least some prior knowledge we can use it to initialise a fuzzy system. A neural network always learns from scratch. So it makes sense to use a fuzzy system, and instead of tuning it manually, using some kind of learning algorithm to optimise its parameters.

235

A common way to apply a learning algorithm to a fuzzy system is to represent it in a special neural-network-like architecture, which is quite easy as we show in the next section. Then a learning algorithm - like, for example backpropagation - is used to train the system. There are some problems, however. Neural network learning algorithms are usually gradient descent methods. They cannot be applied directly to a fuzzy system, because the functions used to realize the inference process are usually not differentiable. There are two solutions to this problem:

a) replace the functions used in the fuzzy system (like min and max) by differentiable functions, or

b) do not use a standard neural learning algorithm but a better suited procedure.

Modern neuro-fuzzy-systems are usually represented as a multilayer feedforward neural network [4, 8, 9,12,15,25,28,29,39]. The well known ANFIS model by Jang [15] for example implements a Sugeno-like fuzzy system in a network structure, and applies a mixture of backpropagation and least mean square procedure to train the system. A Sugeno-like fuzzy system uses only differentiable functions, so ANFIS goes for solution a). The GARIC model [4] also chooses solution a) by using a special "soft minimum" function which is differentiable. The problem with solution a) is that the models are sometimes not as easy to interpret as e.g. Mamdani-type fuzzy systems. Other models, that we discuss in the following sections, try solution b) - they are Mamdani-type fuzzy systems and use special learning algorithms.

In addition to multilayer feedforward networks there are also combinations of fuzzy techniques with other neural network architectures, for example selforganising feature maps [6, 41], or fuzzy associative memories [19]. However, in this paper, we concentrate on multilayer systems, because this type is used most frequently. Even of this type there are a lot of different approaches [8, 29], which have a lot in common, but differ in implementational aspects. To stress the common feature of all these approaches, and to give the term neuro-fuzzy system a suitable meaning, we want t~ restrict it to systems which posses the following properties:

1. A neuro-fuzzy system is a fuzzy system that is trained by a learning algorithm (usually) derived from neural network theory. The (heuristical) learning procedure operates on local information, and causes only local modifications in the underlying fuzzy system. The learning process is not knowledge based, but data driven.

2. A neuro-fuzzy system can be viewed as a special3-layer feedforward neural network. The units in this network use t-norms or t-conorms instead of the activation functions usually used in neural networks. The first layer represents input variables, the middle (hidden) layer represents fuzzy rules and the third layer represents output variables. Fuzzy sets are encoded as (fuzzy) connection weights.

236

Some neuro-fuzzy models use more than 3 layers, and encode fuzzy sets as activation functions. In this case, it is usually possible to transform them into a 3-layer architecture. This view of a fuzzy system illustrates the data flow within the system and its parallel nature. However this neural network view is not a prerequisite for applying a learning procedure, it is merely a convenience.

3. A neuro-fuzzy system can always (Le. before, during and after learning) be interpreted as a system of fuzzy rules. It is both possible to create the system out of training data from scratch, and it is possible to initialise it by prior knowledge in form of fuzzy rules.

4. The learning procedure of a neuro-fuzzy system takes the semantical properties of the underlying fuzzy system into account. This results in constraints on the possible modifications of the system's parameters.

5. A neuro-fuzzy system approximates an n-dimensional (unknown) function that is partially given by the training data. The fuzzy rules encoded within the system represent vague samples, and can be viewed as vague prototypes of the training data. A neuro-fuzzy system should not be seen as a kind of (fuzzy) expert system, and it has nothing to do with fuzzy logic in the narrow sense [20].

In this paper neuro-fuzzy has to be understood as stated by the five points above. Therefore we consider neuro-fuzzy as a technique to derive a fuzzy system from data, or to enhance it by learning from examples. The exact implementation of the neuro-fuzzy model does not matter. It is possible to use a neural network to learn certain parameters of a fuzzy system, like using a self-organising feature map to find fuzzy rules [34] (cooperative models), or to view a fuzzy system as a special neural network and to apply a learning algorithm directly [28] (hybrid models).

Approaches, where neural networks are used to provide inputs for a fuzzy system, or to change the output of a fuzzy system, we prefer to call neural (network)/fuzzy (system) combinations or concurrent neural/fuzzy models to stress the difference that in these approaches parameters of a fuzzy system are not changed by a learning process. If the creation of a neural network is the main target, it is possible to apply fuzzy techniques to speed up the learning process, or to fuzzify a neural network by the extension principle to be able to process fuzzy inputs. These approaches could be called fuzzy neural networks to stress that fuzzy techniques are used to create or enhance neural networks.

3. A Fuzzy System in a Neural Network Structure

A lot of neuro-fuzzy approaches represent their models as a neural network. This is of course not necessary to be able to apply a learning algorithm to a fuzzy system. However, it can be convenient because it visualises the data

237

flow through the system, as well for the input data, as for the error signals that are used to update the system parameters. An additional benefit is that different models can easily be compared, and structural differences are clearly visible.

There can be also some applicational advantages. If a fuzzy system is represented in form of a network, and a neural network development tool is available that is flexible enough to let us define special activation and propagation function, then it may be possible to use it. Fuzzy system development tools usually Implement only very restrictive learning capabilities.

It is very easy to represent a fuzzy system as a neural network. In Fig. 1 we can see a multilayer perceptron that solves the XOR-problem. The network uses sigmoid activation functions in the hidden and output units. The weights and bias values were found by backpropagation. The other network is in fact a fuzzy system with two rules. It is also an acceptable solution to the XORproblem. The activation functions of the units are min and max, respectively, and the connection weights are fuzzy sets. To the right of the network the fuzzy sets b (big) and s (small), and the two fuzzy rules represented in the network are shown.

y y j] s b

~

j§ 0 XI

2.4 5.9

j~ X2

Y ~

1 if XI is b andx2 is s

S theny is b

if XI is s andx2 is b theny is b

XI x2 XI X2

Fig. 1. A multilayer perceptron with sigmoid activation functions that solves the XOR problem, and a fuzzy system for the same problem represented as a feedforward multilayer neural network with special activation and propagation functions

To describe the network structure of a neuro-fuzzy model in general, it is useful to have a generic model. In [28] we have presented a generic 3-layer fuzzy perceptron. The name refers to the structure of the model, that

238

is similar to the perceptrons as they are known from the domain of neural networks. The term fuzzy (multi-layer) perceptIOn has also been used by other authors for their approaches [16, 24, 33]. We use our interpretation of this notion here to describe the structure of our generic model. Other definitions of the term "fuzzy perceptron" are also possible, of course.

By using a generic fuzzy perceptron to derive neuro-fuzzy systems for special domains, it would be possible to evaluate these different neuro-fuzzy approaches by means of the same underlying model. The fuzzy perceptron was used to derive some of the models which we discuss in the following three sections. A generic fuzzy perceptron has the architecture of a usual multilayer perceptron, but the weights are modeled as fuzzy sets and the activation, output, and propagation functions are changed accordingly, to implement a common fuzzy inference path. The intention of this model is to provide a framework for learning algorithms, to be interpretable as a system of linguistic rules, and to be able to use prior rule based knowledge, so that the learning need not start from scratch.

Definition 1 (3-layer fuzzy perceptron) A generic 3-layer fuzzy perceptron is a 3-layer feed forward neural network (U, W, NET, A, 0, ex) with the following specifications:

1. U = U Ui is a non-empty set of units (neurons) and M = {I, 2, 3} iEM

is the index set of U. For all i,j E M, Ui "I 0, and Ui n Uj = 0 with i "I j holds. U1 is called input layer, U2 rule layer (hidden layer), and U3

output layer. 2. The structure of the network (connections) is defined as

W: U X U -+ F(R),

such that there are only connections W(u,v) with u E Ui , V E Ui+l (i E {I, 2}). F(R) is the set of all fuzzy subsets of R.

3. By A an activation function Au for each u E U is given to calculate the activation au (a) for input and rule units u E U1 U U2 :

Au : R -+ R, au = Au(netu) = netu ,

(b) for output units u E U3 :

Au : F(R) -+ F(R),

au = Au(netu) = netu .

4. 0 defines for each u E U an output function 011. to calculate the output 011.

(a) for input and rule units u E U1 U U2 :

011. : R -+ R, 011. = Ou(au) = au,

239

(b) for output units u E U3 ,'

Ou : F(R) ---t R, OU = Ou(au) = DEFUZZu(au),

where DEFUZZu is a suitable defuzzification function. S. NET defines for each unit u E U a propagation function NETu to calcu

late the net input netu (a) for input units u E U1 ,'

(b) for rule units u E U2 ,'

NETu : (R x F(R))Ul ---t [0,1],

netu = T {W(u', u)(ou')}' U'EUl

where T is at-norm, (c) for output units u E U3 ,'

NETu : ([0,1] x F(R))u2 ---t F(R),

netu : R ---t [0,1],

netu(x) = ,..1. {T(ou" W(u',u)(x))} , u EU2

where ..1. is a t-conorm. 6. ex : U1 ---t R, defines for each input unit u E U1 its external input

ex(u) = exu. For all other units ex is not defined.

A fuzzy perceptron can be viewed as a usual 3-layer perceptron that is fuzzifled to a certain extent. Only the weights, the net inputs, and the activations of the output units are modeled as fuzzy sets. A fuzzy perceptron is like a usual perceptron used for function approximation. The advantage is the interpretation of its structure in the form of linguistic rules, because the fuzzy weights can be associated with linguistic terms. The network can also be created partly, or in the whole, out of linguistic (fuzzy if-then) rules.

In the following three sections we consider neuro-fuzzy models in the domain of function approximation. This means we want to describe a function by means of a fuzzy rule base. First we consider general function approximation, where a function is given by (noisy) data samples. In this case, we can use plain supervised learning. Then we discuss two special cases of function approximation by learning: neuro-fuzzy control and neuro-fuzzy classification.

In neuro-fuzzy control, learning is usually done indirectly by reinforcement, because the outputs (control actions) for given input values (system states) are unknown, if no other controller exists for the considered problem. The

240

sought-after (control) function is not given by data samples in this case, but by the behaviour of the process to be controlled and by an external performance or reinforcement signal that guides the learning process.

In neuro-fuzzy classification plain supervised learning is used again, because we use labelled training data. In this case, we are not looking for a continuous function like'in the other two cases, but for a discrete one by which we can classify input vectors.

In each of the following sections we describe some neuro-fuzzy approaches from the literature and present our own approaches, that are derived from the generic fuzzy perceptron above. Our models are available as free software tools, and they allow to study the behaviour of the respective neuro-fuzzy approaches.

4. Neuro-Fuzzy Function Approximation

In this section we consider the problem of approximating an unknown continuous function by a fuzzy system, where the function is partly specified by a set of data samples. This is a supervised learning problem, because the error of the approximation is defined by the difference between the actual output of the fuzzy system, and the target output given in the training data.

One of the first neuro-fuzzy systems for function approximation is the ANFIS model [15]. It represents a Sugeno-type fuzzy system in a special feedforward network architecture (see Fig. 2). The fuzzy sets are modelled by bell-shaped membership functions. Because ANFIS uses only differentiable functions, it is easy to apply standard learning procedures from neural network theory. For ANFIS a mixture of backpropagation (to learn the antecedent parameters, Le. the membership functions) and least mean square estimation (to determine the coefficients of the linear combinations in the rules' conclusions) is used. A step in the learning procedure has two parts: In the first part the input patterns are propagated, and the optimal conclusion parameters are estimated by an iterative least mean square procedure, while the antecedent parameters are assumed to be fixed for the current cycle through the training set. In the second part the patterns are propagated again, and in this epoch, backpropagation is used to modify the antecedent parameters, while the conclusion parameters remain fixed. This procedure is then iterated.

It is also possible to directly apply gradient descent learning to a Sugeno fuzzy system (even with triangular membership functions) without representing it in a network structure. This has been shown by Nomura et al. [31], and in an improved version by Bersini et al. [5].

For ANFIS no algorithm for structure (rule) learning has been specified, so it is only capable of modifying an existing rule base. ANFIS implements a Sugeno model, which is not as intuitively to interpret as a Mamdani model.

241

y

Fig. 2. The structure of the ANFIS model

So although ANFIS learns and performs well in many cases, it can be useful to specify a neuro-fuzzy approach based on a Mamdani-type of fuzzy system.

We can easily do this by deriving the NEFPROX model (neuro fuzzy function approximation) from the generic fuzzy perceptron [30].

A NEFPROX system (see Fig. 3) is a special 3-layer fuzzy perceptron with the following specifications:

1. The input units are denoted as Xl, •.• ,Xn , the hidden rule units are denoted as RI , ... , Rk, and the output units are denoted as YI,"" Yrn'

2. Each connection is weighted with a fuzzy set, and is labelled with a linguistic term.

3. Connections that come from the same input unit and have identical labels, bear the same fuzzy weight at all times. These connections are called linked connections, and their weight is called a shared weight. An analogous condition holds for the connections that lead to the same output unit.

4. Let Lx,R denote the label of the connection between an input unit x and a rule unit R. For all rule units R, R' ("Ix Lx,R = Lx,RI) ==> R = R' holds.

This definition makes it possible to interpret a NEFPROX system in terms of a fuzzy system; each hidden unit represents a fuzzy if-then rule. Condition 3 specifies that there have to be shared or linked weights. If this feature is missing, it would be possible for fuzzy weights representing identical linguistic terms to evolve differently during the learning process. If this is allowed to happen, the architecture of the NEFPROX system cannot be understood as a fuzzy rule base. Shared weights make sure that for each linguistic value (e.g. "Xl is positive big") there is only one representation as a fuzzy set, i.e. the linguistic value has only one interpretation for all rule units (e.g. RI and R2 in Fig. 3). It cannot happen that two fuzzy sets that are identical at the beginning of the learning process develop differently, and so the semantics of the rule base encoded in the structure of the network is not affected [25]. Connections that share a weight always come from the same input unit or

242

Fig. 3. The structure of the NEFPROX model: Some of the connections axe linked - they always have the same (fuzzy) weight

lead to the same output unit. Condition 4 determines that there are no rules with identical antecedents.

In a function approximation problem we can use plain supervised learning, because we know for each given input vector the correct output vector (fixed learning problem). If we use a system of fuzzy rules to approximate the function, we can use prior knowledge. This means, if we already know suitable rules for certain areas, we can initialize the neuro-fuzzy system with them. The remaining rules have to be found by learning. If there is no prior knowledge we start with a NEFPROX system without hidden units and incrementally learn all rules.

The fuzzy set learning algorithm for NEFPROX is a simple, computationally inexpensive heuristic procedure, and not a gradient descent method, which would not be applicable, because the functions (min and max) used for evalutating the fuzzy rules are not differentiable. Based on the error measure at the output layer the fuzzy sets of the conclusions are shifted to higher or lower values, and the width of their support is modified. Then the error is propagated back to the rule nodes. Each rule node computes its individual

243

error value and uses it to correct the spread and position of the antecedent membership functions. It is easy to define constraints for the learning procedure, e.g. that fuzzy sets must not pass each other, or that they must intersect at 0.5, etc. As a stopping criterion usually the error on an additional validation set is observed. Training is continued until the error on the validation set does not further decrease. This technique is well known from neural network learning, and is used to avoid over-fitting to the training data.

To start the learning process, we must specify initial fuzzy partitions for each input variable. This is not necessary for output variables. For them, fuzzy sets can be created during learning, by creating a fuzzy set of a given shape at the current output value, if there is no suitable fuzzy set so far.

The structure (rule) learning algorithm selects fuzzy rules based on a predefined grid over the input space (see also Fig. 10). This grid is given by the initial fuzzy partitions. If the algorithm creates too many rules, it is possible to evaluate them by determining individual rule errors and to keep only the best rules.

In this case, however, the approximation performance will suffer. Each rule represents a number of crisp samples of the (unknown) function by a fuzzy sample. If rules are deleted, some samples are not considered anymore. If parameter learning cannot compensate for this, then the approximation performance must decrease.

As an example for the learning capabilities of the ANFIS and NEFPROX algorithms, we consider a chaotic time series given by the Mackey-Glass differential equation:

0.2x(t - r) x(t) = 1 lO( ) - O.lx(t) +x t-r

We use the values x(t - 18), x(t - 12), x(t - 6) and x(t) to predict x(t + 6). The training data was created using a Runge-Kutta procedure with step width 0.1. As initial conditions for the time series we used x(O) = 1.2 and r = 17. We created 1000 values between t = 118 and 1117, where the first 500 samples were used as training data, and the second half was used as a validation set.

Table 2 compares the performance of ANFIS and NEFPROX on this problem. ANFIS gives a better approximation of the function, but due to the complex learning algorithm it takes a long time to obtain this result. NEFPROX is very fast, but has a higher approximation error. The number of free parameters is almost identical in both systems. Interpretation of the learning result is difficult in both cases, because ANFIS represents a Sugeno-type fuzzy system, and NEFPROX, which represents a Mamdani-type fuzzy system, uses a lot of rules. To enhance the interpretability pruning strategies known from neural networks could be applied to NEFPROX.

This comparison illustrates a trade-off that we often encounter in neurofuzzy approaches. To obtain a high performance, we need complex training

244

Table 2. Performance of ANFIS and NEFPROX on the Mackey-Glass equation

ANFIS NEFPROX RMSE on training set 0.0016 0.0315 RMSE on test set 0.0015 0.0332 cycles 500 216 runtime (SUN Ultra) 1030 s 75 s rule base given learned no. of rules 16 129 fuzzy sets per variable 2 7 no. of free parameters 104 105

algorithms based on gradient descent, algorithms which demand Sugeno-type fuzzy systems. If we use a Mamdani-type fuzzy system, which is easier to interpret, we can use fast heuristics for training, but usually achieve a lower performance.

Another problem in neuro-fuzzy systems is rule learning. Either no rule learning procedure is defined for a given neuro-fuzzy model (like ANFIS), or simple heuristics are used (like NEFPROX). However, those simple strategies are not always powerful enough to yield good (i.e. small and interpretable) rule bases. In this case, it can be useful to consider pruning techniques from neural networks to reduce the number of rules and variables in a neuro-fuzzy system. It is also possible to use e.g. fuzzy clustering methods to find fuzzy rules, and initialize a neuro-fuzzy system with them.

Both neuro-fuzzy approaches discussed in this section can be obtained from the Internet. ANFIS is made available by J.-S.R. Jang at ftp.cs.cmu.edu in user/ai/areas/fuzzy/systems/anfis. Information on NEFPROX can be found on our homepage at http://fuzzy.cs.uni-magdeburg.de.

5. Neuro-Fuzzy Control

In this section we consider approaches to neuro-fuzzy control, which is a special case of function approximation. We restrict ourselves to static aspects, where we want to approximate a control function. We assume that we want to find a fuzzy controller for some process by learning. We further assume that we do not have any previously recorded training data. There are no means yet to cope with the control problem at hand, i.e. no other controller and no human operator. Therefore we cannot use plain supervised learning. A solution is to use reinforcement learning, if either a model of the considered process is available, or training can be done online using the real process. Of course, in the latter case we have to make sure that hazardous situations cannot occur.

In the following we describe two approaches to reinforcement learning in a neuro-fuzzy model and do not consider neuro-fuzzy systems which are trained

245

by plain supervised learning like e.g. the ANFIS model. These models were discussed in the previous section, where the general problem of function approximation was examined.

A well known model for neuro-fuzzy control is the GARIC model by Berenji and Khedkar [4), and its predecessor ARIC [3]. The GARIC model (Generalised Approximate Reasoning based Intelligent Control) is a hybrid neurofuzzy model that uses two specialised neural networks. The architecture of GARIC uses concepts of adaptive critics, Le. special neural controllers learning by reinforcement [42), and it generalises the neural model of Barto et al. [2] to the domain of fuzzy control. GARIC consists of two neural modules, the ASN (Action Selection Network) and the AEN (Action state Evaluation Network).

The ASN consists of a feedforward network structure with five layers of units, where membership functions of the antecedents and the conclusions are stored in the units of the second and fourth layer (see Fig. 4). The rule base is encoded by the connections, and there are no adaptive weights. Learning in the ASN consists only in adapting parameters of the triangular membership functions.

'tJ

.. (>~--y' -,

~2 f -.- >f--- -- -~Ry-~'>-'-_____ 'R---~-pi >1 " -~ 4r--

Fig. 4. The action selection network (ASN) of GARIC

The ASN learns by a kind of gradient descent based on the internal reinforcement signal computed by the AEN. To do this, a differentiable function to evaluate the antecedent of a rule is needed, Le. the minimum function cannot be used here. GARIC uses a so-called soft minimum function instead that is not a t-norm. The learning algorithm also needs a crisp output value from each rule, i.e. it is not possible to use a defuzzification procedure on an aggregated fuzzy set determined e.g. by the maximum function. In GARIC a so called local mean of maximum procedure (LMOM) is used to obtain a crisp value from each rule, which yields a result different from the usual MOM only, if the membership functions are not symmetrical.

The learning algorithm uses gradient descent to optimise the internal re-

246

inforcement signal. But because the dependence of this signal on the control output calculated by GARIC is not explicitly known, the learning procedure has to make some assumptions. Additional problems that have to be heuristically solved are due to the three non-differentiable points of each membership function.

Learning depends on the changes in the internal reinforcement signal. If it is constant, the learning stops. This situation occurs when the process is controlled optimally, but it may also occur when the process is kept in a constant but non-optimal state. Therefore GARIC learns to avoid failure and not to reach an optimal state. This may lead to an undesirable control strategy, because states close to control failure are admissible. This kind of problem is addressed in [32).

The AEN is a 3-layer feedforward neural network with sigmoid units in the hidden layer and short-cut connections from the input layer to the single output unit. The network has the same inputs as the ASN. The AEN is used as an adaptive critic element which learns to predict the state of the process. Based on an external reinforcement signal that tells the AEN, whether control was successful or not, the network calculates an internal reinforcement signal. This value is used to adapt the parameters in the whole GARIC system, by a procedure which is similar to backpropagation in multilayer perceptrons. If there is a high internal reinforcement (i.e. a good process state) the weights are changed such that their contribution to the output value is increased (rewarding). If the process control has failed the weights are changed such that their contribution is decreased (punishment).

To explore the state space, Gaussian noise is added to the output of the ASN. If the internal reinforcement is small, this stochastic action modification is large, allowing the system to randomly produce better output values. This approach was also used by Barto et al. [2).

GARIC can only learn membership functions. The rule base of the controller has to be defined by other means. The model also needs an initial definition of the fuzzy sets, and their number cannot change. This restriction usually holds for all neuro-fuzzy models. The advantage of these approaches is that no control values need to be known for given states. The models learn by trial and error. This implies, of course, that a simulation of the process is available, or that learning can be done online at the process.

The special network structure of the ASN of GARIC ensurse that it can be interpreted as a fuzzy controller. However, it is also possible to use a more common 3-layer architecture, and refrain from using a special soft minimum to enable gradient descent learning.

NEFCON [26) is a model for neural fuzzy controllers developed by our group, and it is based on the architecture of the generic fuzzy percept ron described above. The learning algorithm for NEFCON is also based on the idea of reinforcement learning. In contrast to GARIC, the NEFCON learning algorithm uses a rule based fuzzy error measure as reinforcement signal.

247

Thus it is possible to define a reinforcement type learning algorithm without using an adaptive critic element. The algorithm enables NEFCON to learn fuzzy sets as well as fuzzy rules. Learning a rule base is done by deleting or inserting rules. Hence the learning process can work online and does not need previously recorded sample data.

The structure of NEFCON is identical to NEFPROX, with the exception that NEFCON has only one output variable, which is typical for control problems. A NEFCON system is used to control a dynamical system with one control variable y and n variables Xl, ... , Xn describing its state. The performance of NEFCON is measured by a fuzzy error e E [0,1], that is defined by a number of fuzzy rules like

if Xl is approx. zero and X2 is approx. zero then the error is small,

where Xl and X2 are two state variables of the dynamical system, and input variables of the NEFCON system. Because the error is defined by fuzzy rules, its value can be determined in the same way as the control output y, Le. it is possible to use a second NEFCON system for this task. The defuzzified error value obtained from the fuzzy rules is used for the learning algorithm. Additionally, the sign, Le. the direction of the optimal control action, must be known. The exact value is unknown, of course.

The learning procedure for NEFCON is a simple heuristic that optimises the fuzzy sets by shifting them and by making their supports larger or smaller. The idea is to look whether a larger or smaller output from a given rule is necessary to improve the performance, and then to modify the fuzzy sets accordingly. The algorithm tries to modify as few parameters as possible to keep the learning procedure stable. It is also easily possible to constrain the modifications on the fuzzy sets, to ensure that e.g. a triangular form is kept, or that there is a certain amount of overlapping in the fuzzy sets, etc.

With this learning algorithm NEFCON realizes a standard Mamdani-type controller with center-of-area defuzzification. If the fuzzy error is smaller than a certain value for a certain number of cycles, this may be used as a criterion to terminate the learning process. But it is also possible to continue learning so that the controller can adapt to changes in a dynamical system. If the controlled system has reached a good state, the error value will be around zero, and the changes in the fuzzy sets will also be close to zero or will compensate each other in the long run.

The learning algorithm can be extended to learn the fuzzy rules, too. One possibility is to start with a NEFCON system that contains all fuzzy rules that can be defined due to the partitioning of the variables. Thus the system begins with an inconsistent initial rule base, which must be made consistent by learning. During the training phase those rule units are deleted that accumulate the highest error values. If there are no known fuzzy rules, the system

n begins with N = q. n Pi rule nodes, if there are Pi initial fuzzy sets for each

i=l input variable xi(i = 1, ... , n) and q fuzzy sets for the output variable y.

248

The idea of the rule learning algorithm is to tryout the existing rules and to evaluate them. Rule units that do not pass this test are eliminated from the network. In a first phase all rule units producing an output with a sign different from the otherwise unknown optimal output value are deleted. In a second phase the algorithm has to choose one rule from each set of rules with identical antecedents, and delete all other rules of these sets. By this we go from an inconsistent to a consistent NEFCON system. In the third phase the fuzzy sets are adapted.

If Wiultvl 10 .b oDd W.altolV JoG O~ tbVD t:.raCt io Clb IC WiDkel IB Db oDd Wink.IV 18 pz tbolL '-raft 18 aa :..:, rf Wiatel 1 •• S ODd WlDkelV 1. IIIZ tb.a "roft t •• a IC Wiat.l IB DB aDd Wi.It.IV i. pz thOD I.rart i. az: 'tr If Wi.t.l 18 D'Z DDd Wi"It.IY ia Db tile. Kroft is Db . 1£ Whtel is D~ aDd Wjall:.1V i. a. tbeD lroft t.e ilia ~'. If WiDtel is D'Z BDd Wl_ItolY io az t.lIiolll IraFt. 10 JIll: .

1£ WJ..tol 18 D'Z aDd W1.Dt.lY 18 pb thOD KraFt If,I pB ~~

1£ W~Dt.l 18 DZ aDd W,,-oltelY 18 pH tllaB (raft 1B p:lt { 1£ Wi.t.l i& aZ DDd wi.t.IV i. pz tileD ~Drt is az IC Wht.l i. pb ood Wiat.IV h n . til •• r.roft 18 p. ~

Fig. 5. An implementation of NEFCON under Windows

This rule learning algorithm becomes very expensive, if there are a lot of fuzzy sets defined for a lot of variables. For this reason one should always try to use partial knowledge to avoid that all possible rules have to be created. If there are no known rules for certain input states, then only for these particular states all possible rules have to be created. This way the number of initial rule units can be reduced.

Another rule learning algorithm goes the opposite way, and creates a rule base from scratch by adding rule by rule. It does this by first classifying an input vector, i.e. finding that membership function for each variable that yields the highest membership value for the respective input value. By this a

249

rule antecedent is formed. Then the algorithm tries to guess the output value by deriving it from the current fuzzy error. In a second phase the rule base is optimised by changing the conclusion to an adjacent membership function if necessary.

This idea provides a rule learning algorithm that is less expensive than the one previously described. It is not necessary to handle all possible rules at once, something that soon becomes impossible, especially if there are a lot of variables, or a lot of membership functions. After the rule base has been learned, the learning procedure for the fuzzy sets can be invoked to tune the membership functions.

NEFCON has been implemented in several software tools under Unix, Windows and MATLAB/SIMULINK. This software can be obtained from our WWW server (http://fuzzy.cs.uni-magdeburg.de), to tryout the learning algorithms with simple tutorial applications like an inverted pendulum. In Fig. 5 the surface of the tool under Windows is shown displaying the solution (fuzzy rules and fuzzy sets) the tool has found to control an inverted pendulum.

6. Neuro-Fuzzy Classification

In this section we discuss classification as another special case of function approximation. An input vector x = (Xl,.'" Xn) E R,n is mapped to class indicator C which represents a crisp subset of R,n. We assume the intersection of two different classes to be empty. The training data consists of a set of labelled data, i.e. for each training sample its correct class is known. A classification problem can be represented by a function

where <p(x) = C = (CI,""Cm ) such that Ci = 1 and Cj = 0 (j E {1, ... ,m}, j =j:. i), i.e. x belongs to class Ci . This way, the class information is given by a l-of-n code. If a fuzzy system is used to perform this task, the fuzzy rules look like this:

R : if Xl is J.LI and X2 is J.L2 and ... and Xn is J.Ln then pattern (Xl, X2, ... , xn) belongs to class C,

where J.LI, ... , J.Ln are fuzzy sets which describe the patterns' features. Because of the mathematics involved in the rule evaluation process, the rule

base actually does not approximate the above mentioned function <p but the function <p' : m,n --+ [0, l)m. We will get <p(x) by <p(x) = 'IjJ(<p'(x)), where 'IjJ reflects the interpretation of the classification result obtained from the fuzzy system. Usually, we will map the highest component of each vector c to 1 and its other components to 0 (winner takes all).

The neuro-fuzzy model FuNe-I [12) can be used for both classification and general function approximation purposes. FuNe-I has a 5-layer feedforward

250

network structure (see Fig. 6) and restricts itself to rules with one (simple rules) or two antecedents. If there are two antecedents they can form a conjunction or a disjunction, where conjunction is modeled by a soft minimum function and disjunction by a soft maximum. Fuzzy sets are represented by superimposed sigmoid functions, such that shouldered fuzzy sets (like small and large), 'and bell-shaped fuzzy sets (like medium) can be formed.

output layer

4. layer conjunctive, disjunctive and simple rules

3. layer combinations of sigmoid functions to build fuzzy set medium

2. layer representation of fuzzy sets by sigmoid functions

input layer

Fig. 6. A FuNe-I network that uses three fuzzy sets small, medium and large for its 10 fuzzy rules (4 conjunctive, 4 disjunctive, and 2 simple rules)

FuNe-I can learn by supervised learning. Because all functions used within the system are differentiable, gradient descent (backpropagation) can be used. Fuzzy rules are learned by a special training network, that helps to identify suitable combinations of one or two variables as antecedents. These rules are used to create the FuNe-I network, which is then trained to find suitable fuzzy sets for the rules. The rules are weighted, and weights are allowed to be negative. Rules with negative weights are interpreted as if-not rules [12].

An advantage of FuNe-I is that it concentrates on rules with only one or two variables, which are easy to interpret. More complex rules can be formed, by combining rules. The possibility of having conjunctive and disjunctive rules is also an interesting feature, that is usually not found in neuro-fuzzy approaches. However, interpretability is endangered, however, by the use of rule weights.

Another neuro-fuzzy approach which is suitable for classification is Fuzzy RuleNet [39]. The architecture is a 3-layer feedforward network, where fuzzy

251

rules are represented by the hidden nodes. Each hidden node is connected to exactly one of the output nodes, which represent classes. The learning algorithm places overlapping hyperboxes in the input space, to encompass patterns. Over each hyperbox a multidimensional membership function is defined, such that each hyperbox represents a multidimensional fuzzy set, and can be interpreted as a fuzzy rule. The degree of overlapping of the hyperboxes can be controlled by the user.

A very fast learning algorithm creates, resizes and divides the hyper box until each pattern of the training set is correctly classified. This is usually achieved in about four epochs [40]. To interpret the fuzzy rules, they are projected onto the single dimensions, such that triangular or trapezoidal membership functions are obtained (see Fig. 7). Thus the learning algorithm creates a rule base, and membership functions at the same time. Other neurofuzzy approaches need different learning algorithms for this task. A FUzzy RuleNet can be initialized with prior knowledge, and the learning result yields an interpretable fuzzy rule base. However, problems to interpret the rule base can occur, because the fuzzy sets for single variables are created by projection. In this case, it is very difficult to ensure that fuzzy sets are created, which can easily be identified with linguistic terms.

y y

x

)leX, y) = 1 c::::J 0 ~ )lex, y)..s. 1 c::::J fleX, y) = 0

Fig. 7. Fuzzy RuleNet uses multidimensional fuzzy sets to classify patterns

NEFCLASS (neuro fuzzy classification) [27] is a neuro-fuzzy model that is derived from the generic fuzzy perceptron presented above. An example for a NEFCLASS system is presented in Fig. 8. It shows a NEFCLASS system that classifies input patterns with two features into two distinct classes by using five linguistic rules. There are again shared weights on the connections from the input to the hidden layer. Compared to NEFCON and NEFPROX, the connections to the output layer are quite different, however. Each hidden (rule) unit is connected to exactly one output (class) unit . The weights on the connections from rule units to output units are fixed at 1 for semantical

252

reasons (to avoid weighted rules).

The output value for an output unit (class), can either be computed as the mean of the activation values of all rule units it is connected to, or as the maximum of those values. The activation of a rule is the minimum of the membership values from its antecedent.

Fig. 8. A NEFCLASS system with two inputs, five rules and two output classes

A NEFCLASS system can be built from partial knowledge about the patterns, and can then be refined by learning, or it can be created from scratch by learning. A user has to define a number of initial fuzzy sets partitioning the domains of the input features, and must specify the maximum number of rule nodes that may be created in the hidden layer.

The idea of the learning algorithm is to create a rulebase first, and then to refine it by modifying the initially given membership functions (usually fuzzy partitions where the membership degrees of each value add up to 1.0). The rulebase will be created by finding for each pattern in the training set a rule that best classifies it. If a rule with an identical antecedent is not already in the rule base, it will be added. After all patterns are processed once, the rule

253

base is complete. It is then possible to evaluate the performance of each rule, and delete some of them (if there are too many) to keep only the best rules.

The learning algorithm of the membership functions uses the output error that tells, whether the degree of fulfillment of a rule has to be higher or lower. This information is used to change the input fuzzy sets by shifting the membership functions, and making their supports larger or smaller (see Fig. 9). By changing only the fuzzy set that delivered the smallest membership degree for the current pattern, the changes are kept as small as possible. It is easy to define constraints for the learning procedure, e.g. that fuzzy sets must not pass each other, or that they must intersect at 0.5, etc. Constraints like these help to obtain an interpretable rule base, but may cause a loss of performance in classification.

J.L(x)

1.0

0.851---------ttJ

0.5

0.15 I---~---,f--fi'l

a x

b

• .. . . . . : ....

' . ... ... .....

.•.. .....

•.•....•.•

... c x

Fig. 9. The adapt ion of fuzzy sets is carried out by simply changing the parameters of its membership function in a way that the membership degree for the current feature value is increased or decreased, respectively (middle: initial situation, left: increase situation, right: decrease situation)

The learning process is visualized in Fig. 10. The left part shows the situation after the rule learning algorithm has terminated. The predefined fuzzy partitioning on both input variables defines a grid in the input space. This grid is created by overlapping hyper-boxes, where each hyper-box is formed by the Cartesian product of the supports of n fuzzy sets. Each hyper-box represents the support of an n-dimensional fuzzy set, i.e. the antecedent of a fuzzy rule. During rule learning, hyper-boxes are selected due to the distribution of the patterns. Each hyper-box is mapped to the class of the pattern

254

y y

······················ EEJ ....................... !: ...... ~.~ .. . i ~ 0 i

··:· ~o · . : 9· : : : . . . . . . : j : ~

...... : ... .. . : .... r-....,...----,

€l :€l ····er····

x

Fig. 10. Visualization of a possible NEFCLASS learning process after rule creation (left), and after fuzzy set tuning (right)

which caused its selection (i.e. the rule conclusion is determined) . After all patterns are processed, the mapping of hyper-boxes to classes is re-evaluated and changed, where necessary. After this only the "best" hyper-boxes (fuzzy rules) are kept.

x

After rule learning there are usually some patterns which are not classified, because their hyper-box (rule) was not included in the set of kmax best rules. There are usually also some misclassifications. It is the task of the fuzzy set learning algorithm to improve this situation. By modifying the membership functions, the predefined grid is distorted. This results in the situation shown in the right part of Fig. 10. Because the learning algorithm for the fuzzy sets is constrained (e.g. a fuzzy set must not pass a neighbor), it is possible, that some changes to the form of the hyper-boxes are not applicable. This is one reason that some classification errors can remain. Another reason can be a too small number of fuzzy rules. This can also lead to undesired forms of membership functions (e.g. too much overlapping). Considering the resulting situation in the right part of Fig. 10, it is probably better to accept four instead of three rules to avoid the extremely wide support of the leftmost fuzzy set over feature x.

From the viewpoint of the NEFCLASS architecture and the flow of data, the fuzzy sets are trained by a backpropagation-like algorithm: the error is propagated from the output units towards the input units and is used to change the membership function parameters, but there is no gradient information involved. The adaptivity of a NEFCLASS system is restricted, because of the initially given input fuzzy partions, which define the form and maximal number of clusters, and by the constraints that do not admit certain changes in the fuzzy sets.

255

Table 3. Performance of three neuro-fuzzy classification models on the Iris data

FuNe-I Fuzzy FtuleNet NEFCLASS no. of errors 5 5 4 5 5 no. of rules 10 7 14 7 3 no. of variables 4 3 4 4 2

Table 3 shows how the three discussed neuro-fuzzy classification models perform on a very simple, but benchmark-like classification problem: the Iris data set [11]. This data set contains 150 patterns belonging to three different classes (Iris Setosa, Iris Versicolour, and Iris Virginica) with 50 patterns each. The patterns have four input features (length and width of sepal and petal of iris flowers). The first class is linearly separable from the other two classes, whereas the second and third class are not linearly separable from each other. The results in Table 3 shows that all three approaches perform comparable on this simple problem.

Other neuro-fuzzy models that are suitable for classification purposes are Fuzzy ART [10] and fuzzy min-max neural networks [36, 37]. The latter one is similar to Fuzzy RuleNet. However, there are differences in the learning strategy and the interpretation of the learning result. The software that was used to obtain the results for NEFCLASS can be found on our WWW server at http://fuzzy.cs.uni-magdeburg.de. FuNe-I can be found at the ftp-server obelix.microelectronic.e-technik.th-darmstadt.de in pub/neurofuzzy.

7. Conclusions

The neuro-fuzzy models presented in this paper do not cover all aspects of neuro-fuzzy combinations. We concentrated on approaches that explicitely use a feedforward network architecture. We presented three of our own models which are derived from the same generic model and follow the same principle: keep the learning algorithms simple and do not touch the semantics of the underlying fuzzy systems. Other researchers may prefer other models, for example more sophisticated learning algorithms. However, more powerful learning strategies can be more difficult to handle. From our point of view neuro-fuzzy techniques should be used as tools, and not as automatic solution generators. Therefore we think it is important that the models and learning algorithms are easy to handle and that a user can easily interpret them.

From an applicational point of view, one could say: why bother with interpretability and semantics? It is important that the system does its job. It is of course possible to omit all constraints from the learning procedures of a neuro-fuzzy system, to consider it only as a convenient tool that can be initialized by prior knowledge and trained with sample data, and never to analyse the final system, as long as it performs to the satisfaction of the user.

256

However, interpretability and clear semantics provide us with advantages like simple ways to check the system for plausibility and to maintain it during its life cycle.

It is also possible to consider even simpler learning mechanisms like weighted fuzzy rules, as they can be found in several commercial fuzzy shells. However, they give rise to some semantical problems, which we addressed in [28]. Allowing the weights to be selected from [0,1] could be interpreted as something like a degree of support for a rule, Le. a value less than 1 would then indicate an ill-defined rule, that supports a class only to some extent. Some approaches allow the weights to assume any value in R, but this leaves the semantics of fuzzy rules behind. It is not clear how rules weighted by absolute values greater than 1 or by negative values should be interpreted (for rules with negative weights sometimes an interpretation as if not rules is suggested [12]).

Rule weights can always be replaced by changes in the fuzzy sets of a rule. However these changes can lead to non-normal fuzzy sets and to situations in which identical linguistic values are represented differently in different rules. Rule weights can destroy the interpretation of a fuzzy system completely. Therefore we always refrain from learning weights in our approaches.

Our view of neuro-fuzzy approaches as heuristics to determine parameters of fuzzy systems by processing training data with a learning algorithm, is expressed by the list of five points at the end of Section 2. We think that neuro-fuzzy systems should be seen as development tools that can help to construct a fuzzy system. They are not automatic "fuzzy system generators" . The user should always supervise the learning process and try to interpret its results. We also have to keep in mind that - like in neural networks -the success of the learning process is not guaranteed. The same guidelines for selecting and preprocessing training data that are known from neural networks apply to neuro-fuzzy systems. But if the applicaton of neuro-fuzzy methods is well-considered, they can be a powerful tool in the development process of fuzzy systems.

References

1. Igor Aleksander and Helena Morton. An Introduction to Neural Computing. Chapman & Hall, London, 1990.

2. Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE 'Irans. Systems, Man & Cybernetics, 13:834-846, 1983.

3. Hamid R. Berenji. A reinforcement learning-based architecture for fuzzy logic control. Int. J. Approximate Reasoning, 6:267-292, February 1992.

4. Hamid R. Berenji and Pratap Khedkar. Learning and tuning fuzzy logic controllers through reinforcements. IEEE 'Irans. Neural Networks, 3:724-740, September 1992.

257

5. Hugues Bersini, Jean-Pierre Nordvik, and Andrea Bonarini. A simple direct adaptive fuzzy controller derived from its neural equivalent. In Proc. IEEE Int. Conf. on Fuzzy Systems 1993, pages 345-350, San Francisco, March 1993.

6. James C. Bezdek, Eric Chen-Kuo Tsao, and Nikhil R. Pal. Fuzzy Kohonen clustering networks. In Proc. IEEE Int. Conf. on Fuzzy Systems 1992, pages 1035-1043, San Diego, CA, 1992.

7. J. J. Buckley. Sugeno type controllers are universal controllers. Fuzzy Sets and Systems, 53:299-303, 1993.

8. James J. Buckley and Yoichi Hayashi. Fuzzy neural networks: A survey. Fuzzy Sets and Systems, 66:1-13, 1994.

9. James J. Buckley and Yoichi Hayashi. Neural networks for fuzzy systems. Fuzzy Sets and Systems, 71:265-276, 1995.

10. Gail A. Carpenter, Stephen Grossberg, Natalya Markuzon, John H. Reynolds, and David B. Rosen. Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans. Neural Networks, 3(5}:698-712, September 1992.

11. R.A. Fisher. The use of multiple measurements in taxonomic problems. Annual Eugenics, 7(Part II}: 179-188, 1936.

12. Saman K. Halgamuge and Manfred Glesner. Neural networks in designing fuzzy systems for real world applications. Fuzzy Sets and Systems, 65:1-12, 1994.

13. Simon Haykin. Neural Networks. A Comprehensive Foundation. Macmillan College Publishing Company, New York, 1994.

14. M. Hornik, M. Stinchcombe, and H. White. Multilayer feedfoward networks are universal approximators. Neural Networks, 2:359-366, 1989.

15. J. S. Roger Jang. ANFIS: Adaptive-network-based fuzzy inference systems. IEEE Trans. Systems, Man & Cybernetics, 23:665-685, 1993.

16. James M. Keller and Hossein Tahani. Backpropagation neural networks for fuzzy logic. Information Sciences, 62:205-221, 1992.

17. F. Klawonn and R. Kruse. Fuzzy control on the basis of equality relations with an example from idle speed control. IEEE Trans. Fuzzy Systems, pages 336-350, 1995.

18. Bart Kosko. Fuzzy systems as universal approximators. In Proc. IEEE Int. ConE. on Fuzzy Systems 1992, pages 1153-1162, San Diego, CA, March 1992.

19. Bart Kosko. Neural Networks and Fuzzy Systems. A Dynamical Systems Approach to Machine Intelligence. Prentice-Hall, Englewood Cliffs, NJ, 1992.

20. Rudolf Kruse, Jorg Gebhardt, and Frank Klawonn. Foundations of Fuzzy Systems. Wiley, Chichester, 1994.

21. Chuen Chien Lee. Fuzzy logic in control systems: Fuzzy logic controller, part i. IEEE Trans. Systems, Man & Cybernetics, 20:404-418, 1990.

22. Chuen Chien Lee. Fuzzy logic in control systems: Fuzzy logic controller, part ii. IEEE Trans. Systems, Man & Cybernetics, 20:419-435, 1990.

23. E. H. Mamdani and S. Assilian. An experiment in linguistic synthesis with a fuzzy logic controller. Int. J. Man Machine Studies, 7:1-13, 1975.

24. Sushmita Mitra and Ludmilla Kuncheva. Improving classification performance using fuzzy mlp and two-level selective partitioning of the feature space. Fuzzy Sets and Systems, 70:1-13, 1995.

25. Detlef Nauck, Frank Klawonn, and Rudolf Kruse. Foundations of Neuro-Fuzzy Systems. Wiley, Chichester, 1997.

258

26. Detlef Nauck and Rudolf Kruse. NEFCON-I: An X-Window based simulator for neural fuzzy controllers. In Proc. IEEE Int. ConE. Neural Networks 1994 at IEEE WCCI'94, pages 1638-1643, Orlando, FL, June 1994.

27. Detlef Nauck and Rudolf Kruse. NEFCLASS - a neuro-fuzzy approach for the classification of data. In K. M. George, Janice H. Carrol, Ed Deaton, Dave Oppenheim, and Jim Hightower, editors, Applied Computing 1995. Proc. 1995 ACM Symposium on Applied Computing, Nashville, Feb. 26-28, pages 461-465. ACM Press, New York, February 1995.

28. Detlef Nauck and Rudolf Kruse. Designing neuro-fuzzy systems through backpropagation. In Witold Pedrycz, editor, Fuzzy Modelling: Paradigms and Practice, pages 203-228. Kluwer, Boston, 1996.

29. Detlef Nauck and Rudolf Kruse. Neuro-fuzzy systems research and applications outside of Japan (in japanese). In M. Umano, I. Hayashi, and T. Furuhashi, editors, Fuzzy-Neural Networks (in Japanese), Soft Computing Series, pages 108-134. Asakura Publ., Tokyo, 1996.

30. Detlef Nauck and Rudolf Kruse. Neuro-fuzzy systems for function approximation. In Adolf Grauel, Wilhelm Becker, and Fevzi Belli, editors, Fuzzy-NeuroSysteme'97 - Computational Intelligence. Proc. 4th Int. Workshop FuzzyNeuro-Systeme '97 (FNS'97) in Soest, Germany, Proceedings in Artificial Intelligence, pages 316-323, Sankt Augustin, 1997. infix.

31. Hiroyoshi Nomura, Isao Hayashi, and Noboru Wakami. A learning method of fuzzy inference rules by descent method. In Proc. IEEE Int. ConE. on Fuzzy Systems 1992, pages 203-210, San Diego, CA, 1992.

32. Ann Nowe and Ranjan Vepa. A reinforcement learning algorithm based on 'safety'. In Erich Peter Klement and Wolfgang Slany, editors, Fuzzy Logic in Artificial Intelligence (FLAI93), pages 47-58, Berlin, 1993. Springer-Verlag.

33. S. K. Pal and S. Mitra. Multi-layer perceptron, fuzzy sets and classification. IEEE '1tans. Neural Networks, 3:683-697, 1992.

34. Witold Pedrycz and H. C. Card. Linguistic interpretation of self-organizing maps. In Proc. IEEE Int. ConE. on Fuzzy Systems 1992, pages 371-378, San Diego, CA, 1992.

35. T. Poggio and F. Girosi. A theory of networks for approximation and learning. A.I. Memo 1140, MIT, 1989.

36. P. K. Simpson. Fuzzy min-max neural networks - part 1: Classification. IEEE '1tans. Neural Networks, 3:776-786, 1992.

37. P. K. Simpson. Fuzzy min-max neural networks - part 2: Clustering. IEEE '1tans. Fuzzy Systems, 1:32-45, February 1992.

38. M. Sugeno. An introductory survey of fuzzy control. Information Sciences, 36:59-83, 1985.

39. Nadine Tschichold-Giirman. Generation and improvement of fuzzy classifiers with incremental learning using fuzzy rulenet. In K. M. George, Janice H. Carrol, Ed Deaton, Dave Oppenheim, and Jim Hightower, editors, Applied Computing 1995. Proc. 1995 ACM Symposium on Applied Computing, Nashville, Feb. 26-28, pages 466-470. ACM Press, New York, February 1995.

40. Nadine Tschichold-Gurman. RuleNet - A New Knowledge-Based Artificial Neural Network Model with Application Examples in Robotics. PhD thesis, ETH Zurich, 1996.

259

41. Petri Vuorimaa. Fuzzy self-organizing map. Fuzzy Sets and Systems, 66:223-231, 1994.

42. David A. White and Donald A. Sofge, editors. Handbook of Intelligent Control. Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, New York, 1992.

43. Jacek M. Zurada. Introduction to Artificial Neural Systems. West Publishing Company, St. Paul, MN, 1992.

Fuzzified Petri-Nets and Their Application to Organising Supervisory Controller

This contribution is dedicated to Professor Lotti A.ZADEH

"To Lotti Zadeh, Who had the courage and the gift To begin the grand paradigm shift, And to many others, Whose hard work and healthy thinking Have contributed to the shifting" (Professor George J. Klir, 1995)

Georgi M. Dimirovski

ASE Institute at Faculty of Electrical Engineering, St. Cyril and St.Methodius University, Karpos 2 B.B., P.O.B. 574, MK-91000 Skopje, Republic of Macedonia

Abstract. A model and the decision reasoning processes of a two-layer organising supervisory controller for complex systems have been developed. It is based on fuzzy Petri-net algorithms and fuzzy rule production system for decision and command control. This model follows the fundamental idea of the original intelligent controller of G.N. Saridis, but adds on a new generic property. This fuzzy-Petri-net organising controller employs the advantages of both the qualitative modelling potential of L.A. Zadeh's fuzzy logic and of the discreteevent genesis of K.A. Petri's networks. Thus it accomplishes full compatibility of mathematical formalisms of the organising and co-ordinating levels of G.N. Saridis' architecture and greatly reduces the rules needed due to possibility distribution evaluation and Petri-net-supported reasoning in comparison with Stellakis-Valavanis fuzzy solution for the organiser.

Keywords. Intelligent controller, fuzzy-rule production systems, approximate reasoning, fuzzy-Petri-net reasoning, complex systems, supervisory control.


261

1. Introduction and Background Research

1.1 Introduction

An overview of the published works and new technological implementations during last ten years demonstrates readily that science and technology of systems control has taken a new fundamental standing towards the idea of hybrid qualitative-quantitative approaches. Nowadays, once again and more deeply than in the flourishing era of cybernetics [1, 27, 52, 58, 73, 76] a novel comprehension of the ancient thoughts "Learning without thinking is useless. but thinking without learning is perilous. What is most needed for learning is a humble mind" (Confucius. Chinese philosopher) has prevailed. Had the knowledge of early cybernetics been updated along with the computing theory and technology, and had Russell's comprehension of mathematical logic and theory of knowledge been widely accepted [26, 33, 60], the new paradigm of systems control could have been gained long ago. The genius of L.A. Zadeh's lucid creativity (1965-1996) is believed to have been the outstanding one having the comprehension and the courage to grasp beyond Conficius [84-97]. For his theory of approximate reasoning, which is deriving a fuzzy, yet precisely consistent, proposition from a set of other propositions while following human derivation of mental models and reasoning, has penetrated the very essence of Confucius' saying. For he has laid down the grounds to the modem machine intelligence, soft computing and symbiotic decision and control, and gone beyond the discoveries made by Ashby [2], Conant [11] Kolmogorov [40].

It took another decade for these fundamentals to be firmly justified in industrial applications too by distinguished results of Mamdani [48-50], Sugeno [70, 71], Astrom [3-5], Meystel [53, 54], Saridis [61-65] and their co-workers. In Harris [34-36], an up-to-date status overview via selected contributions may be found. Nevertheless, the comprehension of the real-world as being created by a unique symbiosis of energy, matter and information became widely accepted much later [32, 47] though an argument was available earlier [12]. These results have shed new light on the searching routes for alternative solutions of autonomous intelligent systems based on qualitative-quantitative approaches, and have led to the recent well-known theoretical and technological accomplishments. Among the other ones, Saridis' theory and technology of intelligent machines [61-65] has a place of importance for the work reported in here.

On the grounds of early results of Ashby's cybernetic approach on laws of information governing (general) systems and having essential impact on system architectural issues, Saridis has undertaken and accomplished one of the successful, rounded-up research on intelligent control of autonomous machines. The essential feature of his contributions is the analytically measured (in the sense of Kolmogorov and Zames) , entropy-based flow of knowledge between the organising, co-ordinating and executing controls in the system architecture

262

according to the IPDI principle (of Increasing Precision with Decreasing Intelligence), which he proved too [63]. An alternative for intelligent systems via analytic approach, following the early idea of analytically learning automata of K.S. Fu and Y.Z. Tzypkin, has been derived and elaborated by Meystel in terms of nested automata hierarchical architecture [53, 54].

1.2 On Background Research

The attempts in technical implementations of intelligent supervision and control systems, however, have disclosed a number of essential, both practical and theoretical, systemic issues. To the extent of the literature available, a representative set of these issues which are of importance in our research endeavours, can be found in Astrom [4, 5], Cao and Anderson [8, 9], Lee [44, 45], Harris [35, 36], Krijgsman et al. [41], Pascal et al. [56], Turksen [74], Valete et al. [78], Vukobratovic and Dimirovski [79], and Yager et al. [82]. On the other hand, on the grounds of Zadeh's approximate decision reasoning and knowledge representation, most important results appeared along this line of research towards compatible symbiotic computing structures of rule knowledge bases, fuzzy logic, and fuzzy Petri-nets. According to the literature available, mainly these are due to Gaines [30], Baldwin [6], Yager [80, 81], Dubois and Prade [24, 25], Lonney [46], Chen et al. [10], Garg et al. [31], Cao and Anderson [8], Hirota and Pedrycz [37], Bugarin and Barro [7], Klawon [38], Yu [83] and Scarpelli et al. [66]. In particular, our work has been inspired by Zadeh's contributions [89,91,94] and by results of Bugarin and Barro [7], Cao and Anderson [9], Garg et al. [31], Pedrycz and Gromide [57], Scarpelli et al. [66], Yager [81], and Yu [83].

The original theory of Saridis, which is grounded on the fundamental laws of Ashby and Connat [2, 11] on information governing of the functioning of complex systems, represents a fully completed solution for autonomous intelligent machines which was successfully implemented in several robotic systems. It is believed, his work has inspired and stimulated a number of research endeavours recently all of which cannot be all mentioned in such a short paper. In particular, for the present purpose, the work of Saridis' former student Valavanis and his co-workers should be noted [69, 77]. For Valavanis and his co-author have developed the model which formulates Saridis' organising controller in terms of fuzzy logic [69].

The accomplishments cited above have also inspired and stimulated our own research [13-23]. They stimulated first a closer investigation of the strategic control level inrobotic systems and FMS [14, 16, 17]. In turn, this has involved us deeply into hierarchical control architectures, knowledge-based and rule production systems for decision, control and supervision [19-23, 28, 29] and the ones using fuzzy logic and combined forward and backward chaining [13, 15, 18]. The ongoing research as well as most recent results of ours are oriented towards knowledge-based, production-rule systems with the use of both Zadeh's fuzzy logic and Petri's nets, and approximate reasoning algorithms operating on a fuzzyPetri-net knowledge base.

263

Our work explores further this line of research and contributes in terms of a control-based, fuzzy-Petri-net soft computing mechanism which implements a twolayer supervisory controller with organising-co-ordinating functions and employs fuzzy-rule production system. By making use of the presently existing knowledge on Petri-net theory and applications (e.g. [55, 59, 99]) and on fuzzy logic and applications(e.g. [39, 42, 98]), we have developed a fuzzy-Petri-net schemata for generating co-ordinated command controls to the execution control level, and also some experimental simulation software for supporting our further research [28, 29]. Also, there have been two incidental chances for some fairly modest, short field tests on auxiliary industrial objects (a multi-zone industrial furnace, and a multi-machine low-power hydroelectric plant on a main irrigation channel), which have been done outside the normal operation conditions [20, 21]. This research is still in progress, and both simulation and test results suggest this attempt to formulate the organising supervisory controller in terms of a fuzzy-Petri-net twolayer supervisory controller is rather promising. For the specifications of technical operation and supervisory functions in linguistic terms by a family of primitive events and procedures are available always. Moreover, they can be modelled in the terms of fuzzy-rule knowledge-base, and then the onto-projection in terms of a Petri-net is easily derived. At present, the issues on functioning and malfunctioning or disfunctioning of this supervisory controller and its computing schemata as well as detail on limitations are of particular concern.

The next section is devoted on the fuzzy-Petri-net model of fuzzy-rule production system (FPS) which can generate command controls according to process operation specifications in linguistically defined primitives. In the subsequent section, a discussion on the system architectural issues is presented. The third section gives more detail on the reasoning in fuzzy-Petri-net (FPN) organising supervisory controller. A brief presentation on the data-driven execution in FPN organising controller is given in the section thereafter followed by a short discussion of a comparison example. Conclusions and a reference bibliography are presented at the end.

2. Command Generation by Fuzzy-Petri-Nets in Two-Level Intelligent Control Architecture

2.1 Fuzzy-Petri-Net Model of FPS for Command Control Generation

Control functions of the upper, supervisory level are known that, regardless of preciseness/impreciseness and certainty/uncertainty, are to map the outside imposed goals and preconditions into or onto appropriate co-ordinated commands to distributed local controls. The dynamics of functioning and malfunctioning or disfunctioning of supervisory control level inevitably involves both time-driven and event-driven evolution, and therefore rule knowledge bases, as well as

264

organising and co-ordinating control functions. Thus the language of supervisory control functions is naturally a hybrid one, non-analytical in the first place, and composed ofrelevant sub-languages, respectively, so are the respective processing models and algorithms [14, 16, 19].

The application of Petri nets to hybrid systems is also known to bring in a number of advantages because of their power in representing and modelling parallel and concurrent processes having inherent discrete-event dynamics. The discrete-event dynamics of multivariable processes is readily simulated by means of a function of markings which represents the sequential enabling and firing of transitions. The formalism of Petri nets, however, can be also employed to model fuzzy-logic rule based systems by associating some of the elements, e.g. function of markings, and some of the embedded systems properties, e.g. places and transitions, of Petri-net models with the basic elements of the fuzzified knowledgebases, e.g. propositions. Therefore the conceptual model of a fuzzy Petri network is an appealing modelling tool for representing command control of complex processes in the two-layer supervisory level of the two-level control architecture for intelligent automation (e.g., see the aforementioned widely recognised accomplishments of intelligent controls).

The next step forward towards the constructiqn an effective mechanism of hierarchically structured decision and control, following this Idea of a fuzzy-Petrinet hybrid systems, can be achieved via associating the proposition in the knowledge base with the places in the Petri net by means of a bijective function, as well as via associating the transitions with the degree of truth. This way further the Petri-net itself gets separated from the dynamic process via the concept of datadriven execution of the chaining mechanism of inference. Thus, lastly bet not least, an improved resolution technique for multi-prepositional rules can be implemented. It will be demonstrated in the sequel how the knowledge base can be modelled and represented by means of Petri-net formalism in a way which enables the compatible use of fuzzy logic within a symbiotic structure. Moreover, it will be shown how the chaining is realised, and how the approximate reasoning is implemented and executed.

For these purposes, one needs first to identify an association of the palaces within Petri network (PN) with the propositions within the fuzzy-rule knowledge base (FKB) by means of the following bijective/unction o/projection:

a: P --> PR, Pk --> a(Pk)= prk, k=I, ... ,K, (1)

Here, PR={Pfk} is the set of propositions in the FKB, and K is the number of

propositions in the FKB. This way, in fact, a projection of the FKB onto the FPN model is performed. In the case when one proposition may appear in different rules within the FKB, a different place in the FPN will be assigned for each effective appearance. This is needed because the rules are characterised by the linguistic value of the variable of truth or truth variable. The modelling representation is becoming simpler when the same proposition appears in the

265

consequent part of several, and the linguistic values of the rules are equal. For then one place in the FPN solely may be assigned to such kind of propositions.

The representation of transitions is much more involved because of the chaining

of rules. We use in our representation T = TR U TC = { tl, ... , tR, tR+I, ... ,tR+C }.

The subset TR encompasses the rules of the each individual rule within the FKB,

and TC encompasses the rules having links between the propositions. Therefore the Petri-net input and output functions are defined on T

I : T ---> <I> (P) (2)

0: T ---> <I> (P) (3)

and this way they assign to each transition a subset of input and output places in the FPN. These functions now may have different interpretations depending on the set T to which they correspond by definition. Namely, this may be seen better from the relational expressions below:

IF ~ E TR, \:;f Pi E P, \:;f Pi E I(h <=> a(Pi) E Antecedent part ofi (4)

IF ~ E TR, \:;f Pi E P, \:;f Pi E O(~) <=> a(Pi) E Consequent part ofRj (5)

Consequently, there will exist in the FKB a single transition for each of the intermediate variables Xj within the knowledge base rules. In addition, the

graphical representation of the FPN id defined in terms of directed arc graphs, within which traditionally circles represent the places while bars represent the transitions within the FPN model. There are permitted in the FPN, however, only directed graphs belonging to the set of graphs A as defined below

A = U { ~ x 0(/) } U{ 1(/) x / } j tET

(7)

In addition, in the next step on needs to define a function of truth fth which . R

assigns to each transition f E T of the FPN model the linguistic value associated

with the corresponding rule i :

266

R j j j fib : T ---> V, t ---> f( t) = t (8)

where V represents the set of linguistic values of the linguistic variable of truth.

Now, one can easily define the concept of reachability and immediate reachability of places, the later being of particular importance in here. It is said by definition that the place PI is immediately reachable form another place Pk, if the

following relational correspondence

j j j 3 t E T/Pk E I(t) and PI E O(t) (9)

is valid. Of course, the subset of immediately reachable places for any place Pk is

of importance and it is distinguished as the set IRS(Pk~' Now it is said that the

place p~ is reachable form a place Pk if p~ is immediately reachable from Pk or,

alternatively, from any place Pi E IRS(Pk)' In a similar way, there are defined

arbitrary reachable sets of places Pk, RS(Pk)' In addition, two places Pk, PI in T are

causally adjacent if:

j j 3 t E TI Pk PIE I(t) (10)

Similarly, the concept of causal adjacency can be defined for transitions. The causally adjacent set of transitions for any place Pi in the FPN model is the set defined by relational correspondence

(11)

Thus the causally adjacent transitions which correspond to an the executed rule chaining will yield multiple links within the FPN model equivalent to the FKB projected onto that particular FPN. Of course, each rule chaining does imply its own set of transitions and adjacent transitions.

2.2 Intelligent Two-Level Control System Architecture

In industrial applications, it is well known, typically the operational technical specifications involve purposive description of aims and procedures of supervision functions and recommended set-ups of regulatory and other control functions along with certain pre-conditions, limitations, time-scales, and interaction and inter-relation terms. These specifications (of course, if existent, operator's empirical knowledge too) provide for basic data and knowledge to elaborate event primitives in terms of linguistic and/or hybrid variables. Therefore, by means of appropriate system analysis, the goals and the margins of controlled processes may

267

be mapped onto formal models of inputs to the supervisory (upper) control level in terms of a family of primitives and sets of rules along with some additional data for refinement or tuning. The supervisory control is aimed at generating coordinated command controls to the lower-level (local control loops and paths) and therefore it has to employ a kind of feedforward-feedback architecture and even a off-real-time computed feedforward sometimes, and also hybrid modelling tools.

In our previous work [14, 19, 22], we have studied more closely the system architecture which employs fuzzy-logic control algorithms at its upper, 'qualitative' or 'non-analytical', level.of control, and conventional and affine nonliner controls at its lower levels, the later implying 'quantitative' or 'analytical' algorithms. Also, to a certain extent, we have investigated the case which makes use of a kind of Petri-nets steady-state co-ordination or static optimisation of commands, and local fuzzy-logic or non-linear controls. It appeared there exists several alternative sub-classes of the control architectures, consistently employing different formalisms at different levels, worth to be investigated with respect to their control potential and structural properties. This previous research has been rather instrumental, and has directed us towards a system architecture of 'intelligent knowledge-base controller' in terms of an implementable 'controllerbased soft-computing rather than computer-based controlling structure' in order to handle the organising integration and co-ordination as well as impreciseness and uncertainty issues.

The idea of control based computing is known for quite some time and it appeared to be a naturally appealing idea. Equally so naturally, it gives rise to the two-level control system architecture with distributed intelligence, and, to the best of our knowledge, it has been the case not only in cognition and logic based but in all other developments of intelligent control systems. Namely, if either the need for higher autonomy or the complexity of controlled processes requires so, the first layer may implement the task-organising control in terms of fuzzy-rule knowledge base while the second layer implements the co-ordinating command control in terms of static optimisation via conventional or fuzzy techniques.

However, when the event-driven evolution of controlled processes is strongly present within the operational specifications, the co-ordinating command layer has to capture the discrete-event feature. The obvious alternative is, as in Saridis' solution, it be implemented in terms of Petri-net co-ordination controller. On the other hand, when the supervisory control level can be constructed on the grounds of a rule knowledge base, as most often is the case in real-world industrial systems, then the concept of a fuzzy-Petri-net based supervisory controller provides for advantages: firstly, task-organising and co-ordinating command controls become two natural layers in a single-level supervisory controller; secondly, both eventdriven and time-driven evolution are captured in terms of a hybrid but consistent compOSItIOn of mappings (linguisitic-to-fuzzy-to-event-to-possibilistic-todefuzzified); thirdly, a the link of organising and co-ordinating controls becomes a generic one. Thus the information propagation becomes fully consistent and compatible to the flow of knowledge, and a higher integrity is achieved at the same time. This is our case: the intelligent controller architecture includes a flexible

268

possibility for two-layers at the upper level, and the basic features are easily inferred from an overall diagram in terms of a scheme employing both an appropriate feedforward and a set of feedback paths (e.g. as for an object process which possesses sub-process zones). Moreover, this conceptual model also evolves around the idea of using fuzzy-set theory for describing the overall performance of the control architecture in terms of a possibilistic index as a kind of 'input-output relation' indicator of operating possibilistic measures of sub-systems within the entire control system architecture.

From a general viewpoint, it may be said the conceptual setting of our approach is posed solely in the time domain within the framework of algebra of operators on local spaces (of classes) of functions and/or sequences. For, roughly speaking, a general dynamic process to be controlled may be viewed as a vector field having tangent vector spaces locally and implying translation and rotation equiValence relations as generalised actions on its state-space. On the other hand, with respect to system architecture, the mathematical formalism of directed graphs and respective algebra enable representations to encompass consistently different formalisms to modelling needs both in a more global and in a more detail terms. Consequently, the use of various formalisms at different control levels as most appropriate and relevant with regard to the actual needs becomes feasible; our conceptual system architecture employs fuzzy-sets and Petri-nets formalisms and, in addition, analytic non-linear equations at local controls. With respect to the synthesis of memebership functions and universes of discourse which are most appropriate to the envisaged operational specifications and input and output spaces of classes of signals, one may approach the problem from a study of the nature of the complex object to be controlled. This is feasible due to the very essence of fuzzy systems theory which enables a simultaneous comparative study of tentative spaces of admissible controls (antecedent) and of sustainable outputs (consequent). Finally, there are two facts worth noting. Firstly, regarding the expected computing times, it is fact that the class of objects-to-be-controlled of concern in here are of relatively slow-dynamics processes; this was prerequisite and crucial for developing the proposed intelligent controller system architecture. Secondly, the employed triangle-induced fuzzy subsets are characterised by a peculiar polygonal geometric representation which allows for derivation of accurate, yet fast defuzzification algorithm via direct computing the union of subsets in a geometric fashion.

Lastly but not least, in our case the need for the approach to be essentially computer-simulation oriented in our case is further emphasised by the actual circumstances usually existent in peripheral, small, and developing countries. Typically, the use of expensive software and experimental real-time rigs of twolevel computing networks, implementing a hierarchical controller architecture and functionally integrating all controls and models of respective controlled (sub)processes is not available. In our work, we have taken an orientation towards PC 486 DX2/66 based platform of a couple of PC units and segmented developments of experimental program packages enabling to verify some of the theoretical derivations via simulation computing.

269

3. Reasoning Process in Fuzzy-Petri-Net Organising Supervisory Controller

The organising supervisory controller, in fact, is mapped in a fuzzy-Petri-net rule production computing mechanism, and its essence lies in the reasoning process in fuzzy-Petri-net architecture of the knowledge base constructed by means of fuzzylogic rules. The discrete-event evolution of the markings in the PN play the crucial role when a function of the degree of fulfilment is associated [23], [28].

As a matter of fact, the dynamics of the execution of the FKB is represented by means of the evolution of markings in the FPN. A marking denotes that the degree of fulfilment (OOF) of the corresponding proposition is known and that proposition may be applied in the execution process; consequently, at least one executive step in decision making will be activated. Of course, care must be taken to ensure that all OOFs of respective propositions must be known at appropriate times within the execution process. Therefore afunction offulfilment

g: P ---> [0,1] (12)

is defined accordingly which assigns to each place in the FPN model a real value

g(p) = OOF(a(p» (13)

which is called the degree of fulfilment and which permits the evaluation of corresponding relevant fuzzy rules. Note that this is not a fuzzy marker (as sometimes termed in some works), but OOF of each proposition affected, that is the corresponding relevant place in the FPN. In the FPN model, markers are placed (displaced) form places to other places, thus firing the respective transitions, according to the following basic rule of action: a transition tj is fired if every Pi E I(tj) contains a marker. When a particular transition is fired, a marker is displaced from each of its input places, however, the OOFs of the propositions are

R saved through respective functions of fulfilment. Firing of a transition tj E T within this formalism is equivalent to the applied evaluation of the corresponding

c rule in the evaluation process of the FKB. Firing of a rule tj E T is equivalent with respect to the knowledge to previously obtained conclusions in the inference, i.e. to OOFs of the propositions a(Pi), \;;/ Pi E I(tj).

In the case of this fuzzy-Petri-net computational structure, the OOFs of propositions <X(Pk), \;;/ Pk E O(tj) are not obtained by means of applying directly the

rules in the FKB, but by means of the method which computes the OOFs of the respective propositions viewed as input possibility distributions. It will be seen in the sequel that many of the operations are executed in a-priori manner, which contributes to the simplification and the speed of the whole execution process. When the OOFs of all input places (propositions) of a given transition become known, this particular transition fires and a new marking function in the FPN

270

model is being generated. Thus a Fuzzy-rule Production System (FPS) can be created. For this purpose, the initifll marking function has to be defined in the PN projection of the FKB for a the corresponding FPS, within the representation of the FKB in terms of a PN model, as follows:

M : p ---> { 0,1 }, pi ---> M(pi)= { 0, if g(pi) is unknown

I, if otherwise } (14)

Then, from the given marking function, the firing of transition will generate a new marking function M*, according to the pre-defined transition function trf of the FPN model, which describes the evolution of marking functions of a FPN, as follows:

where

trf: M x T ---> M, (M,tj ) ---> M*

M*(Pi)={ 0, if Pi e 1(9

1, if Pi e O(tj)

M(pi), if othewise }

(15)

(16)

Here, M represents the set of all possible marking functions of the FPN model of the FKB.

In a data-driven executions, which is the case of reference in here for the organising controller, the initial possibility distributions of the input variables of the FKB are known via preliminary observations and task problem formulation. These distributions enable to determine the subsequent possibility distributions of the other intermediate and/or output variables. When this evaluation process is repeatedly executed as many times as needed in the particular application, as a normal outcome, the possibility distributions of all output variables are determined, respectively. In this regard, the execution process of the FKB therefore may be viewed as a propagation process of possibility distributions through the FKB via operations of implication (within one and the same rule) and via effective links (for several rules chained in a particular firing sequence). The actual evaluation does not have a strict ordering, which demonstrates that every rule may be applied at arbitrary instants and when needed exactly. This soft computing process terminates with the stage of aggregation of possibility distributions of the output variables yielding the final, aggregate possibility distribution (as was seen in the set of illustrative examples during the lecture).

Without loss of generality, firstly, we confine ourselves to the analysis of simple case of FKB constructed of two chained rules only. With respect to the source references in fuzzy sets and fuzzy logic systems, following the above outlined representation modelling these rules are described as

RS: IF Xl S IS Al SAND ... AND XMsS IS AMsS THEN

271

RT: IF XIT IS AIT AND ... AND XMTT IS AMTT THEN

XMT+l TIS BIT AND ... AND XMT+NTTIS BNTT (tT) (17)

which are also linked by means of

(18)

For the purpose of representing this pair of rules in the above presented formalism, one must define the bijective function given in the first section which relates places and proposition.

Let the set of places be defined as follows

p= { prmr I mr = I, ... , Mr+Nr, r=S,T } (19)

and, accordingly, the set of propositions

Therefore, for the bijective function (1) we may use the simplest one

(21)

The set of transitions may be given as follows

(22)

(23)

In the sequel, we present an outline of the way the DOF of proposition a(PI T)

through the DOF of proposition a(PMs+ I S), that is how g(PI T) is obtained trough

g(PMs+ I S). In fact, it may be shown that the following equation

(24)

holds, where BSI = { bSI,i} is the possibility distribution for the linguistic value

BSI in the proposition a(PMs+IS). Therefore, one may obtain

272

g(Pl T) = V [ 1:S(g(PMs+ 1 S) /\ bS1) /\ a T 1,i

i=l,I

(25)

There is no particular obstacle to assume that the linguistic truth value 1:s may be represented by an increasing monotone function, and consequently to obtain:

g(P1T) = 1:S(g(PM IS» /\ V [1:S(bSl .) /\ a Tl . ] s+ ,1,1

i=l,I

In these equations the value of the real number

/lpMs+1,pIT= V [1:S(bS1,i) /\ a T 1,i] e [0,1]

i=l,I

(26)

(27)

(28)

represents the DOF existing between possibility distributions 1:s(Bs 1) and Al T ,

and it is this number which resumes the chaining relationship which exist between propositions a(PMs+l s) and a(P1T). The calculation of this number is normally performed at the stage of the definition of the FKB. For the sake of completeness, in the case of decreasing monotone function 1:S one may similarly obtain:

g(Pl T) = V [ 1:S(g(PMs+ 1 S» V 1:S(bSl,i)] /\ a T 1,1

i=l,I

g(p 1 T) = 1:S(g(PMs+ 1 S» V /lpMs+ 1,p 1 T

(29)

(30)

Now we can give an outline of the case when there exist several fuzzy rules R 1,

... ,RS in the FKB to perform inference over a variable; the same variable is present

in the antecedent part of at least one subsequent rule R T. As an example case, let consider the following set of rules:

R 1: IF XII IS All AND ... THEN X IMI + 1 IS B 11 AND ... (1:1)

R2: IF X1 2 IS A12 AND ... THEN X2M2+1 IS B12 AND ... (1:2)

RS: IF XIS IS AI S AND ... THEN XSMS+l IS BIS AND ... (1:S)

273

which are linked by means of

Following the above explained procedure, one may obtain:

g(PI T) = V [ V ['tS(g(PMs+ I S) 1\ bSl,i)] 1\ a T 1,1

i=I,1 s=I,S

If one assumes the case of a monotone function ~, then it holds true:

g(PI T) = V 'tS(g(PMs+IS» OS ~Ms+I,pIT s=I,S

In this expression, the operator OS stand for:

os= 1\, if'ts is an increasing monotone function,

(32)

(33)

(34)

os= V, if'ts is a decreasing monotone function. (35)

We can readily present now the part of this contribution devoted to the datadriven execution when fuzzy-Petri-net supervisory controller is performing its function to generate reference command tasks to the lower-level, local controllers in the hierarchical architecture of our proposed intelligent control system architecture.

4. Data-Driven Execution in Fuzzy-Petri-Net Organising Controller

In the sequel, a shortened version of the main algorithm is presented. This algorithm, as in the case of other fuzzy-Petri-net reasoning algorithm mentioned in the introduction, is constituted of two parts: a part on defining the marking functions, and the other on producing the DOFs and firing the transitions. These steps are repeated until there are active transitions in the FPN model of the FKB present; then the phase of inference is completed and the phase of aggregations is executed. Let IP and OP denote the sets of respective input and output places in the PN model [23, 28, 29].

274

Step 1: It is assumed for the moment that the DOFs of propositions which correspond to input variables (places in the FPN) are known. Thus, the initial marking function will be given as:

(36)

Step 2: The enabled transitions, i.e. the transitions for which

(37)

is valid, are fired; the transition function trf is defined by relationship (15)-(16) above. The respective DOFs are computed according to:

and

where

If d E TR, g(Pi) = f\ g(Pk) , 'v'Pi E 0(d)

PkE I(d)

Ifd E~, g(Pi) = V ['trk( g(Pk» ork Ilpk,pi],

PkE I(d), 'v'Pi E O(d)

while the operator ork is defined according to (35).

Step 3: Return to step 2 until there exist other active transitions, that is:

3 d E T I M(Pi ) = 1, 'v' Pi E I(ti)

(38)

(39)

(40)

(41)

Step 4: Compute for each output variable X its possibility distribution associated !i= thi}, i= 1, ... ,I, by means of the relationship

hi= V 'tr(g(Pnr» or'tr(brn,j)

PnrE Px

where Px, which is defined as

(42)

(43)

275

represents the set of places which are associated with the propositions that contain X over which inferences are carried out.

4. A Comparison Example

Stellakis and Valavanis [69] have developed a fuzzy-logic alternative for Saridis' organising controller for robotic manipulators which enables their operation in an imprecise environment and under minimum linguistic interaction occasionally with a human supervisor. This system based on fuzzy-logic solution has been studied extensively in Gacovski [29], also including the development of experimental simulation software, for the purpose of comparison and validation of our fuzzyPetri-net supervisory organising controller in anthropomorphic robotic tasks.

As outlined before, our supervisory controller is based on the possibility for a set of definitions and operational procedures, similar to those in the case of teleoperators, to be derived in terms of linguistic variables, fuzzy sets and fuzzyrule production mechanisms which define the basics of logic and discrete-event fuzzy environment in the operational domain of the system. Therefore the initial possibility distributions of the input variables of the FKB are known via preliminary observations and task problem formulation (this is the most essential stage, and usually is based on apriori knowledge and/or experience), and these distributions enable to determine the subsequent possibility distributions of the other intermediate and/or output variables (according to operational domain).

The process of decision and command generation is performed by a repeated evaluation within the flexible framework provided for by FKB and its onto mapped FPN, while in turn the possibility distributions of intermediate and output variables become known, and therefore the reduced number of rules needed. Then the aggregation of possibility distributions of the output variables yielding the final, aggregate possibility distribution and its defuzzification gives a particular control command. Following the linguistic input instruction to the system by the user, for each action there is generated a 'fuzzified' Petri-net, capturing the particular discrete-event dimension, and then executed as a programmed command.

Figure 1 depicts the graph of generated FPN structure for the comparison example [69] of an manipulator taking a glass, filling it with water, carrying it to a place, observing and emptying it at the place co-ordinates; PI is the fuzzified input instruction, tll and t12 represent transitions corresponding to possibility distributions of linkage fuzzy-relation Rei (e,) and Rei (e2); P21 and PZ2 represent possibility distributions of events el (,fetch') and ez ('turn on water pipe'); t3 is the transition corresponding to the intersection of possibility distributions of events and generating the respective command in terms of a crisp plan of activities, e.g. represented by P31• Figure 2 depicts graphical presentations of possibility distributions for the input instruction (a, 'take a lot') and for the generated fuzzy relations of linkages (b) and activities 'take' (c) and 'carry' (d); sometimes, there may appear two alternative command programmes to be equivalent in terms of

276

intersection of possibility distri-butions, and this may be taken care off by means of embedding any simple decision rule.

Fig. 1. Graph of the generated FPN structure for the comparison example of StellakisValavanis robotic system with fuzzy organising control

o o Input: "bring a lot" 10

o o Linkages 10 Activity: "carry" 10

Fig. 2. Graphical display of computed possibility distribution for input instruction variable and for generated linkages and activities

277

5. Conclusions

It has been shown that, on the grounds of appropriate fuzzy-knowledge base (FKB) with technical operational specifications and using the theory available, a model of decision mechanism for command control purpose in terms of a fuzzyrule production systems (FPS) and inference algorithms for approximate reasoning along with a compatible projection of the FKB onto its equivalent fuzzy Petri-net (FPN) can be developed. This development makes combined compatible use of fuzzy logic and Petri-nets, and give refined structure and enable the implementation of algorithmic procedures performing data-driven execution process of the FPS. The model constructed this way implements the supervisory organising controller level of a two-level intelligent controller architecture, which is capable of generating a class of commands via inference in approximate reasoning and control decisions over a refined base of event primitives pertinent to different situations in processes-to-be-controlled. Thus, in the upper level of the intelligent controller a new generic property is being implanted, and a two-layer, single-level supervisory organising-co-ordinating controller is implemented. Simulation results for the example of Stellakis-Valavanis have demonstrated that the same performance can be obtained with four times smaller number of rules executed.

A qualified evaluation of the rules by means of linguistic truth-value for the truth variable permits to measure and validate the importance and impact of some of the rules with respect to the other rules in the FKB as well as the resulting decision conclusions from the application of these rules. On the other hand, the implemented decision and control mechanisms is a kind of a priori chained form of FPS which, in tum, puts a great deal of computing burden in the designing phase. This way the complexity of the execution performing algorithms becomes independent of the discretization of universes of discourse over which the linguistic variables of FPS are defined, and the speed of information processing of the organising controller may be brought close to the real-time for typical industrial applications.

We believe, it has become rather apparent that fuzzy-Petri-net supervisory controllers have a significant role in applications to intelligent control of hybrid processes and complex systems. In such applications, they are here to stay for good.

Acknowledgements: I would like to express my gratitude to Professor L.A. Zadeh as well as to Professors K.I. Astrom, A.T. Dinibutun, P.M. Frank, C.I. Harris, R. Hanus, O. Kaynak, M. Mansour, R.I. Patton, and M. Thoma for most useful exchange of ideas and stimulating talks he has had on various occasions. I would also like to acknowledge the contributions by my graduate students Z.M. Gacovski, 1.1. Ivanoska, O.L. Iliev, T.D. Kolemisevska, M.I. Stankovski and J.D. Stefanovski in simulation modelling and software development as well as in experimentation for this exciting research work in intelligent decision and control.

278

References

1. Ashby, W.R.: Design for a Brain. New York: 1 Wiley 1952

2. Ashby, W.R: Information flow within co-ordinated systems. In J. Rose (ed.) Progress in Cybernetics. London: Gordon & Bearch 1970, pp. 57-64

3. Astrom, KJ., lJ. Anton & K.E. Arzen: Expert control. Automatica 22,227-236 (1986)

4. Astrom, KJ.: Towards intelligent control. IEEE Contr. Syst. Mag. 9,60-64 (1989)

5. Astrom, KJ.: Intelligent control. In: I.D. Landau et al. (eds.) 1st European Control Conference. Proceedings, Grenoble 1991, 3, 2328-2339. Paris: Hermes 1991

6. Baldwin, IF.: A new approach to approximate reasoning using a fuzzy logic. Fuzzy Sets & Syst. 2, 309-328 (1979)

7. Bugarin, AJ. & S. Barro, Fuzzy reasoning supported by Petri nets. IEEE Trans. Fuzzy Systems 2,135-150 (1994)

8. Cao, T. & A.C. Anderson: A fuzy petri net approach to reasoning about uncertainty in robotic systems. In: IEEE Int. Conf. Robotics and Automation. Proceedings, Atlanta GA 1993, 317-322. New York: IEEE 1993

9. Cao, T. & A.C. Anderson: Task sequence planning using fuzzy Petri nets. IEEE Trans. Syst. Man Cybern. 25,755-768 (1995)

10. Chen, S.M., lS. Ke & IF. Chang: Knowledge representation using fuzzy Petri nets. IEEE Trans. Know. Data Eng. 2, 311-319 (1990)

11. Conant, RC.: Laws of information that govern systems. IEEE Trans. Syst. Man Cybern. 6, 240-255 (1976)

12. Dimirovski, G.M., N.E. Gough & S. Barnett: Categories in systems and control theory. Int.J. Syst. Sci. 8, 1081-1090 (1977)

13. Dimirovski, G.M., B.L. Crvenkovski & D.M. Joskovski: Expert system for recognition and typical identification of dynamic process models. In: R Husson (ed.) Advanced Information Processing in Automatic Control. Proceedings of Selected Papers, Nancy 1989, 257-262. Oxford: The IFAC & Pergamon Press 1990

14. Dimirovski, G.M.: Towards intelligent control and expert computer-aided systems engineering (Invited Plenary Peper) In: Proc. 36th Yugoslav Conf. ETAN. Proceedings of Selected Papers, Ohrid 1991, I, 27-42. Belgrade (Sr-SFRY): The ETAN Association 1991

15. Dimirovski, G.M. et al.: Knowledge-based closed-loop process identifier via pattern recognition. In: P. Albertos & P. Kopacek (eds.) Low Cost Automation Techniques and Applications, Proceedings of Selected Papers, Vienna 1992, 125-128. Oxford: The IFAC & Pergamon Press 1992

16. Dimirovski, G.M. et a1: Modelling and scheduling of FMS based on stochastic Petrinets. In: REvans (ed.) 12th IFAC World Congress. Proceedings, Sydney 1993, 11,117-120. Barton Act (AUS): The IFAC & The Instn. Engrs. Australia 1993

17. Dimirovski, G.M. et al.: Fuzzy-logic control algorithms in navigation of a factory floor vehicle-robot. In: lEE Publication No 389. Proceedings, Coventry 1994, 1, 282-287. London: The Instn. Electr. Engrs. 1994

18. Dimirovski, G.M. et al.: A generic fuzzy controller for multi variable processes. In: G. Buja (Gen.Chairman) IEEE Conference IECON 94. Proceedings, Bologna 1994, 2, 1359-1364. New York: IEEE 1994

19. Dimirovski, G.M. et al.: Contributions to two-level intelligent non-linear control systems. In: O. Kaynak et al. (eds.) Recent Advances in Mechatronics. Proceedings,

279

Istanbul 1995, II, 874-881. Istanbul: UNESCO Chair Mechatronics Bogazici Uniiversity 1995

20. Dirnirovski, G.M. et al.: Contributions to intelligent supervision and control for saving energy. In: P.D. Roberts & lE. Ellis (eds.) IFAC Int. Symp. on Large Scale Systems. Proceedings of Selected Papers, London 1995, 1,59-60. Oxford: The IFAC & Elsevier Science 1995

21. Dirnirovski, G.M. et al.: Optimum supervisory control of low-power multi-machine hydroelectric plants. In: P.D. Roberts & J.E. Ellis (eds.) IFAC Int. Symp. on Large Scale Systems. Proceedings of Selected Papers, London 1995, 2, 911-916. Oxford: The IFAC & Elsevier Science 1995

22. Dirnirovski, G.M., R. Hanus & R.M. Henry: Complex systems control in energy, industry and transport technologies: Contributions to intelligent automation. Journal E.C.&T. Engineering 1,1-22 (1996)

23. Dirnirovski, G.M., R. Hanus & R.M. Henry: A two-level system for intelligent automation using fuzzified Petri-nets and non-linear systems. In: Molamshidi (Gen. Chairman & ed.) Proceedings WAC96, TSI Press Series Robotics and Manufacturing. Proceedings of Selected Papers, Montpellier 1996, Paper WdA13-5. Albuquerke NM: The TSI Press 1996

24. Dubois, D. & H. Prade, Possibility Theory. New York: Plenum Press 1988

25. Dubois, D. & H. Prade: Fuzzy sets in approximate reasoning: Inference with possibility distributions, Pt. 1. Fuzzy Sets & Syst. 40,143-202 (1991)

26. Einstein, A.: Remarks on Bertrand Russell's theory of knowledge. In: P.A. Schlipp (ed.) The Philosophy of Bertrand Russell. Evanston IL: Northwestern University 1944, pp.211-232

27. Fu, K.S.: Learning control systems - Review and outlook. IEEE Trans. Automat. Contr. 14,210-220 (1970)

28. Gacovski, Z.M. et al.: A contribution to fuzzy-Petri-net of the intelligent organizing controller. In: M.Srniljanic et al. (eds.) XL Conference ETRAN. Proceedings of Selected Papers, Budva 1996, IV, 259-262. Belgrade (Yu): Society for Electron., Telecomm., Comput., Automat. and Nuclear Engng. 1996

29. Gacovski, Z.M.: Fuzzy-Petri-Net Based Organising Coordination Level in the Intelligent Control of Robotic Systems. St. Cyril & St. Methodius University, Faculty ofEE, Skopje, ASE Techn. Rep. FPN-IC/96-3, December 1996

30. Gaines, B.R.: Foundations of fuzzy reasoning. Int. 1. Man-Machine Stud. 8, 227-256 (1976)

31. Garg, M.L., S.I. Ashon & P.V. Gupta: A fuzzy Petri net for knowledge representation and reasoning. Inf. Process. Lett. 39, 165-171 (1991)

32. Gitt, W.: Information: the third fundamental quantity. Siemens Review 56, 36-41 (1987)

33. Godel, K.: Russell's mathematical logic. In: P. Benaceraff & H. Putnam (eds.) Phillosophy of Mathematics. Englewood Cliffs NJ: Prentice-Hall 1964, pp. 211-232

34. Harris, Col., C.G. Moore & M. Brown: Intelligent Control: Aspects of Fuzzy Logic and Neural Nets. Singapore: World Scientific Press 1993

35. Harris, Col.: Advances in Intelligent Control. London: Taylor & Francis 1994.

36. Harris, Col. and T.E. Schilhabe: Advances and critical research issues in intelligent modelling, control and estimation. In: M. Thoma & R. Patton (eds.) First Joint

280

Workshop on COSY Programme of the ESF. Proceedings, Roma 1995,46-51. Rome: European Science Foundation and Universita La Sapienza 1995

37. Hirota, K. & W. Pedrycz: Referential modes of reasoning. In: Proc. 2nd IEEE Int. Conf. on Fuzzy Systems. Proceedings, San Francisco, 1993,558-563. New York: The IEEE 1993

38. Klawon, F.: Fuzzy sets and vague environments. Fuzzy Sets & Syst. 66, 207-221 (1994)

39. Klir, GJ. & B. Yuan: Fuzzy Sets and Fuzzy Logic: Theory and Applications. Upper SadIe River NJ: Prentice-Hall 1995

40. Kolmogorov, A.N.: Logical basis for information theory and probability theory. IEEE Trans. Inform. Theory 16, 662-664 (1968)

41. Krijgsman, AJ., R. Jager & H.B. Verbrugen: Real-time autonomous control. In: REvans (ed.) 12th IFAC World Congress. Proceedings, Sydney 1993, I, 119-122. Barton Act (AUS): The IFAC & The Instn. Engrs. Australia 1993

42. Kruse, R, 1. Gebhardt & F. Klawon: Foundations of Fuzzy Systems. Chichester: 1. Wiley 1994

43. Kuipers, B. & KJ. Astrom: The composition and validation of heterogeneous control laws. Automatica 30,233-249 (1994)

44. Lee, C.C.: Fuzzy logic in control systems: Fuzzy logic controller, Pts. I & II. IEEE Trans. Syst. Man Cybern. 20,404-418 and 419-432 (1990)

45. Lee, C.C.: A self-learning rule-based controller employing approximate reasoning and neural net concepts. Int.J. Intell. Syst. 6, 71-93 (1991)

46. Looney, C.G.: Fuzzy Petri-nets for rule based decision making. IEEE Trans. Syst. Man Cybem. 18,178-183 (1988)

47. MacFarlane, A.GJ.: Information, knowledge and control. In: H.L. Trentelmann & J.C.Willems (eds.) Essays on Control Theory: Perspectives in the Theory and Its Applications. Boston: Birkhauser, 1993, pp. 1-28

48. Mamdani, E.H. & S.AssiIian: An experiment in lingustic synthesis with a fuzzy logic controller. IntJ. Man-Mach. Stud 7,1-13 (1974)

49. Mamdani, E.H.: Advances in the linguistic synthesis of fuzzy controllers. Int. J. ManMach. Stud. 8,669-678 (1976)

50. Mamdani, E.H.: Application of fuzzy logic to approximate reasoning using linguistic synthesis. IEEE Trans. Computer 6, 1182-1191 (1977)

51. Mamdani, E.H.: Twenty years of fuzzy control: Experiences gained and lessons learnt (Invited Paper). In: RJ.Markus II (ed.) Fuzzy Logic Technology and Applications. New York: The IEEE 1994, pp. 19-24

52. Mesarovic, M.D. D. Macko & Y. Takahara: Theory of Multilevel Systems. New York: Academic 1970

53. Meystel, A.: Intelligent control in robotics. J. Robotic Syst. 5, 269-308, 1988 54. Meystel, A.: Intelligent control: A sketch of the theory. 1. Intell. Robot. Syst. Theory &

Appl. 2,97-107 (1989) 55. Murata, T.: Petri nets: Properties, analysis, and applications. IEEE Proceedings 77,

541-580 (1991) 56. Pascal, 1.-C., R Valette & D. Andreu: Fuzzy sequential control based on Petri nets. In:

IEEE Conf. Emerging Technologies and Factory Automation. Proceedings, 140-145. New York: The IEEE 1992

281

57. Pedrycz, W. & F. Gromide: A generalized fuzzy Petri net model. IEEE Trans. Fuzzy Systems 2, 295-301 (1994)

58. Petri, C.A.: Kommunikation mit Automaten. Schriften des Reinisch-Westfalischen Institut fur Instrumentelle Mathematik, Heft 2. Bonn: Universitat Bonn 1962

59. Peterson, J.L.: Petri Nets Theory and the Modeling of Systems. Englewood Cliffs NJ: Prentice-Hall 1981

60. Russell, 8.: Vagueness. Australas. J. Psychol. Phylosophy 1,84-92 (1923)

61. Saridis, G.N.: Towards realisation of intelligent controls. Proceedings IEEE 67, 1115-1133 (1979)

62. Saridis, G.N.: Entropy formulation of optimal and adaptive control. IEEE Trans. Autom. Contr. 33,713-721 (1988)

63. Saridis, G.N.: Analytical formulation of the principle of increasing precision with decreasing intelligence. Automatica 25, 461-467 (1989)

64. Saridis, G.N.: Theory of intellgent machines (Special Lecture). In: O. Kaynak (ed.) IEEE Workshop Intelligent Motion Control, Proceedings, Istanbul 1990, I, SL. (19-30). New York: The IEEE 1990

65. Saridis, G.N. & K.P. Valavanis: Analytical design of intelligent machines. Automatica 24,123-133 (1988)

66. Scarpelli, H., F. Gromide & RR Yager: A reasoning algorithm for high-level fuzzy Petri nets. IEEE Trans. Fuzzy Systems 4, 282-294 (1996)

67. Stankovski, M.1. et al.: Modelling, simulation and control system design for a pipe reheating furnace. In: M. Hadjiiski (ed.) Automatics and Informatics 95. Proceedings of Scientific Papers, Sofia 1995, 76-83. Sofia: National Union for Automatics & Informatics of Bulgaria and Technical University of Sofia 1995

68. Stankovski, M.1. et al.: Simulation modelling of complex systems for exploring intelligent controls. In: M.H. Harnza (ed.) Modeling, Identfication and Control. Proceedings of Selected Papers, Innsbruck 1996, 212-216. Calgary - Zurich: The lASTED & Acta Press 1996

69. Stellakis, H.M. & K.P. Valavanis: Fuzzy-logic based formulation of the organizer for intelligent robotic systems. J. Intell. Robotic Syst. Theor. & App!. 4,1-24 (1991)

70. Sugeno, M. & M .Nishida: Fuzzy control of a model car. Fuzzy Sets & Syst. 16, 103-113 (1985)

71. Sugeno, M. (ed.): Industrial Applications of Fuzzy Control. Amsterdam: Elsevier Science BV 1985

72. Tabak, D.: Petri-net representation od decision models. IEEE Trans. Syst. Man Cybern. 15,812-818 (1985)

73. Tsien, H.-S.: Engineering Cybernetics. New York: McGraw-Hill 1950

74. Turksen, B.I.: Approximate reasoning for production planning. Fuzzy Sets & Sys. 26, 23-37 (1988)

75. Turksen, B.I.: Measurement of membership functions and their acquisition. Fuzzy Sets & Sys. 40, 5-38 (1991)

76. Tzypkin, Y.Z.: Foundation of Self-Learning Systems (in Russian). Moskva: Nauka 1970

77. Valavanis, K.P. & G.N. Saridis: Information-theoretic modeling of intelligent robotic systems. IEEE Trans. Syst. Man. Cybern. 18,852-872 (1988)

282

78. Valete, R., J. Cardoso & D. Dubois: Monitoring manufacturing systems by means of Petri nets with imprecise markings. In: IEEE Int. Symp. Intelligent Control. Proceedings, Albany NY, 1989,233-237. New York: The IEEE 1989

79. Vukobratovic, M.K. & G.M. Dimirovski: Modelling simulation and control of robots and robotized FMS (Invited Plenary Lecture IPL-3). In: A. Kuzucu, I. Eksin & A.T. Dinibutun (eds.) IFAC Int. Workshop ACQP'92. Proceedings, Istanbul 1992, Late Paper IPL-3. (1-33). Istanbul: The IFAC & Istanbul Technical University 1992

80. Yager, R.R. On a general class of fuzzy connectives. Fuzzy Sets Syst. 4, 235-242, 1980

81. Yager, R.R.: Using approximate reasoning to represent dafault knowledge. Artificial Intell. 31, 99-112 (1987)

82. Yager, R.R., D.P. Filev & T. Saghedi: Analysis of flexible structured fuzzy logic controllers. IEEE Trans. Syst. Man Cybem. 24,1035-1043 (1994)

83. Yu, S.-K.: Knowledge representation and reasonin& using fuzzy Prf[ net-systems. Fuzzy Sets & Syst. 75, 33-45 (1995)

84. Zadeh, L.A.: Fuzzy sets. Informat. Contr. 8,338-353 (1965)

85. Zadeh, L.A.: Quantitative fuzzy semantics. Informat. Sci. 3,159-176 (1971)

86. Zadeh, L.A.: Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans. Syst. Man Cybem. 3, 28-44 (1973)

87. Zadeh, L.A.: A rationale for fuzzy control. Trans. AS ME I. Dyn. Syst. Measur. Contr. 94 G, 3-4 (1974)

88. Zadeh,L.A.: The concept of linguistic variable and its application to approximate reasoning, Pts. I, II, & III. Inform. Sci. 8, 199-249; 8, 301-375; and 9, 47-80 (1975)

89. Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibilities. Fuzzy Sets & Syst. 1, 3-28, 1978

90. Zadeh, L.A.: A theory of approximate reasoning. In: J.E. Hayes, D. Michi & L.1. Kulich (eds.) Machine Intelligence 9. New York: J.Wiley 1979, pp. 149-194

91. Zadeh, L.A.: Inference in fuzzy logic. IEEE Proceedings 68,124-131 (1980)

92. Zadeh, L.A.: Commonsense knowledge representation. IEEE Computer 16, 61-65 (1983)

93. Zadeh, L.A.: Fuzzy logic. IEEE Computer 21,83-93 (1988)

94. Zadeh, L.A.: The calculus ofif-then rules. AI Expert 7,22-27 (1992) 95. Zadeh, L.A.: Fuzzy logic, neural networks and soft computing. Comm. ACM 37, 77-84

(1994)

96. Zadeh, L.A.: Fuzzy logic = Computing with words. IEEE Trans. Fuzzy Syst. 4, 103-111 (1996)

97. Zadeh, L.A.: The role of fuzzy logic and soft computing in intelligent control and systems analysis (Invited Plenary Lecture WAC96-l). In: M. Jamshidi (Gen.Chairman & ed.) 2nd World Automation Congress. Proceedings, Montpellier 1996

98. Zimmermann, H.-I.: Fuzzy sets, Decision Making and Expert Systems. Boston: Kluwer 1987

99. Zurawski, R. & M.C. Zhou: Petri nets and industrial applications: A tutorial. IEEE Trans. Industr. Electronics 41,567- 583 (1994)

A Review of Neural Networks with Direct Learning Based on Linear or Non-linear Threshold Logics

Daniel M. Dubois

Universite de Liege, Institut de Mathematique, Grande Traverse 12, B-4000 Liege 1, Belgium

[email protected]

Abstract. This paper deals with a review of the non-linear threshold logic developed in collaboration by D. Dubois, G. Resconi and A. Raymondi. This is a significant extension of the neural threshold logic pioneered by McCulloch and Pitts. The output of their formal neuron is given by the Heaviside function with an argument depending on a linear weighted sum of the inputs and a threshold parameter. All Boolean tables cannot be represented by such a formal neuron. For example, the exclusive OR and the parity problem need hidden neurons to be resolved. A few years ago, Dubois proposed a non-linear fractal neuron to resolve the exclusive OR problem with only one single neuron. Then Dubois and Resconi introduce the non-linear threshold logic, that is to say a Heaviside function with a non-linear sum of the inputs which can represent any Boolean tables with only one neuron where the Dubois' non-linear neuron model is a Heaviside fixed function. In this framework the supervised learning is direct, that is to say without recursive algorithms for computing the weights and threshold, related to the new foundation of the threshold logic by Resconi and Raymondi. This paper will review the main aspects of the linear and non-linear threshold logic with direct learning and applications in pattern recognition with the software TurboBrain. This constitutes a new tool in the framework of Soft Computing.

Keywords. Neural networks, threshold logic, non-linear threshold logic, nonlinear neuron, Heaviside fixed function, parity problem, direct learning, TurboBrain software.

1. Introduction

This paper deals with a review of the non-linear threshold logic developed by D. Dubois, G. Resconi and A. Raymondi [3-8]. This is a significant extension of the McCulloch and Pitts formal neuron [10, 12, 14] in the


284

framework of the threshold logic [1, 2]. The models of the Dubois' fractal neural network [3-5] and the Fukushima et al. [9] neocognitron give the original framework of the new non-linear threshold logic.

This non-linear threshold logic avoids some important critics on this threshold logic collected in a book by Minsky and Papert [11]. A difficult problem is described in Rumelhart et al. [15]: "One of the problem given a good deal of discussion by Minsky and Papert is the parity problem, in which the output required is 1 if the input pattern contains an odd number of Is and ° otherwise. This is a very difficult problem because the most similar patterns (those which differ by a single bit) require different answers. The XOR problem is a parity problem with input patterns of size two.".

A few years ago, Dubois [3-5] proposed the following quadratic activation function to resolve the XOR problem with only one single neuron without hidden neuron

where Xl E to,!}, X2E {0,1}, Sl E [-1,+1],s2E [-l,+l],IlE [O,l],uE [0,1],

~ E [-1,+1],

(1)

The 16 Boolean rules with two inputs and the 4 Boolean rules with one input are well obtained from this equation (1) with the values of the parameters given in Dubois [5]. Dubois and Resconi [6] validated completely this proposition by the demonstration that Dubois' quadratic activation function is a Heaviside fixed function solution. For example, a single non-linear neuron with two inputs resolved the Boolean table for exclusive OR, XOR, which is the following XOR Heaviside fixed activation function:

Thus the classical XOR problem considered until now as impossible to realise with only one single neuron is so completely resolved with a non-linear threshold logic.

Moreover, the non-linear threshold logic resolved all the functions with one single neuron with every number of inputs. It is shown that the very difficult parity problem has also a very elegant solution with one single non-linear neuron. Nonlinear threshold logic will permit to diminish a great quantity of hidden neurons in the neural networks.

2. Non-linear Threshold Logic

Let us start with the following non-linear threshold function for one single neuron with n inputs introduced by Dubois and Resconi [6]

285

(3)

where aDO, aiQ' aij are real numbers and f is a non-linear function, beyond the

degree two, of the variables xI,x2' ... ,xn ,with xij E {O,1}, i = 1,2, ... ,n. The

parameter aoo is similar to the threshold in the McCulloch and Pitts model and the

linear term Li aiQxi is similar to the weighted inputs in the McCulloch and Pitts

model, l!.nd r{g{x)) is the Heaviside function. Recall that r{g{x)) = I if and only

if g(x) > 0, otherwise r{g{x)) = O. In the case of a single neuron with two inputs

(without loss of generality for the method), the equation (3) is written as

where Xi are the 0 or I inputs of a single neuron, Wi the weights on the

corresponding inputs, aij are the new coefficients of the quadratic extension of the

linear argument function of the McCulloch-Pitts model, (J is a threshold parameter and

is the argument of the Heaviside function r(g(x}). The general mathematical

formalism of this new non-linear threshold logic is described in Dubois and Resconi [6]: it is demonstrated that all the Boolean functions can be realised with one single neuron.

The argument function g(x), given by equation (5), can be a "Heaviside fued function" denoted h(x} by Dubois and Resconi [6], when g(x) satisfies to the

following identity:

r(h{x}} = h{x) = y (6)

Let us explicitly demonstrate now that such a general "Heaviside fixed function" h{x) exists for the Boolean logic of one neuron with two inputs. The function

h{x) will be shown to be a non-linear continuous function and it is then possible to

replace the non-derivable Heaviside function by this non-linear Heaviside fixed function, the derivative of which being possible. So, in that case, an output y is

equal to the h{x) argument of the Heaviside function, y = h{x) where h{x) is a

derivable continuous function, for which, from the output value y, a finite number

of values x can be obtained.

286

The 16 Boolean tables, given in Table 1, with 2 inputs XI and x2' can be

defined by

Xl Xz Y

0 0 YI 1 0 Yz 0 1 Y3 1 1 Y4

where the output y can take the values Yi which are ° or 1. From relation (6) and the equations (4) and (5), we deduced the general conditions (see Dubois and Resconi [6])

9=-YI

wI = Y2 -a11 +9

w2 = Y3 -a22 +9

a12 = {- Y2 - Y3 + Y4 -9)/2

(7)

for the parameters of the following quadratic activation function (equation (5) where g is now a Heaviside fixed function h):

The 4 eqs. (7) define 4 parameters, thus the values of two parameters are to be determined, a11 and a22 for example. These parameters are similar to the linear

weights w because in Boolean logic, x2 = x. Tables 2 and 3 give the parameters for eqs. (1) and (8).

The proposition to consider a non-linear threshold function which is a Heaviside fixed function has some similarities with the model of the following activation function of Fukushima et al [9],

if>{g{x ))= max{g{x),O) (9)

where if> is equal to the greatest value between g{x) and 0, so that the function ¢ is equal to its positive argument g which is a non-linear function.

287

Table 1: Classification of the 16 Boolean rules for 2 inputs and 4 rules for 1 input

R1 R3 R5 R7

X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1

R2 R4 R6 R8

X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 0

R9 R11 R13 R15

X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 1 0 1 1 0 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0

R10 R12 R14 R16

X1 X2 Y X1 X2 Y X1 X2 Y X1 X2 Y 0 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 1 1 0 1 1 0 1 1 1 0 0 1 0 0 1 0 0 1 0 0 1 1 1 1 1 0 1 1 0 1 1 1

R17 R18 R19 R20

X Y X Y X Y X Y 0 1 0 0 0 1 0 0 1 0 1 1 1 1 1 0

288

Table 2: Parameters of eq. (1) for the Boolean rules given in Table I [5]

R1 R3 R5 R7

81 = +1 81 = +1 81 = +1 81 = +1 82 = +1 82 = +1 82 = +1 82 = 0 I! = 3/8 I! = 1/2 I! = 1/8 I! = 1/4 a. = 0 a. = 0 a. = +1 a. = +1 B = +1/3 B = +1/2 B = +1 B = +1

R2 R4 R6 R8

81 =-1 81 =-1 81 =-1 81 =-1 82 =-1 82 =-1 82 =-1 82 = 0 I! = 1/8 I! = 1/2 I! = 3/8 I! = 1/4 a. = 0 a. = +1 a. = +1 a. = 0 B =-1 B = -1/2 B = -1/3 B =-1

R9 R11 R13 R15

81 = +1 81 = +1 81 = +1 81 =0 82 =-1 82 =-1 82 = 0 82 =-1 I! = 1/8 I! = 1/8 I! = 1/4 I! = 1/4 a. = 0 a. = +1 a. = 0 a. = +1 B =-1 B = +1 B =0 B =0

R10 R12 R14 R16

81 =-1 81 =-1 81 =-1 81 = 0 82 = +1 82 = +1 82 = 0 82 = +1 I! = 1/8 I! = 1/8 I! = 1/4 I! = 1/4 a. = +1 a. = 0 a. = +1 a. = 0 B = +1 B =-1 B =0 B =0

R17 R18 R19 R20

81 =-1 81 =+1 81 = +1 81 =-1 82 = 0 82 = 0 82 = 0 82 = 0 I! = 1/2 I! = 1/2 I! = 1/2 I! = 1/2 a. = +1 a. = 0 a. = +1 a. = 0 B = -1/2 B = +1/2 B = +1 B =-1

289

Table 3: Parameters of eq. (8) for the Boolean rules in Table I [6]

R1 R3 R5 R7

8= 0 8=0 8=-1 8 =-1

w, =+3/2 w, =+2 w, =+1/2 w, =+1

W2 =+3/2 w2=+2 W2 =+1/2 W2=0

a11 =-1/2 a11 =-1 a" =-1/2 a" =-1 a22 =-1/2 a22 =-1 a22 =-1/2 a22=0

a'2 =-1/2 a'2 =-1 a'2 =-1/2 a'2 = 0

R2 R4 R6 R8

8= 0 8 =-1 8 =-1 8= 0

w, =-1/2 w, =-2 w, =-312 w, =-1

W2 =-1/2 w2=-2 W2 =-3/2 W2=0

a" =+1/2 a" =+1 a11 =+1/2 a"=+1 a22 =+1/2 a22 =+1 a22 =+1/2 822 = 0

a'2 =+1/2 a'2 =+1 a'2 =+1/2 a'2 = 0

R9 R11 R13 R15

8=0 8=-1 8=0 8 =-1 w, =+1/2 w, =+1/2 w, =+1 w,=O

W2 =-1/2 W2 =-112 W2=0 w2=-1

a" =+1/2 a11 =+1/2 a11 = 0 a" =0 a22 =+1/2 a22 =+1/2 822 = 0 a22 = 0

a'2 =-112 a'2 =-112 a'2 = 0 a'2 = 0

R10 R12 R14 R16

8 =-1 8=0 8 =-1 8=0 w, =-1/2 w, =-1/2 w, =-1 w, =0 W2 =+1/2 w2=+1/2 W2=0 w2=+1

a" =+1/2 a11 =+1/2 a11 =0 a11 =0 a22 =+1/2 822 =+1/2 a22 = 0 a22 = 0

a'2 =-112 a'2 =-1/2 a'2 = 0 a'2= 0

R17 R18 R19 R20

8 =-1 8=0 8 =-1 8=0 w, =-2 w, =+2 w,=+2 w,=-2

W2=0 W2=0 W2=0 W2=0

a" =+1 a" =-1 a" =-2 a" =+2 a22 = 0 a22 = 0 a22= 0 822 = 0

a'2 =0 a'2 =0 a'2=0 a'2=0

290

2.1 Exclusive OR and Parity Problems

Let us demonstrate that our non-linear threshold logic permits to give a general Heaviside fixed function which resolves the parity problem with any input patterns. Let us start with the parity problem with two inputs from the exclusive OR (XOR) only with one single neuron. For XOR, the outputs are given by y =(0,1,1,0), so from the first equation of (7), we obtain the numerical value of the threshold e = 0, and, in choosing the numerical values of the free parameters as all = 0 and an = 0, the three other parameters are then given by WI = 1, W2 = 1, an = -1.

Putting these values in the general Heaviside Fixed Function (8), we obtain a Heaviside fixed activation function for the XOR, which is the parity problem for two inputs

(10)

which can be written as

y=(1-(2x)-I)(2x)-1»12 (11)

and we can verify that it gives the correct numerical values of the output y in

function of the different inputs Xl and x2 of the table XOR.o

Equations (10) and (11) can be generalised with every number of inputs n for resolving the parity problem by the following equation (12)

y = LiXj - 2Lil#i2 x il x i2 + (- 2)2 Lil#iUi3 xilxi2xi3 + ...

'" + (- 2 )n-l Li1#i2# '" #in Xii xi2 ... xin

which can be written in the following compact form

(12)

(13)

Indeed, the general Heaviside fixed function in our non-linear threshold logic for every number n of input variables xn ' is capable of resolving the parity problem.

The threshold logic, using linear argument of the Heaviside function, is not capable of resolving all Boolean functions with only one neuron. As pointed out by Rumelhart et al. [14], "it requires at least N hidden units to solve parity with patterns of length N" ... "there is always a recoding (i.e. an internal representation) of the input patterns in the hidden units in which the similarity of the patterns among the hidden units can support every required mapping from the input to the output units".

291

Our non-linear threshold logic needs only one single non-linear neuron with N inputs with the equations (12-13). The non-linear part gives an internal representation inside the single neuron similar to the internal representation by the hidden neurons.

2.2 Unifying Linear and Non-linear Logics [7, 8]

In this section we want to realise Boolean functions by the non-linear threshold logic and to obtain the formal solutions both for the linear and non-linear weights [7, 8]. In fact by a particular operation the input function that we want to realise with a non-linear neuron is expanded so that it holds also the non-linear terms: realising with a linear neuron the expanded function we will find both the linear and non-linear coefficients of the input function. The main result of this section is that a non-linear realisation is equivalent to a linear realisation.

General Theorem [7,8]: Any Boolean function F of dimension n, can be realised by a single non-linear neuron with integer parameters WJ, ••• , Wn , all,"" ann , ... ,

an ... n , 0 such that the following equation (14)

g(Xl'·· .. 'X2)=WlXI +w2x2 +",+wnxn +allxl +a12xlx2 +

... +a12xl xn + ... +aI23nxlx2x3xn-9

gives g(xt. ... , xn) = 1 ifF(xt. ... , xn) = 1 and g(xt. ... , xn) = 1 ifF(xt. ... , xn) = 0

The proof is given in Dubois and Resconi ([7], pp.53-54).

(14)

Remark 1 [7, 8]: The parameters Wi, aijn, 0 are related to the parameters w\, a'ijn , 0' of the Heaviside fixed function: Wi = 2w'j, aijn=2a'ijll> 0= 20'-1. Remark 2 [7, 8]: Since the weights realising a Boolean function F by the Heaviside Fixed Function are unique, the prec~ing remark shows that the weights obtained by the theorem are also unique.

A simple algorithm for direct learning can be obtained for computing the linear and non-linear weights and threshold of any Boolean function from this theorem. It is only necessary to test once each bit of the function and to add at each step the necessary linear or non-linear term. Indeed, from a Boolean function F(xt. ... , xn), ordered as follows F(O,O, ... ,O)=yo, F(I,O, ... ,O)=yt. ... , F(1,1, ... ,1)=Y2n-t. a function g(x" ... , xn), initially equal to zero, is iteratively built in the following order

1. If YO=O then g(x\, '" , xn) = -Iso that 9 = -1

IfYo=1 then g(x\, ... , xn) = 1 so that 9 = 1

2. If y\=O and g(xl> ... , xn) = 1 then g(xl> ... , xn) = 1-2x\ so that w\=-2 Ify\=1 and g(x" ... , xn) = -1 then g(x" ... , xn) = -1+2x\ so that w\=+2 Otherwise g(xl> ... , xn) remains unchanged.

292

In practice, the general rule consists to check each output bit Yi= F(xJ, ... , xn) with 0::;; i::;; 20_1, in computing the following error function

(15)

and then correcting this error in adding to g(xJ, ... , xo) a term given by

(16)

that is to say to add a term which is the product of the error by all the input variables equal to 1. This procedure builds a function which obeys the preceding theorem.

For supervised learning and pattern recognition, only Partial Boolean Tables dealing with samples to be learnt are to be considered.

Definition of partial functions [7,8]: F is a partial Boolean function of dimension n if

Xi} are some of all the possible values 1,0 of the independent variables xl' ... , xn '

Yi are the values 0,1 of the Boolean function F associated to the inputs xij . This

definition means that the Booleanfunction F isn't defined in all the input space but only in a subset of it (indicated with S).

Now when we realise a partial Boolean function F in the form (17) we have

(18)

where xij are the input values and Yi are the output values of the neuron, only if

i E S. Definition of new partial functions [7, 8]: Given a Boolean function F of

dimension n, partial or not, a new function F* of dimension n+ 1 is defined as

with i, j ::;; n , where nil means that the function is undefined for this input.

293

It is easy to see that this is always a partial function that gives the same weights WI' ... , Wn as F. The weight wn+1 is equivalent by definition to the non-linear

term ai,j'

Indeed the output of the neuron is

y=r[WIXI +",+wnxn +wn+lxn+1 -0]=

rlWIXI +",+wnxn +wn+IXjXj -aJ (21)

The . nil correspond to cases which will not be considered for learning but for recognition after the learning procedure is performed.

Algorithms for direct learning and pattern recognition were developed from partial Boolean Tables and implemented in the software TurboBrain, see [7, 8] for details.

2.3 Polynomial Extension of the Non-linear Threshold Logic [7, 8]

In the original Dubois' fractal activation function

N !(xI,x2, ... ,xn)=g( LSjXj)

j=1 (22)

N g(y) is a polynomial form. When Sj = i and xi E {O,+I}N the function LSjXj =

i=1

M is the natural number X associated to the vector Xi of bits. With these conditions

by the interpolation theorem, at any set of discrete values for f, it exists one and only one set of coefficients aj that give the polynomial model of the assigned function.

For the XOR Boolean function f(O,O)= -1, f(I,O) = 1, f(O,I) = 1, f(I,I) = -1,

we obtain

and M(O,O) = 0, M(1,O) = I, M(O,I) = 2 and M(1,I) = 3

With the interpolation method, we can write the polynomial form for XOR

(24)

294

For the function f (Xl ,X2 , ... ,Xn) = g(M) = ao + aIM + azMZ + ... + anMn the Newton representation of the interpolation polynomial function with the data (M k' ik), k = J,2, ... ,n, is

f(xI,x2,""x n ) =bo +bl (M -M 1)+b2(M -M 1)(M -M 2)+ .....

..... +bn- l (M -M 1 )(M -M 2) .. ·.(M -M n-l) (26)

where

bo =/1

bl =h -11 = tV x2 -xl x2 -xI

b2 = h - 2h + fJ = ___ .1.....;2/'--__ (x3 - xI )(x2 - xl) (x3 - xI )(x2 - xl)

.1n l bn- l = ----=--n

IT (xk+1 - xI) k=l

(27)

With the Newton representation the coefficients of the polynomial form is given directly by the output ofthe function I (xl, x2 , ... , xn ) .

For the XOR, the data are

Xl X2 Fk Mk 0 0 0 0

1 0 1 1

0 1 1 2

1 1 0 3

and we can compute

(28)

295

The Newton representation of XOR is

(29)

and with M = 20 xl - 21 x2 ' we obtain the polynomial form

(30)

which is the Heaviside Fixed function for XOR.

A minimal polynomial form can be computed with the Newton representation. In fact a polynomial form P miixj,x2> ... ,xn) exists with a minimum degree for which

r(f(Xl, x2 , ... , xn» = r(p min (Xl' X2 , ... , Xn» (31)

This theory of the minimal polynomial form is given in Dubois and Resconi [7].

Now, let us show some applications in soft computing with the software TurboBrain.

3. Soft Computing with the Software TurboBrain

Let us consider a neural network with n inputs: Xi> X2, X3, ... , Xm where Xj E {O,l} i = 1,2, ... ,n are Boolean variables and m outputs are: Yi> Y2, Y3, ... , Ym, where YjE {O,l} j = 1,2, ... ,m are Boolean variables. The supervised learning considers samples of inputs and the associated outputs presented to the network. So the problem to be resolved is presented as the following partial truth table with multiple outputs:

X, X 2 Xn * Y , Y2 Ym

1st sample: 0 0 0 * 0 1 0 2nd sample: 1 0 0 * 1 0 1 3rd sample: 1 0 1 * 1 0 0 other cases: nil nil nil * nil nil nil

The direct learning of the input patterns in different positions or scales in the vision field is supervised, i. e. the wanted different outputs are coded by the enduser after his criteria for the future recognition of similar patterns. In the non-linear direct learning, the number of neurons is exactly equal to the number of outputs and does not depend on the number of inputs. All the non-linear neurons learn in a parallel mode. TurboBrain is able to recognise both the learnt patterns and similar patterns. It is able of identification and classification of patterns. TurboBrain can finally memorise and rebuild any presented pattern after a direct learning.

296

With TurboBrain, the end-user can choose between 3 types of logic: the linear logic (see Resconi and Raymondi [12] for details) or non-linear threshold logic and the fixed non-linear threshold logic, this 3rd logic doesn't use the Heaviside function. All the parameters, that is to say the threshold and the weights, of both the linear and non-linear neurons are given with integer values.

Let us give a few examples of the power of TurboBrain to give solutions to problems.

3.1 Solution of the Parity Problem

In the parity problem the output neuron y is 1 when the number of input neurons equal to 1 is odd and 0 otherwise. After a direct supervised learning, with 3 inputs XJ, X2 X3 and 1 output y. TurboBrain gives the 3 following solutions:

For the linear threshold logic, TurboBrain computes the 4 hidden neurons Yh Y2, Y3. Y 4 and 1 output neuron y, which realises an AND operation

YI = f(2xI - 2x2 - 2x3 + 3)

Y2 = f(2xI - 2x2 - 2x3 -1)

Y3 = f(-2x) -2x2 +2x3 +3)

Y4 = f(-2x) +2x2 -2x3 +3)

Y = f(2y) +2Y2 +2Y3 +2Y4 -7)

For the non-linear threshold logic, TurboBrain computes one neuron

32)

For the Heaviside Fixed Function, TurboBrain computes one non-linear neuron

y = X\ + X2 + X3 - 2X\X2 - 2x\X3 - 2x2X3 + 4X\X2X3

= r(x\+x2+xr2x\xr2x\xr2x2X3+4x\X2X3)

3.2 Learning and Recognition of Characters

(34)

In the learning phase, 4 patterns of the digits 0, 1, 2 and 3 are presented as input to TurboBrain by 4 matrices of 3 by 5 pixels, and their corresponding outputs given by the binary representation ofthe digits, i.e. 00, 10,01 and 11. Figure 1 shows the 4 learnt patterns and the correct recognition of corrupted patterns.

297

Fi.g 1. Learning of 4 input patterns and recognition of 4 corrupted patterns by TurboBrain

The direct supervised learning of TurboBrain computes the two following output neurons

YI = r(-2xIO+ 1) (35)

Y2 = r(-2xIO- 2x12 + 2x14+ 1)

where Xi represents the value (1 = black and 0 = white) of the pixel at position i (from 1 to 15 in beginning by left up). The pixel xlOis 1 for 0 and 2, so YI=O. Three pixels are necessary to classify the second output Y2. For example, for 0: xlO=l, xl2=l and xl4=l, so Y2=0. Evidently if a pattern given by only one pixel xlO=l will be recognised as 0 because YI=O, Y2=0. This is due to the limited number of learnt patterns. For avoiding this case, it is evidently necessary to increase the number of samples or to increase the number of patterns as shown in Fig. 2.

Fig. 2. The patterns of digits 4 to 9 have been added to the patterns of digits 0 to 4 in Fig. 1. The corresponding corrupted patterns have been correctly recognised by TurboBrain

298

Indeed in adding the learning of the digits 4 to 9, as shown in the figure 2, TurboBrain computes 4 neurons corresponding to the coding of the outputs 0 to 9, i. e. 0000 to 1001,

Yl = r [-2xs - 4XlO - 2x12 + 2x14 + 3] (36)

Y2 = r [XI5+2x2XI5+(312)XI4X1S-2x2XI4X1S-2x4X14XI5-X(jXI4X15+xsXl4X1S+XIOXI4XJ)-2]

Y3 = r [2X2 - 4X6 - 2x14 + 3]

Y4 = r [2X4 + 2X6 + 2xs - 5]

This is a very elementary example which is given for explaining the direct learning process. It permits to understand the meaning of the neural polynomial equations. Evidently, more sophisticated patterns are to be considered in pratice and many examples of different types of digits would enhanced the recognition power of TurboBrain, as classically made in neural network softwares. But with TurboBrain, even from only one sample of each pattern, it is possible to classify corrupted patterns. Let us notice that if the end-user does not agree with the recognition of patterns, he has the possibility to up-date the learning process in adding these cases with the wanted outputs. Several input patterns samples can be learnt with the same output: for example, the same pattern in different positions, orientations, scales. TurboBrain is also able to memorise and rebuild patterns as shown in the next section.

3.3 Memorisation and Reconstruction of Patterns by Direct Learning

Let us show explicitly the reconstruction of a pattern learnt by TurboBrain. The direct learning corresponds in this case to a direct memory, i.e. only one step of computation to obtain all the weights and thresholds of the neurons. These examples will show the power of the non-linear threshold logic.

Fig. 3. These three patterns "line", "square" and "fractal" are learnt, memorised and rebuilt with TurboBrain

299

In this first case, the first pattern "line" in Fig. 2 is given as a 8 x 8 output matrix (white is 0 and black is 1) meanwhile the input matrix is given by the binary coding of the numbers, from 0 to 63, of the pixels of the output matrix line by line: the successive pixel are numbered as X6 Xs X4 X3 X2 Xl> the first pixel number, 0 in decimal, is then 000000 in binary, and the last pixel number, 63, is 111111 in binary. Let us notice that this pattern corresponds to the parity problem: each black (white) pixel 1 (0) corresponds to an odd (even) decimal numbering, i.e. XI = 1 (0) in the binary coding. TurboBrain computes correctly the output neuron with only one binary digit XI.

(37)

The second pattern "square" given in Fig. 3 is an 8 x 8 output matrix which looks as simple as the preceding one. The same technics is used for the coding of the inputs.

With the linear threshold logic, TurboBrain finds 8 hidden neurons depending on 4 binary digits Xl> X4, Xs and X6 and an output neuron Y

YI = r (2x1 - 2x4 - 2xs - 2x6 + 5)

Y2= r (2xI- 2x4 + 2xs- 2x6+ 3)

Y3 = r (2x1 - 2x4 - 2xs + 2x6 + 3)

Y4= r (2xI- 2x4+ 2xs+ 2x6+ 1)

Ys= r (-2x1 + 2xr 2xs- 2x6+ 5)

Y6= r (-2x1 + 2x4+ 2xs- 2x6+ 3)

Y7= r (-2x1 + 2xr 2xs+ 2x6+ 3)

Ys= r (-2x1 + 2X4+ 2xs+ 2x6+ 1)

y= r (2YI+ 2Y2 + 2Y3 + 2Y4 + 2ys + 2Y6 + 2Y7 + 2ys - 15)

(38)

The number of neurons and parameters is great in comparison with the preceding case.

With the non-linear threshold logic, TurboBrain finds only one single neuron depending only on 2 binary digits, XI and X4

(39)

Only 4 parameters are necessary in the activation function. The non-linear threshold logic realises a compression of data in comparison to the linear one. Moreover it gives some information about the symmetry of the pattern: in this case there are 2 periods in the pattern. For each line we have a succession of black and white pixels characterised by the binary digit Xl> and for the 8 lines of 8 pixels, we have also an alternation of 2 complementary patterns characterised by the binary digit X4.

300

Remark: The Heaviside fixed function is given by the algebraic representation of a Not Exclusive OR of XI and Xz

(39a)

In the non-linear threshold logic and in Heaviside fixed function, the non-linear terms are always given by products of inputs which can be represented by And neurons. So, the smallest neural network of this pattern can be built only with two formal linear neurons

YI = r (XI + Xr 1)

Y = r (-2x1 -2x4 + 4YI + 1)

(39b)

(39c)

Why does TurboBrain give so many hidden neurons (see the network 38)? Because it is designed for computing an And output neuron on the hidden neurons.

This last third case will consider the memorisation of the Sierpinski fractal pattern given as output in Fig. 3, with the same binary coding of the inputs like in the two preceding cases. It is well known that many natural images have a fractal structure.

With the linear threshold logic, TurboBrain finds the following 4 hidden neurons depending on all the binary digits and 1 output neuron Y

YI = r (-2x1 - 4X2- 2X4 - 6X5 + 6X6 + 9)

Y2 = r (-2x, - 2X4 + 2X5 + 2X6 +3) Y3 = r (-2xI - 4X2- 4X3 - 2X4 - lOx5 -lOx6 + 23)

Y4 = r (-2x1 - 4X3- 2X4 + 6X5 - 6X6 + 9)

y = r (2YI + 2Y2 + 2Y3 + 2Y4 -7)

(40)

With the non-linear threshold logic, TurboBrain finds only one single non-linear neuron depending also on all the binary digits, but the number of parameters is dramatically low in comparison with the linear case

(41)

The fractal structure of the Sierpinski map is well reflected in the non-linear argument of the Heaviside function.

Remark: Similarly to the preceding pattern, it is possible to design the smallest neural network from the Heaviside Fixed Function

(41a)

which gives an output neuron Y in function of two And hidden neurons YI , Yz

301

Y\ = r (X2 + X5 - 1)

Y2 = r (X3 + X6 - 1)

Y = r (-2x\ -2x4 -4y\ + 4Y2 +3)

instead of an And output neuron in network 40.

(41b)

(41c)

(41d)

Remark: An other way to code a pattern is to divide it in a set of sub-patterns. For example, the 8 by 8 pixels fractal pattern, given in Fig. 3, can be divided by 16 sub-patterns with 2 by 2 pixels.

TurboBrain computes the 4 following neurons for the fractal pattern

Y\ =r [ -2*x\ - 2*X3 - 4*X2X4 + 3 ] Y2 =r [ -2*x\ - 2*X3 - 4*X2X4 + 3 ]

Y3 =r [ -2*x\ - 2*X3 - 4*Xz,X4 + 3 ] Y4 = r [-2]

(42)

where the sub-patterns are coded from 0 to 15 with the binary digits X4XJXz,Xh from 0000 to 1111. The first 3 neurons are identical and the last one is always 0 corresponding well to the iterative construction of the fractal pattern.

This procedure permits to memorise big patterns with a limited number of inputs but the number of neurons is greater, here there are 4 neurons with 4 inputs instead of 1 neuron with 6 inputs as in the preceding case.

3.4 Parallel Computation

One of the most important operation in computation is concerned with the arithmetic sum of two bits string.

As Konrad Zuse ([17], p. 193) writes in his book: "Anyone who constructs a calculating machine starts in general with the adder unit. The difficulty is carrying digits."

Classically, a certain number of steps are necessary before outputting the correct answer: for example, for a 2-bit sum S between X and Y, we write

To obtain the digits of S, we start adding from right to left the digits of X and Y, and using the carry in the next step.

With the direct algebraic learning in TurboBrain, it is possible to design neural mathematical operations without carry. As an example [7, 8], an adder for numbers

302

given by two binary digits is given by the three non-linear Heaviside fixed functions:

Yl= r (Xl+X2-2xlX2)

Y2= r (X3+X4+XIX2-2xJX4-2xlX2X3-2xlX2X4+4x,X2XJX4)

Yc= r (XJX4+XIX2X3+X,X2X4-2xIX2X3X4)

(43)

Let us consider the addition of 3+3, i.e. 11 and 11 in binary digits: X,=X2=X3=X4=1. From the above activation functions, we obtain: Yl=O, Y2=1, Yc=l, i.e. the binary number 110, which is 6 in decimal notation.

4. Conclusion

The parallemism of the neural brain is considered as an important property to explain the high speed of response of the brain, by example, in pattern recognition, although the neural dynamics is far slower than electronic processors. The McCulloch and Pitts model [10], with a Heaviside function with a linear argument, related to the weights and thresholds of the neurons, have no solution for a parallel processing. This threshold logic, using linear argument of the Heaviside function, is not capable of resolving all Boolean functions with only one neuron. In the technology of neural networks, most methods are based on hidden neurons to resolve the Boolean functions and a big number of iterations are necessary for learning. There are some difficulties to obtain the global minimum as the number of neurons increases and the combinatorial explosion gives many problems for practical applications.

The non-linear threshold logic, presented in this paper, shows the possibility to design a totally parallel neural architecture. Each neuron is connected to all the inputs and the number of neurons is equal to the number of outputs. The weights and the thresholds are computed from a direct supervised learning. The example of the parity problem with exclusive OR shows the power of the direct learning with only one single non-linear neuron. The direct supervised learning of patterns generates neural polynomial arguments of the Heaviside function, which give informations about these patterns as in the example of the fractal map.

The theory and the applications presented in this paper have the purpose to show that the non-linear threshold logic with direct supervised learning is a new tool in the framework of soft computing.

303

References

1. J.A. Anderson, E. Rosenfeld (eds.): Neurocomputing: Foundations of Research, The MIT Press Cambridge, Massachusetts, London, 1988

2. M.L. Dertouzos: Threshold Logic: A Synthesis Approach. Res. Monogr. no. 32, The MIT Press, Massachusetts 1965

3. D.M. Dubois: Self-organisation of Fractal Objects in XOR rule-based Multilayer Networks. In: EC2 (ed.): Neural Networks & their Applications, Neuro-Nimes, Proceedings ofthe 3rd International Workshop 1990, pp. 555-557

4. D.M. Dubois: Le Labyrinthe de l'Intelligence. InterEditionsIParis-AcademialLouvainla-Neuve, 2nd edition, 1990

5. D.M. Dubois: Mathematical Fundamentals of the Fractal Theory of Artificial Intelligence. Communication & Cognition - Artificial Intelligence, 8, I, 5-48 (1991)

6. D. M. Dubois, G. Resconi: Mathematical Foundation of a Non-linear Threshold Logic: a New Paradigm for the Technology of Neural Machines. Academie Royale De Belgique, Bulletin de la Classe des Sciences, 6eme serie, Tome IV, 1-6,91-122 (1993)

7. D.M. Dubois, G. Resconi: Advanced Research in Non-linear Threshold Logic Applied to Pattern Recognition. COMETI European Lecture Notes in Threshold Logic. Edited by AILg, Association des Ingenieurs sortis de l'Universite de Liege, DI1995/3603/02, 1995,182 p.

8. D.M. Dubois, G. Resconi, A. Raymondi: TurboBrain: A Neural Network with Direct Learning Based on Linear or Non-Linear Threshold Logics. In: T. Oren, G. 1 Klir (eds.): Computer Aided Systems Theory - CAST'94, Lecture Notes in Computer Science, vol. 1105, Springer, 1996, pp. 278-294.

9. K. Fukushima, M. Sei, I. Takayuki: IEEE Transactions on Systems, Man and Cybernetics SMC-13:826-834 (1983)

10. W.S. McCulloch, W. Pitts: Bulletin of Mathematical Biophysics 5:115-133 (1943)

11. M.L. Minsky, S. Papert: Perceptrons. MIT Press, Cambridge, MA, 1969

12. W. Pitts, W.S. McCulloch: Bulletin of Mathematical Biophysics 9:127-147 (1947)

13. G. Resconi, A. Raymondi: A New Foundation for the Threshold Logic. Quaderno n03/93 del Seminario Matematico di Brescia, 1993F. Rosenblatt: Principles Neurodynamics. Spartan, Washington, DC, 1961

14. D.E. Rumelhart, G.E. Hinton, R.I. Williams: Nature 323:533-536 (1986)

15. D.E. Rumelhart, G.E. Hinton, R.I. Williams: Learning internal representations by error propagation. In: D.E. Rumelhart, lL. McClelland (eds.): Parallel Distributed Processing Explorations in the Microstructures of Cognition, Vol. I, MIT Press, Cambridge, MA, 1986, pp. 318-362

16. K. Zuse: The Computer - My Life. Springer-Verlag, Berlin, 1993

The Morphogenetic Neuron

Germano Resconi

Catholic University, via Trieste 17, Brescia; Italy

[email protected]

Abstract: A simple conventional neural model has difficulty in solving a lot of problems that are easy for neurobiology. In fact, for example, in the conventional neural model we cannot generate a model by the superposition of its sub-models. Neurobiology suggests a new type of computation where analogue variables are encoded in time or in frequency (rate of spikes). This neurobiology computation, recently described by Hopfield [3], is a particular case of a new computing principle [5]. The first step of this principle was given by Gabor (1954) with his hologram physical process [7] and also with his original intelligent machine. Hoffman [11, 12] gives a Lie transformation group as a model of the neuropsychology of visual perception. We can consider this model as a particular case of new computing. In the new computing principle, a computation in an intelligent machine is capable of perceiving order in a situation previously considered disorder [5] and can learn how to choose among functions the function that best approximate the supervisor's response [4]. The instrument to perceive order is the Morphogenetic Neuron, whose elements are the morphogenetic field (MF), the morphogenetic reference space (MRS) the morphogenetic sources (MS) and the morphogenetic elementary field (MEF). To learn and perceive order the new principle create sources or MS to generate elementary fields MEF. Linear superposition ofMEF gives us the desired field or MF. The ordinary neuron [3J is a special case of MF for crisp weights. For fuzzy weights we can improve the ordinary neuron. The morphogenetic neuron becomes a complex membership function for a fuzzy set. Experiments of primary cortical representation of sound suggest that co-ordination tuning forms an organised topographic map across the cortical surface (stimulus specificity) or MF. Other experiments suggest that primates, birds and insects use local detectors to correlate signals sampled at one location with those sampled after a delay in another locations. The animals reshape the MF to obtain desired results.


305

1. A New Computing Principle and Morphogenetic Neuron

1.1 Definition of the Morphogenetic Neuron and Computing Principle

New computing principle [5]:

The desired morphogenetic field MF is generate by morphogenetic sources MS in the morphogenetic reference space MRS.

We remark that for the new principle the computation is holistic as in the process to write and to read an hologram. The new principle extend the first idea of hologram to a novel type of computation where the holographic memory or the logic optical device are only particular cases.

The new computational principle realised by the morphogenetic neuron (MN) is based on four main parts:

1. The morphogenetic reference space (MRS)

2. The morphogenetic sources (MS)

3. The morphogenetic elementary field (MEF)

4. The morphogenetic field (MF).

The name morphogenetic neuron MN comes from the idea that the novel neuron generate structure as solution of problems.

Elements in MN are related in this way: any source or MS generate one MEF in the MRS. the MF is the linear superposition of all the MEF generate by the sources.

Remark: The MS will be considered as the input and MF the output of the MN.

In this paper we show that the morphogenetic neuron with its four parts can implement a lot of functions and build the new computing principle defined in Fatmi-Resconi papers [5].

1.2 Historical Background of the Morphogenetic Neuron

We argue that the first idea of a morphogenetic structure can be originally found in the work of the physicist Huygens (1629-1695). For Huygens any wave in a media is the superposition of elementary fields generate by virtual sources. Huge number of independent sources generate fields which superposition give us the field or wave that we observe. At that time Huygens used its idea of sources to compute the field of waves without the solution of the differential equation.

306

In 1963 Maurice Jessel [10] enlarged on the idea of Huygens and compute sources (secondary sources) not only to calculate the propagation of the wave. but also to generate wanted fields. The wanted fields generated by secondary sources will be denoted morphogenetic fields or MF in this paper.

In 1954 and later the late professor Dennis Gabor [9] outlined a mathematIcal principle on which the intelligent machine of the future could be designed. The Gabor model include a novel class of computers based on this non-linear operator

n n n n n n

OP[f(t)] = ~ w .f.(t)+ ~ ~ w. J.(t)f.(t)+ ...... + ~ ~ ..... ~ w.. f.(t)f.(t) ... f (t) (I) £.J 1 1 £.J £.J I,J 1 J £.J £.J £.J 1,J,_8 1 J 8 i=l i=l j=l i=l j=1 8=1

where (fl(t).f2 (t) ........ fn (t» are the signal input. WbWi,j ....... wi,j, ... ,s are the

response coefficients which minimise the square difference between OP [f] and the target function g(t).

Gabor used equation (1) to filter and the predict data.

For

where N is the number ofthe terms in the polynomial (1) we obtain

k= 1.2 ....... N

We denote M(t) as the MF. Pk(t) as the MEF and Wk as the MS.

(3)

In different papers. Hoffman and others [11.12] create the LTGINP or Lie transformation group model of neuropsychology to obtain a model of the visual perception. The infinitesimal (local) generator of smooth transformation is a differential operator. or Lie operator. of the form

N 0 Df = l:Wj(zl ....... zN)- f

j=l 0 zi (4)

The Lie operator D is a local vector operator and form also a vector field in the N dimensional space. The partial differential operators OIOZi take components of a vector. The expression of D f measures the total rate of change in some functions f. expressing this in terms of N components in orthogonal direction. The simple and complex cortical receptive field units discovered by Hubel and Wiesel [19] have vector-like properties. for each unit has associated with it a position. direction. and (probability) magnitude. All of this vast amount of new knowledge about cortex tells us little about how the local processes embodied in individual neuronal activity are integrated into coherent operations which generate macroscopic properties of vision. The fundamental postulate in LTGINP is that the local vector

307

(2) is interpreted in terms of Hubel and Wiesel neural processing and the integrative process is best understood in the global property of the vector field given in (2). The points for which D[f] = 0 are the points that give us the invariant in the visual perception.

In this paper the vector field Df is the MF, The partial differential operators a/azi or components of Df are the MEF and the functions Wi (zl, Z2 ..... ,ZN) are the MS.

Example:

(5)

Df is the morphogenetic field or MF, the MEF are the two derivatives af af

(-- , -- ) , and the MS are ( z2' a zl a z2

-z ) 1

In the equation (5) when f = Z\2 + z/ (circle) Df = O. The morphogenetic field Df is equal to zero, When

(ellipse) (6)

the morphogenetic field is different from zero and its value is

Df= 2 (a- b) z\ Z2 (7)

Df gives the difference from a visual perception of the ellipse and visual perception of the circle as the reference vector

In the New computing principle the parts (sources) contribute in a synergetic way to obtain global MF which has the wanted property. For example in the study of Germ cell formation [16] the MF, is the concentration of special proteins in any point of the cell. The sources MS of this protein, inside the cell, have position and intensities calculate in a way to obtain the wanted MF which carry the information to built the embryo by the diffusion process of MEF. For example when the MF consist of the protein "NOS" [16], high local concentration of the protein in the posterior of the Drosophila embryo are necessary to inhibit translation of the transcription facto Hunchback in this region, and thus permit expression of genes required for abdomen formation. The fundamental problem is to obtain the sources that can generate wanted morphogenetic field.

1.3 Integral Form of the Morphogenetic Neuron, continuous case

For different applications is useful to give an integral form of the morphogenetic field in this way:

308

+00

f MS(yl'y2'·····'ym) K(xl,···,xn,yl,···,yml Pl'··'Ps) dyl···dYm (8)

Equation (8) can be written in this simple way:

+00

MF (x, p) = fMS (y) K(x, yip) dy (8')

where MS(y) are the morphogenetic sources locate in the point y.

The kernel K(x, yip) is the morphogenetic elementary field MEF generate from a source in the point y. Morphogenetic Elementary Field is controlled by the parameter p, x is a point in the Morphogenetic Reference Space or MRS.

In (8) for any point y we have one MEF which structure is similar in any other points where are located the sources. With the parameters p we change the structure of the elementary field. With the linear superposition of all the MEF we obtain the wanted field or MF.

Examples

I) First example of (8) is the Wavelet Transform of a time series set) [14]

1 +00 t - to T'I' (to, a) = - f s(t)", (--) dt

a a (9)

-00

In (9) the elementary field K(x,y,p) is '" «t - to )/a), MS(y) is s(t)/a, and the MF(x,p) is

T'I' (to, a).

2) Second example of (8) is the model of the effect of the auditory nerve, and of the onset-C cells [13]. In [13] the rectified filter output is convolved with a difference of Gaussian averages convolving function. The convolving function, g(t,k,r) is

g(t,k,r) = f(t,k) - f(t,klr) (10)

where k >1 and F(x,y) =" (y) exp (- y x2), The convolution is

309

+00

Cj (t,k,r) = J g (1:, k, r) Sj (t - 1:) d1: (11)

where Sj (t) is the rectified output of the j-th filter. MS(y) is the function g (1:, k, r), K(x,y,p) is

Sj (t - 1:) and MF(x) is Cj (t,k,r) .

3) Third example of (8) is formal solution of the sound wave equation, Dalembert equation, with the sources S(x,t) by the Green function G (x,t,y,1:) [20]

F(x,t) = f S(y, 1:) G (x,t,y,1:) d1: dy (12) R2

F(x,t) is the sound field in the point x and at the time t, the function G (x,t,y, 1:) is the Green function. In three dimension we have

G (x,t,y,1:) = 0 (t - 1: - ric) 14 1t r (13)

where r is the distance between two points inside the three dimensional space.

The function S(y, 1:) are the sources of the sound field. When we compare (12) with (8) MF is F(x,t). MS(y) is S(y, 1:), the elementary field K(x, yip) is G(x,t,y,1:).

4) The fourth example is the Laplace transform [18] of the function f(t)

Remark:

00

F (a. + j~ ) = J f(t)e-<u+jl3)tdt = L f

o

When in the formula (8)

K (x, y, p) = K ((x - y), p)

(14)

(15)

the Laplace transform of the MF in (3) is the product of the Laplace transform of MS(x) and K(x, p).

L MF(x, p) = L MS (x) L K (x, p) (16)

310

1.4 Additive Form of the Morphogenetic Neuron

When the space is discrete, equation (8) can be written in this way:

Equation (17) is similar to equation (8) where we substitute the integral operator with the sum operator. Now we give some examples and applications of the morphogenetic neuron (17)

1) First example of (17) can be found in tyre modelling for vehicle dynamics studies with Radial Basic Function Neural Network [8] which form is

N 22 22 22 L -[( x- c. ) /a x +( y - c. ) /a Y + ( z - c. ) /a zl f(x,y,z) = p. e JX JY JZ

J (18)

j=l

In (18) the elementary field K(x, yip) is gaussian-like function

-[(x-Cjx)2/a\+(y -Cjx)2/a2y+(z - Cjx)2/a2z] h MS( ) h th e , t e y are t e parameters Pj, e MF(x) is f(x,y,z).

We explain the meaning of equation (18) and its properties. Given samples j = 1,2, .. ,N of the function f(x,y,z), where f is the lateral force or the self-aligning moment or other variables of the tyre, we calculate the sources Pj by (18) in a way to minimise the error. After by (18) again we can found the values of f(x,y,z) in all the other points (interpolation). With (18) we approximate any non-linear multidimensional continuous function as accurately as it is wished, simply by increasing the number of samples. Equation (18) is useful when we cannot find a model of the tyre, when the problem is multidimensional, the amount of samples is redundant compared to information complexity (large amount of experimental samples).

2) Second example of (17) is the fuzzy control or B-spline interpolator [6, 7]. Such a fuzzy controller can learn to approximate any known data sequences and to minimise a certain cost function. The form of the B-spline is

311

ffi, ffi2 ffiq q

= ~ ~ .......... ~y.. . fIN. (x.) £... £... £... 1,,12 , ___ .,1 I.,n. J i=O i =0 i =0 q j=! ) )

2 q

(19)

where Nij,nj (Xj) are different fuzzy sets distributions with the maximum value in the point Xj with the type nj . If the input space is partitioned finely enough and at the correct positions, the interpolation with B-spline hypersurface can reach a given position. The output of the fuzzy controller can be flexibly adapted to anticipated values.

The formula (19) can be compared with the formula (17) in this way. The q

elementary field K(x, yip) is the form fI Ni,n. (x j ), where the parameter p is j=! ) )

the set of parameters (nl, n2, ..... , nq), the variable x is the vector x = (Xlo X2, ...... , Xq) the centre average parameters yil,i2, .... ,iq are the MS(y) functions and the result of (19) Y(XIo X2, ...... , Xq) is the MF(x) function.

Remark: In this example the construction of the elementary field K(x, yip) from functions Nij,nj (Xj) at one dimension is considered as a logic product of the fuzzy sets elements. The superposition of the weighted elementary field functions yil,i2, .... ,iq n Nij,nj (Xj) is the defuzzification process. The (19) can write as an inferential rule in this way:

Rule (iI, iz, ..... , in): IF (x! is Ni"n! (xl) and (X2 is Ni2,n2 (X2) and ...... and (Xq is Niq,nq (Xq) THEN Y is yil,i2, .... ,iq, Under the following condition

a) "product" as fuzzy conjunction, and

b) "centre average" defuzzification method

We can give a logical form to the morphogenetic neuron.

1.5 Properties and Application of the Morphogenetic Neuron

a) Sampling and Interpolation Given the elementary field K(x, yip) and the function MF(x) in a finite number of points, with (17) it is possible to know the function MF(x) for infinite number of points. With (17) we can built the linear system

MF(x. ,x. , ... ,X. ) = J, J2 Jq

312

ml m2 mq ~~ ......... ~MS(y .• y .•.....• y. )K(x .• x .•...• x .• y .• y .•.....• y.IP1 •...• p) (20) ..t.....t... ..t... 11121 -'1J2 J11 12 1 S i =li =1 i =1 q q q 1 2 q

where the unknown variables are the sources MS(y) . With these sources, by (17), we can generate the continuous function MF(x) which is the interpolation function of the samples of the field MF(x). With MF(x) we can also predict new values when we enlarge the reference space or MRS.

Remark: Athanasios Papoulis [18] suggested a relation between the rate of change of an arbitrary function f(t) and the behaviour of the spectrum F Gro) in the Fourier transform. For the condition (15) and formula (16) when the Laplace transform of the MEF is low pass filter the discontinuity or high rate in MS are eliminate and MF is a slowly varying function.

b) Derivation and Integration of MF when Samples of MF Are Known in Some Points Given samples of MF in some points of the reference space, we can find the sources MS by the system (20). With the formula

a MF(X1,x2' .... ,Xn )

a Xk

for continuous cases or the formula

a MF(xl'X2' ... 'Xn)

a Xk

ml m2 mq a K(xl ,x2, ... ,x ,Yo ,Yo , ..•..• y. IPI, .. ,p ) ~ '" '" n II 12 I S ~~ ......... ~MS(Yi 'Yi ,·····'yi ) a q (22) i =li =1 i =1 I 2 q Xk I 2 q

for discrete cases we can calculate the derivative of the MF.

313

For the equations (21) and (22) the derivative of the MF is the linear superposition of the MEF

a K(xl ,x2 ,· .. ,x ,yo ,Yo , ..... ,y. 1PI' .. 'P ) n 11 12 Iq S

(23)

The derivation with the morphogenetic neuron is simpler than the classical derivative with analytical methods. In fact we use only algebraic operations among the MEF in (23) and the calculated sources MS obtained by (20) to realise the derivattive of the MF.

Conversely, when we know the derivative of the field MF with (20) or the (21) we can calculate the sources MS and with the equations (8) or (17) the integral of the derivative of MF. Also in this case the integral is obtained by algebraic calculation with the Morphogenetic Neuron.

Remark: When

(24)

we have that

a K(Xi'yjlp) a K(xj,yjlp) =-

a \ a Yj (25)

so when the MF is equal to zero at the boundary of the reference space we have

a MF(X1'X2' .... ,Xn )

a Xj

and in the diagram form we obtain

314

MS • MN • MF

1 1 (27)

al a Yi ala Xi

1 1 aMS/aYi -. MN • aMF/axi

The Morphogenetic Neuron MN for which MF = MN MS is the MN in (8). We conclude that the (24) and the (27) the derivative of the MF in the space X and the derivative of MS in the space Y are equivalent.

Example: Given the symmetric kernel or elementary field (monopole, Fig.I):

2 K(x,ylp)= he-(x-y) P withp;:::O (28)

Fig. 1. Monopole morphogenetic elementary field

The derivative of the elementary field (bipole, Fig. 2) is another elementary field

a K(x,ylp) _ 2(he-(X-y)2p (29) a x

Fig. 2. Bipole morphogenetic elementary field

315

c) Solution of the Differential Equations by MN Given the differential equation

a Zk(t l 't 2' .... ,t ) ------'-p-= F. k(ZI'Z2' ..... ,Z ) a t. J, q

J

with the (21) we have

a Zk (t l ,t2, .. ·,tn)_

a tk -

For the (30) we can write

With (32) we found the sources MS. With the sources MS for the formula

ffi( ffi2 ffiq

(30)

(32)

~~ ......... ~MS(y. ,yo , ..... ,y. )K(t1,t2, ... ,t ,yo ,yo , ..... ,y. IP1, .. ,p) (33) £..i £..i £..i 1( 12 1 n 1( 12 1 S i =1 i =1 i =1 q q 1 2 q

we can found an interpolated form of the solution of the differential equation(30).

d) Computation of the inverse function

First Method Given the function

316

(34)

we can built this q morphogenetic neurons

N

xk =q>k (y)= LMSk(yi)K[y,F(xli'x2i'· .. ·'Xqi)' p)] (35) i=l

where there is k Morphogenetic Neurons MN, one for any component of the vector x k.

When the MEF is the monopole

with the samples

(37)

We calculate the sources MS by (35) and again by (35) we can obtain the inverse function of (34).

Second Method We write the (34) in function of the parameter t

We built q +1 morphogenetic neurons which forms are n

Xl (t) = L w l,i K(t, 'tl,i' PI) i=l

n

X2 (t) = L w 2,i K(t, 't2,i' P2) i=l

n

y(t) = L w q+I,iK(t, 'tq+l,i ,Pq+l ) i=l

(38)

(39)

(40)

317

In this way we have q+ 1 morphogenetic field MF and q+ 1 morphogenetic sources MS.

With the samples

k=I,2, ... ,p (41)

we can calculate the sources w r. i where r = 1,2, ..... ,q+l and i = 1,2, .... ,n.

In the morphogenetic neuron M

n

t = cp (y) = ~ MS. K(y, yet. ),p) £..J 1 1 (42)

i=l

we calculate the sources MS i i=1,2, .... ,n. In conclusion when we know the value of y by the neuron (42) we can calculate the parameter t, and after by the neurons (39) all the set of variables x . In this way we can compute the inverse function of (38). With the parameters p we can control the process by which we can calculate the inverse function.

e) MF for Pattern Identification Method (The method in the first and second step was developed by G. Massussi in his diploma thesis).

Given a set of patterns or Reference Patterns

(43)

In the p-dimensional space, given the pattern P, we want to decompose P in the weighted sum of the reference patterns:

(44)

To obtain the coefficients ~i without ordinary mathematics methods, but only with the morphogenetic neuron, we must create n morphogenetic neurons, one for any reference pattern in this way:

First Step:

For any reference pattern Pk k = 1,2, .... ,n we create one M dimensional Morphogenetic Neuron

318

M 2n _p ~ ( W -x. )2

MF(W) = fi (W) = "'" <l .. e k=l k J,k £....J I,J (45)

j=l

In the M dimensional space there are 2n points. Any pattern Pk with k = I, .. ,n is divided in M parts and any parts has a numerical value. At any point X in the M dimensional space We associate the M parts of the pattern. Inside the space we put the n points X for all the patterns and other n symmetric points - X.

Second Step: For any neuron the morphogenetic field is

fi (X i) = I, fi (- Xi) = - 1 and f i (X h) = 0 for h * I (46)

for the symmetric conditions in (46) we have

fi (0) = 0 (47)

Third Step:

Given the MF by (45) we can calculate the coefficients <X;j.

Fourth Step: With the equation (45) we can separate the patterns Pk inside the pattern P with

this theorem

Theorem:

There exist a value of the parameter p for which

Proof:

s f; (L~h X h ) = ~ i

h=l

For p - 0 the equation (45) can be approximate so that

M S -p~(W _X.)2 S M

fi(w)=""'a. .. e k=l k J,k = ""'a. .. [1-p""'(Wk -X· k )2] £..i I,J £....J I,J £..i J, j=l j=l k=l

With some computations we obtain

Because

we obtain

Because

319

M S M S -PL(X. )2 ~a .. (1-P~(X'k)2== ~a .. e k=1 1.k =f.(O)=O £... I,J £... J. £... I,J I j=l k=l j=l

S

fl' (W) == - [ ~ a .. (Wk )2 - 2a . . X. kWk] P £... I,J I,J J, j=l

n S

S pI aj,j == 0 we obtain

j=l

f·(W)== I I (2pa· ,)X'kWk 1 I,J J, k=lj=l

For the condition (46) we have that

n S ~ ~ (2pa .. ) X . k X k = O. where &. = 1 if j = sand 0 . = 0 if j :;to s £... £... I,J J, S, J,S J,s J,S k=l j=l

in conclusion when

S

Wk = l)hXh,k h=l

320

we obtain

s f i (L~h Xh ) = ~ i

h=l

• With the new computing principle or the morphogenetic neuron it is possible to

separate patterns superposed in one with different weights ~k' without using the ordinary mathematical methods. Given an MF as a pattern, it is possible to calculate, the sources MS without solving the system (20). The source MSi is obtained with simple substitution of the field MF in the i-th neuron.

For the flexibility of the neuron method, we can feet experimental data without the method of the minimum square, without use the derivative of the reference patterns or model and without solve every time a big non-linear system.

2. The Conventional Neural Model as a Particular Morphogenetic Neuron

2.1 Conventional Neuron

Given_the variables

x = (Xl. X2 ••••••.. , Xn)

The conventional neuron [3] is given by the formula

with

n

vin=vbias+ LWiXi i=l

v = 1 orVout=h[vin] where ifx>Othenh[x]=lelseO out (1 + e -v in )

(48)

We can show that the conventional neuron (48) is a particular case of the morphogenetic neuron. In fact the (48) can write in this way

M n Vin(Xi) = Vbias + L L f(Yi,p) K( Xi ' Yi.p)

p=li=l (49)

321

where for the points Yi,pp =l, ... ,M

The MEF in (49) is given by the formula

where

The function fin (49) is

Remark:

{I if x=o

8 ( X) = 0 if X * 0

f(Y. ) = 2Y. -1 I,p I,p

Any p-th element in the (49) is

n

Vin(Xi)= Vbias+ L(2Yi -1)Xi 1=1

for 2Yi - 1 = Wi we obtain n

Vin(Xi) = Vbias + L WiXi i=l

This is the ordinary neuron that realise the simple function

(49')

(50)

(51)

which is equal to one only when Xi = Yi, p and equal to zero in all the other cases .

• In conclusion the formula (49) is the superposition ofM simple functions (51)

Remark: For the previous remark, the MN in (49) can be written in this way

n M

Vin(Xi)= Vbias+ ~ [~( 2Y. -1) 8 (X. - Y. )] x. L... L... I,p I I,p I i=l p=l

322

when the weights are

Equation (49) has the same form as that of the ordinary neuron in (48) .

• The last point in this computation is to calculate the value of the variable Vbias'

The value of Vbias can be calculated by the formula

1 M M Vb' =-- [MIN (~f(Y.)K(X,Y.))+ MAX ( ~f(Y.)K(X,y.))] (52)

las 2 XeA ~ J J XeA C ~ J J .F1 J=1

where the set A is the set of the vectors· X for which Vout(X) "" 1, with the condition that

M M

MIN ( ~ f(Y.) K(X,Y.)) > MAX (~f(Y.) K(X,Y.)) (53) XeA # J J XeA C # J J

Example: Consider the codes given in Table 1. How the second term of equation (49) is obtained in the entries of the table.

Table 1. The evaluation of the second term of equation (49)

code (XI. X2. X3) f (XI. X2. X3)= K(XSu) K(Xj'Y~2) K(Xj,Yj.3) l: f(Y} VOUI K(X;,Y~p)

0 (0,0,0) 0 0 0 0 0

1 (1,0,0) 1 (1,0,0) 0 0 XI- X2-X3

2 (0,1,0) 1 0 (0, 1,0) 0 -XI +X2-X3

3 (1, 1,0) 1 0 0 (1,1,0) XI+X2- X3

4 (0,0,1) 0 0 0 0 0

5 (1,0, 1) 0 0 0 0 0

6 (0, 1, 1) 0 0 0 0 0

7 (1,1,1) 0 0 0 0 0

For the codes indicated, we have

323

(54)

The second term of equation (54) is evaluated in Table 2.

Table 2. Set A and the second term of equation (54) for the codes considered

code (Xl. X2. X3) f (Xl. X2. X3)= Vout setA Xl + X2- 3X3

° (0,0,0)

1 (1,0,0)

2 (0,1,0)

3 (1, 1,0)

4 (0,0,1)

5 (1,0, 1)

6 (0, 1, 1)

7 (1,1,1)

When VOUI= 1

and when VOUI = 0

° (0,0,0) Ii!' A

1 (1,0,0) e A

1 (0,1,0) e A

1 (1,1,0) e A

° (0,0,1) Ii!' A

° (1,0,1) Ii!' A

° (0,1,1) Ii!' A

° (1,1, 1) Ii!' A

M MIN (I,f(Yj ) K(X, Yj » = MIN (1,1 ,2) = 1 XeA j=l

M MAX (I,f(Yj ) K(X, Yj ) )=MAX (0,-3,-2 ,-2,-1 )=0 XeAC j=l

° 1

1

2 -3

-2 -2 -1

The condition (51) is therefore true and Vbias = - 112. In conclusion by the morphogenetic neuron (49) we have the ordinary neuron

(55)

2.2 Parity Problem and Ordinary Neuron

For the parity function indicated in Table 3, we have

324

M MIN (I f (Yj ) K(X, Yj ) ) = MIN ( 0,0,0,0) = 0 XeA j=l

M MAX (If(Yj ) K(X, Yj »= MAX(O,O,O, 0)=0 XeAC j=l

We see that the condition (53) is false. The parity function, as we know, cannot be realised by the ordinary neuron.

Table 3. Parity function

code (XI. X2• X3) f (XI. X2• X3)

0 (0,0,0) 0

I (1,0,0) I

2 (0, 1,0) 1

3 (I, 1,0) 0

4 (0,0,1) 1

5 (1,0, 1) 0

6 (0, 1, 1) 0

7 (1, 1, 1) 1

2.3 Hopfield Remarks on Simple Conventional Neural Model

Recently Hopfield [3] observed that a simple conventional neural model has difficulty to solve the problem to establish if one unknown stimulus Xu is one of the implicit knowledge stimulus a,b,c .... , (pattern recognition) and to recognize the quality of the stimulus Xu and its intensity. To obtain a pattern recognition by conventional neural model it is necessary to convert the input variable X into a vector of fixed length. This structure of network, even with normalization has two additional undesirable features. First, sensitivity to minor components is lost. If the input pattern has intensity 5:5:1, for the linearity of the neuron the weights are 5:5: l.The weights of the minor component has intensity equal to 1 the sensitivity is lost.

Second the variable X is divided for the scalar factor ~IXi 2 . When attempt to

split the recognition problem into sub-part, for the non-linearity of the scalar factor we have

(56)

325

And we cannot simple added the gain confidence of the two pattern. From recognition of sub-parts we cannot recognise all the pattern with simple added operation or superposition operation.

For the morphogenetic neuron we can split the problem in sub-parts, solve the problem in this parts and after built the global solution by adding the subparts solutions.

2.4 Fuzzy form of the ordinary neuron

To solve the problems in the ordinary neuron different methods was invented. For example we can substitute the linear form in the neuron by a non-linear polynomial [17] or we can introduce layers of neurons as in the back propagation [21]. We think that with the morphogenetic neuron we can solve problems impossible for the ordinary neuron.

In fact when we substitute at the crisp function o(x) in the formula (51) with the membership function

n 2 - a~(X.-Y. )

1.1. (X. - Y. ) = e i=1 I I.p I I,p (57)

The MEF in (49') becomes

_ af(x.-y. )2 K(X, YpJ = Xi e i=1 I I,p (58)

Equation (49) becomes

n M

Vjn(Xj) = Vbi .. + ~ [~( 2Y. -1) 1.1. (X. - Y. )] X. = Vbi .. + F (59) £..i £..i I,p I I,p I i=l p=l

If we introduce the fuzzy weights

M

Wj(X) = ~ (2Y. -1)J.l (X. - Y. ) £..i I,p I I,p (60) p=l

we come back to the formula (48)

Equation (59) for M = 4 and Yj,1 = (1, 0, 0), Yj,1 = (0, 1,0), Yj,1 = (0, 0, 1), Yj,1 = (1, 1, 1) (see Table 2) becomes

326

For (61) we have Table 4.

and

Table 4. The value ofF in equation (61) for the codes considered

code (Xl. X2• X3) F

° (0,0,0) ° 1 (1,0,0) 0.865

2 (0,1,0) 0.865

3 (1, 1,0) 0.636

4 (0,0,1) 0.865

5 (1,0,1) 0.636

6 (0,1, 1) 0.636

7 (1,1, 1) 2.594

M MIN ( I f(Yj ) K(X, Yj » = MIN (0.865,0.865,0.865,2.594) = 0.865 XeA j=!

M MAX ( If(Yj ) K(X, Yj » = MAX (0,0.636,0.636,0.636) = 0.636 XeAC j=!

The condition (53) is true and

Vbi,s = - (0.865 + 0.636) / 2 = - 0.68

The extension of the morphogenetic neuron can solve the parity problem with a superposition of the membership functions for fuzzy sets.

327

3. Morphogenetic fields in neurobiology

3.1 Place Field of Hippocampal Neurons as a Morphogenetic Field

In the paper "Geometric Determinants of the Place Fields of Hippocampal Neurons" [2] O'Keefe and Burgess studied the part of brain denoted hippocampus implicate in spatial memory. They report the identification of the environmental features controlling the location and shape of the place fields (MF) of the place cells. O'Keefe and Burgess put rats in a testing room with rectangular form. To understand the hippocampal spatially localized firing they create a model in which the place field is formed by the summation of gaussian tuning curves (MEF) each oriented perpendicular to a box and peaked a the fixed distance from it.

The gaussian tuning curves or MEF are:

Gl( d _ exp(-«h-y)-dn)2/202dn) y, n)-

~27t02dn The other gaussian functions are

2 2

max of G 1 is for y = h - dn

G2( d _ exp(-(y-ds) /20 ds) y, .) - max of G2 is for y = d.

~27t02ds 2 2

G3(x,d.)-- exp(-«w-x)-de) /20 de) • max of G3 is for x = h - de

~27t02de 2 2

G3(x,dw)-- exp(-(x-dw) /20 d w)

max ofG2 is for y = dw

~27t02dw where x and y is the position of the rat, (0,0), (w,O), (O,h), (w,h) are the position of the corner of the room where we put the rat. The walls of the room are oriented to North, South, West and East. The dN is the distance from the north wall where the tuning curve or the gaussian curve has the maximum value, ds. dw. dE are defined similarly. The width cr(x) of the gaussian curve can be function of the position x or can be constant.

The Morphogenetic field MF is the superposition of the four gaussian curves.

exp(-«h-y)-d )2/202d MF= n n

~27t02dn exp(-«w-x)-d )2/202d

e e +

~27t02de

328

Fig. 3. Graphic of the Gaussian function G 1

Fig. 4. Graphic MF when cr = 1 and h = 61, w = 61, d.= 10, d.= 10, de = 10, dw = 10.

329

O'Keefe and Burgess found that the anatomic experiment indicate a dramatic difference in the location of the peak firing rate according to the rat's direction. When the rat is moving north the firing rate of the southern peak is much greater then when is moving south, and vice versa for the northern peak. The position field or MF memorises the first rat's direction. The brain introduces meaning inside the MF so that different structures of MF have different meanings. The model of MF that memorise rat's direction is given when the width is a parabolic function of the x,

When x increase the width increase and the maximum value of the gaussian function decrease. In this way we create an asymmetric position field oriented from the zero value as the maximum to a minimum value when x is at the maximum value.

Remark: When we change the geometry of the box the positIOn field changes. The geometric form of the room is memorise in the MF or position field.

Fig. 5. Firing rate map for cr(x)

330

3.2 Correlation of signals to create morphogenetic elementary field (tuning process)

In the paper "Insect Motion Detectors Matched to Visual Ecology" O'Carroll et al. [1] study how the primates, birds and insects can detect law velocity. They found that animals use local detectors to correlate signal sampled at one location in the image with those sampled after a long delay at adjacent locations. The neuron that correlate signals sampled at the two locations form a morphogenetic elementary field MEF which peak fire at the velocity for which the time necessary to move from one side to another side is equal to the programmed delay.

Mathematical model of MEF:

[ ( (S2 -St) exp -

MEF(V)= V ~21t02

Where V is the image velocity, S2 - Sl is the distance between one location and the adjacent location, the variable D is the value of the delay of the sample at the adjacent location. the cr is the width of the MEF.

The signals and the correlator neuron is represented as follows:

S 1 ~ S)

Velocity V 1 1 Delay D

S2 ~ CORRELATOR ~ MEF

Where S) is the first location, S2 is the second location arrive at the velocity V. At the other side of the diagram we have the delay D. When the two sides are correlate (synchronisation) in the correlator we have the maximum spike. The CORRELATOR is the neuron that has the maximum spike when the two signals are correlate or arrive at the same time.

For many correlators that can comunicate one from the other we have the morphogenetic field

331

Different correlator neurons can create by superposition a Morphogenetic Field in the space of the velocity. When at any MEF we associate the degree of true to belong to a fuzzy set, the aggregation or superposition of MEF or MF give us the degree of true to belong to one velocity or to another.

4. Conclusion

Intrinsic difficulties inside simple conventional neural model and the neurobiology possibilities to solve this difficulties was the guide to create the morphogenetic neuron that we consider as the physical realisation of a new computing principle [5]. We show that in implicit way the morphogenetic neuron was inside the history of science and that the ordinary neuron is a special case of the morphogenetic neuron. This paper define and show the possible applications of the morphogenetic neuron and the absolute necessity of the fuzzy sets theory. We think also that the fuzzy sets is an essential part of the morphogenetic neuron and not only an external part. We can remark that the morphogenetic neuron is the result of the fusion of the genetic field, the fuzzy set and the ordinary neuron. With the novel neuron, different problems, impossible to solve by the ordinary neuron, are solved. We think that this paper is only the first step in the redefinition of the neuron as fundamental biophysical entity. We hope that the connection among the physiological neuron with the artificial neuron described in the morphogenetic neuron can solve other problems and give us new interesting applications of the physiological results.

References

1. D.C. O'Carrol, N.J. Bidwell, S.B. Laughlin & E.J. Warrant, Insect motion detection matched to visual ecology, Nature, vo1.382, 4 July 1996

2. J. O'Keefe & N. Burgess, Geometic determinants of the place fields of hippocampal neurons,Nature, vol 381, 30 May 1996

3. U. Hopfield, Pattern recognition computation using action potential timing for stimulus representation, Nature, vol 376, 6 July 1995

4. V.N. Vapnik (1979) Estimation of dependencies based on empirical data, [in Russian], Nauka, Moscow (English translation: 1982, Springer-Verlag, New York)

5. H.A. Fatmi, G.Resconi, A New Computing Principle, II Nuovo Cimento, vol.lOl B,N.2, - February 1988, pp. 239-242

6. J. Zhang, A. Knoll, Constructing fuzzy controllers with B-spline models-principles and applications, private communication

7. W.J. Gordon and R.F. Riesenfeld. B-splie curves and surfaces. In R.E. Barnhill and R.F. Riesenfeld, editors, Computer Aided Geometric Design, Academic Press, 1974

332

8. P. Maffezzoni, P. Gubian, Approximate radial basis functions neural networks to learn smooth relations from noisy data, Proceedings of the 37th Midwest Symposium on CAS. Lafayette LA, 3- 5 Aug. 1994

9. D. Gabor: Inaugural lectures, Imperial College (London 1959) p.63 10. M.J.M. Jessel, Secondary sources and their energy transfer, Acoustic Letters, vol.4,

No.9,1981 11. W.C. Hoffman, The neuron as a Lie group germ and Lie product, Quarterly Journal of

Aplied Maths, 1968, p.423-440 12. P.C. Dodwell,The Lie transformation group model of visual perception, Perception &

Psychophysics, 1983,,34 (1), 1-16 13. L.S. Smith, Onset-based sound segmentation, Advances in Neural Information

Processing Systems 8, D.S. Touretzky, M.C. Mozer, M.E. Haselmo (eds), MIT Press, 1996

14. P.C. Ivanov, M.G. Rosenblum, C.K. Peng, 1. Mietus, S. Havlin, H.E. Stanley & A.L. Golberger, Scaling behaviour of heartbeat intervals obtained by wavelet-based timeseries analysis, Nature, vo1.383, 26 September 1996

15. Leslie S.Smith, A neural plausible technique for voicing detection and Fo Estimation for Speech, Submitted to ICANN96, March 1996

16. A. Ephrussi & R. Lehmann, Induction of germ cell formation by oskar, Nature, vol 358, 30 July 1992

17. A.D.McAulay, Optical computer architectures, A Wiley-Interscience Publication, 1991 18. A. Papoulis, Circuits and systems, a modem approach, Holt-Saunders International

Editions 1981 19. D.H. Hubeland and T.N. Wiesel, Functional architecture of macaque monkey striate

cortex. Proceeding ofthe Royal Society, London (series B),1977, 198, 1-59 20. M. Jessel, Acoustique theorique, Masson et Cie, 1973 21. D.E. Rumelhart, 1.L. Clelland and PDP Research Group, Parallel distributed

processing, MIT Press 1986

Boolean Soft Computing by Non-linear Neural Networks With Hyperincursive Stack Memory

Daniel M. Dubois

Universite de Liege, Institut de Mathematique, Grande Traverse 12, B-4000 Liege 1, Belgium Fax: +/32/4/3669489 [email protected]

Abstract. This paper is a review of a new theoretical basis for modelling neural Boolean networks by non-linear digital equations. With real numbers, soft Boolean tables can be generated. With integer numbers, these digital equations are Heaviside fixed functions in the framework of the threshold logic. These can represent non-linear neurons which can be split very easily into a set of McCulloch and Pitts formal neurons with hidden neurons. It is demonstrated that any Boolean tables can be very easily represented by such neural networks where the weights are always either an activation weight + 1 or an inhibition weight -1, with integer threshold. The parity problem is fully solved by a fractal neural network based on XOR. From a feedback of the hidden neurons to the inputs in a XOR non-linear equations, it is showed that the neurons compete with each other. Moreover, the feedback of the output to the inputs for a XOR non-linear neuron gives rise to fractal chaos. A model of a stack memory can be designed from such a chaos map. Binary digits are memorised by folding to a real variable by an anti-chaotic hyperincursive process. The retrieval of these data is computed by an incursive chaotic map from the last value of the variable. Incursion is an extension of recursion for which each iterate is computed in function of variables not only defined in the past and the present time by also in the future. Hyperincursion is an incursion generating multiple iterates at each step. The basic map is the PearlVerhulst one in the zone of fractal chaos. The hyperincursive memory realises a coding of the input binary message under the form similar to the Gray code. This is based on a soft exclusive OR equation mixing binary digits with real numbers.

Keywords. Neural networks, fractal, chaos, stack memory, flip-flop, boolean tables, soft boolean tables, hyperincursion.


334

1. Introduction

Threshold logic was initiated by the pioneer work of McCulloch and Pitts in 1943 [15] for modelling formal neurons at a logic level. But one single McCulloch & Pitts neuron is not able to learn the parity of binary patterns. For example. a few neurons are necessary to learn the truth table of the exclusive OR. i. e. the parity of two inputs.

An extension of this threshold logic with non-linear argument in the Heaviside function of the formal neuron was published by D. M. Dubois and G. Resconi in the Academy of Sciences of Belgium in 1993 [10].

This gives an elegant solution to the parity problem: only one single neuron can learn the parity of any binary patterns. Moreover. any truth tables can be modelled by non-linear neurons represented by fixed Heaviside functions.

This paper gives a new way of building McCulloch and Pitts formal neurons from non-linear digital equations [5]. For the difficult parity problem. the architecture of the neural network is fractal by self-similarity of XOR elements.

These digital equations can be considered with real numbers. In feedbacking the hidden neurons to their inputs for XOR. it is shown that surprisingly we obtain the finite difference equations of the competition between the neurons similar to the equations of Volterra [18] for competing species. This non-linear XOR neuron gives rise to fractal chaos when the output is feedbacking on the inputs.

A new model of a neural memory will be represented by a stack of binary input data embedded in a floating point variable from an hyperincursive process based on the Pearl-Verhulst chaotic map [8].

Computation deals with recursive processes. A recursion is a loop where each successive state of a vector x(t) is a function of its preceding states:

x(t+ 1)=f( ...• x(t-l).x(t»

Fractal chaos can be generated with such recursive processes [14.17].

An extension of the concept of recursion. what I called incursion [3]. for inclusive or implicit recursion. deals with iterate as a function of future values:

xCt+l) = fC ...• xCt-l). xCt). x(t+l) •... ).

With incursion. a recursive chaos can be controlled [6]. When each iterate generates mUltiple iterates. the process is hyper recursive. So hyperincursion is a hyper reCl,Jrsion for which each iterate can be a function of its future iterates [6.9]. This could be related to hypersets Csee for example. [1]). Such hyperincursive systems give rise to complexification processes with the formation of complex components. the emergence of new properties and the emergence of a hierarchy of components of increasing complexity during the evolution. A lot of practical examples of hyperincursive systems are given in [9). For example. the complexification of a graph which has many similarities with the biological

335

construction of proteins. Inheritance properties were shown in some hyperincursive processes.

This paper will review successively, first, the modelling of Boolean tables by non-linear digital equations, second, the generation of neural networks from these digital equations, third, the design of a neural flip-flop memory with the universal NAND operator and four, the symbolic dynamics of a hyperincursive stack memory with a chaotic neural network from a soft exclusive OR operators.

2. Modelling Boolean Tables by Non-linear Digital Equations [5]

Let us consider the general Boolean table with two inputs XI , X2 and one output y. The values of the output yare given by the set y=(y), Y2, Y3, Y4).

Examples are given for exclusive OR, AND and OR.

General Boolean Table AND Boolean Table

XI X2 Y XI X2 Y

0 0 YI 0 0 0

0 1 yz 0 1 0

1 0 Y3 1 0 0

1 1 Y4 1 1 1

XOR Boolean Table OR Boolean Table

XI X2 Y Xl Xz Y 0 0 0 0 0 0

1 0 1 0 1 1

0 1 1 1 0 1

1 1 0 1 1 1

The following general digital equation

(1)

is a non-linear logic equation for the 16 Boolean Tables, that is to say for all the outputs. The number of terms is equal to the number of lines in the Boolean table (2n where n is the number of inputs). Each term is the product of the output value Yi> i=I, ... ,n, by all the inputs variables Xj or their complement (I-Xj) depending on its value 1 or 0 in the table at the line n.

336

A practical rule can be stated: consider only the lines in the Boolean Table for which the output values are 1. If the number of outputs equal to 1 is greater than the outputs equal to 0, take the complementary output 1-y.

Examples for XOR, AND and OR:

y=(l-XI).X2+(l-X2).XI=XI+X:r2.XI.X2

y=XI·X2 y=(1-xl).x2+xl.(1-x2)+xl.x2=xl+x2-xl.x2

(la)

(lb)

(lc)

For OR, with the complementary output, there is only one output value equal to 1: 1-YI=1

These non-linear digital equations give the Boolean tables with integers and the following probabilistic (soft) tables with real numbers XI = (0, 0.5, 1) and X2 = (0, 0.5, 1).

XOR Soft Table

0 0.5 1

0 0 0.5 1

0.5 0.5 0.5 0.5

1 1 0.5 0

AND Soft Table OR Soft Table

0 0.5 1 0 0.5 1

0 0 0 0 0 0 0.5 1

0.5 0 0.25 0.5 0.5 0.5 0.75 1

1 0 0.5 1 1 1 1 1

An interesting case is the parity problem where the output values are 1 when the number of input values equal to 1 is odd.

With 4 inputs, 8 sets of values Ofx=(Xb X2, X3, '4) have an odd number of 1:

(1000,0100,0010,0001,1110,1101,1011,0111)

so

337

+(I-x\).(l-x2).(1-x3).""+x\.x2.x3.(1-,,,,)+x\.x2.( I-x3)'''''

+x\.( I-x2).x3.""+( I-xIl.x2.x3''''' (2a)

y = [X\.(1-X2)+( l-x\}.x2] .[1-[X3.(1-",,}+(1-X3}.""]]+

[X3.(1-",,}+(1-X3}.x..].[ 1-[x\.( l-x2}+(1-x\}.x2n (2b)

which represents a fractal process [[XI XOR X2] XOR [X3 XOR ""n. By recurrence, such a fractal process can be continued for the Parity Problem with any number of inputs.

The method for generating non-linear digital equations is general for any Boolean table with any number of inputs. These digital equations are Heaviside fixed functions in the framework of the threshold logic with McCulloch and Pitts formal neurons.

3. Generation Of Neural Networks From Digital Equations [5]

McCulloch and Pitts formal neurons are defined as follows

(3)

where Wi are the synaptic weights, a the threshold and r is the Heaviside function defined by r(x}=O if x S; 0 and r(x}=1 if x>O. The digital equations given at the preceding section are Heaviside Fixed Functions (Dubois and Resconi, 1993 [10]) in the framework of this Threshold logic. McCulloch and Pitts formal neurons can be built from these digital non-linear equations. Indeed, the terms given by products of the inputs or their complementary inputs can be represented by AND hidden neurons and the output by a OR neuron (the sum of all the AND hidden neuron). First of all let us give the AND and the OR formal neurons.

The AND neuron corresponding to eq. Ib is given by

y=r(xl+ x2 - I} (3a)

In a general way, an AND neuron of any equation given by a product y=z\.z2 .... zn,is

y=r(Z1 + Z2 + Z3 + ... + Zn - (n-l}) (3b)

where all the weights are equal to 1 and the threshold is equal to the number of inputs n minus 1. So each product of inputs in the digital equations can be represented by such AND neurons. These AND neurons will be hidden neurons, the outputs of which being the inputs of the output neuron y. The output neuron will be an OR neuron. Let us remark that the OR logical function is the AND

338

logical function in considering the negation of the output with the negation of the inputs. In fact any Boolean Tables can be described by only the NOT and AND operators (or NOT and OR). The AND is a sequential operator (only one input equal to 0 gives an output 0) and the OR a parallel operator (only one input equal to 1 gives an output 1). These are dual operators.

For generating the OR formal neuron from the digital eq. Id, the following theorem is used.

Theorem [5]: for integer values of weights and threshold, the negation of the Heaviside function with the negation of its argument is equal to the Heaviside function of the argument

r(x)=I-r(1-x) for any integer x (4)

Proof: as r(x)=O if x<=O or x<l, r(x)=1 if x>O or ~>=1, because x is an integer,

and so l-r(1-x)=I-I=O ifx<l, l-r(1-x)=1-0=1 ifx>=l.

From eq. Id and eq. 3b, we can write

l-y=(I-xl).(1-x2)= r(1-(xl+x2» (5 a)

where the complement output l-y is the AND of the complement inputs which is an AND formal neuron as shown previously. The eq. 5a can be written as

y = 1- r(1-(xl+x2» (5b)

and from eq. 4, we obtain the formal OR neuron

y = 1- r(l-(xl+x2» = r(xl+x2) (5c)

because the weights and the threshold are integers. So the OR is given by a Heaviside function with a linear sum of its inputs with weights equal to 1 and a null threshold. In a general way, the OR neuron y for m inputs Ylo Y2, ... , Ym, is

(5d)

Let us apply these relations to digital eq. 1 with two inputs and one output. This eq. 1 is a Heaviside fixed function and thus, we can define 4 AND hidden neurons Ylo Y2, Y3, Y4, corresponding to the 4 terms with products in eq. 1 and then an output OR neuron:

YI = (l-xI).(I-x2)·YI = r(- XI - X2 + YI)

Y2 = (l-XI).X2·Y2 = r(- XI + X2 + Y2 - 1)

Y3 = XI.(l-X2)·Y3 = r( XI - X2 + Y3 - 1)

Y4 = XI·X2·Y4 = r( XI + X2 + Y4 - 2)

Y = YI + Y2 + Y3 + Y4 = r( YI + Y2 + Y3 + Y4)

(6a)

(6b)

(6c)

(6d)

(6e)

339

Input Neurons

xl x2

Hidden Neurons y I

Output Neuron y

---il •• +1 Synaptic Weights

-J

y2

Fig. la. XOR neural networks given by eqs. 7a-c with weights +1 and -I and null threshold

We remark that when Yi = 0, the argument of the Heaviside function is always null or negative, so the corresponding hidden neuron can be cancelled. The weights are -lor + 1 when the corresponding value of the input is ° or 1, and the threshold is equal to the sum of all the input values minus the output value for each line of the Boolean table. The outputs of the hidden neurons are mutually exclusive, Yi . Yj = ° for i ~ j. The weights of the output neuron are I and the threshold is 0, so the output neuron is the sum of the outputs of the hidden neurons. For example, XOR Boolean table can be represented by two hidden neurons Y2 and Y3 for which Y2 = Y3 = I, so XOR neural network is

Y2 = r(- x\ + X2 ) Y3 = r( x\ - X2 )

Y = r( Y2 + Y3)

The neural network is given in Figure lao Remark: different neural networks can be built from the digital equations.

(7a)

(7b)

(7c)

For example, XOR neural network can be built with only one hidden neuron from eq. la

Y\= X\.X2 = r(x\+x2-1) Y=X\+X2-2.y\=r(X\+X2-2.y\)

(8a)

(8b)

where the output neuron depends on 3 inputs: the hidden neuron with an integer weight -2 and the 2 inputs. The neural network is given in Figure lb.

340

Input Neurons

xl x2

Hidden Neuron

Output Neuron y

-~.·+l Synaptic Weights • -2

Fig. 1 b. XOR Neural Networks given by eqs. 8a-b with weights +1 and -2 and null threshold

For the parity problem with 4 inputs (eq. 2b), a fractal neural network can be built with 3 layers of hidden neurons in the following way

YI = r ( XI - X2 ) (9a)

Y2 = r (-XI + X2) (9b)

Y3 = r ( X3 - X4) (9c)

Y4= r (-X3 + X4) (9d)

Y5 = r ( YI + Y2 ) (9a)

Y6= r (Y3 + Y4) (9f)

Y7 = r ( Y5 - Y6 ) (9g)

Ys= r (- Y5 + Y6) (9h)

Y = r (Y7 + Ys) (9i)

Figure 2 gives the fractal neural network for the parity problem with 32 inputs in a recursive way from eqs. 9 of the basic parity problem for 8 inputs.

The fractal architecture of the parity problem is based on XOR basic neural network. In fact, XOR neural network can be represented by the non-linear equation (Dubois, 1990) [2] :

(lOa)

341

which is similar to XOR digital equation la. Indeed, in Boolean logic, Xj=Xj2, i=I,2, ... and eq.lOa can be written as eq. la

xl x32

y

Fig. 2. Fractal neural network of the parity problem with 32 inputs Xl to Xn and one output y constructed in a self-similar architecture with the elementary XOR neural network given in Figure la. All the weights are +1 (dark lines) and -I (grey lines) and thresholds are null. There are 9 layers of hidden neurons (2.log2n - I where n is the number of inputs) and 92 hidden neurons (the number of neurons is < 4.n). Each neurons has 2 dendrites

The non-linear neuron lOa is a Heaviside fixed function [10] which is a polynomial equation in function of the weighted sum of the inputs.

In considering two hidden neurons

Yl=2.Xl.( 1-(Xl+X2)/2)

Y2=2.X2.(l-(Xl+X2)/2)

y = Yl + Y2

a feedback can be created in defining y( = x(t+l) and Y2 = x2(t+I)

X(t+ 1)=2.x(t).(l-(x(t)+x2(t»I2)

X2(t+ I )=2.X2(t).(l-(Xl(t)+X2(t»/2)

(Ila)

(lIb)

(llc)

(12a)

(12b)

342

This system is equivalent to the discrete Volterra equations [18] for two competing species. So the hidden neurons compete each other and give rise to a neural network with competing neurons. Moreover, this eq. lOa is also equivalent to the Pearl-Verhulst map. Indeed, in considering y=y(Hl) and Xl=X2=y(t), a feedback between the output y and the inputs is given by the chaos equation.

y(H 1)=4.y(t).(1 - y(t» (13)

Indeed the map y(Hl)=4.I.l.y(t).(1-y(t) shows fractal chaos for J.L=1 (see for example [14]). A stack memory will be built below from such a chaos map [8].

Remark: from any Heaviside fixed tFunction, a formal neuron with real numbers can be considered with the following activation function

cr(x) = x for x E [0, 1 ], cr(x) = 0 when x < 0 and cr(x) = 1 when x > 1 (14)

In fact, any Heaviside fixed function given in this paper is a fixed function of cr(x) for any real values. So, neural networks computing with real numbers can be designed. So, the chaotic map 13 can be designed by a non-linear formal neuron

y(H 1) = cr [ 4.y(t).( 1 - y(t) ) ]

4. Flip-Flop Neural Memory With the Universal Nand Operator

(15)

All Boolean tables of any kind can be represented by only one Boolean operator, the NAND (Not AND). We have already shown that the OR operator can be generated by the NOT and AND operators. It is known, but rarely pointed out, that the NOT Operator can be generate from the NAND. In fact, it is easy to see that NOT x = x NAND x, so the AND can be generated from the NAND: x AND y = (x NAND y) NAND (x NAND y).

The formal neuron of y = Xl NAND X2 is given by

y = r ( - Xl - X2 + 2 ) (16)

A flip-flop memory can be designed with two NAND Operators as follows (see for example [11]). The two inputs Xl and X2 are the set and reset. The two outputs are Yl and Y2 . The first NAND has the two inputs Xl and Y2 (the second output feeds back on the input) and the second NAND has the two inputs X2 and Yl (the first output feeds back on the input).

343

Yl = Xl NAND Y2

Y2 = X2 NAND Yl

(17a)

(17b)

This means that it is necessary to know the future outputs to compute them: this is a hyperincursive process as we will show.

The Boolean table of this memory is given by

Xl X2 Yl Y2

0 0 1 1

0 1 1 0 1 0 0 1

1 1 0 1

1 1 1 0

This is an unsual Boolean table because the number of lines is 5 instead of 4 in classical Boolean tables with two inputs. There is an ambiguity to know the outputs in function ofthe inputs because for the same inputs Xl = X2 = 1, there are two different outputs. A complete control of this system is possible with the rule which consists in changing only the set or the reset at each time step, in excluding to take both null inputs [11].

The successive operations are:

Xl = 0 and X2 = 1 which give Yl = 1 and Y2 = 0, then Xl = X2 = 1, which give the same outputs; or

Xl = 1 and X2 = 0 which give Yl = 0 and Y2 = 1, then Xl = X2 = 1, which give the same outputs. In the state Xl = X2 = 1, the outputs give the memory of the preceding inputs.

The state of such historical systems depends strongly of the path of their evolution during time. This is a fundamental characteristics of neural networks which is at the basis of the brain memory.

The formal neurons of this flip-flop memory are given by:

Yl = r ( - Xl - Y2 + 2 )

Y2 = r ( -X2 - Y7 + 2 )

Figure 3 gives this neural flip-flop memory.

(17c)

(17d)

344

INPUT NEURONS Xl X2

OUTPUT NEURONS YI Y2

Fig. 3. Flip-flop neural memory given by formal neurons (eqns. l7c-d)

Let us show that it is possible to build a one bit neural memory with only one neuron with three inputs.

By the method described above, we can write the following algebraic equations:

YI = 1- XI' Y2

Y2 = 1 - X2 · YI

If we put the second equation in the first one,

YI = 1 - XI • ( 1 - X2 • YI )

(1 Sa)

(ISb)

(lSc)

YI must be computed in function of the two inputs XI and X2 and from itself YI •

When X2 = 0 then YI = 1 - XI , YI = 0 with XI = 1 and YI = 1 with Xl = O.

When XI = 0 then YI = 1 for X2 = 0 and X2 = 1.

When XI = X2 = 1 then YI = YI •

This equation can be represented by only one neuron with three inputs with one bit of memory:

YI = r ( - 2 XI + X2 + YI + 1 ) (lSd)

Figure 4 gives the neural one-bit memory with only one neuron.

345

This neuron can be descriibed by the hyperincursive Boolean table:

XI X2 YI

0 0 1

0 1 1

1 0 0 1 1 YI

From the method describes above, we can obtain the algebraic equation:

YI = 1 - XI . ( 1 - X2) - XI . X2 • ( 1 - YI ) = 1 - XI . ( 1 - X2 • YI ) (19)

which represents a NAND embedded in a NAND.

INPUT NEURONS Xl X2

OUTPUT NEURON VI

Fig. 4. One bit memory with one formal neuron (eq. 18d)

It is also possible to build a neural memory with weights equal to + 1 an -1 with a null threshold as follows

XI X2 YI

0 0 YI

0 1 0 1 0 1

1 1 YI

346

(20)

Let us show now that it is possible to build an infinite stack memory with only two neurons.

5. Hyperincursive Stack Memory With Chaos

In this section, a new model of a neural memory will be represented by a stack of binary input data embedded in a floating point variable from an hyperincursive process based on the Pearl-Verhulst chaotic map: x(t+ 1)=4JlX(t)(1-x(t» [8].

Theoretical and experimental works enhance the validity of such an approach. Von Neumann [19] suggests that the brain dynamics is based on hybrid digitalanalogical neurons. I proposed a fractal model of neural systems based on the Pearl-Verhulst map [3, 4]. A non-linear threhold logic was developped from this chaos fractal neuron [7, 10] in relation to the McCulloch and Pitts formal neuron [15]. Experimental analysis in nervous systems show fractal chaos [12, 16]. Neural systems can be modelled as automata [20]. My model of a stack memory could be applied in the framework of symbolic dynamics and coding [13].

The Pearl-Verhulst map in the chaotic zone (J.I.=l) can be transformed to a quasilinear map X(t+1)=1-abs(1-2x(t», where abs means the absolute value. This simple model was proposed for simulating neural chaos [4]

Let us consider the incursive map

X(t) = 1- abs(l - 2X(t+1» (21)

where the iterate X(t) at time t is a function of its iterate at the future time t+ 1, where t is an internal computational time of the system. Such a relation can be computed in the backward direction T, T-1,T-2, .. , 2,1,0 starting with a "final condition" X(T) defined at the future time T, which can be related to the Aristotelian final cause. This map can be transformed to the hyper recursive map

1- 2X(t+1) = ± (1- X(t» so X(t+1) = [1 ± (X(t) - 1)]/2 (22)

In defining an initial condition X(O), each successive iterates X(t+ 1), t=0,1,2, ... ,T, give rise to two iterates due to the double signs ±. So at each step, the number of values increases as 1,2,4,8, ... In view of obtaining a single trajectory, at each step, it is necessary to make a choice for the sign. For that, let us define a control function u(T-t) given by a sequence of binary digits 0,1, so that the variable sg

sg = 2u(t) -1 for t=1,2, ... ,T (23)

347

is -1 for u=O and +1 for u=1. In replacing eq. 23 in eq. 22, we obtain

X(t+l) = [1 + (1- 2u(t+l»(X(t) - 1)]/2 = X(t)/2 + u(t+1) - X(t).u(t+l) (24)

which a hyperincursive process because the computation of X(t+l) at time t+l depends on X(t) at time t and u(t+ 1) at the future time t+ 1. Equation 24 is a soft algebraic map generalising the exclusive OR (XOR) defined in Boolean algebra: y = Xl + X2 -2XIX2 where Xl and X2 are the Boolean inputs and y the Boolean output. Indeed, in the hybrid system 24, X(t) is a floating point variable and u(t) a digital variable.

Starting with the initial condition X(0)=1I2, this system can memorise any given sequence of any length u. The following tTable gives the successive values of X for all the possible sequences with 3 bits.

This table gives the successive values of X for each sequence u as rational and floating point numbers. The number of decimal digits increases in a linear way (one bit of the sequence corresponds to a decimal digit of X). The last digit 5 corresponds to the initial condition X(O)=O.S and the two last digits 25 or 75 give the parity check of the sequence. The time step t is directly related to the number of digits: with t=0,1,2,3 there are 4 digits. In looking at the successive increasing values of the floating points of X, we see that the correspondent sequences u represent the Gray code. Contrary to the binary code, the Gray code changes only one bit by the unitary addition. The numerator of each ratios is two times the floating point representation of the Gray code of the sequence u, plus one. With the Gray code, we can construct the Hilbert curve which fills the two-dimensions space: the fractal dimension is DH=2. This is not possible with the Cantor set, which gives discontinuities in two directions in the space [17].

u X(1) X(I) X(2) X(2) X(3) X(3)

000 114 0.25 118 0.125 1116 0.0625

100 3/4 0.75 3/8 0.375 3/16 0.1875

110 3/4 0.75 5/8 0.625 5/16 0.3125

010 114 0.25 7/8 0.875 7/16 0.4375

011 114 0.25 7/8 0.875 9/16 0.5625

111 3/4 0.75 5/8 0.625 11/16 0.6875

101 3/4 0.75 3/8 0.375 13/16 0.8125

001 114 0.25 118 0.125 15/16 0.9375

The neuron is an analogical device which shows digital spikes: the analogical part of the neuron is given by the floating point values X and the values of the spikes are given by the digital sequence u.

The analogical coding X of digital information of the spikes u is then a learning process which creates a fractal memory.

348

Now, let us show how it is possible to recover the digital information u(t) for t=,1,2,3, ... ,T from the analogical final value X(T) of the neuron (T=3 in our example). With our method, the coding of an image leads to an inverse image so that the image is reconstructed without inversion. The inversion of the sequence has some analogy with the inversion of the image received by the eyes. The decoding of a sequence u from the final value X(T) can be made by relation 1 for t =T-I, T-2, .....

X(t) = 1 - abs(1 - 2X(t + I» (25)

Let us take an example, starting with the final value X(T=3)=0.5625, we compute sucessively X(2) = 1 - abs(1 - 2x0.5625) = 1 - 0.125 = 0.875,

X(1) = 1 - abs(1 - 2xO.875) = 1 - 0.75 = 0.25, X(O) = 1 - abs(1 - 2xO.25) = 0.5.

The sequence is then given by

u(t+I) = (2X(t+l» div 1 (26)

where div is the integer division: u(3) = (2xO.5625) div 1 = 1, u(2) = 1, u(1) = O. The neuron will continue to show spikes 1,0,1,0,1,0, ...

It is well-known that neurons are oscillators which present always pulsations, the coding of information is a phase modulation of these pulsations [4].

In taking the formal neuron of McCulloch and Pitts [11], eq. 26 can be replaced by

u(t+I) = r (X(t+I) - 0.5) (27)

for which u=1 if X ~ 0.5 and u=O otherwise.

As we can compute u(t) from X(t), it is possible to compute eq. 25 in the following way

X(t) = 2X(t+l) + 2u(t+I) - 4X(t+I)u(t+l) (28)

which is also a soft computation of XOR.

So, to retrieve the message embedded in the stack memory by the soft XOR relation 24, a similar soft XOR relation 28 is used. The following Figure 5a-b gives a possible neural network for the stack memory.

349

u(t) X(T)

u(t) X(t)

Fig. Sa, b. (a) The neuron NM represents the soft XOR eq. 24 for the coding of the sequence u(T-t) giving X(T); (b) The neuron NU is a McCulloch and Pitts neuron given by eq. 27 computing the ordered sequence u(t) and the neuron NX represents the soft XOR eq. 28 giving X(t) starting from the final state X(T) coming from the neuron NM

This is a property of XOR that the addition and the subtraction is the same operator. Here the soft XOR given by a non-linear algebraic relation gives the same property in a generalised way.

This neural stack memory can be designed with formal neurons (J (x) (given by definition 14) from the non-linear eqs 24 and 28.

It must be pointed out also that formal neurons with non-linear arguments can be split into a set of formal neurons with linear arguments as it was shown in the beginning of this paper.

The purpose of this paper was to show is that the design of neural networks is highly easier from non-linear equations. The two main components of a computing machine are the computation and the memory. This paper is a first attempt for the design of a Soft Computing Machine beyond the Turing Machine.

6. Conclusion

This short paper showed that non-linear digital equations can be easily built from Boolean tables. These equations are Heaviside Fixed Functions which can be used to generate directly neural networks with McCulloch and Pitts formal neurons. The parity problem has an elegant solution given by a fractal neural network with a self-similar architecture. The hidden neurons are shown to be in competition. The non-linear equation of exclusive OR is a non-linear neuron which gives rise to fractal chaos.

From this fractal chaos map, a stack memory can be built for automata such as neural systems. A hyperincursive control of a fractal chaos map is used for

350

embedding input informations in the state variable of the memory. The input sequence is given by a digital variable and the memory is represented by an analogical variable. The analogical variable is represented in floating point. With classical computer, the number of decimal digit is limited so that we must code the decimal digits of great length by strings. The actual neuron could be an analogical device working only with strings.

In this way, such a neural system with hyperincursive stack memory could help in the design of a Hyper Turing Machine performing soft computing.

References

1. Delahaye J-P. [1995], Logique, informatique et paradoxes. Berlin.

2. Dubois D. [1990], Le labyrinthe de I 'intelligence: de [,intelligence naturelle a [,intelligence jractale, InterEditionslParis, AcademialLouvain-la-Neuve.

3. Dubois D. [1992], The fractal machine, Presses Universitaires de Liege.

4. Dubois D.M. [1992], "The Hyperincursive Fractal Machine as a Quantum Holographic Brain", CC AI, Communication and Cognition - Artificial Intelligence, vol 9, number 4,pp.335-372

5. Dubois D.M. [1995], "Modelling of Fractal Neural Networls". In Proceedings of the 14th International Congress on Cybernetics, Namur (Belgium), 21st-25th August 1995, pub!, by International Association for Cybernetics, pp. 405-410.

6. Dubois D.M. [1996], "Introduction of the Aristotle's Final Causation in CAST: Concept and Method of Incursion and Hyperincursion". In F. Pichler, R. Moreno Diaz, R. Albrecht (Eds.): Computer Aided Systems Theory - EUROCAST'95. Lecture Notes in Computer Science, 1030, Springer-Verlag, Berlin, pp. 477-493.

7. Dubois D.M. [1996], "A Semantic Logic for CAST related to Zuse, Deutsch and McCulloch and Pitts Computing Principles". In F. Pichler, R. Moreno Diaz, R. Albrecht (Eds.): Computer Aided Systems Theory - EUROCAST'95. Lecture Notes in Computer Science, 1030, Springer-Verlag, Berlin, pp. 494-510.

8. Dubois D.M. [1996], "Hyperincursive Stack Memory in Chaotic Automata", In A.C. Ehresmann, G. L. Farre, J.-P. Vanbremeersch (Eds.): Actes du Symposium ECHO, Amiens, 21-23 Aoiit 1996, Universite de Picardie Jules Verne, pp. 77-82.

9. Dubois D., Resconi G. [1992], HYPERINCURSIVITY: a new mathematical theory, Presses Universitaires de Liege.

10. Dubois D. M., Resconi G. [1993], Mathematical Foundation of a Non-linear Threshold logic: a new Paradigm for the Technology of Neural Machines, ACADEMIE ROYALE DE BELGIQUE, Bulletin de la Classe des Sciences, Mme serie, Tome IV, 1-6, pp. 91-122.

11. Hirschfelder R., Hirschfelder 1. [1991], Introduction to Discrete Mathematics. Brooks/Cole Publishing Company, Pacific Grove, California.

12. King C.C. [1991], "Fractal and Chaotic Dynamics in Nervous Systems", Progress in Neurobiology, vol 36, pp. 279-308.

13. Lind D., Marcus B. [1995], Symbolic Dynamics and Coding, Cambridge University Press.

14. B. Mandelbrot: The Fractal Geometry of Nature. Freeman, San Francisco 1983.

351

15. McCulloch W.S., Pitts W. [1943], "A logical calculus ofthe ideas immanent in nervous activity", Bulletin of Mathematical Biophysics, vol 5, pp. 115-133.

16. Schiff Steven J. et al. [1994], "Controlling chaos in the brain", Nature, vol 370, pp. 615-620.

17. Schroeder Manfred [1991], Fractals, Chaos, Power Laws, W.H. Freeman and Company, New York.

18. V. Volterra: Le~ons sur la theorie mathematique de la lutte pour la vie. GauthierVillars 1931.

19. Von Neumann 1. [1996], L'ordinateuret Ie cerveau, Champs, Flammarion. 20. Weisbuch G. [1989J, Dynamique des systemes complexes: une introduction aux reseaux

d'automates, InterEditionslEditions du CNRS.

Using Competitive Learning Models for Multiple Prototype Classifier Design

James C. Bezdek, Sok Gek Lim and Thomas Reichherzer

Department of Computer Science, University of West Florida, Pensacola, FL 32514, USA [email protected]

Abstract. First three competitive learning models are reviewed: learning vector quantization, fuzzy learning vector quantization, and a deterministic scheme called the dog-rabbit (DR) model. These models can be used with labeled data to generate multiple prototypes for classifier design. Then these three models are compared to three methods that are not based on competitive learning: a clumping method due to C.L. Chang; a new modification of C.L. Chang's method; and a derivative of the batch fuzzy c-means algorithm due to Yen and C.W. Chang. The six multiple prototype methods are then compared to the sample-mean based nearest prototype classifier using the Iris data. All six multiple prototype methods yield lower error rates than the labeled subsample means classifier (which yields 11 errors with 3 prototypes). The modified Chang's method is, for the Iris data and processing protocols used in this study, the best of the six schemes in one sense; it finds 11 prototypes that yield a resubstitution error rate of O. In a different sense, the DR method is best, yielding a classifier that commits only 3 errors with 5 prototypes.

Keywords. Competitive learning, editing data, Iris data, modified fuzzy c-means, multiple prototypes, supervised learning.

1. Introduction

Perhaps the most basic idea in pattern recognition is the class label. There are four types of labels - crisp, fuzzy, probabilistic and possibilistic. Let integer C denote the number of classes, 1 < c < n, and define three sets of label vectors in 9tc as follows:

Npe={ye9tc:yt e[o, 1) '<:/ i,yt>O 3 i}

Nrc = {y e Npc: ;tYi = I} 1=1

Nbc ={yeNrc:Yt e{O,l},<:/i}={e1,e2, .. .,ec}

(la)

(lb)

(lc)


353

Figure 1.1 depicts these sets for c= 3. Nhc is the canonical (unit vector) basis

of Euclidean c-space. The i-th vertex of Nhc ' ei = (0 ,0 , .... , 1 , .... , O)T where the 1 occurs in the i-th position, is the crisp label for class i, 1 S; i S; c. Nrc, a

piece of a hyperplane, is the convex hull of Nhc . The vector y = (0.1, 0.6, 0.3) T

is afuzzy or probabilistic label; its entries lie between 0 and 1, and sum to 1. The interpretation of y depends on its origin. If y is a label vector for some XE 9tP

generated by, say, the fuzzy c-means clustering method, we call y a fuzzy label for x. If y came from a method such as maximum likelihood estimation in mixture decomposition, y would be a probabilistic label. N pC' the unit hypercube

in 9tc, excluding the origin, contains possibilistic labels such as z = (0.7, 0.2, T 0.7) . Note that Nhc c Nrc c Npc'

[0.7

z = 0.2 H---II~ 0.7

Fig. 1.1. Label vectors for C = 3 classes

Object data are represented as X = {Xl' x2' ... , xn} in feature space 9tP . The k

th object ( a ship, patient, stock market report, pixel, etc.) has xk as it's numerical

representation; xjk is the j-th characteristic (or feature) associated with object k.

Examples of (batch) alternating optimization (AO) algorithms that generate each of

354

the four kinds of labels, as well as a set V = { VI' V 2' ... , v c} c 9tP of prototypes

(or centers) for clusters in X from unlabeled object data are :

Label

Crisp Fuzzy Prob. Poss.

Modell AO Algorithm

Crisp c-meansl HCM Fuzzy C -meansl FCM Statistical mixturelEM Poss. c-meansl PCM

Reference

Duda and Hart [1] Bezdek [2] Titterington et. al. [3] Krishnapuram and Keller [4]

A classifier, any function D: 9tP , Npc ' specifies C decision regions in 9tP •

Training a classifier means identification of the parameters of D if it is explicit; or representing the boundaries of D algorithmically if it is implicit. The value y =

D(z) is the label vector for z in 9tP . D is a crisp classifier if D[ 9tP ] = N hc '

New, unlabeled object data that enter feature space after crisp decision regions are defined simply acquire the label of the region they land in. If the classifier is fuzzy, probabilistic or possibilistic, labels (y=D(z)) assigned to object vectors z during the operational (i.e., classification) phase are almost always converted to crisp ones through hardening of y with the function

In (2) 0E (y, e) = Ily - ell = ~ (y - e) T (y - e) is Euclidean distance, and if ties

occur, they are resolved arbitrarily. If the design data are labeled (that is, if we have training data that possess class label vectors in Npc )' fmding D is called

supervised learning. In supervised classifier design X is usually crisply partitioned into a design (or training) set Xtr with label matrix Ltr; and a test set Xte = (X -

X~ with label matrix Lte. Columns of Ltr and Lte are label vectors in N pc .

Testing a classifier D designed with Xtr means finding its error rate (or

estimated probability of misclassification). The standard method for doing this is to submit Xte to D and count mistakes (Lte must have crisp labels for data in Xte

in order to do this). This yields the apparent error rate ED(XteIXtr) ; our notation

indicates that D was trained with Xtr, and tested with Xte. ED is often the

performance index by which D is judged, because it measures the extent to which D generalizes to the test data.

The error rate Eo(XIX) is called the resubstitution error rate. Resubstitution

uses the same data for training and testing, so it usually produces an optimistic error rate. That is, Eo(X'X) is not as reliable as Eo(XteIX~ for assessing

generalization, but this is not an impediment to using Eo(XIX) as a basis for

comparison of different designs. Moreover, unless n is very large compared to p

355

and c (an often used rule of thumb is n ~ lOOpc), the credibility of either error

rate is questionable. The data used in our examples does not justify worrying about the difference between En(XIX) and En(XteIXtr)'

Classifier performance is largely dependent on the quality of Xtr. If Xtr is large

enough and its substructure is well delineated, we expect classifiers trained with it to yield small error rates. On the other hand, when the training data are large in dimension p and/or number of samples n, classifiers such as the k-nearest neighbor (Dk ) rule [5] can require too much storage and CPU time for efficient -nn deployment. To circumvent time and storage problems caused by very large data sets, many authors have studied ways to edit the training data (cf. [5], ch. 6).

The two most common editing schemes are selection, which means : throw away as many points in X tr as possible without significantly increasing

EDk (Xt IXt ). And replacement, which means: derive from the given data a -nn e r new, smaller set of labeled data, usually labeled prototypes V, that can be used as a substitute for Xtr (e.g., in the nearest neighbor rule) without appreciable

degradation in EDk ((Xt IXt ) . In this case Dk becomes a nearest prototype -nn e r -nn design with error rate En(XteIV)

........ • "0"0 °0 "0 eo"o °0 "0-0 .............

-."0"0 °0 "0"0. - - "0"0"0"0"0· · .... "0"0"0"0 •

.. . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ~ ........... . . . . . . . . . . . . . . · ............ . · ............ . · ............ .

•••••••••• ~ • •••••••••••••••• 1 . . . . . . . . . . . . . . • • • r · . , · .

............... . .... . , .... . . . . . . ..... Fig. 1. 2. Editing by selection of labeled data in Xtr

Figure 1.2 depicts the idea of selection. The density of labeled data over each cluster in the left side of Figure 1 is high. A selected subset (or skeleton) of the original data are shown on the right. This approach has many variants, and is well summarized in Devijver and Kittler [6]. The aim is to condense Xtr whilst

approximately preserving the shape of the decision boundaries set up by training D with it.

Figure 1.3 illustrates the other approach to editing the same data. In this scheme Xtr is replaced by V, a set of labeled prototypes for class 1 (0) and class 2

(D). Kohonen's [7] self-organizing feature map (SOFM) is one very good way to

accomplish replacement as depicted in Fig. 1.3.

356

Fig. 1.3. Replacing Xtr with multiple prototypes made from it

We will give a numerical example using the Iris data with six multiple prototype generation schemes: (i) Kohonen's learning vector quantization (L VQ), which is a special case of the SOFM [7]; (ii) a new family of fuzzy LVQ models due to Karayiannis et. al [8] called GL VQ-F ( which are a modification of the generalized learning vector quantization (GL VQ) model of Pal et. al [9]); (iii) the deterministic dog-rabbit (DR) model of Lim et. al [10,11]1; (iv) a deterministic hierarchical clumping model due to Chang [12]; (v) our modification of (iv); and (vi) Yen and Chang's modification of the batch FCM model that enables it to seek multiple prototypes.

2. Nearest Prototype Classifiers

Synonyms for the word prototype include : vector quantizer (VQ), signature, template, codevector, paradigm, centroid, exemplar. There are many approaches to prototype generation. A non-exhaustive list includes sequential competitive learning models such as : crisp (adaptive) c-means [1]; LVQ [7]; GLVQ-F [8]; GLVQ [9]; the DR model [10, 11]; and probabilistic schemes such as the soft competition scheme of Yair et. al [13]. Batch prototype generator models include crisp and fuzzy c-means [2]; possibilistic c-means [4]; statistical models such as mixture decomposition [3]; and VQ approaches such as the generalized Lloyd algorithm [14].

The common denominator in most prototype generation schemes is a mathematical definition of how well prototype vi represents a crisp subset Xi of

X. Any measure of similarity on 9tP can be used. The usual choice is distance (dissimilarity), the most convenient is squared Euclidean distance. Local methods

attempt to optimize some function of the c squared distances {llxk - Vi 112: 1 ~ i

1 In [10,11], the input vectors are called "rabbits", and the prototypes trying to catch them are the "dogs" - hence DR for dog-rabbit.

357

:s;; c} at each xk in Xi" Global methods seek extrema of some function of all (c n)

distances {llxk - V t ll2 : 1 :s;; i :s;; c and 1 :s;; k:S;; n}2.

Once the prototypes V are found (and possibly relabeled if the data have physical labels), they can be used to define a crisp nearest prototype (l-np) classifier, say DV.l):

The nearest prototype (l-np) Classifier. Given any c prototypes V = {vi: 1 :s;; i:S;; c } ,so there is one vi Iclass, and any dis-similarity measure B on

9tP :foranyzE 9tP :

(3)

Ties in (3) are arbitrarily resolved. The crisp I-np design can be implemented using prototypes from any algorithm that produces them. Equation (3) defmes a crisp classifier, even when V comes from a fuzzy, probabilistic or possibilistic algorithm. It would be careless to call DV.l) a fuzzy classifier, for example, just

because fuzzy c -means produced V. The geometry of DV.l) is shown in Fig. 2.1, using ~ for B in (3). This I-np

design erects a linear boundary between the i-th and j-th prototypes, viz., the hyperplane HP through the midpoint of and perpendicular to (Vi - V J). Figure 2.1

illustrates the labeling decision represented in equation (3); z is assigned to class i because it is closest to the i-th prototype. Be careful not to confuse nearest prototypes, which are new vectors made from the data, with nearest neighbors, which are labeled points in the data.

Anl-np designs that use inner product norms erect (piece wise) linear decision boundaries. Thus, the geometry of I-np classifier boundaries is fixed by the way distances are measured in the feature space; and not by geometric properties of the

model that produces the cluster prototypes. The location in 9tP of the prototypes determines the location and orientation of the c ( c -1 )/2 hyperplanes that separate each pair of prototypes. The geometry of the prototypes does depend on both the clustering model and data used to produce them. Hence, I-np designs based on different prototype generating schemes can certainly be expected to yield different performance as I-np classifiers, even though they all share the same type of decision surface structure.

When one or more classes have multiple prototypes as shown in Fig. 1.3, there are two ways to extend the I-np design. We can simply use equation (3), recognizing that V contains more than one prototype for at least one of the c classes. Or we can extend the I-np design to a k-np rule, wherein the k nearest prototypes are used to conduct a vote about the label that should be assigned to input z. This amounts to operating the k-nn rule using prototypes (points built

2Don't confuse our use of the terms local and global methods with the local and global extrema found by a particular method.

358

from the data) instead of neighbors (points in the data). We opt here for the simpler choice. which is formalized as the (l -nmp) design.

x· I HP

Fig. 2.1. Geometry of the I-np classifier for the Euclidean norm

The Nearest Multiple Prototype (l-nmp) Classifier. Given any c

prototypes V = {v Ij: 1 ::; i ::; c ; 1 ::; j ::; llPI }. where npi is the number of c

prototypes for class i; c = L llpj • and any dis-similarity measure 0 on ~p : for j=1

any Z E ~p:

DecidezE classi¢:::> D .(z)=e ¢:::> 3 SE{l, .. .• ll } such that V.v 1 pi

(3')

As in (3). ties in (3') are resolved arbitrarily. We use the same notation for the I-np and I-nmp classifiers. relying on context to identify which one is being discussed. Now we tum to methods for finding multiple prototypes.

3. Sequential Competitive Learning (CL) models

The primary goal for CL models in this article is to portray the input data by a much smaller number of prototypes that are good representatives of structure in the

359

data/or classifier design 3. Identification of clusters is implicit, but not active, in pursuit of this goal.

The salient features of a general CL model are contained in Fig. 3.1. The input or fanout layer is connected directly to the output layer. The circles in Fig. 3.1 are sometimes called nodes, and the prototypes are then called node weights. In this context the p components {v .. } of v. are often regarded as weights or connection

1J 1

strengths of the edges that connect the p inputs to node i. The prototypes V = (v l' v2 ' ... , v c) ,vi E 9tP for 1::;; i::;; c, are the (unknown) vector quantizers we

seek. The norm used in competitive layer nodes is most typically Euclidean, but there is no overpowering reason to restrict the measure of distance this way.

In put La ye r/ ..... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~··· •• (Fanout) : Competitive !

Layer ! • , , , , , , , , , , , I , , , , , , , , , , I , , , , , : , , , , , , , , , , , , , , , , , , , , , , , , : , , ,

\"~~~~~~~~~~~~~~~~~~~~~~~~~ •.••..........•••.. ~

Fig. 3.1. The general CL network

3Prototypes that are good for classifier design are not necessarily the same (even in form) as those that are used for other purposes. For example, prototypes good for compression, transmission and reconstitution of images may be quite poor as representatives of classes for the purpose of pixel labeling in the same image.

360

Sequential CL models update estimates of the {v.} at each of the n input events I

during pass t (one iteration is one pass through X). Upon presentation of an xk from X , the general form of the update equation is:

(4)

In (4) {(Xik,t} is the learning rate distribution over the c prototypes for input xk during iterate t. When x k is submitted to this network, distances are computed

between it and each vj" The output nodes "compete", a (minimum distance) winner

node, say Vi' is found; and finally, it and possibly other prototypes are then

updated using one of many update rules that are most often of the form (4). There are three cases :

(i) Only Vi is updated (winner take all, LVQ, e.g.)

(i) Some vi's are updated (elite updates, SOFMs, e.g.)

(i) Every Vi is updated (all share updates, GLVQ-F, DR, e.g.)

The prototypes that get updated (the update neighborhood) depend on the model chosen, and the update neighborhood can be imbedded in the definition of the learning rates for a particular model. A template that can be used for many CL models is given in Table 3.1.

Several points need to made about Table 3.1. First, notice that the data are unlabeled. It may be that you have labeled data, and want to use them to design a prototype classifier (as we do here). There are many ways to use the labels towards this end. In Section 4 the labels for the points in X will be used after the training phase is completed to assign a physical label to each prototype. This necessitates having a scheme for prototype labeling; ours is discussed in Section 4. Different ways to use the labels in the context of CL models such as LVQI-LVQ3 are discussed in [17].

Other methods often use the labels during training. an approach called supervised learning. In this scheme the labels are used to help guide the algorithm towards good prototypes that are physically labeled during training. An example of this is the scheme employed by the Chang and Modified Chang clumping methods discussed in Section 6.

Second, the ~-s-p-eC-j-a-1 C-ho-j-C-es-]-' box in Table 3.1 refers to additional

specifications that are needed in order to make the general template into a particular algorithm. We will discuss these choices for the three models in Section 4, arxl will recreate Table 3.1 as Table 4.2 to exhibit the protocols used in the examples to follow.

Finally, the optional labeling phase at the bottom of Table 3.1 produces n crisp label vectors in the set Nh If we array these as column vectors they form a c X n c. array, say U = [U1",Uk ... Un]= [uik] ,where Uk denotes the k-th column of U.

361

These columns are the crisp labels generated for the data in X by equation (3). They are usuaUy4 a crisp c-partition of X.

Table 3.1. The general CL algorithm for unlabeled data

Training phase Store Unlabeled Object Data X = {Xl' X2 , ... , Xnl C 9tP

... number of nodes : 1 < c < n . Rule of thumb: c:s;;..Jll ... max. # of iterations : T Pick ... termination measure : E t = IIVt - Vt_lll ... termination criterion: e

Get ... initial cluster centers: V 0 e 9tcp

O~t DO UNTIL (t=Tor Et:S;; e): For k = 1 to n

xeX, ~ ~X, X~X-{~l

Do Get learning rates {Clik,tl with ~cial Choices ]

vt,t = Vi,t-l + Clik,t(Xk - Vt,t-l)

Et : next k

Increment t END UNTIL V~Vt

Optional Labeling Phase with l-np rule at (3)

{I' u ik = 0'; I"', -VII < II". -vJI . IS j S coj ~ i } \II, k otherwise. Resolve ties arbitrarily

CL models are not explicitly designed to find good clusters in the sense that partitions of the data are never examined during the training phase. Consequently clusters built "after the fact" by approaches such as (3) mayor may not be satisfactory in the sense of partitioning X for substructure.

4usuaUy, because there is no guarantee that each of the c classes defined by the prototypes has at least one point in it.

362

4. The L VQ, GL VQ-F and DR Models

The learning rate distribution for L VQ is well known:

LVQ _ {at ; i = arg min{llxk - V r,t-111}} aik,t - ~

o . r = 1.2 •. ,. c ; r:¢:. i (5)

In (5) at is usually initialized at some value in (0, 1), and decreased nonlinearly

with t. The model underlying GLVQ-F contains LVQ as a subcase and is discussed

extensively in [8]. GLVQ-F is based on minimizing a sum of squared errors associated with replacing unlabeled data set Xtr by the c prototypes V. The

function to be minimized is

L(~; V) = r;IUrll~ - vrl12

_ c c II~ -vrl 1m-I) { [ JJ_l

-,~ t-l ix, _ .. ,fro-" ~z. -.. ,r , m>l. (6)

In (6) the vector U=(Ul.U2, ... ,uc)T E Nfc is a fuzzy label vector; its

entries are the memberships of xk in each of the c classes represented by the

prototypes V. The real number m > 1 in (6) is a parameter which affects the quality of representation by and speed of termination of the GL VQ-F algorithm, which is just steepest descent applied to the function in (6). The GLVQ-F update rule for the prototypes V at iterate t in the special (and simple) case m=2 uses the following learning rate distribution in equation (4) :

",GLVQ-F ""ik,t

2' U ik,t-l

(7)

As in (5), a in (7) - now one factor of the learning rates {a } - is usually t ik,t

proportional to 1/t, and the constant (2c) is absorbed in it without loss. Limiting properties of GL VQ-F are [8] : (i) as m approaches infinity, all c prototypes receive equal updates and the v 's all converge to the grand mean of the data;

i whereas (ii) as m approaches I from above, only the winner is updated, arxl GLVQ-F reverts to LVQ. Finally, we mention that the winning prototype in

363

GL VQ-F for m=2 receives the largest (fraction) of (X at iterate t; that other ik,t

prototypes receive a share that is inversely proportional to their distance from the input; and that the GLVQ-F learning rates satisfy the additional constraint that

c I. (Xik,t ~ 1· i=1

The last algorithm we chose to compare to Chang's method is the deterministic DR algorithm [10, 11]. The basic idea for our implementation can be found in [10]; an alternate implementation is discussed in [11]. Like GLVQ-F, the DR algorithm may update all c prototypes for each input. Unlike GLVQ-F, the DR algorithm is not based on an optimization problem. Rather, its authors use intuitive arguments to establish the learning rate distribution for update equation (4) that is used by the DR model:

(8)

In (8) A > 0 is a user-specified constant that inhibits movement of the nonwinning prototypes towards xk; and {fik,t ;:: I} is a user-specified distribution of

fatigue factors for the DR algorithm. In our implementation the fatigue factors are not necessarily updated at the same time across i, and are not functions of iterate number t. Rather, control of these exponents depends on circumstances at individual prototypes. Figure 4.1 illustrates how the learning rates are controlled.

Vr ,I Vi ,I

Xk _ Rly =.:=::=:

Fig. 4.1. Control of learning rates in the DR algorithm

364

The DR user must specify an initial distribution for the {fik,t ~ I} , and four

constants : a rate of change of fatigue factor M > 0; a maximum fatigue fM ; a

fence radius Re > 0, and an inhibition constant A > O. Now suppose Vi,t_! to be

the winning prototype with IIXk - Vi.t_111 > Re as shown in Fig. 4.1. All c

prototypes are updated using (8) in (4). Following this, the distance IIXk - Vt,tll is

compared to Rf If IIXk - Vull < Re, the closest dog is now inside the fence

around xk' and is slowed down by increasing its fatigue, f ik.t ~ f ik.t- 1 + M. This inhibits future motion of this prototype a little (relative to the other prototypes), and it also encourages non-winners such as V r.t to look for other data to chase.

When the winning prototype gets very close to a (group of) inputs, we want it to stop moving altogether, so we also check the current value of f ik.t against fM'

Movement of (i.e., updating) the i-th prototype ceases when f ik.t > f M' Thus,

termination of updating is done prototype by prototype, and DR stops when all of the prototypes are "close enough" - as measured by their rates of change of fatigue exceeding the maximum - to the set of data for which they are the winner.

Analysis of the dependency of { (X&:~ } on the parameters

{{fik.tl,M,fM,Re,A} is complicated by the functional form of (8). However,

we can say that these rates insure that the winning prototype receives the largest (fraction) of (Xik,t at iterate t until the winning prototype closes in on its rabbits.

At this point, other prototypes may start receiving a larger fraction of the update even though they are non-winners. The DR learning rates do not satisfy any additional constraints.

Unlike Chang's method, none of the CL methods just described uses the labels of points in Xtr during training to guide iterates towards a good V. Consequently,

at the end of the learning phase the c prototypes have algorithmic labels that may or may not correspond to the physical labels of Xtr. The relabeling algorithm

discussed next uses the labels in Ltr to attach the most likely (as measured by a

simple percentage of the labeled neighbors) physical label to each v .. I

Recall that c is the number of classes in Xtr, labeled by the crisp vectors

{e1,e2" .. ,ec}=NhC • Now define Pij' i=1,2, ... , C, j=1,2, ... , c to be the

percentage (as a decimal) of training data from class i closest to Vj via the l-np

rule DV•6E ' Define the matrix P = [Pi/ P has c rows in N je' and c columns Pj

in N pC • We assign label ei to Vj when H(pP = ei '

labeli~vj¢::>H(pj)=ei; i=1,2, ... ,c;j=1,2, ... ,c (9)

365

We illustrate the labeling algorithm at (9). Suppose Xtr has c = 3 classes,

labeled with the crisp vectors {el' e2' ea} = Nha · Let V = ( vI' v2 ' v3 ' v 4) be

four prototypes found by some algorithm. Let P be the 3 x 4 percentage matrix shown in Table 4.1. Labeling algorithm (9) assigns v 1 to class 1, v 2 and v 3 to

class 3, and v 4 to class 2.

Table 4.1. Example of the multiple prototype labeling algorithm

e l 0.57. 0.10 . 0.13 ! 0.20 ...... 111 .. 111 ......... 1116 .......... • ............. 11.0 .... 11 .................... e-III ..................... ..

e2 0.15 ~ 0.10 ! 0.15 ! 0.60 e3 ········~:·~;········l·········~·.·~~········l·········~·.·~~········r········~:·~~·········

~ ~ ~ ~ H(Pl)=el H(P2)=e3 H(P3)=e3 H(P4)=e2

Whenever c > C, we will have more than one prototype for at least one of the labeled classes, and will use the I-nmp rule at (3') instead of the I-np rule at (3). How do we use I-np and I-nmp classifiers to compare unsupervised learning algorithms?

The method employed here is to first derive the prototypes V from ~ data Xtr without using the labels (that is, we pretend there are no labels) during the

training phase. Then (9) is used to get class labels for the prototypes. Finally, Xte is submitted to the classifier and its error rate is computed.

Error rates are conveniently tabulated using the c x c confusion matrix C = [cij] = [# labeled class jl but were really class i] that can be constructed during this

process. The error rate (in percent) is :

A brief specification of L VQ, GL VQ-F for m = 2 and the DR algorithms with the special choices as used in our examples is given in Table 4.2. Initialization of all three algorithms is done the same way, and is referenced to equation (13), which will be discussed in Section 5.

366

Table 4.2. The L VQ, GL VQ-F and DR algorithms

Store Labeled X = X tr C 9\P and label matrix L tr of X tr

Euclidean norm, similarity of data to prototypes

0E(X - v) = Ilx - viiI = ~(x - V)T (x - v)

Pick number of prototypes: 3 ::; c :5 30 maximum number of iterations: T = 1000 termination criterion: E = 0.1, 0.01 and 0.001

E: V-V =2:2:Y-Y ~ ~ P cJ I t t t-l I j=lr= rj.t Ij.t-l initial learning rate: aO = 0.6 (L VQ and GL VQ-F )

GL~Q~F_QIi1:t: weighting exponent: m = 2

I DR Only' fatigue distribution: {f 1k.0 = 1 ; l:5i:5C}

ROC of {f Ik.t} : M == 0.1

maximum fatigue: f M == 5 fence radius : R f = 0.2 inhibition factor: A = 5

Get V 0 = (vIO' v20' .. . , v cO) E 9\cp with (13)

For t = 1 to T: For k = 1 to n: , X f- X-Ix} ; Xk f-X

Find a Lvg Ik.t with (5) for i=l, ... , c

(or) Find aGLvg-F Ik.t with (7) for i=l, ... , c

(or) Find DR alk.t with (8) for i=l, ... , c

vl.t = Vl.t -l +alk.t(xk -VI.t-l ) for i==l, ... , C

Ilxk - v l.t ll < R f ~ f Ik.t f- f Ik.t + M (DR only) Do

f Ik.t > f M ~ V~,t f- Vu (DR only)

Next k If Et = IIV t - V t-l[[err < E, Stop and put V f- V t; Else

Adjust learning rate at f- ao (1 - t / T) Next t If t=T : put V f- V T

367

5. Numerical Results Using the Three CL Algorithms

5.1 Iris Data

Following Chang [12], we use Anderson's Iris data [15] for the experiments. Iris

contains SO (physically labeled) vectors in 9\4 for each of c = 3 classes of Iris subspecies. Figure 5.1 is a scatterplot of the third and fourth features of Iris that shows the subsample mean (listed in Table 5.1) for each of the three classes in these two dimensions.

X4 = Petal Width 3 = Virginica

2.5

2

1.5

1

.5

.J~---- ...... ~ -...

'" I II

-<t = Mean of class 1 '* = Mean of class 2

<> = Mean of class 3

I I I

I ••••• ••

I I I

••••• I .... .:-.1 I I

1 = Sestosa ." ... --.--",

" I, , . .... , , I

~#-'";~~ •••• 1/1 • \

(", .. .: "," \ • • ••••• I

.. "" I '~I_ ~I~ __ :,. .. II _.... • • ••• I

, I I •• •• • ,

, • II ,~' , " " ••• _.!- ........

-----, 2 = Versicolor

. '" I '"

• I

• • • I

I • I

'"

, ~I ,

, • ·'-V- • / X3 = Petal Length "-_ ...... '_<tI-

1 2 3 4 5 6 7

Fig. 5.1 The Iris data: feature 3 vs. feature 4

Class 1 is well separated from classes 2 and 3 in these two dimensions; classes 2 and 3 show some overlap in the central area of the figure, and this region contains the vectors that are usually mislabeled by nearest prototype designs. The dashed boundaries indicate the physically labeled cluster boundaries in these two dimensions.

368

Typical (resubstitution) error rates for classifiers - nearest prototype or otherwise - that use the labels during training on Iris are 0-5 mistakes. For unsupervised nearest prototype designs most studies report about 16 mistakes when resubstitution errors are counted. The resubstitution error rate for the supervised 1-np design that uses the class means (listed in Table 5.1 and shown on Figure 5.1) as single prototypes is 11 errors in 150 submissions using the Euclidean norm. i.e .• ED-. (IrislIris) = 7.33%.

V,UE

Table 5.1. Labeled sample (mean) prototypes V in 9t4 for Iris

Symbol Name Xl x2 x3 x4

~ VI 5.01 3.43 1.46 0.25

* v2 5.94 2.77 4.26 1.33

• V3 6.59 2.97 5.55 2.03

5. 2 Initialization of the CL Schemes

The following method was used to generate an initial set of prototypes V 0 :

Minimum offeature j : mj = ~{Xjk}: j = 1.2 ..... P k

Maximum of feature j : M. = ~{ x jk}: j = 1. 2 ..... P J k

(11)

(12)

The set hb(m. M) = (mI' Mllx ... x(mi • Mi J ... x(mp • M p J is a hyperbox in

9tP • The main diagonal of hb(m. M) connects m and M with the line segment {m + a.(M - m); 0 ~ a. ~ I}. Initial prototypes for all three CL algorithms were:

(i-I) ViO =m+ -- (1\I-m) , c-I i = 1.2 ..... c (13)

Thus. v1.O = m = (ml.m2 •...• mpf; vc,o = 1\1 = (M1.M2 ..... M p )T; aIll the remaining (c-2) initial prototypes are uniformly distributed along the diagonal of hb(m. M) . Table 5.2 shows the initial prototypes produced by (13) with the Iris data at c = 6. Runs discussed for other values of c were initialized the same way. The control parameters of each algorithm that we used are listed in Table 4.2. Some experimentation with them is discussed at the end of this section.

369

Table 5.2. Initial prototypes for Iris at c = 6 computed with (13)

VI,O = (4.30 2.00 1.00 0.10) = m

v2,0 = (5.02 2.48 2.18 0.58)

v3,0 = ( 5.74 2.96 3.36 1.06 )

v 4,0 = (6.46 3.44 4.54 1.54 )

v5,0 = (7.18 3.92 5.72 2.02 )

v6,0 = (7.90 4.40 6.90 2.50) =M

5.3 Termination of the CL Methods

The primary tennination criterion is to compare successive estimates of the prototypes with the I-nonn.

IIVt - Vt-1111 = fllvr.t -vr.t-111 = ~ flVrj.t -Vrj.t-ll is compared to cutoff r=1 1 j=lr=1

threshold e. If this fails, secondary termination occurs at the iterate limit T specified in Table 4.2. We tested three thresholds: e = 0.1, 0.01 and 0.001. The DR algorithm has a third termination criterion (the prototype-by-prototype cutoff) that can (and often does) occur inside the main iteration, as shown in Table 4.2.

5.4 Iteration

We drew samples randomly from X without replacement. One iteration corresponds to one pass through X. Each algorithm was run 5 times for each case discussed to see how different input sequences affected the tenninal prototypes. For the less stringent termination criteria (e = 0.1 and 0.01), we sometimes obtained different terminal prototypes for different runs. For e = 0.001, this effect was nearly (but not always) eliminated. Most of the runs using e = 0.001 were completed in less than 300 iterations through X. DR, via its prototype-by-prototype criterion, often terminated in less than 50 iterations.

5.5 Results

The results shown in Tables 5.3-5.5 are typical cases; those in Table 5.6 are the best case we saw in each instance. The limited number of trials conducted for each case is not a serious detriment to our main objective, which is to compare the methods rather than obtain an optimal design for Iris. Indeed, it may be that with enough experimentation, any of the CL models will yield our best case results.

Table 5.3 exhibits the tenninal prototypes found by each algorithm at c=6, as well as the resultant I-nmp error rates they produce when used in (3') on all of

370

IRIS. The DR parameters {{flk.t},M,fM,Rr,A}used in all runs are shown in

Table 4.2. GLVQ-F and LVQ used <Xo = 0.4.

Each of the three physical clusters is represented by two prototypes for both LVQ and GLVQ-F, and the overall error rate produced by these two classifiers is 9.33%. The DR model performs much better, finding 6 prototypes that produce only 4 errors when used with (3'). Note especially that DR uses only 1 prototype for class 1, and that it uses 3 for class 2 and 2 for class 3.

Table 5.3 Typical prototypes, confusion matrices, and error rates for c = 6 prototypes

LVQ LVQ GLVQ-F GLVQ-F (m=2) Labels prototypes Labels prototypes

1 4.69 3.12 1.39 0.20 1 4.75 3.15 1.43 0.20 ......................................................... .. ....................................................... 1 5.23 3.65 1.50 0.28 1 5.2'4 3.69 1.50 0.27 ......................................................... • ....................................................... 1

2 5.52 2.61 3.90 1.20 2 5.60 2.65 4.04 1.24 ......................................................... . ......................................................... 2 6.21 2.84 4.75 1.57 2 6.18 2.87 4.73 1.56 ......................................................... . ........................................................ 3 6.53 3.06 5.49 2.18 3 6.54 3.05 5.47 2.11 ......................................................... .. ....................................................... 3 7.47 3.12 6.31 2.02 3 7.44 3.07 6.27 2.05

[SO 0

J6] [50 0

J6] c= g 50 c= g 50 14 14

Error rate = 9.33 % Error rate =9.33 % DR DR

Labels prototypes 1 5.08 3.45 1.44 0.22 ......................................................... 2 5.78 2.63 3.99 1.20 ......................................................... 2 6.61 3.00 4.45 1.39 ......................................................... 2 6.09 2.98 4.60 1.40 ......................................................... 3 6.06 2.83 4.95 1.78 ......................................................... 3 6.74 3.12 5.60 2.24

[50 0 1] c= g 47 1

Error rate =2.66%

The prototypes in Table 5.3 are plotted in Fig. 5.2 against a background created by roughly estimating the convex hull of each physical class in these two dimensions by eye. Some of the prototypes are hard to see because their coordinates are very close in these two dimensions. We draw attention to the LVQ and GLVQ-F prototypes that seem to lie on the boundary between classes 2 and 3

by enclosing these points with a jagged star 0-. These prototypes are the ones

that incur most of the misclassifications that are committed by the L VQ arxl

371

GLVQ-F I-nmp classifiers. Notice that there is no DR prototype in this region at c=6. Instead, DR opts for only one class 1 prototype, thereby enabling it to better represent the critical boundary region by developing 2 prototypes near to but now below the star. This is a real difference between and decided advantage for the DR model compared to the two LVQ designs.

2.5

2

1.5

1

0.5

o

o 6: LVQ

[] 6: GLVQ-F

.6:DR

2

Figure 5.2. Terminal prototypes at c = 6

4 6

Table 5,4 lists the same information as Table 5.3 for c = 7. There is a sharp drop in the error rate for the LVQ and GLVQ-F l-nmp designs. Be careful to note that the seventh prototype is not "added" to the previous six; rather, new prototypes are found by each algorithm. The error rates in Table 5,4 are very low for designs that do not use the labels during training. Note that LVQ and GLVQ-F continue to use 2 prototypes for each of classes 1 and 2, and add a third representative for class 3 at c = 7. Contrast this to DR, which still has only 1 class 1 prototype, but now uses 4 for class 2, and 2 for class 3. Adding a seventh prototype does not improve the DR I-nmp design because two of the seven prototypes are almost identical to one that was used at c = 6. Thus, the DR method continues to provide a more efficient representation of the data than either L VQ or

372

GL VQ-F in the sense that only one prototype is needed, so only one is used to represent class 1 points.

Table 5.4. Typical prototypes, confusion matrices, and error rates for c = 7 prototypes

LVQ LVQ GLVQ-F GLVQ-F (m=2) Labels prototypes Labels prototypes

1 4.68 3.11 1.39 0.20 1 4.74 3.15 1.43 0.20 ......................................................... . ......................................................... 1 5.23 3.65 1.50 0.28 1 5.24 3.69 1.50 0.27 ......................................................... . ......................................................... 2 5.53 2.62 3.93 1.21 2 5.57 2.61 3.96 1.21 ......................................................... . ......................................................... 2 6.42 2.89 4.59 1.43 2 6.26 2.92 4.54 1.43 ......................................................... . ......................................................... 3 6.57 3.09 5.52 2.18 3 6.62 3.09 5.56 2.16 ......................................................... . ......................................................... 3 7.47 3.12 6.31 2.02 3 7.50 3.05 6.35 2.06 ......................................................... . ......................................................... 3 5.99 2.75 5.02 1.79 3 6.04 2.79 4.95 1.76

(50 0 }9] (SO 0 l] c= g 47 c= g 46 1 1

Error rate = 2.66 % Error rate = 3.33 % DR DR

Labels prototypes 1 5.06 3.42 1.45 0.21 .......................................................... 2 5.58 2.49 3.89 1.10 .......................................................... 2 5.69 2.884.18 1.29 .......................................................... 2 6.09 2.95 4.63 1.40 .......................................................... 2 6.64 3.004.56 1.41 .......................................................... 3 6.11 2.82 4.91 1.78

........................................................ o.

3 6.72 3.10 5.57 2.22

(50 0 }9] c= g 47 1

Error rate = 2.66 %

Figure 5.3 shows that the crucial "boundary" prototypes from LVQ and GLVQF in the c = 6 case have roughly "divided" into two sets of new prototypes, shown again by the jagged star. LVQ and GL VQ-F essentially "catch up" with DR in the region of overlap by now representing class 3 with 3 prototypes instead of 2. These two prototypes seem to have moved away from the apparent boundary of the convex hull of class 2 towards that of class 3.

When the three CL algorithms are instructed to seek c = 8 prototypes, the error rate for all three I-nmp designs typically remains at 2.66% as shown in Table 5.5. At c = 9 the results are quite similar to those shown for c = 8. But only DR continues to represent class 1 by a single prototype.

373

2.5

o 7 :LVQ D 7: GLVQ-F

2 • 7: DR

1.5

0.5

o 2 4 6

Fig. 5.3. Terminal prototypes for c = 7

Tables 5.3-5.5 suggest that the replacement of IRIS with 8 or 9 prototypes found by any of the three CL algorithms results in a l-nmp design that is quite superior to the labeled l-np design based on the c = 3 subsample means V. Moreover, the DR model yielded consistently better results than either L VQ or GLVQ-F in almost every case we tested.

Table 5.5. Typical confusion matrices and class representatives for c = 8 terminal prototypes

c = 8 prototypes LVQ GLVQ-F DR

(50 0 }g) (50 0 19] (50 0 }g) c= g 47 c= g 47 c= g 47

1 1 1

E= 2.66 % E=2.66% E = 2.66 % Class I : 2 Class I : 2 Class I : I Class 2 : 3 Class 2: 3 Class 2: 4 Class 3 : 3 Class 3 : 3 Class 3 : 3

374

The experiments discussed so far led us to wonder how few prototypes were needed by the I-nmp rule to achieve good results. And conversely, going in the other direction, at what point does prototype representation become counterproductive? Table 5.6 reports the best case results (as number of resubstitution errors) we saw using each algorithm for various values of c.

Table 5.6. Number of resubstitution errors of the I-nmp designs: best case results

c-' II 3 4 5 6 7 8 9 15 30

LVQ 17 24 14 14 3 4 4 4 4 GLVQ-F 16 20 19 14 5 3 4 4 4

DR 10 13 3 3 3 4 3 6 3

First, we can observe that on passing from c = 3 to c = 4, even the best case error rate for all three models increased, followed by a decrease on passing from c = 4 to c = 5. Second, the shaded cell in Table 5.6 points to one run of DR for which the 5 prototypes shown in Table 5.7 produced only 3 resubstitution errors when used in (3'). This shows that the Iris data can be well represented by five labeled prototypes.

Table 5.7. Five DR prototypes that yield 3 resubstitution errors with the I-nmp rule on Iris

Class 1 5.03 3.38 1.50 0.31

Class 2 5.59 2.70 4.02 1.28 6.45 2.88 4.60 1.46

Class 3 6.12 2.88 5.06 1.82 6.95 3.05 5.83 2.12

At the other extreme, increasing c past c = 7 has little effect on the best case results. Taken together, these observations suggest that Iris (and more generally, any labeled data set) has some upper and lower bounds in terms of high quality representation by multiple prototypes for classifier design. There seems to be little hope, however, of discovering this on a better than case by case basis.

It is also clear from Table 5.6 that the DR model provides the best results for every value of c. We conjecture that the reason for this is that the control structure for this model is fundamentally very different from both L VQ and GL VQ-F. Thus, it seems as if DR rapidly closes on a single prototype for class 1 (which for Iris, is really all that is needed), terminates updates for this prototype, and by its increased fatigue encourages the remaining prototypes to seek other data to represent, which

375

they do. It would be a mistake to generalize this belief to other data without more computational evidence. However, we believe that when a small number of errors can be tolerated in exchange for a small number of multiple prototypes, the DR algorithm will prove to be superior to both LVQ and GLVQ-F.

5.6 Robustness

Finally, we comment on the sensitivity of each CL model to changes in its control parameters. We did not experiment with changes in m for GLVQ-F. Certainly this parameter affects terminal prototypes. However, we doubt that small changes in m will cause radical changes in the results given above. We varied nO

from 0.4 to 0.6 in both LVQ and GLVQ-F without noticeable changes in typical results.

The DR algorithm has more parameters to vary, and we spent a little time experimenting with them before settling on the values listed in Table 4.2. For example, we made runs of DR with A ranging from 1 to 9, and found little difference in the average case outputs at every value of c shown in Table 5.6. In particular, the average minimum number of DR errors occurred at c = 6 prototypes for A = 1,5, and 9. All three CL algorithms are sensitive - but not alarmingly soto changes in their control parameters. Usually - but not always - when the number of errors was the same for competing designs, the vectors that were misclassified were also identical.

6. Other Multi-prototype Designs

This section discusses C. L. Chang's algorithm [12]; an improved version of it; and a batch method due to Yen and C. W. Chang [16]. C.L. Chang discussed one of the earliest multiple prototype classifier schemes which was presented in the context oflabeled data replacement (Fig. 1.3). We give a short verbal description of it here, and record our (modified) implementation of it in Appendix 2.

The method begins by assuming every point in a labeled data set X is its own prototype, so let V = X. Consequently, the I-np rule at (3) or the I-nmp rule at n (3') error rate is zero, E DV • (XI V n) = O.

n,UE

Now find (i, j) = ~ {[llxs - xtll]}. Tentatively merge these two s;tt:xs,Xt eV n

points using the weighted mean v ij = (MXi + Nxj)/(M + N), where M, N are the

number of merger parents of x. and x. respectivelyl. Next, update the prototypes 1 J

by setting V n- 1 f-X-{xi,Xj}+vijandcalculate E DV • (XIVn_d using n-l·UE

lInitially, M and N have the values 1. When two data points x. and x. are merged, 1 J

V ij = (Xi + x j)/2, and V ij has 2 merger parents. Subsequently, if V ij and xk are

merged, then M=2 ,N=l, and V ijk = (2vij + x k) /3, etc.

376

the I-nmp rule at (3'). If the error rate is still zero and if x. and x. have the same 1 J

label, accept the merger and continue. If either (i) the error rate increases; or (ii) x. 1

and x. have different labels, do not merge x. and x.. In this case Chang regards x. J 1 J 1

and x. as non-mergeable prototypes and continues. J

When there is a merger, the number of merger parents of the child produced is increased by 1, the child inherits the class label of its parents, and it replaces them in the current prototype set. Especially important is that the test data are fixed (all of X). Continue this procedure until further merging produces an error, and at this point stop, having found c prototypes V that replace the n labeled data X and that c preserve a resubstitution error rate of zero, i.e., EDV • (XI Ve ) = O. An

e'UE

implementation of this scheme based on minimal spanning trees is given by Chang, who reports in [12] that his method finds c = 14 prototypes that replace Iris and preserve a zero resubstitution error rate. The prototypes were not listed in [12].

Chang's approach was then modified in three ways. First, instead of using the weighted mean v 1j = (Mx1 + Nx j) /(M + N) to merge prototypes we used the

simple arithmetic mean. Second, we altered the search for candidates to merge two ways. First, we partition the distance matrix into c submatrices blocked by common labels, and look for the minimum in each block. This eliminates candidate pairs with different labels. Then, we attempt to merge the minimum of label-matched pairs. If this fails (because the prototype produced by merger yields an error), we look at the next candidates. And we continue looking in ascending order of distance until either (i) a merger can be done; or (ii) no merger is possible. The algorithm terminates when (ii) occurs. This is effective because merging the closest points of the same label is not necessary to preserve the zero error rate. Table 6.1 lists the modified Chang algorithm.

These simple modifications led to the c = 11 prototypes shown in Table 6.2 that yield zero resubstitution error, EDv • (Irisl VII) = O. There is little doubt

ll'UE

that an even more efficient modification of Chang's algorithm can be found. An interesting question concerns how to modify this method so that it produces the minimum number of prototypes that will guarantee a prespecified error rate. This point will be discussed a little more in Section 7.

The last method we mention is due to Yen and Chang ]16] , who modified the (batch) fuzzy c-means algorithm to produce multiple prototypes for each class. TIle theory of their MFCM-n method is well discussed elsewhere, so we are content here to show their results on Iris. Specifically, Yen and Chang compare four outputs: (FCM, c=3, 16 errors ); (MFCM-l, c=3, 16 errors); MFCM-2, c=5 with (1,2,2) labeled prototypes for classes (1,2,3), 14 errors); and their best result (MFCM-3, c=7, with (1,3,3) labeled prototypes for classes (1,2,3), 8 errors).

377

Table 6.1. The modified Chang algorithm used in the examples

Store

Pick

Set

Do

Labeled X = Xtr C ~p and label matrix L tr of X tr .

X = Xl u ... uXc = X tr E ~p ordered with c classes as

X II C c IXil' 1 -= Xl"'" X nI ,'" ,Xl"'" X nc ' nl = ,1 = , ... , C. ~ ' .

class 1 clais c Euclidean norm, similarity of data to prototypes

BE(x, v) = Ilx - vii = ~(x - v? (x - v)

While Ev ttl .IiE (XI V ttl) = 0 :

o Compute partitioned upper triangular distance matrix on V

Class 1 Class c

'\Jllv~ - v~111 "'Jllv~ -vflll D(Vn)= ~ ••• ~

@ Find (~*, r) = ~ {llv~ - v~lI}. }";ksc ; s;<t

Compute v k * = (vr + vf)/2, update

V * V {k* kO} kO n-I ~ n - V i ,Vj + V .

If EVn_l.5E (XIV~_I) = 0 ; V n-l ~ V~_1 Compute DeV 1) ; continue. n-

Else return to D(V n) and find next (~', r) (~', r) = ~g mi~ {llv~ -v~ II} and (~', r) * (~', r)

ISkSc' ; s;<t

Attempt to merge v h * = (vr" + vr)/2. Repeat @ until no merger is possible

@) Terminate with V c 3 EVe .IiE (XI V c) = O.

378

Table 6.2. c = 11 modified Chang's algorithm prototypes that yield zero resubstitution errors with the I-nmp rule on Iris

Class 1 4.94 3.36 1.38 0.28

Class 2 5.69 2.33 3.88 1.16 6.70 2.98 4.82 1.56 5.86 3.08 4.55 1.54 6.15 2.60 5.00 1.55

Class 3 4.90 2.50 4.50 1.70 6.29 2.71 5.01 1.68 5.83 2.86 4.98 1.88 6.05 2.40 5.30 1.45 6.19 3.00 5.33 2.31 7.02 3.09 6.14 2.18

7. Discussion

This article compared the efficacy of three CL methods for multiple prototype generation to three other methods that use labeled data in a very different way. Table 7.1 summarizes the best results achieved by the seven algorithms (including the sample mean based nearest prototype design) used in our study for the Iris data. What does Table 7.1 entitle us to conclude? First, our results are of course specialized to just one data set, and generalizations to other data warrant great caution.

Table 7.1. Summary of the best error rates achieved by the 6 methods

Algorithm

Labeled 1-np ( V)

i Min. errors ~ Value of c at ~ in 150 tries ~ min. # errors

11 3 I-nmp designs

.. kY..Q ................................... L ............... :? ................ L ................... 1. ................. ..

.. 9..hY.Q~gJ1E.:=~)" ........... L .............. } ................ L ................... § .................. .. DR ; 3 1 5 ............................................... ~ .................................... :. ........................................ .. Chang ~ 0 ~ 14

::M~~;.¢.~~~i::::::::::::::::::L:::::::::::::I::::::::::::I::::::::::::::::::U:::::::::::::::::: MFCM-3 1 8 ; 7

Second, all six 1-nmp designs use the labeled data more effectively than the 1-np design based on the labeled sample means. This indicates that better classifier performance is certainly possible using multiple prototypes. Our tests at low and

379

high values for c suggest that there is probably an optimal range for the number of prototypes that should be used to replace labeled training data. Since a theoretical derivation of this number seems optimistic, it is probably the case that the best number is data dependent, and must be discovered by the trial and error process illustrated by Table 5.6. The minimum error rate (zero) is not realized by the minimum number of prototypes (five). If the determining criterion for choosing multiple prototypes is minimum error rate, then our modification of Chang's method might be the method of choice. On the other hand, we can imagine applications (image compression comes to mind) where it is very important to find a minimum number of prototypes. If this is important enough, developers may be willing to sacrifice a little accuracy to achieve this objective. In this case, the DR algorithm seems ideally suited to finding multiple prototypes that yield a few errors with fewer prototypes than the modified Chang's method.

We did not study the question of how to find an optimal path for continuing our modification of Chang'S method to carry it beyond zero errors, but the DR results (3 errors with 5 prototypes) suggest that paths exist that might bring us close to this. Specifically, some sort of partial minimal spanning tree technique might be used to extend our modification so that the best path beyond no errors could be found for 1 error, 2 errors, etc. A solution to this problem would provide users a way to find some (not necessarily) minimal number of prototypes that guaranteed any prespecified resubstitution error rate that the data would support.

Finally, we comment on the results reported by Yen and Chang [16]. Their best I-nmp (batch-designed) classifier was inferior to the best results achieved by all of the CL models. We suspect that sequential updating encourages "localized" prototypes which are able, when there is more than one per class, to position themselves better with respect to subclusters that may be present within the same class. This leads us to conjecture that batch algorithms are at their best when used to erect I-np designs; and that sequential models are more effective for I-nmp classifiers. There is little doubt that the DR model is the best of the three CL methods tried for this data. We speculate that this is due to its prototype-byprototype control structure, which provides very "localized" behavior, a property that seems ideally suited to finding multiple prototypes. The fact that DR never placed two prototypes in Class 1 of Iris supports this, but we reserve judgment until more experimental evidence exists. This will be the subject of our next investigation.

References

1. Duda, R. and Hart, P. (1973). Pattern Classification and Scene Analysis, Wiley Interscience, NY.

2. Bezdek, J.e. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum, NY.

3. Titterington, D., Smith, A. and Makov, U. (1985). Statistical Analysis of Finite Mixture Distributions, Wiley, NY.

4. Krishnapuram, R. and Keller, J. (1993). A Possibilistic Approach to Clustering, IEEE Trans. Fuzzy Systems, 1(2),98-110.

5. Dasarathy, B.V. (1990). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos, CA.

380

6. Devijver, P. and Kittler, J. (1982). Pattern Recognition: A Statistical Approach, Prentice-Hall, Englewood Cliffs, NJ.

7. Kohonen, T. (1989). Self-organization and Associative Memory. SpringerVerlag, Berlin, Germany, 3rd ed.

8. Karayiannis, N., Bezdek, J.c., Pal, N.R, Hathaway, RJ. and Pai, P.-1. (1996). Repairs to GLVQ : A new family of competitive learning schemes, IEEE Trans. Neural Networks, 7(5), 1062-1071.

9. Pal, N.R, Bezdek, J.C., Tsao, E.C. (1993). Generalized Clustering Networks and Kohonen's Self-Organizing Scheme. IEEE Trans. Neural Networks, 4(4), 1993,549-558.

10. Lim. G.S., Alder, M. and Hadingham, P. (1992). Adaptive quadratic neural nets, Patt. Recog. L.etters, 13, 325-329.

11. McKenzie, P. and Alder, M. (1994). Initializing the EM algorithm for use in Gaussian mixture modeling, in Pattern Recognition in Practice IV ; Multiple Paradigms, Comparative Studies and Hybrid Systems, ed. E.S. Gelsema arxl L. N. Kanal, Elsevier, NY, 91-105.

12. Chang, C.L. (1974). Finding prototypes for nearest neighbor classification, IEEE Trans. Comp., 23(11).

13. Yair, E., Zeger, K. and Gersho, A. (1992). Competitive learning and soft competition for vector quantizer design, IEEE Trans. SP, 40(2), 294-309.

14. Gersho, A. and Gray, R (1992). Vector Quantization and Signal Compression, Kluwer, Boston.

15. Anderson, E. (1935). The IRISes of the Gaspe peninsula, Bull. Amer. IRIS Soc., 59, 2-5.

16. Yen, J. and Chang, C.W. (1994). A multi-prototype fuzzy c-means algorithm, Proc. 2nd EUFIT, Aachen, 539-543.

17. Kohonen, T. (1990). Improved versions of learning vector quantization, Proc. IJCNN, 1, IEEE Press, Piscataway, 545-550.

Fuzzy Data Analysis

H.-J. Zimmermann

RWTH Aachen, Templergraben 55, D-52062 Aachen, Germany

Phone: +49-241-8061 82, Fax: +49-241-88 88-168

zi @buggLor.rwth-aachen.de

Abstract. Data analysis has been described as "the search for structure in data", or as a means of reducing complexity. Most of the traditional methods for data analysis or data mining are dichotomous, Le. they assume that patterns to be detected are two-valued. Whenever this is not the case the relationship between data or elements on one hand and classes to which these data shall be assigned on the other hand becomes gradual. In these cases fuzzy classification methods become appropriate. After a short introduction into data analysis algorithmic as well as knowledge based approaches of fuzzy data analysis are described. Finally tools and industrial applications will be presented.

1. Basic Principles of Data Analysis

In general, data analysis can be considered as a process in which starting from some given data sets information about the respective application is generated. In this sense data analysis can be defined as search/or structure in data [4]. In order to clarify the terminology about data analysis used throughout this paper a brief description of its general process is given below.

In data analysis objects are considered which are described by some attributes. Objects can be for example persons, things (machines, products, ... ), time series, sensor signals, process states, and so on. The specific values of the attributes are the data to be analyzed. The overall goal is to find structure (information) about these data. This can be achieved by classifying the huge amount of data into relatively few classes of similar objects. This leads to a complexity reduction in the considered application which allows for improved decisions based on the gained infounation. Figure 1 shows the process of data analysis described so far which can be separated into feature analysis, classifier design, and classification.


382

Data Analysis

1 . Feature Ana~sis Features

I Oblects X;. I • C") ~ ~12 . X ir

2. Classifier Design

Object X, with features

C") ~Il Classes X,. .. ~ [QJ[gJ ... lQ;J

Object X, with features

C") ~ll

X2"

Object X, with featu res

C") f~~ XXr

3. Classification

ObjectX.'J Assigning with featu res Objects to

(X.<;,. ) • ~ Classes

~N+j.l ~[gJ ... lQ;J

XN+i,r

Fig.l. Contents of Data Analysis

383

Here three steps of complexity reduction can be found:

• An object is characterized in the first step by all its attributes. • From these attributes the ones which are most relevant for the specific data

analysis task are extracted and called features (feature extraction). • According to these features the given objects are assigned to classes (classifier

design).

Information is gained from the data in the sense that relationships between objects are detected by assigning objects to classes. Based on the derived insights, improved decisions can be made. Here one could think of decision support for diagnosis problems (medical or technical), evaluation tasks (e.g. creditworthiness [21]), forecast (sales, stock prices), and quality control as well as direct process optimization (alarm management, maintenance management, connection to process control systems, and development of improved sensor systems). Of course, this list of applications is by no means exhaustive; for more applications see e.g. [4].

The process of data analysis described so far is not necessarily connected with fuzzy concepts. If, however, either features or classes are fuzzy the use of fuzzy approaches is desirable. In figure 1, for example, objects, features, and classes are considered. Both, features and classes can be represented in crisp or fuzzy terms. An object is said to be fuzzy if at least one of its features is fuzzy. This leads to the following four cases [22]: ..

• Crisp objects and crisp classes

• Crisp objects and fuzzy classes

• Fuzzy objects and crisp classes • Fuzzy objects and fuzzy classes

2. Methods for Fuzzy Data Analysis

Figure 2 indicates that some boxes - particularly those of feature analysis and classifier design - contain quite a number of classical dichotomous methods, such as clustering, regression analysis, etc., which for fuzzy data analysis have been fuzzified, i.e., modified to suit problem structures with fuzzy elements. The box "classification", in contrast, lists some approaches that originate in fuzzy set theory and that did not exist before.

In modem fuzzy data analysis, three types of approaches can be distinguished. The first class is algorithmic approaches, which in general are fuzzified versions of classical methods, such as fuzzy clustering, fuzzy regression, etc. The second class is knowledge-based approaches, which are similar to fuzzy control or fuzzy expert systems. The third class, (fuzzy) neural net approaches, is growing rapidly in number and power. Increasingly combined with these approaches, but not discussed in this article, are evolutionary algorithms and genetic algorithms [24]. The major three classes mentioned above will be discussed in the following.

384

Process Description

determination of membership function feature nomination scale levels

Feature Analysis Classifier Design

clustering factor analysis structured modelling discriminant analysis neural nets regression analysis knowledge based

approaches

Classification

pattem recognition diagnosis ling. approximation fuzzification defuzzification ranking

neural nets

Fig. 2. Scope of data analysis

2.1 Algorithmic and Knowledge-Based Approaches

Here we shall focus on the most frequently used family of algorithms, namely cluster methods. Essentially graph-theoretic, hierarchical and objective-functional methods can be distinguished.

Hierarchical clustering methods generate a hierarchy of partitions by means of a successive merging (agglomerative) or splitting (divisive) of clusters. Such a hierarchy can easily be represented by a dendrogram, which might be used to estimate an appropriate number of clusters, c, for other clustering methods. On

385

each level of agglomeration or splitting, a locally optimal strategy can be used without taking into consideration the policies used on preceding levels. These methods are not iterative; they cannot change the assignment of objects to clusters made on preceding levels. Figure 3 shows a dendrogram that could be the result of a hierarchical clustering algorithm. The main advantage of these methods is their conceptual and computational simplicity.

In fuzzy set theory, this type of clustering method would correspond to the determination of "similarity trees".

,-1-

r- .....

r-'-1

Fig. 3. Dendrogram for hierarchical clusters

Graph-theoretic methods are normally based on some kind of connectivity of the nodes of a graph representing the data set. The clustering strategy is often breaking edges in a minimal spanning tree to form subgraphs. We shall not elaborate on these methods in more detail here.

Objective-function methods allow the most precise formulation of the clustering criterion. The "desirability" of clustering candidates is measured for each c, the number of clusters, by an objective function. Typically, local extrema of the objective function are defined as optimal clusterings. Many different objective functions have been suggested for clustering (crisp clustering as well as fuzzy clustering). The interested reader is referred in particular to the excellent book by Bezdek [2] for more details and many references. We shall limit our considerations to one frequently used type of (fuzzy) clustering method, the socalled c-means algorithm.

386

Classical (crisp) clustering algorithms generate partitions such that each object is assigned to exactly one cluster. Often, however, objects cannot adequately be assigned to strictly one cluster (because they are located "between" clusters). In these cases, fuzzy clustering methods provide a much more adequate tool for representing real-data structures.

To illustrate the difference between the results of crisp and fuzzy clustering methods let us look at one example used in the clustering literature very extensively: the butterfly .

• X3 .x IS

.x6 .x12

.x2 • Xs • x7 rs .xg .x II .xU

.x4 .x10

.x I .x13

Fig. 4. The butterfly

Example 1: The data set X consists of 15 points in the plane, as depicted in Fig. 4.

Clustering these points by a crisp objective-function algorithm might yield the picture shown in Fig. 5, in which" 1" indicates membership of the point in the lefthand cluster and "0" membership in the right-hand cluster. The x's indicate the centers of the clusters. Figures 6 and 7, respectively, show the degrees of membership the points might have to the two clusters when using a fuzzy clustering algorithm.

We observe that, even though the butterfly is symmetric, the clusters in Fig. 5 are not symmetric because point Xg, the point "between" the clusters, has to be (fully) assigned to either cluster 1 or cluster 2. In figures 6 and 7, this point has the degree of membership .5 in both clusters, which seems to be more appropriate. Details ofthe methods used to arrive at Figs. 5-7 can be found in Bezdek [2].

387

clustercenters ~ X

Fig. 5. Crisp clusters of the butterfly

•• 86 .14 • •• 94 •. 06

•• 97 X •. 99 •. 86 •. 50 .14 .01 .X •. 03

.94 •• 06 •

• 86 .14 • • Clustercenters A X =

Fig. 6. Cluster 1 of the butterfly

388

•• 14 •. 86

,.06 ,.94

,.03 X .0' .14 .50 .86 .99 .97 • • • IX , .06 .94 • •

, .14 , .86

Clustercenters a X

Fig. 7. Cluster 2 of the butterfly

One of the best known fuzzy clustering algorithms is the fuzzy c-means algorithm [2].

Even though the fuzzy c-means algorithm (FCM) performs better in practice than crisp clustering methods, problems may still have features that cannot be accommodated by the FCM. Exemplarily, two of them shall be looked at briefly.

Most crisp and fuzzy clustering algorithms seek in a set of data one or the other type of clustershape (prototype). The type of prototype used determines the distance measurement criteria used in the objective function. Windham [19] presented a general procedure that unifies and allows the construction of different algorithms using points, lines, planes, etc. as prototypes. These algorithms, however, normally fail, if the pattern looked for is not in a sense compact. Dave [6] suggested an algorithm that can find rings or, in general, spherical shells in higher dimensions. His fuzzy shell clustering (FSC) algorithm modifies the variance criterion by introducing the radius of the "ring" searched for, arriving at

where

c n min zs(u, v,r) = LL(llik)ffi(Dik )2

i=l k=l

Dik = III Xk - Vi II - ri I

ri is the radius of the cluster prototype shell, and all other symbols are as defined for the FCM algorithm. The algorithm itself has to be adjusted accordingly by including rio

389

Details are given in Dave [6]. This algorithm also finds circles if the data are incomplete. Figure 8 shows examples of it from Dave.

o o o o 0 0 o~~ 0 goo

" 0 o <¢eII><P o

o o

o

a.

o

o § d>% 0

~o ii' 0 o 0 0

o o

o

c.

o 0

o o

b.

o

d.

Fig. 8. Clusters by the FSC. (a) Data set; (b) circles found by FSC; ( c) data set, (d) circles found byFSC

The FCM as well as the FSC satisfy the constraint

c

L Jlik = 1, 1 ::; k ::; n . i=l

Considering data sets shown in Fig. 9, this constraint would enforce that, for instance, two cluster points A and B would get the same degree of membership, Jl = .5, in clusters 1 and 2.

The Jlik would then express a kind of "relative membership" to the clusters, i.e., the membership of point B in cluster 1 compared to the membership of point B in cluster 2 (see also Fig. 9). From an observer's point of view it might, however, be inappropriate to assign the same degrees of membership to points A and B because he interprets those as (absolute) degrees of membership, e.g., degrees to which points A or B belong to clusters 1 or 2, respectively. Krishnapuram and Keller [10]

390

suggest their possibilistic c-means algorithm clusters by modifying the definition of a fuzzy c-partition and, as a consequence, the objective function of the cluster algorithm.

.8

•••• • ••• ••• • • •• il'. .. \ • •

Fig. 9. Data sets [Krishnapuram and Keller 1993]

••• A. ••• • •• • ••

• •• • ••• • •• • •

An interesting approach is also the semi-supervised approach by Bezdek et.al. [5] that combines labeled and unlabeled data in clustering and thereby increases the stability of existing clusters.

Knowledge based approaches resemble very much the classical and well-known Fuzzy Control systems. The real numbers obtained after defuzzification are here interpreted as degrees of membership of the elements to the classes to which they have to be assigned.

2.2 Other Methods for Fuzzy Data Analysis

2.2.1 New Developments of Methods for Data Analysis

Recently a lot of research efforts are directed towards the combination of different intelligent techniques. Here the elaboration of neuro-fuzzy systems is one cornerstone for the future development of intelligent machines, see e.g. [8]. One of these methods is a fuzzy version of Kohonen's network [3].

It is expected that in the near future the areas of fuzzy technology, neural networks, and genetic algorithms will be combined to a higher degree. Especially for data analysis the combination of these methods could give promising results.

Often it is not realized that the existence of e.g. a fuzzy clustering algorithm is not sufficient to solve real classification problems: the data to be clustered are generally not in a suitable form to be fed into the algorithm but they have to be pre~processed before.

neuro

neuro-fuzzy

391

supervised learning

Back-propagation

Fuzzy Back-propagation

Fig. 10. Neural and neuro-fuzzy methods for data analysis

2.2.1 Data Pre-processing

unsupervised learning

Kohonen network

Fuzzy Kohonen network

If, for example, in quality control some acoustic signals have to be investigated, it becomes necessary to filter these data in order to overcome the problems of noisy input. In addition to these filter methods some transformations of the measured data as, for example, fast Fourier transformation (FFf) could improve the respective results. Both filter methods as well as FFf, belong to the class of signal processing techniques. Data pre-processing includes signal processing and also conventional statistical methods.

Statistical approaches could be used to detect relationships within a data set describing a special kind of application. Here correlation analysis, regression analysis, and discrimination analysis can be applied adequately. These methods could be used for example to facilitate the process of feature extraction. If, for example, two features from the set of available features are highly correlated, it could be sufficient for a classification to consider just one of these two.

3. Tools for Data Analysis

3.1 Functions of Tools

In many application areas it is known or can be predicted which methods can best be applied to determine the desired solution. In data analysis this is, however, generally not the case. The method to be applied in most cases depends very much on the type of information available and on the specific kind of patterns that are being searched. General recommendations can often be given e.g. that clustering methods should be used if one knows the number and shapes of the structures

392

before, knowledge based approaches if expert knowledge is available and neural nets when enough learning data are available to train the respective neural nets.

The specific method or pre-processing technique is, however, generally unknown in advance. Since in addition data analysis can only be performed computer supported, it is important that a variety of methods are at hand in the form of user-friendly software.

Software that contains the appropriate approaches in the form of CASE-tools is readily available for the area of fuzzy control. For fuzzy data analysis it is still very rare. In the following one of the few, maybe so far the only, existing CASE-tool is briefly described.

3.2 DataEngine - A Software-Tool for Data Analysis

Data Engine The Software-Tool for Intelligent Data Analysis

Data Input

~ File ~ Serial Ports "-Data Acquisition Boards '> Data Editor /,/ Data Generator //

,,/'

Hardwareplatforms: -IBM-Compatiblo (MS Windows) - Sun SPARe II (MOTIF) - other Platforms

User Interface: - graphical Programming - interactive and automatic modes

Fig. 11. Structure of DataEngine

Preprocessing

Algorithmic Data Analysis

Knowledge Based Data Analysis

Neural Data Analysis

c++ Precompiler for

- Algorithm ic Classifiers • Rulebased Systems • Neural Nets

Structure Output

Data Output

DataEngine is a software tool that contains methods for data analysis which are described above. Especially the combination of signal processing, statistical analysis, and intelligent systems for classifier design and classification leads to a powerful software tool which can be used in a very broad range of applications.

DataEngine is written in an object oriented concept in C++ and runs on all usual hardware platforms. Interactive and automatic operation supported by an efficient

393

and comfortable graphical user interface facilitates the application of data analysis methods. In general, applications of that kind are performed in the following three steps:

3.2.1 Modelling of a specific application with DataEngine

Each sub-task in an overall data analysis application is represented by a so called function block in DataEngine. Such function blocks represent software modules which are specified by their input interfaces, output interfaces, and their function. Examples are a certain filter method or a specific clustering algorithm. Function blocks could also be hardware modules like neural network accelerator boards. This leads to a very high performance in time-critical applications.

5

o -6

·10 0.00 0.05 0.10 0.15

Fig. 12. Screenshot of DataEngine

3.2.2 Classifier Design (OtT.Line Data Analysis)

After having modeled the application in DataEngine off-line analysis has to be performed with given data sets to design the classifier. This task is done without process integration.

394

3.2.3 Classification

Once the classifier design is finished, the classification of new objects can be executed. Depending on the specific requirements this step can be performed in an on-line or off-line mode. If data analysis is used for decision support (e.g. in diagnosis or evaluation tasks) objects are classified off-line. Data analysis could also be applied to process monitoring and other problems where on-line classification is crucial. In such cases, direct process integration is possible by configuration of function blocks for hardware interfaces.

4. Industrial Applications

Here two applications of advanced methods for data analysis are shown in order to emphasize the wide range of related problems and the high potentials for industrial use. In both cases the above described tool DataEngine was used to solve the respective problems of data analysis.

4.1 Maintenance Management in Petrochemical Plants

4.1.1 Problem Formulation

Over 97 % of the worldwide annual commercial production of ethylene is based on thermal cracking of petroleum hydrocarbons with steam [15]. This process is commonly called pyrolysis or steam cracking. Naphtha, which is obtained by distillation of crude oil, is the principal raw ethylene material. Boiling ranges, densities, and compositions of Naphtha depend on crude oil quality.

Naphtha is heated in cracking furnaces up to 820 0 C-840 0 C, where the chemical reaction starts. The residence time of the gas stream in the furnace is determined by the severity of the cracking process. The residence time for low severity is about 1 s, for high severity 0.5 s. The severity of the cracking process specifies the product distribution. By high severity cracking the amount of ethylene in the product stream is increased, the amount of propylene is decreased significantly.

After the cracking reaction the gasstream has to be cooled very quickly to avoid further chemical reactions. This process is called quenching. After that the product stream is fractionated several times and the ethylene is purified. Commercial thermal cracking plants produce about 360,000 t ethylene a year.

During the cracking process also acetylenic, diolefenic and aromatic compounds are produced, which are known to deposit coke on the inside surfaces of the furnace tubes. This coke layer inhibits heat transfer from the tube to the process gas, so that at some time the furnace must be shut down to remove the coke. To guarantee a continuous run of the whole plant, several furnaces are parallel

395

integrated into the production process. The crude on-line measured process data is not suitable for determining the degree of coking. About 20 different measurements of different indicators, such as, temperatures, pressures, or flows are taken every minute. By regarding only this data it is not possible for the operator to decide whether the furnace is coked or not. His experience and the running time of the regarded furnace is the basis for his decision.

Some work has been done to give computational support concerning the coking problem [14]. There, an expert system was used to determine times of decoking processes.

In the next chapter a method is described, which is suitable to determine the degree of coking based on on-line measured process data.

4.1.2 Solution by Data Analysis

Clustering methods compress information of data sets by finding classes, which can be used for a classification [2]. Similar objects are assigned to the same class. In our case objects are different states of a cracking furnace during a production period. Objects are described by different features. Features are the on-line measured quantities like temperatures etc.

" Current Process

Process

Features: M~ M2~ Process M1, M2 , .•. Analysis

M1 M1

~ \ '

Feature Classifier Expert Selection Design Classification

Fig. 13. Analyzing process data

The problem is to find the right features for the regarded problem. There are some mathematical methods like principal component analysis to reduce the number of features down to three or two. Now graphical methods can be used to see and recognize the dependencies.

396

Normally the loss of information is too big when using these techniques. Figure 13 sketches the principle way of analyzing process data by clustering methods.

Modern process control systems collect the data and archive them. Based on this archived data set, the classifier is designed with the help of clustering. For this task the support of experts of the plants is also required. Each feature leads to one dimension of the feature space. Clustering algorithms find accumulations of objects in that space. These accumulations are the different classes. A new object can now be classified. Therefore it is important, that the so found classes can be interpreted by the practitioner. He may recognize that one class contains good process states, the other class bad ones.

So the information, which is hidden in a big data set, can be compressed by finding classes and designing a classifier. With fuzzy classification, processes can be studied, which continuously move from one state to another. One of these processes is the above described coking of cracking furnaces. After the production period the furnace is shut down and the coke is burned out with a mixture of steam and air. After that the furnace is reintegrated into the production process until it has to be decoked again. The state of the furnace is described by several features. Figure 14 shows a brief sketch of the furnace and the measured features.

For the determination of coking of a furnace it is not necessary to find classes by clustering. Two different classes describing the coked and decoked states are already known. The center of these classes in the multidimensional feature space are also known, so that the classifier can be built from the history of the process data. After a decoking process the values of the features for this state can be acquired. A short time before a decoking process the values of the coked state are obtained analogously. This classifier can be used to classify the current furnace state and to support the operator's decision, whether the furnace is coked or not.

4.1.3 Results and Discussion

Clustering-methods mentioned in Section 2.1 were used to determine the coking of 10 cracking furnaces of a thermal cracker [14]. The data of one year have been analyzed. The process of coking lasts about 60 days. Therefore only mean values of a day of the measured quantities were considered. Each object (furnace) is described by features sketched in Fig 14. For different furnaces the centers of coked and decoked classes were found by searching for coked and decoked states in the data set.

Fig. 15 shows the temperature profile of a furnace during the whole year. Characteristic peaks, where temperature decreases significantly, result from decoking processes. Kl and K2 describe decoked and coked states of the furnace.

397

Cracking-Furnace

Fig. 14. Cracking furnace

398

temperature in Celsius

800

600

400

200

o~ __ ~ __ ~~~~~~ __ ~~~~ o 60 120 180 240 300 360 days

Fig. 15. Furnace temperature

The temperature profile shows no characteristic shape, which results form coking. Furnace temperature is only one of the features sketched in Fig. 14. There are dependencies between features, so that a determination of coking is not possible considering only the feature "temperature" . The whole feature set shown in Fig. 14 is suitable to find coked and decoked classes and to build a classifier, which can be used to classify current furnace states .

arbitrary • • units • • •

• K2 K1

arbitrary units

Fig. 16. Fuzzy classification of a continuous process

399

Figure 16 shows the membership values of a furnace state during a production period using the classifier. The values describe the membership of the current furnace state to the coked class. The membership values increase continuously and reach nearly 1 at the end of the production period.

membership

p1(i) 1.0

0.8

coked 0.6

0.4

0.2 decoked

0.0 -+---r--~---''''''----r-....... --.---r--~---''''''---r---'''--

40 50 60 70 80 90 100

days

Fig. 17. Transition of process states

By using the classifier the information, which is hidden in the data set, is compressed. The membership value sketched in Fig. 17 shows the degree of coking and hence the feature which describes coking of cracking furnaces.

The classifier works on-line and classifies the current furnace state concerning the coking problem. The operator can use this information to check how long the regarded furnace will be able to run until it has to be decoked. Now it is easier to make arrangements concerning logistical questions like ordering right amounts of raw material or not to be understaffed at certain times.

4.2 Acoustic Quality Control

In acoustic quality control many efforts have been undertaken to automate the respective control tasks which are usually performed by humans. Even if there are many computerized systems for automatic quality control by analysis of acoustic signals, some of the problems could not be solved adequately yet. Here an example

400

of acoustic control of ceramic goods is presented to show the potentials of fuzzy data analysis in this respect.

4.2.1 Problem Formulation

In cooperation with a producer of tiles a prototype has been built which shows the potentials of automatic quality control. So far an employee of this company has to check the quality of the final product by hitting it with a hammer and deciding about the quality of the tile based on the resulting sound. Since cracks in the tile cause an unusual sound an experienced worker can distinguish between good and bad tiles.

4.2.2 Solution Process

In this application algorithmic methods for classifier design and classification were used to detect cracks in tiles. In the experiments the tiles are hit automatically and the resulting sound is recorded via a microphone and an NO-converter.

/

Fig. 18. Automated quality control of tiles

AlOConverter

Data Acquisition

Board

..... DataEngine

Then signal processing methods like filtering and (FFf) transform these sound data into a spectrum which can be analyzed. For example, the time signal is transformed by an FFf into the frequency spectrum. From this frequency spectrum several characteristic features are extracted which could be used to distinguish between good and bad tiles. The feature values are the sum of amplitude values in some specified frequency intervals. In the experiments a 6-dimensional feature vector showed best results.

After this feature extraction the fuzzy c-means algorithm found fuzzy classes which could be interpreted as good and bad tiles. Since a strict distinction between these two classes is not always possible fuzzy clustering techniques have the advantage that they do not only distinguish bad from good tiles but that intermediate qualities can also be defined.

Based on this prototype an automatic system for acoustic quality control can be installed at the production lines. In the future this prototype will be enlarged to

401

support also optical quality control by methods of computer vision [9]. Especially if the overall quality of tiles has to be evaluated fuzzy technology offers methods to aggregate different evaluations as for example acoustical and optical ones. A lot of research has been done in this respect in the past [18].

AIDConverter

t-----I-+ Data

Acquisition Board

L...-F_ilt_er_-,I-' "":':===:..1

Inlensil)

(mv1!. . L;;Ude !" I ..... 1_1 R ...... ..,,c ....

Time signal Frequency spectrum

Femue Exlrllclion

1-.1 Clustering

~ t Fealure Vector

I I

Fealure space ClusterillJl1

Fig. 19. Application of DataEngine for acoustic quality control

5. Conclusion

Data analysis has large potentials for industrial applications. It can lead to the automation of tasks which are too complex or too ill-defined to be solved satisfying with conventional techniques. This can result in the reduction of cost, time, and energy which also improves environmental criteria.

By contrast to fuzzy controllers where the behaviour of the controlled system can be observed and therefore the performance of the controller can be stated immediately, many applications of methods for data analysis have in common that it will take some time to exactly quantify their influences.

References

1. H. Bandemer, W. Niither, Fuzzy Data Analysis (Kluwer, Dordrecht, 1992). 2. lC. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms (plenum

Press, New York, 1981). 3. lC. Bezdek, E.C.-K. Tsao, N.R. Pal, Fuzzy Kohonen Clustering Networks. in: IEEE

International Conference on Fuzzy Systems (San Diego, 1992) 1035-1043. 4. lC. Bezdek, S.K. Pal (Eds.), Fuzzy Models for Pattern Recognition. (IEEE Press, New

York, 1992).

5. l C. Bezdek, A.M. Bensaid, L.P. Clarke, L.O. Hall, Partially Supervised Clustering for Image Segmentation. Pattern Recognition. Vol. 29, No.5, 1996, pp. 859-871.

6. R. N. Dave, Fuzzy shell-clustering and applications to circle detection in digital images. (Int. l Gen. Syst. 16, 1990) 343-355.

402

7. A. Kandel, Fuzzy Techniques in Pattern Recognition. (John Wiley & Sons, New York, 1982).

8. B. Kosko, Neural Networks and Fuzzy Systems. (Prentice-Hall, Englewood Cliffs, N.J, 1992).

9. R. Krishnapuram, J. Lee, Fuzzy-Set-Based Hierarchical Networks for Information Fusion in Computer Vision. Neural Networks 5 (1992) 335-350.

10. R. Krishnapuram, J.M. Keller, A possibilistic approach to clustering. (IEEE Trans. Fuzzy Syst. I, 1993) 98-110.

11. P.A. Paardekooper, C. van Leeuwen, H. Koppelaar, A.G. Montfoort, Simulatie van een ethyleenfabriek bespaart tijd en moeite. PT Polytechnische tijdschrift, Simulatie (1990) 30-34 (in Dutch).

12. Y.-H. Pao, Adaptive Pattern Recognition and Neural Networks. (Addison-Wesley, Reading, Mass., 1989).

13. R. Schalkoff, Pattern Recognition Statistical, Structural and Neural Approaches. (John Wiley & Sons, New York, 1992).

14. B. Trompeta, W. Meier, Erfahrungen bei der ProzeBidentifikation von verfahrenstechnischen Gro8anlagen. 2. Anwendersymposium zu Fuzzy Technologien 23-24. March 1993, Aachen (in German).

15. Ullmanns Encyclopedia of Technical Chemistry. Vol. 8, 4th Edition, (New York 1982).

16. J. Watada, Methods for Fuzzy Classification. Japanese Journal of Fuzzy Theory and Systems 4 (1992) 149-163.

17. R. Weber, Fuzzy-ID3: A Class of Methods for Automatic Knowledge Acquisition. Proceedings of the 2nd International Conference on Fuzzy Logic & Neural Networks (lizuka, Japan, July 1992) 265-268.

18. S.M. Weiss, C.A. Kulikowski, Computer Systems that learn. (Morgan Kaufmann, San Mateo, Calif, 1991).

19. M. P. Windham, Geometrical fuzzy clustering algorithms (FSS 10, 1983) 271-279.

20. H.-J. Zimmermann, Fuzzy Sets in Pattern Recognition. in: P.A. Devijer, J. Kittler, Eds., Pattern Recognition Theory and Applications (Springer-Verlag, Berlin, 1987) 383-391.

21. H.-J. Zimmermann, Fuzzy Sets, Decision Making, and Expert Systems. (Kluwer, Boston, Mass., 1987).

22. H.-J. Zimmermann, Fuzzy Set Theory - And Its Applications. 3rd rev. ed. (Kluwer, Boston, Mass., 1996).

23. H.-J. Zimmermann, P. Zysno, Latent Connectives in Human Decision Making. Fuzzy Sets and Systems 4 (1980) 37-51.

24. H.-J. Zimmermann, Hybrid approaches for fuzzy data analysis and configuration using genetic algorithms and evolutionary methods, in: Zuruda, Marks II, and Robinson (edtrs.). Computational Intelligence - Imitating Life. (New York, 1994) 364 -370.

Probabilistic and Possibilistic Networks and How To Learn Them from Data

Christian Borgelt and Rudolf Kruse

Dept. of Information and Communication Systems, Otto-von-Guericke-University of Magdeburg, 39106 Magdeburg, Germany [email protected] de

Abstract: In this paper we explain in a tutorial manner the technique of reasoning in probabilistic and possibilistic network structures, which is based on the idea to decompose a multi-dimensional probability or possibility distribution and to draw inferences using only the parts of the decomposition. Since constructing probabilistic and possibilistic networks by hand can be tedious and time-consuming, we also discuss how to' learn probabilistic and possibilistic networks from a data, i.e. how to determine from a database of sample cases an appropriate decomposition of the underlying probability or possibility distribution.

Keywords: Decomposition, uncertain reasoning, probabilistic networks, possibilistic networks, learning from data.

1. Introduction

Since reasoning in multi-dimensional domains tends to be infeasible in the domains as a whole--and the more so, if uncertainty and/or imprecision are involved-decomposition techniques, that reduce the reasoning process to computations in lower-dimensional subspaces, have become very popular. For example, decomposition based on dependence and independence relations between variables has been studied extensively in the field of graphical modeling [17]. Some of the best-known approaches are Bayesian networks [23], Markov networks [20], and the more general valuation-based networks [27]. They all led to the development of efficient implementations, for example HUGIN [1], PULCINELLA [26], PATHFINDER [12] and POSSINFER [8].

A large part of recent research has been devoted to learning probabilistic and possibilistic networks from data [4, 13, 9], i.e. to determine from a database of sample cases an appropriate decomposition of the probability or possibility distribution on the domain under consideration. Such automatic learning is important, since constructing a network by hand can be tedious and time-consuming. If a database of sample cases is available, as it often is, learning algorithms can take over at least part of the construction task.


404

In this tutorial paper we survey the basic idea of probabilistic and possibilistic networks and the basic method for learning them from data. In Section 2 we introduce the idea of decomposing multi-dimensional distributions and demonstrate how a decomposition can be used to reason in the underlying multi-dimensional domain. We do so by inspecting decompositions of relations first and then, in Section 3, proceed to decompositions of probability distributions. Section 4 considers the graphical representation of decompositions. Section 5 discusses the general scheme for inducing decompositions from data, which is applied to probability distributions in Section 6.

With Section 7 we start to transfer the ideas of the preceding sections, where they were presented in the relational and the probabilistic setting, to the possibilistic setting. To do so, we first clarify what we understand by a degree of possibility in Section 7. In Section 8 we look at decomposition of and reasoning in possibility distributions, emphasizing the differences to the probabilistic case. Finally, Section 9 discusses how to induce possibilistic networks from data.

2. Decomposition and Reasoning

The basic idea underlying probabilistic as well as possibilistic networks is that a probability or possibility distribution D on a multi-dimensional domain can under certain conditions be decomposed into a set {D l , ... , Dn} of (overlapping) distributions on lower-dimensional subspaces. By multi-dimensional domain we mean that a state of the universe of discourse can be described by stating the values of a set of attributes. Each attribute-or, more precisely, the set of its possible values-forms a dimension of the domain. Of course, to form a dimension the possible values have to be exhaustive and mutually exclusive. Thus each state corresponds to a single point of the multi-dimensional domain. A distribution D assigns to each point of the domain a number in the interval [0,1], which represents the (prior) probability or the (prior) degree of possibility of the corresponding state. By decomposition we mean that the distribution D on the domain as a whole can be reconstructed (at least approximately) from the distributions {Db ... ,Dn} on the subspaces.

Such a decomposition has several advantages, the most important being that a decomposition can usually be stored much more efficiently and with less redundancy than the whole distribution. These advantages are the main motive for studying decompositions of relations (which can be seen as special possibility distributions) in database theory [5, 30]. Not surprisingly, database theory is closely connected to our subject. The only difference is that we focus on reasoning, while database theory focuses on storing, maintaining, and retrieving data.

But just being able to store a distribution more efficiently would not be of much use for reasoning tasks, were it not for the possibility to draw inferences in the underlying multi-dimensional domain using only the distributions

405

Table 1. The relation RABG stating prior knowledge about the possible combinations of attribute values

A al al a2 a2 a2 a2 a3 a4 a4 a4

B bl bi bi bi b3 b3 b2 b2 b3 b3

C CI C2 CI C2 C2 C3 C2 C2 C2 C3

{DI' ... ,Dn} on the subspaces without having to reconstruct the whole distribution D. How this works is perhaps best explained by a simple example, which we present in the relational setting first [6, 16, 18]. We consider only whether a combination of attribute values is possible or not, thus neglecting its probability or degree of possibility. In other words, we restrict ourselves to a distribution that assigns to each point of the underlying domain either a 1 (if the corresponding state is possible) or a 0 (if the corresponding state is impossible). With this restriction the ideas underlying decomposition and reasoning in decompositions can be demonstrated to the novice reader much clearer than in the probabilistic setting, where the probabilities can disguise the very simple structure. Later on we will study the probabilistic and finally the possibilistic case.

Consider three attributes, A, B, and G, with corresponding domains dom(A) = {al,a2,a3,a4}, dom(B) = {bl,b2,bs}, and dom(G) = {Cl,C2,C3}.

Thus the underlying domain of our example is the Cartesian product dom(A) x dom(B) x dom(G) or, as we will write as an abbreviation, the three-dimensional space {A, B, G}.

Table 1 states prior knowledge about the possible combinations of attribute values in the form of a relation RABe: only the value combinations contained in RABe are possible. (This relation is to be interpreted under the closed world assumption, i.e. all value combinations not contained in RABe are impossible.) A graphical representation of RABe is shown in the top left of Fig. 1: each cube indicates a possible value combination.

The relation RABe can be decomposed into two two-dimensional relations, namely the two projections to the subspaces {A,B} and {B,G}, both shown in the right half of Fig. 1. These projections as well as the projection to the subspace {A, G} (shown in the bottom left of Fig. 1) are the shadows thrown by the cubes in the top left of Fig. 1 on the surrounding planes, if light sources are imagined in front, to the right, and above the relation.

Mathematically, a projection of a relation can be defined in the following way. Let X = {AI' ... ' Am} be a set of attributes. A tuple t over X is a mapping that assigns to each attribute Ai a value aJ!) E dom(Ai). Assuming an implicit order of the attributes, a tuple t over X can be written

(a}~), ... , a;:)), where each vector element states the value the correspond-

b3 ..1--I---+--+----l

bz

b1

406

// //V / V

/ / V / / / /

/ / / / /Cz / / / / / Cl

b3

b2 ..1----1I---+--+----l

b1

Fig. 1. Graphical representation of the relation RABC and of all three possible projections to two-dimensional subspaces. Since in this relation the equation

RABc = lIU::(} (RABC) C><J lIf;:~(} (RABC) holds, it can be decomposed into

two relations on the subspaces {A, B} and {B, C} . This is demonstrated in Fig. 2.

Fig. 2. Cylindrical extensions of two projections of the relation RABC shown in Fig. 1. On the left is the cylindrical extension of the projection to {A, B}, on the right the cylindrical extension of the projection to {B, C}. Their intersection yields the original relation RABC.

407

al a2 a3 a4 Cl C2 C3

! A C

~ fj{A ,B} {A} B

n{B,C} {C} t

bs n{A,B} § fj{B,C} b3 {B} {B}

b2 • b2

bl b1


Fig. 3. Propagation of the evidence that attribute A has value a4 III the three-dimensional relation shown in Fig. 1 using the relations on the subspaces {A,B} and {B,G}

ing attribute is mapped to. To indicate that X is the domain of definition of t, i.e. that t is a tuple over X, we write dom(t) = X. If t is a tuple over X and Y ~ X, then tly denotes the restriction or projection of th~ tuple t to Y, i.e. the mapping tly assigns values only to the attributes in Y. Hence tly is a tuple over Y, i.e. dom(tly) = Y.

A relation R over an attribute set X is a set of tuples over X. If R is a relation over X and Y ~ X, then the projection n 9 (R) of R from X to Y is defined as

n9(R) ~f {s I dom(s) = Y 1\3t E R: s = tly}.

The two relations RAB = nt1:~}C} (RABC) and RBC = nt~:g(} (RABd) are a decomposition of the relation RABC, because it can be reconstructed by forming the natural join RAB t><I RBC. In database theory RABC would be called join-decomposable.

Forming the natural join of two relations is the same as intersecting their cylindrical extensions to the union of their attribute sets. The cylindrical extensions of RAB and RBC to {A, B, C} are shown in Fig. 2. They result from RAB and RBC by simply adding all possible values of the missing dimension. Thus the name "cylindrical extension" is very expressive: since in sketches a set is usually depicted as a circle, adding all values of a perpendicular dimension yields a cylinder. Mathematically, a cylindrical extension can be defined in the following way. Let R be a .felation over an attribute set X and Y ;2 X. Then the cylindrical extension nJ; (R) of R from X to Y is defined as

~y def nx(R) = {s I dom(s) = Y 1\ 3t E R : t = sly}.

408

It is easy to see that intersecting the cylindrical extensions of RAB and RBG

shown in Fig. 2 yields the original relation RABG. Intuitively this is possible, because fixing the value of attribute B, on which the two relations RAB and RBG overlap, renders the possible values of the remaining attributes A and C freely combinable (see Fig. 1). Hence we can say that given B, the attributes A and C are independent of each other.

To illustrate the reasoning process we assume that from an observation we know that in the current state of the universe of discourse attribute A has value a4. From this information we can draw inferences about the possible values of the other two attributes B and C. This can be done easily, if we are given the whole distribution as shown in the top left of Fig. 1. We then simply cut out the "slice" corresponding to A = a4 and project the set of possible value combinations in this "slice" to the domains of the attributes Band C. Obviously, we find that according to our prior knowledge neither B = b1 nor C = Cl are possible for the current state of the universe of discourse.

But the same result can also be derived using only the relations RAB

and RBG. This is demonstrated in Fig. 3. Starting from the evidence that attribute A has value a4, we first form the cylindrical extension of the relation {(a4)} to {A, B} (medium grey) and intersect it with RAB (light grey). This intersection (R~o;\ dark grey) is then projected to {B} yielding b2 and b3

as possible values for B. In the same way, the relation {(b2 ), (b3 )} is then extended cylindrically to {B, C} (medium grey) and intersected with RBG

(light grey). The result (R~~\ dark grey) is projected to {C} yielding C2 and C3 as possible values for C.

Of course, the reasoning process can also take two observations, for example A = a4 and C = C3, as input. To obtain the possible values for B we only have to intersect the projections of R~o;t = 1Ii1}B} ( {( a4)}) and

R~~t = 1Iig{} ({(C3)}) to {B}. The result is {(b3)}. It is easy to show that the result of such a reasoning process is always the

same as the result obtained directly from the original relation, if intersecting the cylindrical extensions of the projections yields the original relation.

3. Decomposition of Probability Distributions

The method of decomposing a relation can easily be transfered to probability distributions. Only the definitions of projection, cylindrical extension and intersection have to be modified. Projection now consists in calculating the marginal distribution on the subspace. Extension and intersection are combined and consist in multiplying the prior distribution with the quotient of posterior and prior marginal probability. Again the idea is best explained by a simple example.

Fig. 4 shows a probability distribution on the joint domain of the three attributes A, B, and C together with its marginal distributions (sums over

409

12201330117012801 al a2 a3 a4

/ 20 90 10 80 2 1 20 17 28 24 5 3

/ 18 81 9 72V 8 4 80 68 56 48 10 6

2 9 1 ,;V C2

2 1 20 1460 I 84 72 15 al a2 a3 a4 cl

b3 1240 I 40 180 20 160 b2

b1

12 168

6 144

120 102 30 18

b3 400 b2 240 b1 360

C3

1300 I

50 82 88

all numbers in parts per 1000

Cl C2 C3

20 180 200 40 160 40 180 120 60

115 35 100 133 99 146 82 36 34

Fig. 4. A three-dimensional probability distribution with its marginal distributions (sums over lines/columns). Since in this distribution the equations

P(a' b·)P(b Ck) . Vi, j, k : P(ai, bj, Ck) = " J ( ) J, hold, it can be decomposed mto the

P bj marginal distributions on the subspaces {A,B} and {B,C}.


1 0 I 0 I 0 110001 new A C

old 1240 1460 1300 1

1220 1330 1170 1280 1 old new 11221520 13581 B

l::column t ~.~ old new old

b3 40 180 20 160

572 400 20 180 200 b3 0 0 o 572 l::line

new 29 257 286 "Old b2

12 6 120 102 364 240 40 160 40 b2 0 o 364 - -0 61 242 61

b1 168 144 30 18

64 360 180 120 60 b1 0 0 0 64 32 21 11


Fig. 5. Propagation of the evidence that attribute A has value a4 in the three-dimensional probability distribution shown in Fig. 4 using the marginal probability distributions on the subspaces {A, B} and {B, C}

410

lines/columns). It is closely related to the example of the preceding section, since in this distribution those value combinations that were contained in the relation RABG (were possible) have a high probability, while those that were missing (were impossible) have a low probability.

Just as the relation RABG can be decomposed into RAB and RBG, the probability distribution in Fig. 4 can be decomposed into the two marginal distributions on the subspaces {A, B} and {B, C}. This is possible, because the three-dimensional distribution can be reconstructed using the formulae

where P(ai, bj , Ck) is short for P(A = ai, B = bj, C = Ck) etc. These formulae can be derived from the (generally true) formulae

Vi,j,k: P(ai,bj,ck) = P(ailbj,ck)P(bj,Ck)

by noting that in this probability distribution A is conditionally independent of C given B, usually written A Jl C I B. That is

Vi,j,k: P(ailbj,ck) = P(ailbj) = P~iiji),

i.e. if the value of B is known, the value of A does not depend on the value of C. Note, that conditional independence is symmetric, i.e. if A Jl C I B, then

Vi,j,k: P(cklbj,ai) = P(cklbj) = P~(ij~j)

also holds. In other words, A Jl C I B entails C Jl A lB.

To illustrate the reasoning process, let us assume again that of the current state of the universe of discourse we know that A = a4. Obviously the corresponding probability distributions of B and C can be determined from the three-dimensional distribution by restricting it to the "slice" that corresponds to A = a4 and computing the marginal distributions of that "slice." But the distributions on the two-dimensional subspaces are also sufficient to draw this inference as is demonstrated in Fig. 5. The information that A = a4 is extended to the subspace {A, B} by multiplying the joint probabilities by the quotient of posterior and prior probability of A = ai, i = 1,2,3,4. Then the marginal distribution on {B} is determined by summing over the lines, which correspond to the different values of B. In the same way the information of the new probability distribution on B is propagated to C: the joint distribution on {B, C} is multiplied with the quotient of prior and posterior probability of B = bj, j = 1,2,3, and then the marginal distribution on C is computed by summing over the columns, which correspond to the different values of C. This scheme can be derived directly from the decomposition

411

formulae. It is easy to check that the results obtained are the same as those that follow from the computations on the three-dimensional domain.

Of course, this scheme is a simplification that does not lend itself to direct implementation, as can be seen when assuming that of the current state A = a4 and C = C3 are known. In this case additional computations are necessary to join the information from A and C arriving at B. We omit these computations for reasons of simplicity, since our aim is only to illustrate the basic idea. 1

4. Graphical Representation

The reasoning scheme suggests the idea to use a graphical structure, i.e. an acyclic hypergraph, to represent the decomposition: the attributes are represented as nodes, the distributions Di of the decomposition as hyperedges connecting the attributes of their underlying domains. For the examples of the two preceding sections the hyper graph is simply

A-B-C

and hence a normal graph. Of course, in real world applications the resulting hypergraphs can be much more complex, especially the edges can connect more than two nodes, thus forming real hyperedges.

This representation uses undirected edges, since in our example the decomposition consists of joint marginal distributions. If undirected graphs are used in the probabilistic setting, the network is usually called a Markov network [23]. But it is also possible to use conditional distributions and directed edges (often accompanied by a refined hyper graph structure), thus arriving at so called Bayesian networks [23]. This can be justified by the possibility to write the decomposition formulae in terms of conditional probabilities. E.g. for the above example we can write

These formulae can also be derived from the so called chain rule of probability

with the help of the conditional independence A Jl C lB. 1 In short: the marginal distributions for B obtained from the two two-dimensional

subspaces have to be multiplied with each other, divided by the prior distribution of B and normalized to 1. This can easily be derived from the decomposition formulae.

412

When using conditional probabilities -it seems natural to direct the edges according to their inherent direction, i.e. from the conditioning attributes to the conditioned. In the above example the decomposition would be represented by the directed hypergraph

A-tB-tC

Usually only conditional dependences of one attribute (called the child) given a set of other attributes (called the parents) are used, although in principle it is possible to have joint conditional probabilities of two or more attributes given a set of other attributes.

The name Bayesian network stems from the fact that, if we want to propagate evidence against the direction of a hyperedge, we now have to use Bayes' formula

P( I ) = P(xly)P(y) y x P(x)

to reverse the conditional probability associated with the hyperedge. (Of course, reasoning in the direction of a hyperedge is simpler now, since we no longer need to form the quotient of the posterior and prior probability, but can multiply directly with the posterior probability.)

With respect to the class of probability distributions they can represent, the expressive power of Markov networks and Bayesian networks is equivalent, since it is always possible to go from joint probabilities to conditional probabilities and vice versa (provided the necessary marginal probabilities are available). Nevertheless, in some applications Bayesian networks may be preferable, because the additional degree of freedom consisting in the direction of the hyperedges can be used e.g. to express assumed causal or functional dependences. Indeed, when constructing a Bayesian network by hand, one often starts from a supposed causal model and encodes the causal dependences as probabilistic conditions. This is the reason why Bayesian networks are sometimes also called probabilistic causal networks.

However, in this paper we use mostly undirected graphs and joint distributions and we do so for two reasons. In the first place, the direction of an edge can not be justified from the probabilistic model alone. E.g. in the above example it is possible to write the decomposition formulae in several different ways, i.e.

\.I • • k. P( . b. ) _ P(ai,bj)P(bj,Ck) vZ,J, . at, 3,Ck - P(bj )

= P(cklbj)P(bjlai)P(ai)

=P(ail~)P(~lck)P(Ck)

=P(ail~)P(Ckl~)P(~)

= P(b.1 .)P(b.1 )P(ai)P(Ck) 3 at 3 Ck P(bj )

A-tB-tC At-Bt-C At-B-tC

A-tBt-C

413

and represent the decomposition by the corresponding graphs. The direction of an edge always comes from an external source, e.g. from assumptions about causal or functional dependences, and even then, this source may justify directing only some of the edges. Secondly, restricting to joint distributions facilitates the transfer to the possibilistic setting, since it frees us from the need to define what a conditional possibility distribution is.

Note, that the interpretation given here to the structure A -t B t- C differs from the interpretation that is usually adopted, which is the generally true Vi,j,k : P(ai,bj,ck) = P(bjlai,ck)P(ai,ck). That is, the conditional independence A II C I B need not hold in this structure. In our interpretation a situation in which A and C are dependent given B would be represented by one directed hyperedge connecting A and C to B, while two separate directed edges indicate conditional independence.

The problem is that the usual interpretation of edge directions owes a lot to causal modeling. As we mentioned above, a Bayesian network is often constructed from a causal model of the universe of discourse. The two separate edges then indicate only that both causes have an influence on the effect: even if one of the causes (e.g. A) is fixed, a change in the other cause (here C) can still change the common effect (here B). That is, the causes independently influence their common effect.

Which interpretation to choose may be a matter of taste, but one should be aware of the fact that the interpretation based on independence of causal influence often contains an implicit assumption. This assumption-which is contained in the stability assumption [22]-states that given the value of their common effect causes must be dependent. We will not discuss here whether this assumption is reasonable, but only mention that it is easy to come up with a (mathematical) counterexample2 , and that this assumption is a basic presupposition of the theory of inferred causation [22].

5. Learning Networks from Data

To understand the problem of learning networks from data, i.e. of finding an appropriate decomposition of a multi-dimensional distribution, consider again the relational example of Section 2. We demonstrated that the relation RABG

shown in Fig. 1 can be decomposed into the relations RAB and RBG. It goes without saying that we could not have chosen any pair of two-dimensional subspaces as a decomposition. Intersecting the projections RAB and RAG

leads to two, intersecting RAG and RBG to six additional tuples (compared to RABG). Hence only the pair RAB and RBG forms an exact decomposition.

It is also obvious that there need not be an exact decomposition. Imagine, for example, that the tuple (a4, bs, C2) is not possible. Removing the corre-

2 Let A, B, and C be variables with dom(A) = dom(B) = dom(C) = {O, 1, 2, 3}. If C = (A mod 2) + 2(B mod 2), then A and B independently influence C and are independent given C.

414

sponding cube from Fig. 1 does not change any of the projections, therefore this cube is present in all possible intersections of cylindrical extensions of projections to two-dimensional subspaces. Hence the new relation can not be reconstructed from any projections and thus there is no exact decomposition. In such a situation one either has to work with the relation as a whole or be contented with an approximation that contains some additional tuples. Since the former is often impossible in real world applications because of the high number of dimensions of the underlying domain, a certain loss of information is accepted to make reasoning feasible. Often an approximation has to be accepted even if there is an exact decomposition, because it contains one or more very large hypered~es (connecting a lot of attributes), that can not be dealt with.

Thus the problem of decomposing a relation can be stated in the following way. Given a relation and a maximal size for hyperedges, find an exact decomposition, or, if there is none, the best approximate decomposition of the relation. Unfortunately no direct way to construct such a decomposition has been found yet. Therefore one has to search the space of all possible candidates.

It follows that an algorithm for inducing a decomposition consists always of two parts: an evaluation measure and a search method. The evaluation measure estimates the quality of a given candidate decomposition (a given hypergraph) and the search method determines which candidates (which hypergraphs) are inspected. Often the search is guided by the value of the evaluation measure, since it is usually the goal to maximize (or to minimize) its value.

A desirable property of an evaluation measure is a certain locality or decomposability, i.e. the possibility to evaluate subgraphs, at best single hyperedges, separately. This is desirable, not only because it facilitates computation, but also because some search methods can make use of such locality. In this paper we only consider local evaluation measures, but global evaluation measures are also available. For example, a simple global evaluation measure for the relational decomposition problem would be the number of additional tuples in the intersection of the cyliridrical extensions of the projections [6J.

We illustrate the general learning scheme by applying one of the oldest algorithms for decomposing a multi-dimensional probability distribution, that was suggested by Chow and Liu in 1968 [3], to our three-dimensional example. This algorithm can learn only decompositions representable by normal graphs (in which edges connect exactly two nodes) and not hypergraphs (in which edges can connect more than two nodes). But since our example domain is so small, this restriction does not matter.

The idea of Chow and Liu's algorithm is to compute the value of an evaluation measure on all possible edges (two-dimensional subspaces) and use the Kruskal algorithm to determine a maximum or minimum weight spanning tree. For our relational example we could use as an evaluation measure the number of possible value combinations in a subspace relative to the size of this

415

Table 2. The number of possible combinations relative to the size of the subspace and the gain in Hartley information for three subspaces

subspace relative number of gain in possible value combinations Hartley information

{A,B} 3~4 = ~ = 50% log2 3 + log2 4 - log2 6 = 1

{A,G} 3~4 = ~ ~ 67% log2 3 + log2 4 - log2 8 ~ 0.58

{B,G} 5 _ 5 ...., 56o/c 3.3 - 9" ...., 0 log2 3 + log2 3 - log2 5 ~ 0.85

subspace (see table 2). Since the overall quality of a decomposition depends on the number of additional tuples in the intersection of the cylindrical extensions of its projections, it is plausible to keep the number of possible value combinations in the cylindrical extensions as small as possible. Obviously, this number depends directly on the number of possible value combinations in the projections. Therefore it seems to be a good heuristic method to select projections in which the ratio of the number of possible value combinations to the size of the subspace is small.

This measure is closely connected to the gain in Hartley information [11] (see table 2), which we will need again in the possibilistic setting. Intuitively, Hartley information measures the average number of questions necessary to determine an element within a given set. It does not take into account the probabilities of the elements and therefore is defined as the binary logarithm of the number of elements in the set. Now consider the task of determining a tuple within a two-dimensional relation. Obviously there are two ways to do this: we can determine the values in the two dimensions (the coordinates) separately, or we can determine the tuple directly. For example, to determine a tuple in the relation RAB (shown in the top right of Fig. 1), we can first determine the value of A (log24 bits) and then the value of B (log23 bits), or we can determine directly the tuple (log2 6 bits, since there are only six possible tuples). When doing the latter instead of the first, we gain

log2 4 + log2 3 - log26 = log2 364 = log2 2 = 1 bit.

The above calculation shows that the gain in Hartley information is the binary logarithm of the reciprocal value of the relative number of possible combinations.

If we interpret the values of table 2 as edge weights, we can apply the Kruskal algorithm-to determine a minimum spanning tree for the relative number of possible combinations or a maximum spanning tree for the gain in Hartley information-and thus obtain the graph

A-B-C Hence for our example this algorithm finds the exact decomposition.

416

6. Learning Probabilistic Networks

To apply the ideas of the preceding section to probability distributions we only have to change the evaluation measure. Chow and Liu [3] originally used mutual information or cross entropy [19] as edge weight. For two variables A and B with domains dom(A) = {a1, ... ,arA } and dom(B) = {bl, ... ,brB} it is defined as

rA rB P(ai,bj) Imut(A, B) = tt ~ P(ai' bj ) log2 P(ai)P(bj )

and can be interpreted in several different ways. One of the simplest interpretations is to see mutual information as a measure of the difference between the joint probability distribution P(A, B) and the distribution F(A, B) that can be computed from the marginal distributions P(A) and P(B) under the assumption that A and B are independent, i.e. 'Vi,j : F(ai' bj ) = P(ai)P(bj).3 Obviously, the higher the mutual information of two variables, i.e. the more their joint distribution deviates from an independent distribution, the more likely it is that we need their joint distribution to appropriately describe the distribution on the whole domain.

A different interpretation of this measure is connected with the name information gain under which it was used in decision tree induction [25]. It is then written differently,

rB Igain (A, B) = - L P(bj ) log2 P(bj )

j=l rA rB

+ L P(ai) L P(bjlai) log2 P(bjlai) i=l j=l

=HB-HBIA = HA +HB - HAB,

where H is the Shannon entropy, and thus denotes the expected reduction in entropy or, equivalently, the expected gain in information about the value of B, if the value of A becomes known. Since mutual information is symmetric, this is also the expected gain in information about the value of A, if the value of B becomes known. Because of its apparent similarity to the gain in Hartley information, it is plausible that the higher the information gain, the more important it is to have the corresponding edge in the network.

Although we do not need this for our example, it should be noted that information gain can easily be extended to more than two attributes:

m

Igain(A1 , ..• ,Am) = LHAi - HAl ... Am. i=l

3 It can be shown that Imut is always greater or equal to zero, and equal to zero, if and only if P{ai, bj) = P{ai)P{bj).

417

40 180 20 160 88 132 68 112 12 6 120 102 Irnut (A, B) = 0.43 53 79 41 67 168 144 30 18 79 119 61 101

50 115 35 100 66 99 51 84 82 133 99 146 Irnut{A, C) = 0.05 101 152 78 129 88 82 36 34 53 79 41 67

20 180 200 96 184 120 40 160 40 Irnut{B, C) = 0.21 58 110 72

180 120 60 86 166 108

Fig. 6. The mutual information of the three attribute pairs of the probability distribution shown in Fig. 4. On the left is the marginal distribution as calculated from the whole distribution, on the right the independent distribution, i.e. the distribution calculated by multiplying the marginal distributions on the single attribute domains. Mutual information measures the difference of the two.

Note also, that mutual information or information gain is defined in terms of probabilities. This is no problem, if we are given the probability distribution on the domain of interest; but in practice this distribution is not directly accessible. We are given only a database of sample cases, of which we assume that it was derived from the underlying distribution by a random experiment. In practice we therefore estimate the true probabilities by the empirical probabilities (relative frequencies) found in the database. That is, if n is the total number of tuples in the database and ni the number of tuples in which attribute A has value ai, then it is assumed that P(ai) = ~, and the evaluation measure is calculated using this value.

For the example presented in Section 3 we are given the joint probability distribution in Fig. 4. In Fig. 6 mutual information is used to compute the difference of the three possible two-dimensional marginal distributions of this example to the independent distributions calculated from the marginal distributions on the single attribute domains. If we interpret these differences as edge weights, we can again apply the Kruskal algorithm to determine the maximum weight spanning tree. This leads to

A-B-C

i.e. the graph already used above to represent the possible decomposition of the three-dimensional probability distribution.

418

A more sophisticated, Bayesian method, the K2 algorithm, was suggested by Cooper and Herskovits in 1992 [4]. It is an algorithm for learning directed graphs and does so by selecting the parents of an attribute.

As an evaluation measure Cooper and Herskovits use the g-function, which is defined as

rpa,(A) (rA-l)! rA

g(A,par(A»=c' II (n. r _1)!IInij !, j=1 J + A i=1

where A is an attribute and par(A) the set of its parents (this is a measure for directed hyperedges). rpar(A) is the number of distinct instantiations (value vectors) of the parent attributes that occur in the database to learn from and r A the number of values of attribute A. nij is the number of cases (tuples) in the database in which attribute A has the ith value and the parent attributes are instantiated with the jth value vector, nj the number of cases in which the parent attributes are instantiated with the jth value vector, that is nj = I:~~1 nij· c is a constant prior probability. If it is assumed that all sets of parents have the same prior probability, it can be neglected, since then only the relation between the values of the evaluation measure for different sets of parent attributes matters.

The g-function estimates (for a certain value of c) the probability of finding the joint distribution of the variable and its parents that is present in the database. That is, assuming that all network structures are equally likely, and that, given a certain structure, all conditional probability distributions compatible with this structure are equally likely, K2 uses Bayesian reasoning to compute the probability of the network structure given the database from the probability of the database given the network structure.

The search method of the K2 algorithm is the following: To narrow the search space and to avoid loops in the resulting hypergraph a topological order of the attributes is defined. A topological order is a concept from graph theory. It describes an order of the nodes of a directed graph, such that: if there is a (hyper )edge from an attribute A (and maybe some others) to attribute B, then A precedes B in the order. Fixing a topological order restricts the permissible graph structures, since the parents of an attribute can only be selected from the attributes preceding it in the order. A topological order can either be stated by a domain expert or derived automatically [29].

The parent attributes are selected using a greedy search. At first the evaluation measure is calculated for the child attribute alone, or-more preciselyfor the hyperedge consisting only of the child attribute. Then in turn each of the parent candidates (the attributes preceding the child in the topological order) is temporarily added to the hyperedge and the evaluation measure is computed. The parent candidate yielding the highest value of the evaluation measure is selected as a first parent and permanently added to the hyperedge. In the third step all remaining candidates are added temporarily as a second parent and again the evaluation measure is computed for each of the

419

resulting hyperedges. As before, the parent candidate yielding the highest value is permanently added to the hyperedge. The process stops, if either no more parent candidates are available, a given maximal number of parents is reached or none of the parent candidates, if added to the hyperedge, yields a value of the evaluation measure exceeding the best value of the preceding step. The resulting hypergraph contains for each attribute a (directed) hyperedge connecting it to its parents (provided parents where added).

Of course, the two algorithms examined are only examples. There are several other search methods (in principle any general heuristic search method is applicable, like hill climbing, simulated annealing, genetic algorithms etc.) and even more evaluation measures (X2 , information gain ratio, measures based on the minimum description length principle etc.-see [2] for a survey), which we can not consider here.

7. Degrees of Possibility

We now start to transfer the ideas, that up to now were presented in the relational and the probabilistic setting, to the possibilistic setting. Our discussion rests on a specific interpretation of a degree of possibility that is based on the context model [7, 18]. In this model possibility distributions are interpreted as information-compressed representations of (not necessarily nested) random sets, a degree of possibility as the one-point coverage of a random set [21].

More intuitively, a degree of possibility is the least upper bound on the probability of the possibility of a value. We explain this interpretation in three steps. In the first place, the possibility of a value is just what we understand by this term in daily life: whether a value is possible or not. At this point we do not assume intermediate degrees, i.e. if a value is possible, we can not say more than that. We can not give a probability for that value. All we know is that if a value is not possible, its probability must be zero.

Secondly, imagine that we can distinguish between certain disjoint contexts or scenarios, to each of which we can assign a probability and for each of which we can state whether in it the value under consideration is possible or not. Then we can assign to the value as a degree of possibility the sum of the probabilities of the contexts in which it is possible. Thus we arrive at a degree of possibility as the probability of the possibility of a value.

Thirdly, we drop the requirement that the contexts or scenarios must be disjoint. They can overlap, but we assume that we do not know how. This seems to be a sensible assumption, since we should be able to split contexts, if we knew how they overlap. If we now assign to a value as the degree of possibility the sum of the probabilities of the contexts in which it is possible, this value may exceed the actual probability, because of the possible overlap. But since we do not know'which contexts overlap and how they overlap, this is the least upper bound consistent with the available information.

420

Note, that in this interpretation probability distributions are just special possibility distributions. If we have disjoint contexts and if in all contexts in which a value is possible it has the probability 1, the degree of possibility is identical to the probability. Note also, that in this interpretation the degree of possibility can not be less than the probability.

8. Decomposition of Possibility Distributions

The method of decomposing a relation can be transfered to possibility distributions as easily as it could be transfered to probability distributions in Section 3. Again only the definitions of projection, cylindrical extension and intersection have to be modified. Projection now consists in computing the maximal degrees of possibility over the dimensions removed by it. Extension and intersection are combined and consist in calculating the minimum of the prior joint and the posterior marginal possibility degrees.

Determining the maximum or minimum of a number of possibility degrees are the usual methods for reasoning in possibility distributions, but it should be noted that projecting multi-dimensional possibility distributions by determining the maximum over the dimensions removed changes the interpretation of the resulting marginal distribution. Unlike marginal probabilities, which refer only to value vectors over the attributes of the subspace, maximumprojected possibilities still refer to value vectors over all attributes of the universe of discourse. The values of attributes removed by the projection are implicitly fixed but left unknown.

For example, a marginal probability distribution may state: "The probability that attribute A has value a is p." This probability is aggregated over all values of all other attributes and thus refers to single element vectors. A maximum projection states instead: "The degree of possibility of a value vector with the highest degree of possibility of those value vectors in which attribute A has value a is p." That is, it always refers to a specific value vector over all attributes of the universe of discourse (a specific point), although only the value of the attribute A is known for this vector. The reason is that computing the maximum focuses on a specific vector and on the contexts in which it is possible. But these contexts need not be all in which a value vector with value a for attribute A is possible. Hence the maximum projection of a possibility distribution will in general be less than the actual marginal possibility distribution.

Why all these complications? Basically they are due to the course of history. There are good reasons for using maximum and minimum when working with one-dimensional possibility distributions, especially if the underlying random sets are nested. So it seemed plausible to extend this scheme to multi-dimensional possibility distributions. In addition possibility distributions need not be normalized like probability distributions are. The sum over the degrees of possibility of all elements of the universe of discourse can

421

Fig. 7. A three-dimensional possibility distribution with maximum projections (maximums over lines/columns). Since in this distribution the equations Vi,j, k : 1l'(ai, bi, Ck) = mini (maxk 1l'(ai' bi, Ck), maxi 1l'(ai' bi' Ck)) hold, it can be decomposed into the two projections to the subspaces {A, B} and {B, C}.

1801901701701 al a2 a3 a4

/ 40 70 10 70 20 10 20 20 30 30 20 10

/ 40 80 10 mV 30 10 70 60 60 60 20 10

20 20 10 wV C2

30 10 40 40 ~ 80 90 20 10 al a2 a3 a4 Cl

b3 [ill 40 80 10 70 b2

b1

30 80

10 90

70 60 20 10

40 60 80

all numbers in parts per 1000

20 80 70 40 70 20 90 60 30

70 20 70 80 70 70 90 40 40

Fig. 8. Propagation of the evidence that attribute A has value a4 in the three-dimensional possibility distribution shown in Fig. 7 using the projections to the subspaces {A, B} and {B, C}


I 0 I 0 I 0 I 70 I new C

old I 90 I 80 I 70 I A I 80 I 90 I 70 I 70 I old new I 40 I 70 I 70 I

~ min B

max t new new old column

b3 40 80 10 70

70 80 20 80 70 b3 0 0 0 70 max min 20 70 70

30 10 70 60 line new 40 70 20 b2 0 0 0 60 - 60 70 - b2 40 60 20

b1 80 90 20 10

10 90 90 60 30 b1 0 0 0 10 10 10 10 al a2 a3 a3 Cl C2 C3

422

exceed one. Therefore, at first sight, sum projection seems to be inapplicable. But a closer look reveals that it can be used as well (though with some precaution). Nevertheless we will not discuss this possibility here. Rather we proceed by illustrating the decomposition of possibility distributions with a simple example.

Fig. 7 shows a three-dimensional possibility distribution on the joint domain of the attributes A, B, and C and its maximum projections. Since the equations Vi,j,k : 7r(ai,bj ,ck) = minj(maxk7r(ai,bj,ck),maxi7r(ai,bj,ck)) hold in this distribution, it can be decomposed into projections to the subspaces {A, B} and {B, C}. Therefore it is possible to propagate the observation that attribute A has value a4 using the scheme shown in Fig. 8. Again the results obtained are the same as those that can be computed directly from the three-dimensional distribution.

9. Learning Possibilistic Networks

Learning possibilistic networks follows the same scheme as learning probabilistic ones: we need an evaluation measure and a search method. Since the search method is fairly independent of the underlying uncertainty calculus, we can use the same methods as for learning probabilistic networks. Hence we only have to look for appropriate evaluation measures.

One evaluation measure can be derived from the U -uncertainty measure of nonspecificity of a possibility distribution [15], which is defined as

and can be justified as a generalization of Hartley information [11] to the possibilistic setting [14]. nsp(7r) reflects the expected amount of information (measured in bits) that has to be added in order to identify the actual value within the set [7rl a of alternatives, assuming a uniform distribution on the set [0, sup( 7r) 1 of possibilistic confidence levels Q; [10].

The role nonspecificity plays in possibility theory is similar to that of Shannon entropy in probability theory. Thus the idea suggests itself to construct an evaluation measure from nonspecificity in the same way as mutual information or information gain is constructed from Shannon entropy. By analogy we define specificity gain for two variables A and B as

Sgain = nsp ( ffijX( 7r AB)) + nsp ( m,.F( 7r AB)) - nsp( 7r AB),

or for more than two variables

m

Sgain = L nsp (m;X(7rA1 ... A,J) - nsp(7rA1 ... Am ),

k=l

423

log2 1 + log2 1 - log2 1 = 0

log22 + log2 2 - log2 3 ~ 0.42

log23 + log2 2 - log2 5 ~ 0.26

log24 + log2 3 - log2 8 ~ 0.58

log24 + log2 3 - log2 12 = 0

Fig. 9. Illustration of the idea of specificity gain. A two-dimensional possibility distribution is seen as a set ofrelational cases, one for each a-level. In each relational case, determining the allowed coordinates is compared to determining directly the allowed value pairs. Specificity gain aggregates the gain in Hartley information that can be achieved on each a-level by computing the integral over all a-levels.

40 80 10 70 80 80 70 70 30 10 70 60 Sgain(A, B) = 0.055 70 70 70 70 80 90 20 10 80 90 70 70

40 70 20 70 70 70 70 70 60 80 70 70 Sgain(A, C) = 0.026 80 80 70 70 80 90 40 40 80 90 70 70

20 80 70 70 70 70 40 70 20 Sgain(B,C) = 0.048 80 70 80 90 60 30 90 70 80

Fig. 10. The specificity gain of the three attribute pairs of the possibility distribution shown in Fig. 7. On the left is the maximum projection as calculated from the whole distribution, on the right the independent distribution, i.e. the distribution calculated as the minimum of the maximum projections to the single variable domains. Specificity gain measures the difference of the two.

424

where X = {Ai 11 ~ i ~ m,i -::j:. k}. This measure is equivalent to the one suggested in [10].

The idea of specificity gain is illustrated in Fig. 9. A joint possibility distribution is seen as a set of relational cases, one for each a-level. Specificity gain aggregates the gain in Hartley information for these relational cases by computing the integral over all a-levels.

In analogy to information gain it is also possible to interpret specificity gain as a difference measure. For example, for two attributes A and B the possibility distribution 7rAB is compared to ii"AB, which is defined as Vi,j : ii"AB(ai,bj ) = min(maxj7rAB(ai,bj),maxi7rAB(ai,bj)). Since it is easy to show that nsp(ii"AB) = nsp(maxB(7rAB)) + nsp(maxA(7rAB)), it follows that Sgain = nsp(ii"AB) - nsp(7rAB).

In the same way as information gain is used in Fig. 6, specificity gain is used in Fig. fig.spcgain to compute the difference of the three possible two-dimensional maximum projections of the example shown in Fig. 7 to the distribution calculated from the maximum projections to the single variable domains. If we interpret these differences as eage weights we can apply the Kruskal algorithm to determine the maximum weight spanning tree. This leads to

A-B-C

Hence the exact decomposition is found.

Just as for learning decompositions of probability distributions this measure is only an example. There are also some other measures available (specificity gain ratio, a variant of the x2-measure etc., see [2]), which we can not discuss in detail here.

10. Summary

As we hope to have shown in this paper, the ideas underlying decomposition of probability as well as possibility distributions are very simple. Since decomposition reduces the amount of storage needed, but does not restrict reasoning (at least if the distribution can be reconstructed exactly from the decomposition), it is a valuable technique for expert system development. The methods available for inducing decompositions are also fairly simple, as was demonstrated by the examples of Sections 6 and 9. All of them consist of an evaluation measure and a search method, for both of which there are several alternatives. Although most decomposition methods are heuristic in nature, they lead to good results in practice. Nevertheless there is a large potential for future refinements.

425

References

1. S.K. Andersen, K.G. Olesen, F.V. Jensen, and F. Jensen. HUGIN - A shell for building Bayesian belief universes for expert systems. Proc. 11th Int. J. Con/. on Artificial Intelligence, 1080-1085, 1989

2. C. Borgelt and R Kruse. Evaluation Measures for Learning Probabilistic and Possibilistic Networks. Proc. 6rh IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE'97), Vol. 2, 669-676, Barcelona, Spain, 1997.

3. C.K. Chow and C.N. Liu. Approximating Discrete Probability Distributions with Dependence Trees. IEEE Trans. on Information Theory 14(3):462-467, IEEE 1968

4. G.F. Cooper and E. Herskovits. A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning 9:309-347, Kluwer 1992

5. C.J. Date. An Introduction to Database Systems, Vol. 1. Addison Wesley, Reading, MA, 1986

6. R Dechter. Decomposing a Relation into a Tree of Binary Relations. Journal of Computer and System Sciences 41:2-24, 1990

7. J. Gebhardt and R. Kruse. A Possibilistic Interpretation of Fuzzy Sets in the Context Model. Proc. IEEE Int. Con/. on Fuzzy Systems, 1089-1096, San Diego, CA, 1992.

8. J. Gebhardt and R Kruse. POSSINFER - A Software Tool for Possibilistic Inference. In: D. Dubois, H. Prade, and R Yager, eds. Fuzzy Set Methods in Information Engineering: A Guided Tour of Applications, Wiley 1995

9. J. Gebhardt and R Kruse. Learning Possibilistic Networks from Data. Proc. 5th Int. Workshop on Artificial Intelligence and Statistics, 233-244, Fort Lauderdale, FL, 1995

10. J. Gebhardt and R. Kruse. Tightest Hypertree Decompositions of Multivariate Possibility Distributions. Proc. Int. Con/. on Information Processing and Management of Uncertainty in Knowledge-based Systems, 1996

11. RV.L. Hartley. Transmission ofInformation. The Bell Systems Technical Journal 7:535-563, 1928

12. D. Heckerman. Probabilistic Similarity Networks. MIT Press, Cambridge, MA,1991

13. D. Heckerman, D. Geiger, and D.M. Chickering. Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Machine Learning 20:197-243, Kluwer 1995

14. M. Higashi and G.J. Klir. Measures of Uncertainty and Information based on Possibility Distributions. Int. Journal of General Systems 9:43-58, 1982

15. G.J. Klir and M. Mariano. On the Uniqueness of a Possibility Measure of Uncertainty and Information. Fuzzy Sets and Systems 24:141-160, 1987

16. R. Kruse and E. Schwecke. Fuzzy Reasoning in a Multidimensional Space of Hypotheses. Int. Journal of Approximate Reasoning 4:47-68, 1990

426

17. R. Kruse, E. Schwecke, and J. Heinsohn. Uncertainty and Vagueness in Knowledge-based Systems: Numerical Methods. Series: Artificial Intelligence, Springer, Berlin 1991

18. R. Kruse, J. Gebhardt, and F. Klawonn. Foundations of Fuzzy Systems, John Wiley & Sons, Chichester, England 1994

19. S. Kullback and R.A. Leibler. On Information and Sufficiency. Ann. Math. Statistics 22:79-86, 1951

20. S.L. Lauritzen and D.J. Spiegelhalter. Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society, Series B, 2(50):157-224, 1988

21. H.T. Nguyen. Using Random Sets. Information Science 34:265-274,1984 22. J. Pearl and T.S. Verma. A Theory of Inferred Causation. In: J.A. Allen,

R. Fikes, and E. Sandewall, eds. Proc. 2nd Int. Conf. on Principles of Knowledge Representation and Reasoning, Morgan Kaufman, San Mateo, CA,1991

23. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (2nd edition). Morgan Kaufman, San Mateo, CA, 1992

24. J.R. Quinlan. Induction of Decision Trees. Machine Learning 1:81-106, 1986

25. J.R. Quinlan. C4.5: Programs for Machine Learning, Morgan Kaufman, San Mateo, CA, 1993

26. A. Saffiotti and E. Umkehrer. PULCINELLA: A General Tool for Propagating Uncertainty in Valuation Networks. Proc. 7th Con/. on Uncertainty in AI, 323-331, San Mateo, CA, 1991

27. G. Shafer and P.P. Shenoy. Local Computations in Hypertrees. Working Paper 201, School of Business, University of Kansas, Lawrence, KS, 1988

28. C.E. Shannon. The Mathematical Theory of Communication. The Bell Systems Technical Journal 27:379-423, 1948

29. M. Singh and M. Valtorta. An Algorithm for the Construction of Bayesian Network Structures from Data. Proc. 9th Con/. on Uncertainty in AI, 259-265, Morgan Kaufman, San Mateo, CA, 1993

30. J.D. Ullman. Principles of Database and Knowledge-Base Systems, Vol. 1 and 2. Computer Sciences Press, Rockville, MD, 1988/1989

Image Pattern Recognition Based on Fuzzy Technology

Kaoru Hirota 1, Yoshinori Arai 2 and Yukiko Nakagawa 1

1 Dept. of Computational Intelligence & Systems Science, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama 226, Japan

2Research & Education Center of Software Engineering, Tokyo Institute of Polytechnics, 1583 Iiyama, Atsugi 243-02, Japan

Abstract. Various image pattern recognition techniques based on fuzzy technology developed by the authors' group have been surveyed. First fuzzy clustering is applied to the remote sensing images. It is a modified version of the well known FCM. Then a shape recognition algorithm is presented for a robotics assembling line. It is a fuzzy discriminant tree method for real-time use. Finally a fuzzy dynamic image understanding system is presented. It can understand the dynamic images on general roads in Japan, where a fuzzy frame based knowledge representation and a special kind of fuzzy inference engine are introduced.

1. FCM-AD

A lot of fuzzy clustering algorithms have been proposed, among which FCM(fuzzy c-means), by Bezdek [1], is widely preferred for its stability. But FCM is not effective for the purpose of separating isolated clusters (outliers) which consist of a few data vectors. In the case of application of fuzzy clustering to remote sensing data(imagery data), good results are not obtained from FCM, because it is especially important to separate outliers in that application.

We proposed a modified version of FCM that can separate several outliers. The notion of standard pattern (cluster) vectors Sj, called additional data (AD), is

introduced. Each standard pattern vector is a representative vector of a corresponding outliers. Effectiveness of this algorithm is confirmed through a computer simulation experiment on remote-sensing data.

Let X= {Xl 'X2, ..• xn } denote a finite unlabeled data set, where X j is a s-

dimensional vector representing the l observation (x j E RS ,j = 1,2, ... , n). Each

algorithm of fuzzy clustering generates a partition of X into several fuzzy clusters,

which may be described in terms of a partition matrix U= [Uji ], i = 1,2, ... , c ,

j=1,2, ... n, with entries lying in the interval [0,1]. The (i,j)th element uij of U


428

indicates the degree of belongingness of the l' vector x j to the illl cluster Ci .

Moreover, it is assumed that U satisfies two natural conditions:

1. Every cluster is nonempty:

n LUij >0, j=l

(1)

2. the sum of the elements of each row is equals to 1, viz., the grade of each vector in the whole cluster is equal to 1:

1 ~ 'Vj~ c (2)

FCM [1], which is a well-known class of fuzzy clustering methods, has an objective function

(1~p<oo) (3)

with

(4)

A final partition matrix U is given by an iteration method of finding a local minimum of J p

(5)

The iteration algorithm is called FCM, and the program is called F-ISODATA [1]. In the case of application of fuzzy clustering to remote sensing data (imagery data), good results are not always obtained from FCM. This is mainly because outliers (a few isolated but important data vectors) are absorbed in big clusters. In order to avoid this disadvantage, Equation (3) is augmented with standard pattern vectors si'

429

i~p<oo, (6)

gj E [0,1], (7)

gj E (0,1] when Sj is given (8)

g j = 0 otherwise (9)

Each Sj, which is a representative vector of the ilb cluster, should be given

beforehand. If all Sj IS are given, then this method can be considered as a kind of

pattern-matching method. Each parameter gi represents a ratio of a term in FCM to one in the pattern-matching method. In the case of gi = 1 for all i, this clustering method is the same as FCM, whereas when gi = 1 for all i, it is a clustering method based on distances from fixed S j to the x j 's.

An iterative algorithm which calculates a local minimum of Equation (6) is given below:

Algorithm FCM-AD (fuzzy c-means with additional data)

1. Fix e(2 ~ e ~ n) and p(l::;; p < 00); choose standard pattern vectors Sj and

parameters gi. Initialize U(O). Put the number of repetitions L=O. 2. For i=l, .. . ,e, gi = 1, then compute the cluster center V/L) using

I n u·· Px ·

V.(L) = j=l ( IJ) J

1 In ' (u .. )p j=l IJ

(10)

3. Update {jL):

For t=1, ... ,n and i=I, ... ,c, calculate

(11)

Put

430

(12)

4. If It ::t. 0 then

(L+l) _ {lI#lt, i E I" Uit - 0 i~1 , t,

(13)

U(L+q) _ 1 it - ~c (D. I D )lIp-l

""" k=1 It kt

(14)

If

max entry~u(L+l) - u<L)I):s; £ (errorbound) (15)

then stop,

otherwise return to step 2, putting L=L+ 1.

A proof of realization of the local minimum in this algorithm is given in [2]. The results for area division of images, which is an application of FCM-AD to the remote sensing imagery data, are also presented in [2].

2. Robotics Vision in Assembling Line

A robot-arm system which is able to recognize a moving pattern and to manipulate a moving object on a belt-conveyor at a various speed is built [3]. This system consists of two parts. The first part is related to recognizing patterns. In this part, a method of constructing a discriminant-tree is proposed. The robot-arm system is able to recognize the shape and the size of moving patterns on a belt-conveyor based on the discriminant-tree. The second part is concerned with replacing a moving object (i.e. grasping a moving object and putting it on an indicated moving mark) based on fuzzy-inference rules with the aid of image-processing technique. The first part is mentioned a little bit precisely in the followings.

In dynamic pattern recognition, one of the most important thing is a discriminant tree. The discriminant tree distinguishes patterns using features. Each of the end nodes (leaves) corresponds to one pattern category. The method using discriminant tree has been a fundamental approach in real time pattern recognition. In general, however, how to construct the discriminant tree depends on the experience of researchers. Although various features of patterns have been studied, construction methods of discriminant trees have not been studied enough.

431

In [3], a new approach concerning how to construct an efficient discriminant tree is proposed first. Necessary information for choosing minimum features' set which is used to construct an efficient discriminant tree are the frequency of appearance of each pattern category, the features themselves, and the computing time of extracting each feature. Shape recognition using the discriminant tree and size recognition based on fuzzy logic are applied to distinguish moving patterns on a belt conveyor. In the proposed system an object or a mark moves on a belt conveyor, and the robot grasps an indicated object and puts it on indicated mark.

It should be noted that only one 16 bit personal computer controls the whole system (i.e. real time image processing of CCD camera data, fuzzy inference, and robot control of 2 robot arms are done by this 16 bit personal computer alone.) Through a lot of processing time and memory are required in general to realize a robot control system with a visual sensor (e.g. CCD camera), only lower level devices were successfully utilized in this report to construct a real time robot system equipped with a visual function. It is realized by applying the fuzzy inference method with vagueness to rather insufficient visual information from a CCD camera. Such a system has not been realized by other methods.

3. Dynamic Image Understanding Using Fuzzy Frame Knowledge Base

A fuzzy frame knowledge with spool base is applied to the dynamic image understanding for general roads in Japan [4]. The system consists of an image processing part using conventional processing techniques, fuzzy inference part using fuzzy frame knowledge base with spool base and human friendly interface.

The object extraction under the various lightning conditions even in the bad weather and the daylight changes is done by rule based fuzzy inference and difference image method in the image preprocessing part. The image processing part gives shape, color, position in the image, and area from an image to the fuzzy inference part.

The fuzzy inference part consists of a fuzzy inference engine and fuzzy frame knowledge base with spool base. The spool frame in the fuzzy frame knowledge base is proposed to provide a tool for the expressing gradually changing cognitive environment. The fuzzy inference engine enables to recognize the extracted object and the objects' movements. The spool frame is divided into a floating spool frame and an anchored spool frame. The fuzzy frame is applied to knowledges or categories to recognize objects. The spool frame is applied to the information of recognized object. The anchored spool frame includes the information about the uniquely determined object. The floating spool frame includes the information of the ambiguously recognized results about an observed object. The floating spool treats such an information of the ambiguous object. If the ambiguous object is determined clear by coming near, then the spool frame changes to the anchored frame.

432

(a)

(b)

::;~=;.::::::;===::::::: (d)

(c) _a.....I"""'" ...... _ __ ~ __ ----I (e)

Fig. 1. Two cars passing by each other

At the time that (e) is presented, the following dialogue takes place:

User : kuruma ha imasuka? (Are there any cars?)

comp : kuruma ha 2 dai imasu. (There are two cars.)

user : sure-chigau kuruma ha imasu ka? (Do cars pass by each other?)

comp : hai imasu. (Yes. they do.)

Fig. 2. Q&A for dynamic image

433

The human interface provides the useful answer that is not asked directly for the user question. The system can answer both yes/no and what/which type questions.

The fuzzy frame knowledge base with spool frame for dynamic image understanding on general roads in Japan is shown in the following experimental results. On general roads, the observed image includes ambiguous information such as a car coming closer to the observation point from far pJace.

The system has been realized on Dr.Image (Kawasaki Steel Co. Ltd.) and SPARCStation10 (SUN Microsystems) using C language.

Figure 1 shows the objects and the environment of general road. In the image processing part, dynamic image is observed each 0.1 seconds interval which is taken by video camera. Figure 1 shows each 0.5 seconds to save space. Figure 2 shows query and answer using human interface.

The system answered the query.from user at the (e) image. The experimental result shows the system understand the scene of general roads. The results don't show only counting cars but also their action (pass by). The system has understood their action by fuzzy inference using fuzzy frame knowledge.

The system can treat Yes/No question and what/where (direction) question.

References

1. I.C. Bezdek: Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York. 1981

2. K. Hirota, K. Iwama: Application of Modified FCM with Additional Data to Area Division oflmages, Information Sciences, Vol. 45, pp.213-230, 1988

3. K. Hirota, Y. Arai, S. Hachisu: Moving Mark Recognition and Moving Object Manipulation in Fuzzy Controlled Robot, J. of Control-Theory and Advanced Technology, Vol. 2, No.3, pp.399-418, 1986

4. Y. Nakagawa, K. Hirota, W. Pedrycy: Dynamic Image Understanding on General Roads in Japan Using Fuzzy Frame Knowledge Base with Spool Base, Int. 1. of Scientia Iranica (accepted)

Fuzzy Sets and the Management of Uncertainty in Computer Vision

James M. Keller 1

Department of Computer Engineering and Computer Science, University of Missouri-Columbia, Columbia, Missouri 65211, USA [email protected]

1. Introduction

Visual perception is a difficult task to automate. This process, known as computer vision, has received a considerable amount of attention for the last three or four decades. Even with all of the research and development efforts, relatively few real computer vision systems have been put into routine use - these being primarily in controlled environments. Yet, the potential of general purpose vision systems which can effectively operate in varying scenarios is so great that much research continues to be devoted to the components of a computer vision system. These components include tasks such as noise removal, smoothing, and sharpening of contrast (low-level vision); segmentation of images to isolate objects and regions and description and recognition of the segmented regions (intermediate-level vision); and finally interpretation of the scene (high-level vision). Uncertainty exists in every phase of computer vision. Some of the sources of this uncertainty include: additive and non-additive noise of various sorts and distributions in lowlevel vision, imprecisions in computations and vagueness in class definitions in intermediate-level vision, and ambiguities in interpretations and ill-posed questions in high-level vision. The use of multiple image sources can aid in making better judgements about scene content, but the use of more than one source of information poses new questions of how the complementary and supplementary information should be combined, how redundant information should be treated and how conflicts should be resolved.

In his seminal book on computer vision, David Marr stated two principles for the design of vision algorithms:

Principle of Least Commitment

Don't do something that may later have to be undone. In fact, Marr states "If the Principle of Least Commitment has to be disobeyed,

one is either doing something wrong or something very difficult". Obviously, the

1 The author was partially supported by Office of Naval Research grant NOOOI4-96-1-0439.


435

Principle of Least Commitment is consistent with the notion of utilizing degree of membership, or more general linguistic models, to the objects and features in vision processes, and to carry that information along until a crisp decision is required.

Principle of Graceful Degradation

Degrading the data will not prevent the delivery

of at least some of the answer.

He elaborates that this principle indicates that algorithms should be robust, and hence implies a condition of continuity of the processes involved in computer vision. Once again, fuzzy set theoretic methods lend themselves to this type of activity.

Feature Extraction

Segmentation

Object Scene Recognition ----.. Interpretation

Fig. 1. Traditional view of the computer vision process

It is extremely important to note that just the use of membership functions alone does not guarantee that algorithms will satisfy the above two principles. My contention is that fuzzy set theory contains natural modeling mechanisms, calculi for computation involving uncertain information, and intuitively pleasing interpretations. But simply mapping gray levels to the interval [0,1], and using Max and/or Min does not automatically qualify an approach to satisfy Marr's Principles any more than misusing Bayes Theorem does. Fuzzy set theory offers one of the best overall frameworks within which to pose, model and solve problems in computer vision in the presence of uncertainty. Hybrid systems involving probability theory, fuzzy set theory, belief theory, expert systems, etc.

436

will undoubtedly be required to attack general vision domains. It is crucial to use the most appropriate tool for each subproblem rather than to attempt to force the problem into one's favorite methodology when that is not reasonable.

In this paper, I will briefly state the various activities involved in computer vision and will attempt to indicate how fuzzy set theory can be, and has been, applied to solutions to problems within this domain along with a bibliography of some of the relevant references. Fig. 1 depicts the traditional structure of a computer vision system. While there are many variations to the task breakdown, (and, in fact, considerable interaction between these modules in most real systems), I will loosely use the taxonomy of figure 1 to discuss the role and potential of fuzzy set theory in computer vision. I refer the reader to the first few references by Keller and Krishnapuram for more in-depth discussions. The bibliography is basically organized along the same line as the following sections.

2. Sensing

Sensing involves the conversion of analog data into a digital format. Most imaging sensors produce a two or three dimensional array, the values of which are related to the phenomenon under consideration: light intensity, color intensities, x-ray density, distance, etc. Hence, the values of the image pixels can be scalars or vectors. Before applying fuzzy set operations to an image, the pixel features must be converted to appropriate membership functions. Numerous approaches ranging from heuristic Sand 1t functions to fuzzy and possibilistic clustering to neural networks and more have been used in this regard. The important factor is that the resultant membership functions faithfully represent the meanings of the terms they describe, for example, "light objects", "highly textured", "moderately red", etc. This conversion is highly problem dependent, but better general models to convert numeric scalars and vectors arising from analog sensing into fuzzy sets will greatly benefit the field, making the subsequent tasks easier to accomplish. In fact, using fuzzy models for object recognition require the generation of membership functions from extracted image features. Thus, this endeavor is central to the success of fuzzy-based computer vision.

3. Low-Level Vision

The basic membership functions which are produced during the sensing phase can be used for both contrast enhancement and smoothing through the application of appropriate fuzzy set operators. Also, using union and intersection connectives to replace sum and product in "neighborhood" computations can give rise to thinning, edge detection and smoothing from a more continuous standpoint.

Recently, researchers have made attempts to design fuzzy filters for image processing with promising results. In the fuzzy-rule based approach to image processing, we can incorporate human intuition (heuristic rules) which is highly non-linear by nature and is hard to represent by traditional mathematical modeling.

437

Moreover, we can also combine heuristic rules with traditional methods. This leads to a more flexible and adaptive filter design.

As an example, Russo proposed fuzzy rule-based operators for smoothing, sharpening, and edge detection. He used heuristic knowledge to build rules for each of the operations. Fuzzy rules for smoothing took the form:

IF a pixel is darker than its neighboring pixels

then make it brighter; IF a pixel is brighter than its neighboring pixels

THEN make it darker; ELSE leave it unchanged.

In this approach, the gray level differences between a given pixel and its neighbors is used as input and output variables. The fuzzy set medium positive and medium negative were used for input variables, and small positive, small negative, and zero were used for output variables. The inferred output value was added to the original gray level of the pixel.

It is sometimes hard to represent a simple fuzzy relation between inputs and outputs in an image processing problem as is common in control and pattern classification applications. The consequent clause need not be a simple fuzzy set, but another set of fuzzy rules or actions. In other words, the if-then relation is a condition-action relation rather than an input-output relation. Suppose that we construct rules as condition-action relations and that we have a library of possible actions or consequents, then we only need to select the appropriate consequents based on the expected result in the application domain. This is a more general and flexible design scheme for tasks in image processing because the meaning of linguistic labels, such as noisy, depends on the application domain.

Image enhancement can be viewed as replacing the gray-level value of every pixel in the image with a new value depending on the local information. If the local region is relatively smooth, then the new value of the pixel may be a type of average of the local values. On the other hand, if the local region contains an edge or noise points, a different type of filtering should be used. This gives rise to a conditional and adaptive smoothing technique. In other words, we could create a bank of filters, and one of them could be selected at each pixel depending on the local information. Fuzzy rule-based systems have recently been developed for contrast enhancement, smoothing, edge detection with excellent results. The advantage is that the fuzzy rules can adapt the image processing actions locally and nonlinearly to image neighborhood conditions.

Fuzzy integrals also model a wide variety of image filters including linear filters, order statistic filters, linear combinations of order statistics, and many others. Hence, these mechanisms offer great potential to be incorporated into low level vision systems.

438

4. Segmentation and Region Representation

Image segmentation is one of the most critical components of the computer vision process. Errors made in this stage will impact all higher level activities. Therefore, methods which incorporate the uncertainty of object and region definition and the faithfulness of the features to represent various objects (regions) are desirable. The first connection of fuzzy set theory to computer vision was made by Prewitt who suggested that the results of image segmentation should be fuzzy subsets rather than crisp subsets of the image plane. In the two-class (object and background) case, this problem reduces to computing the membership function J1 for the object, since the membership function for the background can be simply taken to be 1-J1 to satisfy the conditions for a fuzzy partition.

Probably the most popular method of assigning multi-class membership values to pixels, for either segmentation or other processing, is to use the fuzzy C-means (FCM) algorithm. The FCM algorithm attempts to cluster feature vectors by searching for local minima of a least squares type objective function. In terms of generating membership functions for later processing, the fuzzy C-means has several advantages. It is unsupervised, that is, it requires no initial set of training data; it can be used with any number of features and any number of classes; and it distributes the membership values in a normalized fashion across the various classes based on "natural" groupings in feature space. However, being unsupervised, it is not possible to predict ahead of time what type of clusters will emerge from the fuzzy C-means from a perceptual standpoint. The resulting fuzzy subsets may be disconnected. Also, the number of classes must be specified for the algorithm to run. FinaIly, iteratively clustering features for a 512x512 resolution image can be quite time consuming. Approximations and simplifications have been introduced to ease this computational burden.

Several enhancements of the fuzzy C-means have been proposed. These range from modifying the distance metric to automaticaIly determining the number of clusters present. The problem of having the memberships sum to one across classes (necessary for the iterative optimization of the least squares criterion function) has been successfuIly relaxed with the introduction of Possibilistic clustering. This reformulation has shown to produce very robust results even in the presence of a considerable amount of noise. In fact, connections between fuzzy and possibilistic clustering and robust statistics have recently been established.

Before and after segmentation, objects and their boundaries are often not very smooth, and sometimes contain several tiny spurious areas. These effects can be remedied by using clean up procedures such as shrink-expand. Rosenfeld et al. have developed geometrical operations on fuzzy image subsets, including shrinking and expanding. These gave rise to many new formulations of image filters which are generally known as fuzzy morphological operations, which appear to be a fertile area of research.

Skeletons of the object regions are often used as compact representation of the shape of the boundary. When all points in an object belong to the skeleton to varying degrees, it may be cal1ed a fuzzy skeleton. Several fuzzy skeletonization algorithms have been developed based on this idea.

439

Good features are of critical importance to the recognition of objects in images. There has been some work on extraction of features from fuzzy subsets of an image, but the utility of these features has not as yet been demonstrated in gray level image processing. Some fuzzy features can be shown to be invariant to convolution blurring of binary objects, but one must be very careful to only use fuzzy features which are directly related to the membership function of the region. For example, if "blueness" is used to generate a membership function in an image that contains a lake, then is it appropriate to measure the area of the lake using that membership function? Experimental results suggest that the answer is no. This is an area for continued research.

5. Boundary Detection and Representation

Boundary detection is another approach to segmentation. In this approach, an edge operator is first used on the image to detect edge elements. The edge elements so detected are considered to be part of the boundaries between various objects or regions in the image. Simple algorithms have been described which can classify and approximate boundary segments in terms of lines and arcs. The boundaries are sometimes described in terms of analytical curves such as straight lines, circles, and other higher degree curves.

Variations of the FCM algorithm and the possibilistic C-means algorithm can be used to detect (or fit) straight lines or parametrized families of second degree curves to edge elements. These techniques have been shown to be considerably more effective than crisp approaches for boundary detection and recognition.

6. Object Recognition and Region Labeling

The area of computer vision concerned with assigning meaningful labels to regions in an image can be thought of as a subset of pattern recognition. There is a large amount of research in the use of fuzzy set theory in pattern recognition. The book edited by Bezdek and Pal contains a nice set of introductory papers in this area.

Two techniques which we have found to be of great value in computer vision are the fuzzy integral and neural network architectures where the individual nodes implement fuzzy set connectives. In both cases, the parameters are learned through training. These approaches view the labeling problem as an aggregation of evidence problem. The evidence can be derived from several sensors (for example, color), several distinct pattern recognition algorithms, different features, or the combination of image data with non-image information (intelligence). The support for a labeling decision may depend on supports for (or degrees of satisfaction of) several different criteria, and the degree of satisfaction of each criterion may in turn depend on degrees of satisfaction of other sub-criteria, and so on. Thus, the decision process can be viewed as a hierarchical network, where each node in the network "aggregates" the degree of satisfaction of a particular criterion from the observed evidence. The inputs to each node are the degrees of satisfaction of each of the sub-criteria, and the output is the aggregated degree of satisfaction of the

440

criterion. The fuzzy integral and the fuzzy aggregation networks have used for both segmentation and object recognition

In the methods discussed above, one needs to compute membership values in different classes from observed feature data. Several methods can be used for this purpose. One approach is to run the FCM algorithm (or any clustering approach, such as variants of Learning Vector Quantization) on the training data to estimate the prototypes which can then be used to compute membership values. Normalized histograms of the feature values generated from training data have also been used to estimate the particular membership functions. This has the advantages that it does not force any particular shape to the resultant distributions, can be extended to deal with multiple features instead of gray level alone, and can easily accommodate the addition of new classes. Clearly, neural networks are excellent function approximators if appropriate training data is available. As mentioned earlier, the success of any fuzzy approach lies first in the ability to generate meaningful membership functions. Recent work on estimating Gaussian mixture densities using fuzzy techniques show promise here.

The above techniques, as well as many other fuzzy pattern recognition algorithms, are numeric feature-based procedures. On the other hand, fuzzy logic, and in general possibility theory, is inherently set-based, and so, offers the potential to manipUlate higher order concepts. Several authors have used linguistic weighted averaging of possibility distributions and other approximate reasoning algorithms to generate object confidence from a combination of feature level results and harder-to-quantify values relating to range and motion. We considered two applications of fuzzy rule-based systems to the realm of mid-level vision: the problem of locating street numbers on handwritten postal address blocks, and that of identifying chromosomes from their images in metaphase spreads. These are typical of the use of fuzzy rules for tasks within intermediate-level computer vision. Essentially, they represent rule-based approaches to pattern (object) recognition. Hence, the potential here is large.

7. High-Level Vision

I believe that the area of high level computer vision offers the most potential for the application of fuzzy logic, since scene interpretation involves perception and understanding As noted earlier, digital images are loaded with uncertainty at many levels for a variety of reasons. High level vision is where those uncertainties finally need to be resolved. Unfortunately (or fortunately for those who may wish to get involved), the least amount of effort has been directed to the inclusion of fuzzy set theory into scene interpretation. Some research has been performed, for example, in perceptual grouping of edges, but much more work is needed.

Traditional rule-based systems have gained popularity in computer vision applications, particularly in high level vision activities. However, complex mechanisms need to be incorporated to handle the uncertainty present in computer vision. In guiding the choice of parameters for low-level algorithms, a vision knowledge base may have a rule such as

441

IF the range is LONG THEN the object detection window size is SMALL, or in a relaxation style scene interpretation module, a rule like

IF object A is SMALL and object A is SURROUNDED by Region Band Confidence of Region B being WATER is HIGH

THEN Label BOAT is Highly Compatible with Object A

If the terms in italics in the above rules are modeled by possibility distributions over appropriate domains of discourse, then fuzzy logic offers numerous approaches to translate such rules and to make inferences from the rules and facts modeled similarly. High-level vision is the place where fuzzy logic can make the most dramatic impact in the future.

Spatial relationships between regions in an image, as in the second rule above, play an important role in scene understanding. Humans are able to quickly ascertain the relationship between two objects, for example liB is to the right of A", but this has turned out to be a somewhat illusive task for automation. The determination of spatial relationships is critical for higher level vision processes (based on artificial intelligence) involved in, for example, autonomous navigation, medical diagnosis, or more generally, scene interpretation. When the objects in a scene are represented by crisp sets, the all-or-nothing definition of the subsets actually adds to the problem of generating such relational descriptions. Definitions of spatial relationships based on fuzzy set theory should yield realistic results. This topic has seen considerable research, but suffers from the problem that it is difficult to rank definitions since spatial relationship determination is a very perceptual concept. More effort is needed here, particularly on the cognitive side.

8. Conclusions

The use of fuzzy set theory is growing in computer vision as it is in all intelligent processing. The representation capability is flexible and intuitively pleasing, the combination schemes are mathematically justifiable and can be tailored to the particular problem at hand from low level aggregation to high level inferencing, and the results of the algorithms are excellent, producing not only crisp decisions when necessary, but also corresponding degrees of support.

There is much work left to be done at all levels of computer vision. Membership and hypothesis confidence generation will always be an issue. Fusion of information has been a success area for fuzzy set research. It's application to computer vision should continue to grow. One area of particular need is the calculation and subsequent use of (fuzzy) features from the output of fuzzy segmentation algorithms. More research is also necessary in high level vision processes. Fuzzy set theory offers excellent potential for describing and manipulating object and region relationships, thereby assisting with scene

442

interpretation. Finally, possibility distributions should be investigated as the model for the interface between the human and the vision system and between a high level vision subsystem and mid or low level vision processes. Of course, the real payoff will be in robust real vision systems which model, and effectively manage, the immense uncertainty in visual processes.

Selected Bibliography

Organization Roughly Follows the Order of Topics

D. Marr, Vision, W. H. Freeman and Company, San Francisco California, 1982.

J. Keller and R. Krishnapuram, "Fuzzy Set Methods in Computer Vision", in An Introduction to Fuzzy Logic Applications in Intelligent Systems, R. Yager and L. Zadeh (eds), 1992, pp. 121-146.

R. Krishnapuram and J. Keller, "Fuzzy Set theoretic Approach to Computer Vision: An Overview", Invited Paper, Proceedings, IEEE International Conference on Fuzzy Systems, San Diego, CA, 1992, pp. 135-142.

J. Keller, "Fuzzy Logic and Neural Networks in Computer Vision", Video Tutorial, IEEE Press, 1992.

R. Krishnapuram and J. Keller, "Fuzzy and Possibilistic Clustering Methods for Computer Vision", in Neural and Fuzzy Systems, S. Mitra, M. Gupta and W. Kraske (eds), SPIE Press, 1994, pp. 133-159.

J. Keller and R. Krishnapuram, "Fuzzy Decision Models in Computer Vision", in Fuzzy Sets, Neural Networks, and Soft Computing, R. Yager and L. Zadeh (eds), Van Nostrand, 1994, pp. 213-232.

J. Keller, R. Krishnapuram, P. Gader, and Y.-S. Choi, "Fuzzy Rule-Based Models in Computer Vision," in Fuzzy Modelling: Paradigms and Practice, W. Pedrycz, Ed.: Kluwer Academic Publishers, 1996, pp. 353-371.

S. K. Pal, "Fuzzy Sets in Image Processing and Recognition," Proceedings, First IEEE International Conference on Fuzzy Systems, San Diego, pp. 119-126, March 8-12, 1992.

S.K. Pal, and R.A. King. "Image enhancement using smoothing with fuzzy sets," IEEE Transactions on System, Man, and Cybernetics, Vol. SMC-l1, 1981, pp. 494-501.

S.K. Pal, "A Note on the Quantitative measure of image enhancement through fuzziness", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-4, no. 2, 1982, pp. 204-208.

T.K. De and B.N. Chatterji, "An Approach to a generalized technique for image contrast enhancement using the concept of fuzzy set", Fuzzy Sets and Systems, vol. 25, 1988, pp. 145-158.

S.K. Pal, and R.A. King. "Histogram equalization with S and· functions in detecting xray edges", Electronics Letters, Vol. 17,1981, pp. 302-304.

O. AlShaykh, S. Ramaswamy, and H. Hung, "Fuzzy Techniques for Image Enhancement and Reconstruction", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp. 582-587.

F. Russo, "A New Class of Fuzzy Operators for Image Processing: Design and Implementation", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp. 815-820.

443

F. Russo and G. Ramponi, "Combined FIRE Filters for Image Enhancement," Proceedings, Third IEEE International Conference on Fuzzy Systems, Orlando, FL, pp. 260-264, June, 1994.

F. Russo and G. Ramponi, "Edge Extraction by FIRE Operators," Proceedings, Third IEEE International Conference on Fuzzy Systems, Orlando, pp. 249-253, June, 1994.

M. Mancuso, R. Poluzzi, and G. Rizzotto, "A Fuzzy Filter for Dynamic Range Reduction and Contrast Enhancement," Proceedings, Third IEEE International Conference on Fuzzy Systems, Orlando, FL, pp. 264-267, June, 1994.

S. Peng and L. Lucke, "Fuzzy Filtering for Mixed Noise Removal During Image Processing," Proceedings, Third IEEE International Conference on Fuzzy Systems, Orlando, FL, pp. 89-93, June, 1994.

B. Chen, Y. Chen, and W. Hsu, "Image Processing and Understanding Based on the Fuzzy Inference Approach," Proceedings, Third IEEE International Conference on Fuzzy Systems, Orlando, FL, pp. 254-259, June, 1994.

F. Russo and G. Ramponi, "Fuzzy Operator for Sharpening of Noisy Images," lEE Electronics Letters, vol. 28, no. 18, pp. 1715-1717, 1992.

C. Tyan and P. Wang, "Image Processing - Enhancement, Filtering and Edge Detection Using the Fuzzy Logic Approach," Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, pp. 600-605, March, 1993.

M. Grabisch, "Fuzzy Integrals as a General Class of Order Filters", Proceedings of the European Symposium on Satellite Remote Sensing, Rome, Italy, 1994.

M. Grabisch and M. Schmitt, "Mathematical Morphology, Order Filters, and Fuzzy Logic", Proceedings, Fourth IEEE International Conference on Fuzzy Systems, Yokohama, Japan, pp. 2103-2108, March, 1995.

H. Shi, P. Gader, and 1. Keller, "An O(K)-Time Implementation of Fuzzy Integral Filters on an Enhanced Mesh Processor Array", Proceedings, Fifth IEEE International Congress on Fuzzy Systems, New Orleans, LA, September, 1996, pp. 1086-1091.

1. Keller, H. Qiu, and H. Tahani, "The fuzzy integral in image segmentation", Proceedings of NAFIPS Workshop, New Orleans, June 1986, pp. 324-338.

R. Sankar, "Improvements in image enhancement using fuzzy sets", Proceedings of NAFIPS Workshop, New Orleans, June 2-4,1986, pp. 502-515.

A Rosenfeld, "Fuzzy digital topology", Information and Control, 40, 1979, pp. 76-87. A Rosenfeld, "On connectivity properties of gray scale pictures", Pattern Recognition, 16,

1983, pp. 47-50. A Rosenfeld, "The fuzzy geometry of image subsets", Pattern Recognition Letters, 2,

1984, pp. 311-317. D. Dubois and M.e. Jaulent, "A general approach to parameter evaluations in fuzzy digital

pictures", Pattern Recognition Letters, Vol. 6, 1987, pp. 251-259. S.K. Pal, R.AKing and AA Hishim, "Automatic grey level thresholding through index of

fuzziness and entropy", Pattern Recognition Letters, vol. 1, 1983, pp. 141-146.

S.K. Pal, "A measure of edge ambiguity using fuzzy sets", Pattern Recog Letters, vol. 4, 1986, pp. 51-56.

S.K. Pal and A Ghosh, "Index of area coverage of fuzzy image subsets and object extraction", Pattern Recognition Letters, vol. 11, 1990, pp. 831-841.

Y. Nakagawa, and A Rosenfeld, "A note on the use of local min and max operators in digital picture processing," IEEE Transactions on System, Man and Cybernetics, Vol. SMC-8, 1978, pp. 632-635.

444

S.K. Pal, and R.A. King. "On edge detection of x-ray images using fuzzy sets," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-5, 1983, pp. 69-77.

M.M. Gupta, G.K. Knopf, and P.N. Mikiforuk, "Edge Perception Using Fuzzy Logic", in Fuzzy Computing: Theory Hardware and Applications, North-Holland, 1988.

T. Huntsberger, and M. Desclazi, "Color edge detection", Pattern Recognition Letters, 3, 1985,205.

R. Krishnapuram and L. Chen, "Implementation of Parallel Thinning Algorithms Using Iterative Neural Networks", to appear in the IEEE Transactions on Neural Networks, 1992.

D. Sinha and E. Dougherty, "An Intrinsically Fuzzy Approach to Mathematical Morphology", SPlE Image Algebra and Morphological Image Processing Ill, San Diego CA, July 1992.

P. Gader, "Fuzzy Morphological Networks", Proc. First Midwest Electro-Technology Conference, Ames I, April 1992, pp. 70-74.

1.M. Prewitt, "Object enhancement and extraction", in Picture Processing and Psychopictorics, B.S. Lipkin and A. Rosenfeld (Eds.), Academic Press, New York, 1970, pp. 75-149.

c.A. Murthy and S.K. Pal, "Fuzzy thresholding: mathematical framework, bound functions and weighted moving average technique", Pattern Recognition Letters, vol. 11, 1990, pp. 197-206.

S.K. Pal and A. Rosenfeld, "Image enhancement and thresholding by optimization of fuzzy compactness", Pattern Recognition Letters, vol. 7, 1988, pp. 77-86.

1. T. Kent and K.V. Mardia, "Spatial classification using fuzzy membership models", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, no. 5, 1988, pp. 659-671.

J.C. Dunn, "A fuzzy relative of the Isodata process and its use in detecting compact welIseparated clusters", Journal Cybernet 31(3),1974, pp. 32-57.

R. R. Yager and L. A. Zadeh (Eds.), An Introduction to fuzzy logic applications in intelligent systems, Kluwer Academic, Norwell, MA, 1992.

J. C. Bezdek and S. K. Pal (Eds.), Fuzzy Models for Pattern Recognition, IEEE Press, New York,1992.

J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981.

T. Huntsberger, C. Jacobs, and R. Cannon, "Iterative fuzzy image segmentation", Pattern Recognition, vol. 18, 1985, pp. 131-138.

R. Cannon, 1. Dave and 1. Bezdek, "Efficient implementation of the fuzzy c-means clustering algorithm," IEEE Transactions on Pattern Analysis Machine Intelligence, Vol. 8, No.2, 1986, pp. 248-255.

T. Huntsberger., "Representation of uncertainty in low level vision", IEEE Transactions on Computers, Vol. 235, No.2, 145, 1986, p. 145.

R. Cannon, J. Dave, 1.C. Bezdek, and M. Trivedi, "Segmentation of a thematic mapper image using the fuzzy c-means clustering algorithm," IEEE Transactions on Geographical Science and Remote Sensing, Vol. 24, No.3, 1986, pp. 400-408.

S. Araki, H. Nomura, and N. Wakami, " Segmentation of Thermal Images Using the Fuzzy C-Means Algorithm", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp. 719-724.

R. Krishnapuram and J. Keller, "A PossibiIistic Approach to Clustering", IEEE Transactions on Fuzzy Systems, Vol. 1, No.2, 1993, pp. 98-110.

445

J. Keller and C. Carpenter, "Image Segmentation in the Presence of Uncertainty," International Journal of Intelligent Systems, vol. 5, 1990, pp. 193-208.

M.M. Trivedi and 1. Bezdek, "Low-Level segmentation of aerial images with fuzzy clustering, IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-16, No.4, 1986, pp. 589-598.

I. Gath and AB. Geva, "Unsupervised Optimal Fuzzy Clustering", IEEE Transactions on Pattern Analysis Machine Intelligence, vol. PAMI-ll, no. 7, July 1989, pp. 773-781.

R. Krishnapuram and A Munshi, "Cluster-Based Segmentation of Range Images Using Differential-Geometric Features", Optical Engineering, Vol. 30, No. 10, October 1991, pp. 1468-1478.

J. Jantzen, P. Ring, and P. Christiansen, " Image Segmentation Based on Scaled Fuzzy Membership Functions", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp. 714-724.

O. Strauss and M. Alden, "Segmentation of Cross Sectional Images Using Fuzzy Logic", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp. 731-738.

1. Keller and Y. Seo, "Local fractal geometric features for image segmentation", International Journal of Imaging Systems and Technology, Vol. 2, 1990, pp. 267-284.

M.P. Windham, "Cluster validity for the fuzzy c-means clustering algorithm", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-4, no. 4, 1982, pp. 357-363.

E. Backer and AK. Jain, "A clustering performance measure based on fuzzy set decomposition", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-3, no. 1, 1981, pp. 66-75.

S. Peleg and A Rosenfeld, "A mini-max medial axis transformation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-3, 1981, pp. 208-210.

C.R. Dyer and A Rosenfeld, "Thinning operations on grayscale pictures," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-I, 1979, pp. 88-89.

S.K. Pal "Fuzzy sketetonization of an image", Pattern Recognition Letters, vol. 10, 1989, pp.17-23.

S. K. Pal and L. Wang, "Fuzzy Medial Axis Transformation (FMAT): Redundancy, Approximation and Computational Aspects", Proceedings of the International Fuzzy Systems Association Congress, Brussels, 1991, volume on Engineering, pp. 167-170.

S.K. Pal, R.A King, and AA Hashim, "Image description and primitive extraction using fuzzy sets", IEEE Transaction on Systems, Man, and Cybernetics, vol. SMC-13, No.1, 1983, pp. 94-100.

J. Bezdek and I.M. Anderson, "An Application of the c-varieties clustering algorithms to polygonal curve fitting", IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-15, No.5, 1985, pp. 637-641.

R. Dave, "Use of the adaptive fuzzy clustering algorithm to detect lines in digital images", Proceedings of the Intelligent Robots and Computer Vision VIII, vol. 1192, no. 2, 1989,pp.600-611.

1. Bezdek, C. Cordy, R. Gunderson and J. Watson, "Detection and characterization of cluster substructure", SIAM Journal Applied Mathematics, Vol. 40,1981, pp. 339-372.

M. Windham, "Geometrical fuzzy clustering algorithms", Fuzzy Sets and Systems, vol. 10, 1983, pp. 271-279.

R. Krishnapuram and C.-P. Freg, "Algorithms to detect linear and planar clusters and their applications", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, June 1991, pp. 426-431.

446

R. Krishnapuram and C.-P. Freg, "Fitting an unknown number of lines and planes to image data through compatible cluster merging", Pattern Recognition, Vol. 25, No.4, April 1992, pp. 385-400.

R. Dave, "Fuzzy Shell-Clustering and applications to circle detection in digital images", International Journal of General Systems, vol. 16, No.4, 1990, pp. 343-355.

R. Krishnapuram, H. Frigui and O. Nasraoui, "New Fuzzy Shell Clustering Algorithms for Boundary Detection and Pattern Recognition", Proc. of the SPIE Con! on Intelligent Robots and Computer Vision, Vol. 1607, Boston, Nov. 1991, pp. 458-465.

R. Krishnapuram, O. Nasraoui, and H. Frigui, "Fuzzy C Spherical Shells Algorithm: A New Approach", IEEE Transactions on Neural Networks, Vol. 3, No.5, September 1992, pp. 663-671.

R. Krishnapuram, O. Nasraoui, and H. Frigui, "The Surface Density Criterion and Its Application to Linear/Circular Boundary Detection and Planar/Spherical Surface Approximation", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA,1993, pp. 725-730.

J. Keller and R. Krishnapuram, "Possibilistic Clustering for Shape Description", Plenary Paper, Proceedings, Third International Workshop on Neural Networks and Fuzzy Logic, NASNJSC, Houston, TX, 1992, pp. 227-236.

R. Krishnapuram and S. Gupta, "Morphological Methods for Detection and Classification of Edges in Range Images", to appear in the Journal of Mathematical Imaging and Vision, 1993.

R. N. Dave, "Adaptive C-shells clustering", Proceedings of the North American Fuzzy Information Processing Society Workshop, Columbia, Missouri, 1991, pp. 195-199.

J. Han, L. Koczy, and T. Poston, "Fuzzy Hough Transform", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp. 803-808.

C. Tao, W. Thompson, and J. Taur, "A Fuzzy If-Then Approach to Edge Detection", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp. 1356-1360.

R. Krishnapuram and 1. Lee, "Fuzzy-Connective-Based Hierarchical Aggregation Networks for Decision Making", Fuzzy Sets and Systems, 46,1992, pp. 1-17.

H.J. Zimmermann and P. Zysno "Decisions and evaluations by hierarchical aggregation of information", Fuzzy Sets and Systems, vol. 10, no. 3, 1983 pp. 243-260.

R. Krishnapuram and J. Lee, "Fuzzy-Compensative-Connective-Based Hierarchical Networks and Their Application to Computer Vision", Neural Networks, 5, 1992, pp. 335-350.

R. Krishnapuram and 1. Lee, "Fuzzy-Connective-Based Hierarchical Aggregation Networks for Decision Making", Fuzzy Sets and Systems, 46, 1992, pp. 1-17.

1. Keller, R. Krishnapuram, Z. Chen, and O. Nasraoui, "Fuzzy Additive Hybrid Operators for Network-Based Decision Making", Inti J of Intelligent Systems, 9, 1994, pp 1001-1023.

1. Keller, Y. Hayashi, and Z. Chen, "Interpretation of Nodes in Neural Networks for Fuzzy Logic", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp. 334--338.

1. Keller and Z. Chen, "Learning in Fuzzy Neural Networks Utilizing Additive Hybrid Operators", Proceedings, Second International Conference on Fuzzy Logic and Neural Networks, Iizuka, Japan, 1992 pp. 85-87.

1. Keller and R. Krishnapuram, Second Quarter Report: Fuzzy Set Methods For Object Recognition in Space Applications, NASNJSC through a subcontract NO. 088 Under Cooperative Agreement No. NCC9-16 (Project No. SE. 42), University of HoustonClear Lake, 1991.

447

H. Qiu and 1. Keller, "Multispectral segmentation using fuzzy techniques," Proceedings of the NAFlPS Workshop, Purdue University, May 1987, pp. 374-387.

H. Tahani and 1. Keller, "Information fusion in computer vision using the fuzzy integral", IEEE Transactions on System, Man and Cybernetics, vol. 20, no. 3, 1990, pp. 733-741. Reprinted in G. Klir and Z. Wang, Fuzzy Measure Theory, Plenum Press, 1992.

1. Keller and B. Yan, "Possibility Expectation and Its Decision Making Algorithm", Proceedings of the First IEEE Conference on Fuzzy Systems, San Diego, CA, March, 1992, pp. 661-668.

J. Keller and H. Tahani, "The Fusion of Information Via Fuzzy Integration", Proceedings, NAFlPS'92, Puerto Vallarta, Mexico, December, 1992, pp 468-477.

M. Sugeno, "Fuzzy measures and fuzzy integrals: A survey", in Fuzzy Automatic and Decision Processes, North Holland, Amsterdam, 1977, pp. 89-102.

1. Keller, and 1. Osborn, "Training the Fuzzy Integral", International Journal of Approximate Reasoning, Vol. 15, No.1, 1996, pp. 1-24.

1. Keller, G. Hobson, 1. Wootton, A. Nafarieh, and K. Luetkemeyer, "Fuzzy confidence measures in midlevel vision," IEEE Transactions on System, Man and Cybernetics, Vol. SMC-17, No.4, 1987, pp. 676-683.

1. Wootton, 1. Keller, C. Carpenter, and G. Hobson, "A Multiple Hypothesis Rule Based Automatic Target Recognizer", in Pattern Recognition, Lecture Notes in Computer Science, Vol. 301,1. Kittler (ed.), Springer-Verlag, 1988, pp. 315-324.

1. Keller and D. Jeffreys, "Linguistic computations in computer vision", Proceedings of the NAFIPS Workshop, Vol. 2, Toronto, 1990, pp. 432-435.

R. Krishnapuram and F. Rhee, "Compact Fuzzy Rule Base Generation Methods for Computer Vision", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp. 809-814.

W. Dong, H. Shaw and F. Wang, "Fuzzy computations in risk and decision analysis", Civil Engineering Systems, vol. 2,1985, pp. 201-208.

1. Keller and D. Hunt, D., "Incorporating Fuzzy Membership Functions into the Perceptron Algorithm," IEEE Transactions on Pattern Analysis Machine Intelligence, Vol. PAMI-7, No.6, November, 1985, pp. 693-699,(Reprinted in Fuzzy Models for Pattern Recognition, 1. C. Bezdek, and S. K. Pal (eds), IEEE Press, Piscataway, N.J., 1992).

J. Keller, M. Gray, and J. Givens, "A fuzzy k-nearest neighbor algorithm," IEEE Transactions on System, Man, and Cybernetics, vol. 15, 1985, pp. 580-585.

1. Keller, D. Subhanghasen, K. Unklesbay, and N. Unklesbay, "An approximate reasoning technique for recognition in color images of beef steaks", International Journal General Systems, Vol. 16,1990, pp. 331-342.

P. Gader, and 1. Keller, "Fuzzy Logic in Handwritten Word Recognition", Proceedings, Third IEEE International Congress on Fuzzy Systems, Orlando, FL, June, 1994 (invited paper), pp. 910-917.

P. D. Gader, Mohamed M., and Chiang, J-H., "Fuzzy and Crisp Handwritten Alphabetic Character Recognition Using Neural Networks", Proceedings of Artificial Neural Networks in Engineering, St. Louis MO, November 1992.

P. Gader, J. Keller, and 1. Cai, "A Fuzzy Logic System for the Detection and Recognition of Street Number Fields on Handwritten Postal Addresses," IEEE Transactions on Fuzzy Systems, vol. 3, no. 1,1995, pp. 83-95.

1. Keller, P. Gader, O. Sjahputera, C. W. Caldwell, and H.-M. Huang, "A Fuzzy Logic Rule-Based System for Chromosome Recognition," Proceedings, Eigth IEEE Symposium on Computer-Based Medical Systems, Lubbock, TX, pp. 125-132 (invited paper), June 9-11, 1995.

448

M. Man and C. Poon, "A Fuzzy-Attributed Graph Approach to Handwritten Character Recognition", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp. 570-575.

P. Gader, M. Mohamed, and 1. Keller, "Dynamic-Programming-Based Handwritten Word Recognition Using the Choquet Integral as the Match Function", Journal of Electronic Imaging, Special Issue on Document Image Analysis, Vol. 5, No.1, 1996, pp. 15-24.

H. Ushida, T. Takagi, and T. Yamaguchi, "Recognition of Facial Expressions Using Conceptual Fuzzy Sets", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp. 594-599.

H. Takagi, "Fusion Technology of Fuzzy Theory and Neural Networks - Survey and Future Directions", Proc. International Conf. on Fuzzy Logic and Neural Networks, Iizuka Japan, July 1990, pp. 13-26.

A. Nafarieh and 1. Keller, "A Fuzzy Logic Rule-Based Automatic Target Recognizer", International Journal of Intelligent Systems, Vol. 6, 1990, pp. 295-3 I 2.

C. Perneel, M. deMathelin, and M. Acheroy, "Automatic Target Recognition Fuzzy System for Thermal Infrared Images", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp. 576-581.

A. Mogre, R. McLaren, J. Keller, and R. Krishnapuram,"Uncertainty Management in Rule Based Systems With Applications to Image Analysis", IEEE Transactions, Systems, Man, and Cybernetics, Vol. 24, No.3, 1994, pp. 470-481.

L. Zadeh, "The concept of a linguistic variable and its application to approximate reasoning", Information Sciences, Part I, Vol. 8, pp. 199-249; Part 2, Vol. 8, pp. 301-357; Part 3, Vol. 9, pp. 43-80,1975.

I. B. TUrk-en and Z. Zhong, "An approximate analogical reasoning approach based on similarity measures", IEEE Transactions on Systems, Man and Cybernetics, vol. 18, 1988,pp.1044-1056.

A. Nafarieh and J. Keller, "A new approach to inference in approximate reasoning", Fuzzy Sets and Systems, vol. 41, 1991, pp. 17-37.

J. KeIler and H. Tahani, "Backpropagation neural networks for fuzzy logic", Information Sciences, Vol. 62, No.3, 1992, pp. 205-221.

J. KeIler and R. Yager, "Fuzzy logic inference neural networks", Proceedings of the SPlE Symposium on Intelligent Robots and Computer Vision VlII, 1989, pp. 582-591.

1. KeIler and H. Tahani, "Implementation of conjunctive and disjunctive fuzzy logic rules with neural networks", International Journal of Approximate Reasoning, Vol. 6, No.2, 1992, pp. 221-240. Reprinted in Fuzzy Models for Pattern Recognition, 1. C. Bezdek and S. K. Pal (Eds.), IEEE Press, New York, 1992.

W. Zhuang and M. Sugeno, "A Fuzzy Approach to Scene Understanding", Proceedings, Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp.564-569.

E. Walker, "Fuzzy Relations for Feature-Model Correspondence in 3D Object Recognition", Proceedings, NAFIPS'96, Berkeley, CA, 1996, pp. 28-32.

H-B. Kang and E. Walker, "Characterizing and ControIIing Approximation in Hierarchical Perceptual Grouping", Fuzzy Sets and Systems, Vol. 65,1994, pp. 187-223.

1. Keller, "Computational Intelligence in High Level Computer Vision: Determining Spatial Relationships", Computational Intelligence: Imitating Life, 1. Zurada, R. Marks II, C. Robinson (eds.), IEEE Press, 1994, pp. 81-91. Presented at the IEEE World Congress on Computational IntelIigence, Orlando, FL, June, 1994 (invited paper).

J. Keller and L. Sztandera, "Spatial relations among fuzzy subsets of an image", Proceedings of the First International Symposium on Uncertainty Modeling and Analysis, College Park, MD, 1990, pp. 207-21 I.

449

R.Krishnapuram, 1. Keller, and Y. Ma, "Quantitative analysis of properties and spatial relations of fuzzy image regions", Proceedings, NAFlPS'92, Puerto Vallarta, Mexico, December, 1992, pp. 468-477.

R. Krishnapuram, J. Keller, and Y. Ma, "Quantitative Analysis of Properties and Spatial Relations of Fuzzy Image Regions", IEEE Transactions on Fuzzy Systems, Vol. 1, No. 3, 1993,pp. 222-233.

K. Miyajima and A. Ralescu, "Spatial Organization in 2D Segmented Images," Proceedings, Third IEEE International Conference on Fuzzy Systems, Orlando, FL, pp. 100-105, ,1994.

K. Miyajima and A. Ralescu, "Spatial Organization in 2D Segmented Images: Representation and Recognition of Primitive Spatial Relations," Fuzzy Sets and Systems, vol. 65, pp. 225-236,1994.

1. Keller and X. Wang, "Comparison of Spatial Relation Definitions in Computer Vision", Proceedings, ISUMAINAFIPS'95, College Park, MD, September, 1995, pp. 679-684.

S. Dutta, "Approximate spatial reasoning: Integrating qualitative and quantitative constraints", International Journal of Approximate Reasoning", Vol. 5, 1991, pp. 307-331.

Intelligent Robotic Systems Based on Soft Computing - Adaptation, Learning and Evolution

Toshio Fukuda) and Koji Shimojima2

) Dept. of Micro System Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-01, Japan

2 National Industrial Research Institute of Nagoya, AIST, Mm, 1-1 Hirate-cho, Kita-ku, Nagoya 462, Japan

Abstract. This paper deals with some intelligent control schemes for robotic systems, such as a hierarchical control based on fuzzy, neural network, genetic algorithm, reinforcement learning control, and group behavior control scheme. We also introduce the network robotic system, which is a new trend in robotic systems. The hierarchical control scheme has three levels: learning level, skill level and adaptation level. The learning level manipulates symbols to reason logically for control strategies. The skill level produces control references along with the control strategies and sensory information on environments. The adaptation level controls robots and machines while adapting to their environments which include uncertainties. For these levels and to connect them, artificial intelligence, neural networks, fuzzy logic, and genetic algorithms are applied to the hierarchical control system while integrating and synthesizing themselves. To be intelligent, the hierarchical control system learns various experiences both in a top-down manner and a bottom-up manner. The reinforcement learning is very important for acquisition of the control signal without any previous information of the system or environment. The group behavior control scheme which is one of the artificial life research areas, and the network robot control scheme are also very important for multiple robotic systems. Thus, these control schemes are effective for intelligent robotics.

Keywords. Neuro-fuzzy system, genetic algorithm, hierarchical control, group robots, network robots

1. Introduction

Intelligent robots are required in many fields. The intelligent robots have to carry out tasks in various environments by themselves like human beings. They have to determine their own actions in uncertain environments based on sensory information. In advance, human operators can give the robots their knowledge and skill to some extent in top-down manner. However, when the robots perform tasks in an uncertain environment, the knowledge may not be useful. In this case, the


451

robots have to adapt to their environments and acquire new knowledge by themselves through learning. This process proceeds in bottom-up manner.

This paper introduces a control scheme for intelligent robots, which this paper refers to as hierarchical intelligent control scheme. The hierarchical intelligent control consists of three levels: adaptation level, skill level and learning level. This scheme has two characteristics with respect to learning process: top-down approach and bottom-up approach. To link three levels and have such characteristics for knowledge acquisition, the scheme uses artificial intelligence(AI), fuzzy logic, neural networks (NN) and genetic algorithm (GA) [1-3]. Each technique has advantages and disadvantages. In order to overcome the disadvantages, this paper introduces integration and synthesis techniques of them. Those are key techniques for intelligent control of systems in robotics.

This paper describes advantages and disadvantages of each techniques in the second section. The third section explains integration and synthesis techniques of them to overcome the disadvantages (see Fig. 1, Table 1). This paper also shows the skill acquisition scheme based on the reinforcement learning and group behavior control scheme which is one of the research area of the artificial life. These are key technologies for an intelligent control of systems in robotics.

This paper also introduce a new trends of the robotic system, network robotic system. These technologies will expand the ability of robots and they are now developing in many places.

Fig. I. Synthesis of fuzzy, neural networks, artificial intelligence, and genetic algorithms for intelligent system

452

2. Fuzzy Logic, Neural Networks and Genetic Algorithms

2.1 Fuzzy Sets

Fuzzy logic is characterized as extension of binary crisp logic. Each fuzzy rule has an antecedent, or if, part containing several preconditions, and a consequent, or then, part which prescribes the value. The fuzzy set is a class in which transition from membership to non-membership is gradual rather than abrupt as shown in Fig. 2. Crisp sets allow only full membership or no membership at all, whereas fuzzy sets allow partial membership. In other words, an element may partially belong to a set. One identifies the main parameters and determines a term set which is the right level of granularity for describing the values of each linguistic variable. For example, a term set including linguistic values such as {Small (S), Medium Small (MS), Medium Big (MB), Big (B)} may be used.

quantity xi

(a) Membership functions

x~ A24 HI4' , .. H24 .......... H34" "H44"

I : : A23HI3'

I A22H 12' .. , H22 .. ..

I : A21Hll-lhl-~

Ail Al2 Al3 If Als 1\ A2t then list

Hst: Fuzzy number (b) Gradually partitioned area in lattice

Fig. 2. Classification by fuzzy logic with membership functions

453

Because of the partial matching attribute of fuzzy rules and the fact that the preconditions of rules do overlap, more than one fuzzy rule can fire at a time. The methodology which is used in determining which value should be taken as the result of the firing of several rules can be referred to as conflict resolution. Traditionally, fuzzy logic uses a minimum operator. In the simplified fuzzy logic, however, multipliers are used instead of the minimum operator. Defuzzification procedure is also simple. However, since the fuzzy set does not have learning capability, it is difficult for human operator to tune the rules from data set.

2.2 Neural Network

Neural network, a model of the brain, artificially connects many nonlinear neuron model and processes information in a parallel distributed manner. The neural network has many characteristics such as nonlinear mapping, parallel processing, learning, and self-organization. It is applied to pattern recognition, control and so on. The neural network which consists of three layers (input/output layers and one hidden layer) are able to express any functions while using enough hidden units. The neural network produces transformative rules from empirical training sets through learning. To train the neural network, the back-propagation is used. However, the mapping rules in the network is not visible and is difficult to understand as shown in Fig. 3. Moreover, the convergence of learning is very slow and not guaranteed. To overcome those problems, some structured neural networks are proposed. The next section describes the networks as synthesis techniques.

2.3 Genetic Algorithm

GA is one of search algorithms based on the mechanics of natural selection and natural genetics. It is not a gradient search technique. It combines survival of the fittest among string structures with a structured yet randomized information exchange to form a search algorithm with some of the innovative flair of human problem solving. An occasional new part is tried for good measure. While randomized, GAs are no simple random walk. They efficiently exploit historical information to speculate on new search points with expected improved performance.

GAs have traditionally three operations to abstract and rigorously explain the adaptive process of natural systems as follows: (1) selection operation, (2) crossover operation, (3) mutation operation. Figure 4 shows a flow chart of the GA. The selection process is an operation to select the survivals in a set of candidate strings. In this process, the fitness value is calculated for each candidate string by using the fitness function which depends on a goal for searching problems. According to the fitness value, the selection rate is determined for the present candidate strings, and the survival is selected in any rate depending on the selection rate. The crossover process is a reform operation for the survival candidates. In natural system, a set of creatures creates a new set of the next

454

generation by crossing among the creatures. In the same way, the crossover process is performed by exchanging pieces of strings using information of old strings. The pieces are crossed in couples of strings selected randomly. The mutation process is held to escape the local minima in search space in the artificial genetic approach. The calculation is stopped when the generation is up.

In order to improve the performance of the GA, some methods have been proposed. such as steady state genetic algorithm (SSGA) [38], parallel genetic algorithm [39], and so on [40-43]. Virus evolutionary genetic algorithm [13, 36, 37, 46-48] is one of the improved genetic algorithm and which is applied for path planning, trajectory planning, tuning of fuzzy controller and so on.

Xl~X2

X2

Xl (a) Signal transformation at a neuron using a nonlinear function (i.e. sigmoid function)

;> ?

III

XI

(b) Classification using plural neurons

High

. . . ' . ........ ; ... . . ; .... .. ... , .... .... , ........ . · .. . . · .. . . · .. . . · .. . . · . . . . I I I • ,

Lovv~~· ====~· ... ·~==~·~ .. ~·~~~ quantity

(c) Signal transformation from a set of numerals to a set of symbols

Fig. 3. Classification by neural network with nonlinear functions

455

Fig. 4. Flow chart of the genetic algorithm

3. Integration and Synthesis of Neural Network, Fuzzy Logic, and Genetic Algorithms

As described in the previous section, the AI, fuzzy logic and neural network have similar performance with respect to signal transformation, though methods of them are different. Each method has merits and demerits. Table 1 is the comparison of them. To overcome their demerits, some integration and synthesis techniques of them and GA have been proposed. This section explain these techniques which are indispensable to construct the hierarchical intelligent control architecture.

The fuzzy logic and the neural networks can be used as preprocessors of the AI. They transform numerical data set to symbolic data set. To give the rules for transformation, human operators easily determine rules of the fuzzy logic. However, when the number of input parameters increases, determination of the rules becomes laborious for the human operators. In this case, the neural networks are useful. While showing data sets of input/output to the neural network, it learns them and works as a transformative function. Drawbacks of the neural network are that the human operator can not give their knowledge beforehand nor understand the acquired rules. Moreover, the convergence of the learning is very slow and the neural network can not learn new patterns incrementally. To solve those problems, some kind of structured neural networks are investigated.

456

The fuzzy neural network is a combined neural network with the fuzzy logic. Figure 5 shows an example of the fuzzy neural networks. Human operators are able to give their knowledge in the fuzzy neural network by means of membership functions. The membership functions are modified through learning process as fine tuning. After the learning, the human operators can understand the acquired rules in the network. With respect to the convergence of the learning, the fuzzy neural network is faster than the conventional neural network. For multiple input parameters, the hierarchical fuzzy neural network is available [4, 11]. However, it is difficult to optimize the structure of the hierarchical fuzzy neural network.

On the other hand, the fuzzy logic is used as a critic for improvement of convergence of learning of the neural network [7]. In this case, the fuzzy logic determines the learning step depending on the state of convergence.

The neural network with radial basis functions is also the structured one [5, 12]. It has potential to learn more quick and easier than the neural network with the sigmoid functions (Fig. 6) [5]. For incremental learning, the adaptive resonance theory (ART) model has been proposed as Fig. 7. It has a two-layered structure. It learns patterns one by one incrementally. That is, it can correct errors by learning new patterns without old patterns. However, the ART model has a problem of bad classification ability. For example, the ART model cannot classify the two patterns shown in Fig. 8, though the RBF neural network can. The Neural network based on distance between patterns (NDP), shown in Fig. 9, has the abilities of incremental learning and classification [5, 18, 19]. The NDP learns categories of patterns one by one. It increases neurons of the output layer using the incremental learning algorithm. It uses the radial basis function at the output layer. Therefore, it can classify the patterns shown in Fig. 8. Depending on aims, human operators should give the neural network efficient structure if they have experiences. Or else, heuristic approach for structure optimization is necessary.

The GA is a powerful tool for structure optimization of the fuzzy logic and the neural networks (Figs. 10-12) [9, 13, 14, 20, 43, 44]. Particularly, the GA is powerful to optimize the hierarchical fuzzy neural network [20, 43]. The GA can optimize the hierarchical structure of fuzzy, neural networks or, fuzzy-neuro system, but also membership functions and weights of neural networks.

On the other hand, the fuzzy logic and the neural network can be a evaluation function for the GA [10]. It is difficult to define evaluation functions for complex optimization problems. However, while using the fuzzy logic or the neural network, human operators can transfer their criterion. Those are the complicated reinforce learning technique because they do not use teaching signal but obtain desirable states while manipulating a lot of parameters at the same time. The Genetic Programming which is one of applications of the GA and manipulates symbols can produce new rules or knowledge for the AI [15].

457

XI

y

X2

(a) Configuration of fuzzy neural network

~ Ail : Ai2 : Au : AI4 : Ais · . . . · . . . · . . . · . . . · . . . · . . . .. . . . · . . . · . . .

ail ;1Ii2 ;1Ii3 ;1Ii4 ;IIiS Xi

· . . . · . . .

~ · . . . · . . · . . · . . . · . . . .. . .. . · . . Modified membership functions tbrough learning

Xi

(b) Gaussian basis functions for membership functions in the fuzzy neural network

Numerals

(c) Signal transformation from sets of numerals to sets of symbols or numerals

Fig. 5. Fuzzy neural network

458

Xl>®- X2 o X2

Xl (a) Signal transformation at a neuron using a nonlinear function (i.e. radial basis function)

Xl (b) Classification using plural neurons

Rig

.................... .. : ............. i .......... .................. ..

quantity

(c) Signal transformation from a set of numerals to a set of symbols

Fig. 6. Classification by a neural network with radial basis functions as a structured neural network

459

+ Output

Output Layer OJ

Wji

o Linear Function

Input Layer

Ii

Fig. 7. Structure of adaptive resonance theory model

Pattern A

Pattern B

Fig. 8. Two kinds of patterns which distribute radially

•• Increment of neurons

11 Input ® Radial Basis Function

o Linear Function

[::J Category A

c:::J Category B

[::J Category C

c=J Category D

Pattern

Output Layer

Input Layer

Fig. 9. Structure of neural network based on distance between patterns

460

GA

Heuristic Modification

Neural Network Fuzzy Fuzzy Neuro

Apply

Fig. 10. Structure optimization of fuzzy or neural network by,genetic algorithm

Membership Functions

tIX\ Modified

byGA

Fig. 11. Structure optimization and learning of fuzzy logic by genetic algorithm

GA

Evaluation

Fig. 12. Structure optimization and learning of neural network by genetic algorithm

461

Table 1. Comparison of neural network, fuzzy logic, AI, and genetic algorithm (GA)

O Goodor suitable

Math

Model

Control 0 Theory

Neural X Network

Fuzzy (J AI ~ GA X

O . ~ . A Needs some other X Unsuitable or . aIr ~ knowledge or techniques does not require

Learning Operator Real Knowledge Nonlinearity Optimization

Data Knowledge Time Representation

X ~ 0 X X X 0 X 0 X 0 (]

X 0 0 ~ 0 X X 0 X 0 ~ X 0 X ~ X 0 0

4. Behavior Acquisition by Reinforcement Learning

In this section, we describe the behavior acquisition scheme based on a reinforcement learning [21, 22, 45]. A reinforcement learning [23, 24] is an unsupervised learning method to learn control policies, or skills only from a scalar performance index without any explicit teacher which shows how to control a system at each moment. When we humans and animals perform complex motions such as walking, they can be divided into fundamental units of actions. For instance, walking can be divided into motions such as "an action to sustain the body", "an action to stretch out a free leg", "an action to swing arms", "an action to control pitching, rolling and yawing", "an action to find obstacles", and so on. In this study, a 'behavior' means such a fundamental unit of an action function. After these required behaviors are learned well, a complex motion "walking" can be realized with a sequential and/or parallel combination of them. Figure 13 shows an essential structure of a motion learning agent which is consisting of functions of three levels: a planner at the highest level, behavior agents in the middle level, and a sensor data integrator and an action combinator in the lowest level. Each function can be described as follows:

[Planner]: Planner is a high level controller that plans how to activate sequentially or simultaneously which behaviors at which states by which parameters considering coordination of their actions. It receives the state information and sends activation signals and behavioral parameters to behavior agents and combination orders to a combinator.

462

[Behavior agents]: Each behavior agent receives some behavioral parameters and some state variables as its inputs, and it outputs some control variables. The behavioral parameters tell how it should behave. For example, when we walk, we can change strides. In this case, a length of a stride can be a behavioral parameter. Each behavior agent may have different sets of inputs and outputs. Generally, inputs are abstract state variables, which are integrated from raw sensor data, and actions are abstract control variables, which will be combined and translated into direct actuator inputs. [Integrator]: Integrator provides state variables in useful forms by integrating raw information from sensors. [Combinator]: Combinator combines actions from behavior agents according to the planner's order and yields control inputs for actuators.

The behavior agent which we propose has a structure shown in Fig. 14. Here params are behavioral parameters given by a planner and states are integrated state variables provided by an integrator. Utility networks represent utility functions (Q-values in the Q-Iearning) by connectionist networks and the policy determines the best actions according to the utility values. The reinforcement function is assumed to be given by a human or a planner.

The goal of learning is to acquire the behavior to reactively output optimal actions according to the parameters and the states. All of the parameters, states and actions are real values, and this scheme can be implemented to dynamic motion learning of real robots [21, 22].

task

Fig. 13. Structure ora motion learning agent

463

params p

Behavior agent

states x actions u(p,x,t)

Fig. 14. Structure of each behavior agent

5. Hierarchical Intelligent Control

The hierarchical intelligent control scheme comprises three levels: a learning level, a skill level, and an adaptation level as shown in Fig. 15 [4, 8]. Therefore, there are three feed-back loops. The learning level is based on the expert system for a reasoning mechanism and has a hierarchical structure: recognition and planning to develop control strategies. The recognition level uses neural networks and fuzzy logic combined with the neural network as nodes of a decision tree. In the case of the neural network, inputs are numeric quantity sensed by some sensors, while outputs are symbolic quality which indicates process states. The structured neural network for incremental learning is effective to memorize new patterns [5]. In the case of the fuzzy neural network, inputs and outputs are numeric quantities and the fuzzy neural network clusters input signals by using membership functions. That is, the fuzzy neural network transforms numerical quantity into symbolic quality by using membership functions. Both the neural network and the fuzzy neural network are trained with the training data sets of a priori knowledge obtained from human experts. As a result, the neural network and the fuzzy neural network can transform various sensed data from numerical quantities to symbolic qualities, and perform sensor fusion and production of meta-knowledge at the learning level. The important information is sensed actively on using the knowledge base. The sensors of vision, weight, force, touch, acoustic, and others can be used as nodes of decision tree for recognition of the environment.

Data Base

464

Goal

strategy+ quality (Recognition quality

~ ( Planning ) ~

tquality

Knowledge

Signal Transfonnation

I quality Base quality

quantity

~ Skill Model I--(Eyz7,y Neural ~ ... --......

quantity

c: ··· .. ijuaiiiitY····· .~ ................................................................. quaiiiiiY'" .2 10 r Neural Servo 15.. Controller as ~ ,

Plant I Robot Sensing

I

Fig. 15. Hierarchical intelligent control system

Human Heuristic Instruction Learning

~Evaluation. AI~ E~Z: ~GA

Then, the planning level reasons symbolically for strategic plans or schedules of robotic motion, such as task, path, trajectory, force, and other planning in conjunction with the knowledge base. The system can include another common sense for robotic motion. The GA optimizes control strategies for robotic motion heuristically [6, 15]. The GA also optimizes structures of neural network and fuzzy logic connecting each levels. Thus, the learning level reasons unknown facts from a-priori knowledge and sensory information. Then, the learning level produces control strategies for skill level and adaptation level in a feed-forward manner. Following the control strategy, the learning level selects initial data set for a servo controller at the adaptation level from a data base which maintains some gains and initial values of interconnection weights of the neural network in the servo controller. Moreover, the recent sensed information from the skill level and the adaptation level updates the learning level through long-term learning process with human instruction. Therefore, knowledge at the learning level is given by human operator in top-down manner and acquired by heuristics of the skill level and the adaptation level in bottom-up manner.

In the same task and different environments, it is necessary to change control references depending on the environment for the servo controller at the adaptation level. At the skill level, the fuzzy neural network is used for specific tasks following the control strategy produced at the learning level in order to generate appropriate control references. Input signals into the fuzzy neural network are numerical values sensed by some specific sensors and some symbols which indicate the control strategy produced at the learning level. Output of the fuzzy

465

neural network is the control reference for the servo controller at the adaptation level. This output is based on the skill extracted from human experts through learning training sets obtained from them. At the same moment, the fuzzy neural network clusters the input signals in the shape of membership functions. These membership functions are used as the symbolic information for the learning level.

In the adaptation level, a neural network in the servo controller adjusts control law to current status of dynamic process [7, 16]. Particularly, compensation for non-linearity of the system and uncertainties included in the environment must be dealt with by the neural network. Thus, the neural network in the adaptation process works more rapidly than that in the learning process. It is shown that the neural network-based controller, the Neural Servo Controller, is effective to the nonlinear dynamic control with uncertainties such as force control of a robotic manipulator. Eventually, the neural networks and the fuzzy neural networks connect neuromorphic control with symbolic control for hierarchical intelligent control while combining human skills.

The hierarchical intelligent control is applied not only to a single robot, but also to multi-agent robot system. If there is no interaction between robots, each robot has to work optimally for its purpose, so that the total task should be achieved optimally. That is, each robot should work selfishly. Or else conflicts among the robots might occur when using a public resource. The competition may cause collisions and deadlock states among the robots in a local area. In order to avoid competition, it is necessary for the robots to communicate and to coordinate among themselves. The coordination among the robots is as important as selfishness. The GAs are applied hierarchically to balance selfishness with coordination for efficient motion planning [6]. When multiple robots works independently as decentralized system, the learning capability of the robots is indispensable for evolution ofthe system [17].

As results, integration and synthesis of AI, Fuzzy Logic, Neural Network and GA are important for intelligent system, depending on their characteristics. Hierarchical intelligent control using these techniques is effective to control intelligent systems in robotics.

6. Hierarchical Trajectory Planning with Virus Evolutionary Genetic Algorithm

Trajectory planning is one of the most important and difficult tasks required to robot manipulators. To achieve a given task, the robot manipulator, in, general, performs the follow steps: 1) find obstacles (perception), 2) generate a collisionfree trajectory (decision making), and 3) trace the trajectory actually (action). In this section, we focus on the decision making and discuss how to generate an optimal trajectory of a manipulator.

466

6.1 Hierarchical Trajectory Planning

We have proposed a hierarchical trajectory planning method [47,48], which is composed of a position generator as local search and a trajectory generator as global search (Fig. 16). The position generator generates some intermediate positions of the manipulator between the given initial and final positions. A position is expressed by a set of joint angles of the manipulator. All intermediate positions are generated based on their before and after intermediate positions simultaneously. An intermediate position satisfied the aspiration level is sent to the trajectory generator as local information. An appropriate intermediate position is generated based on the distance between the manipulator and obstacles. To measure the distance between the manipulator and obstacles, we apply the concept of pseudo-potential [49].

The trajectory generator generates a collision-free trajectory combining some intermediate positions generated in the position generator. Here the intermediate positions of the best candidate solution are constraint for generating intermediate positions, that is, the position generator generates other intermediate positions based on the intermediate positions of the best candidate solution. Therefore, the hierarchical trajectory planning results in a co-optimization problem of the trajectory generation and the intermediate position generation.

Constraints

Local information

Trajectory generator

Position generator

Fig. 16. Hierarchical trajectory planning

467

6.2 Virus-Evolutionary Genetic Algorithm for Hierarchical Trajectory Planning

A virus-evolutionary genetic algorithm (VEGA) simulates the evolution with both horizontal propagation and vertical inheritance of genetic information (Fig.I7). The VEGA is composed of two populations, a host population and a virus population. Here the host and virus population are defined as a set of candidate solutions and a substring set of the host population, respectively. Genetic operators are performed between the host population. In the VEGA, virus infection operators are introduced into GA. The VEGA has two virus infection operators as follows:

• Reverse transcription operator: A virus overwrites its substring on the string of a host individual to evolve host population.

• Transduction operator: A virus takes out a substring from a host individual to evolve virus population.

In this way, the VEGA performs genetic operators and virus infection operators. The procedure of the VEGA is as follows,

Initialization repeat Selection Crossover Mutation Virus _infection Replacement until Termination _condition = True

end.

time t

Host individual

Virus individual

~met+lJ Fig.17 Virus-evolutionary genetic algorithm

468

Base Base

(a) Front view (b) Side view (c) Top view

Fig. 18. Collision-free trajectory

The objective of trajectory planning is to generate a trajectory realizing minimum distance from the initial point to the final point and farther from the obstacles. To achieve the objective, we use the fitness function as follows,

fitness = wdp + wdd + wJf fr 2 + W4 max po/ + wssumpo/

(1)

where WI,··, Ws are weight coefficients. The first and second terms denote the sum of squares of the distance and joint angle, respectively. The third term makes each joint be within an available range. The fourth term denotes the maximum value in the pseudo-potential values on the sampling points. The last term denotes the sum of pseudo-potential values on all sampling points.

Next, we show an example of simulation results. Fig. 18 shows a collision-free trajectory of a 7 DOF manipulator acquired by the hierarchical trajectory planning with the VEGA. In this figure, the side view shows that each manipulator avoids colliding with the obstacles. The manipulator achieves the final position without colliding with obstacles and the obtained trajectories are farther away from obstacles.

7. Group Robotic System by Self-Recognition

In recent years, the research on decentralized autonomous robotic systems has been becoming active, because of the expansion of the robotic application fields and the development of computer devices with low cost. The decentralized autonomous systems are regarded as an approach method to organize complex

469

systems with a number of robots. Additionally, the decentralized robotic systems have the ability of the flexibility, redundancy, extendibility, reconfigurablity and so on. Therefore, in laboratories, industrial facilities and so on, many researches on the decentralized robotic systems have been carried out [25-28]. On the other hand, we proposed the Cellular Robotic System (CEBOT) as one of the decentralized robotic systems [29-32]. The CEBOT is an autonomous distributed robotic system (DARS) composed of a number of functional robotic units called "cells. " The CEBOT has three characteristics, as follows. The first characteristic is that it is regarded as an autonomous distributed robotic system that can have an optimaVsuboptimal configuration composed by many cells. The optimal/suboptimal configuration is organized dynamically by docking and detaching of cells with cooperation. The second one is that it is considered as a self-organizing robotic system that can reconfigure dynamically to carry out given tasks orderly depending on its environment and its given tasks. The third one refers to a "group robotic system" composed of many cells. The group robotic system generates and evolves its behavior with coordination and cooperation among cells.

r ........................ ( Robot [) ............................ ..

Self -Recognition

Recognition of ----I Environment

Decision Making

.......................................................................... : · .. ·1

::" ..................... ( Robot 2 ) ................... :. - _ =---..... -t ...... +-... ~

;,.............................................................. .. .. !

• • • : ....................... ( Robot i :" .................... :. : +-..... - ~ :....I--+ .... ~ ...

i--... ~ ... .. .................................................................

Fig. 19. Concept of self-recognition

470

(a) Initial condition

(b) Group behavior without self-recognition

(c) Improved group behavior with self-recognition

Fig. 20. Self-organization of group behavior with multiple robots

471

The group behavior is carried out by the cooperation between autonomous robots. For the problem of cooperation between autonomous robots, we consider that each robot needs to recognize that the robot itself carries out tasks in the group as a part of the group. So, we proposed a concept of the "self-recognition" for the decision making of the group behavior as a basic strategy of group robotic systems [33]. Figure 19 shows the concept of self-recognition.

Figure 20 represents results of group behavior that has been organized by 120 autonomous robots. The task of each robot is to go out the room through two exits. Figure 20(a) shows the initial condition ofthese robots, where circles represent the robots. Figure 20(b) shows a group behavior where the robots behave selfishly or without "self-recognition." Figure 20(c) illustrates an improved group behavior, where optimal or feasible parameters are provided to the robots. The optimal or feasible parameters have been obtained by learning and adaptation with "selfrecognition" of individual robot to optimize the group behavior. The selfrecognition mentioned above can be considered as mutual understanding. From the simulation results, we can see that the learning and adaptation with self-recognition have an effect on the group behavior.

8. Group Robotic System Based on Immune Network

This section deals with a control architecture to organize the group behavior of the DARS based on the self-organizing mechanism of the natural system, especially on the control architecture of the population organization. Population organization works well for the execution of the tasks under the dynamical changing environment [50, 51].

8.1 Recognition of the Global State of the System Based on Interaction

In general, the information of global state of the system is required to control the system. Controlling the DARS also needs its information. Conventional centralized control system is easy to get the global state information because it is given by the supervisor. In the other hand, DARS needs to get the global state information via interaction between robots. In this case, the following two steps are required, (1) implementation of the Segment Information of Global State of the System (SIGS) into interaction between robots, and (2) recognition of the global state of the system based on SIGS.

8.2 Immunological Interaction

When each robot has different N type strategies, global strategy can be represented by the population balance of each strategy. Each robot must select its strategy to organize optimal global strategy. The problem is that each robot has to decide its strategy locally. Our proposed architecture can organize an optimal global strategy

472

while each robot decides its strategy locally by the immune network based architecture.

According to the Jerne's idiotype network theory [52-54], immune system is a huge network system organized B-cell's (=B-Iymphocyte) interaction. Each B-cell recognizes others as antigen, thus the immune system will recognize its own antibodies as if they were foreigners and make antibodies against them. An B-cell is stimulated when it recognizes other types, and suppressed if it is recognized. The stimulated B-cell produces their clones, and suppressed one goes dead. This interaction effects a huge number of B-cells and builds network which is called idiotype network. In the idiotype network, the immune system operates on a steady state in absence of antigen; antigen simply causes a perturbation away from this steady state. Upon termination of the response to the antigen, the system would then return to a steady state, or to a new steady state. When B I-cell is stimulated by foreign antigen, it produces its clones. As the result, the next B2-cell is also stimulated and produces its clones. And this reaction influences all network agent, and new network structure is organized. Immune network can produce necessary cells on necessary time and necessary population by this network structure. In the immune network based DARS, an agent is coupled with two-way interaction, stimulation and suppression. Individual agent gets global information through network. And disturbance of network works for a trigger to organize popUlation balance of agents. Each robot has their own internal network and they are coupled with external network. Each robot acquires other's internal state, and environment's information through external network into its internal network, and then internal network decides its behavior strategy after generating global information of field. This global information is generated local interaction between robots, and it works for restriction of strategy decision. After that, the robot feeds back its internal state to external network through the broadcast communication.

8.3 Population Organization Algorithm

8.3.1 Structure of Network

Figure 21 shows the network structure of the swarm robotic system. For the analogy between the immune system and the swarm robotic system, B-cell corresponds to robot, antigen to work, and clone cell to message antigens. There are different kinds of work A, B, ......... , and these works corresponds one-to-one to behavior strategy of robots.

In Fig. 21, if the robot can interact with work A or work B, it gets antigen information (=state of work at the time) from network. And the antigen information is brought to another robot via network. Focusing on the each robot, they acquire the antigen information via network, after that, (1) recognizes the global state of the system, and (2) decides the behavior strategy for itself to organize swarm behavior. And it also the staff member of the network, it sends the antigen information to network.

473

Interaction (Won A) Interaction (Work 8)

~en'A ~ _ ~ Anligen·8

KIt / \("-':'c r-~b/~

Network Output of Antigen Input of Antigen Information thmugh Information to

Network ,----Robot--___.

Network

Recognition of Global State of the System , Decision of Behavior Strategy for Group Behavior

Fig. 21 . Network structure of the swarm robotic system

8.3.2 Decision of Behavior Strategy based on Interaction

Figure 22 shows the behavior strategy decision algorithm of each robot and the relationship of interaction between robot i-I, i, i+ 1 and their environment. The strategy decision algorithm is composed of (1) Antigen Obtainer, (2) Antigen Hangar, (3) Message-Antigen Generator, (4) Message-Antigen Sender, (5) Global State Recognitioner and (6) Behavior Strategy Decisioner. From (1) to (4) is related to interaction with other robots, and we call them "Interaction Part". And (5) to (6) is related to decision of behavior strategy, and we call them "Strategy Decision Part" .

I Strategy DeCISion] Part Interaction Part J \ Robot[iJ I

Behavior Strategy Mes,wge-AntigelOoj ,\ObO'Ii+Jl Decisioner Sender J t

II Global State I Message-Antigen Recognitioner Generatar l Me.f,wge AllIigen ) t

I Antigen Hangar I t t

Environment .I A1IIigen ObtainerU, A1IIigen Obtainer (robot-robot) (robot-envirclllme1lli /

(Work)

I Antigen I

Fig. 22. Immune type interaction algorithm

474

Robot i

"Acquisition of Antigen Infomation "Broadcast of Antigen Infomation

"Recognition of Global State of the System

"Decision of Behavior Strategy

No

Fig. 23. Flowchart of the interaction algorithm

As shown in Fig. 22, in the part of the interaction between a robot and an antigen, "antigen (which describes later)" is exchanged. And in the part of the interaction between robots, "message antigen (which describes later)" is exchanged. Figure 23 show the flowchart for deciding behavior strategy of each robot. Step T is strategy changing step. The followings describe this algorithm step by step.

(1) Antigen Obtainer In the part of the "antigen obtainer", each robot acquires antigen by robot-antigen interaction, and message-antigen by robot-robot interaction. Antigen corresponds to the work. Robot acquires the "antigen information," when it succeeds to execute work. It is composed of work information (=antigen) and achievement ratio of the work (=antigen density). "Message antigen" is the information to give and take between robots, and it is composed of the sender ID, update number of antigen information, and antigen list (the list of kind and density of antigen). Each robot acquires the message antigen, which is sent by another robots, by implicit communication.

Work i is composed of time tti (sec) and work amount Wi. And work amount per unit time Di is defined by eq. (2).

W D· =-' (2) , ttj

475

The antigen density Di is the standardized value of the achievement ratio of the work i by eq. (3).

G;(t) = 1 +a (3)

( -u.(t)) l+exp T

Di = 0 represents the ARW (achievement rate of work) is 100%, Di < 0 represents the overwork, and Di > 0 represents the lack of the work. And ui is defined by eq. (4).

u;(t) = r;(t)-l.0 (4)

However, ri(t) represents the work achievement rate (the achievement rate of the work i at time t). pet) is the work amount which is achieved from time 0 to t.

p(t) r; (t) = D; (t) . t

(2) Antigen Hangar

(5)

The antigen hangar houses the antigen information which was acquired from antigen obtainer. Antigen hangar houses antigen information to antigen information table (which is composed of antigen kind, acquired number and antigen density). If antigen information list has an antigen which kind agrees with the kind of the input antigen from the antigen obtainer, its update number Ci and average antigen density (Gi) is updated by eqs. (6) and (7).

-G G;C;+G; . ~ --=--=----"-I C; +1

C; ~C; +1

(6)

(7)

However, Gi represents the antigen density of antigen i which is acquired at antigen obtainer, Ci represents the acquired number of antigen i, and Gi represents the average density per one antigen i.

(3) Global State Recognitioner The global recoginitioner recognizes the global state of the system from the

difference of the work achievement rate. It is represented as the difference of average density of the antigen. The antigen information table of the antigen hangar is sorted from high density to low density.

476

(4) Behavior Strategy Decisioner In the behavior strategy decisioner, the behavior strategy of the robot is

decided by the following sequence. (a) Decision of the antigen which density is highest, and which is called

Gwinner, (b) set of the strategy corresponds to Gwinner, and which is called Swinner, (c) reference of the antigen in the antigen list corresponds to Swinner, (d) decision of the probability of strategy changing, which is called Q, (e) selection of the strategy, Swinner by probability Q and self-strategy by probability l-Q.

Winner antigen is antigen i and its strategy is strategy i, and self strategy is i+2 and its corresponding antigen is antigen i+2. Then, robot selects from Swinner and self-strategy i+2 by strategy changing probability Q. Strategy changing probability Q is defined by eq. (8). However d represents the absolute value of the difference of the antigen density of the winner antigen and the antigen corresponding to selfstrategy, and which is defined by eq. (9).

Q=d 2 x 100(%)

d = IG winner - G self I

9. Network Robotic System

(8)

(9)

Recently, the communication technology is improved very quickly and now a computer system can communicate with others via network with large amount of data in very high speed.

Multi-media based communication system is also developed as one of the communication technology. Multimedia technology can integrate several different media, such as, high quality image, real time color image, sound, text data, control signal and so on. With the high speed communication technology, we can realize interactive communication between the different places.

Applying theses technologies, we can connect the robotic system in which is long way from operator or the intelligent control units. Therefore we can distribute functional parts of the intelligent robotic system such as the intelligent manmachine interface units, sensory systems, knowledge database, information processing units, and actuators. Thus the working robot system dose not need high intelligent processing units in its body.

The tele-manipulation control scheme [35] was developed to connect the manmachine interface part and sensor-actuator system (i.e. the robot system) with good manipulation.

As the example of network robotic system, we show the multimedia based telesurgery system. Figure 24 shows the concept of the multimedia tele-surgery. Operator can get several information through the high speed optical fiber network. The number of the specialized doctors tends to decrease. So, by using the multimedia network, we can exchange the information about the patients. The virtual environment will support collaborative work between several doctors in the

477

different places. The idea of the distributed virtual environment for concurrent engineering has been proposed before [34]. The distributed virtual environment can be applied for the medical matters. This idea will leads to the world wide education system, and proper decision making will be possible based on the global communication. Figure 25 shows the concept of the multimedia based medical network to realize tele-pathology and tele-surgery (tele-medicine).

I~II_ VisualInfonnation ~ Force Feedback

cwn~~ Blood Vessel Catheter

Fig. 24. Concept of Multimedia Tele-surgery

/. ~ '" ~ WMkr ~ W ....... , "-....~ /'" IWmk.,,"oo I

I ~ I

~ /..«mgoy"-.... ~ Data !Text

, Sound Skill Vision Knowledge

I W.,k ,,",00 ~ I ~ I f •• r " ";00

Work station

Fig. 25. Concept of Multimedia Based Medical Network

478

10. Conclusions

This paper described hierarchical intelligent control scheme, reinforcement learning scheme, and various intelligent robotic systems. Integration and synthesis techniques of AI, fuzzy, neural network, and GA make the robot system intelligent. The hierarchical control system has both top-down and bottom-up learning abilities while integrating and synthesizing those techniques. They give the robotic system the flexibility for their tasks their environments. The group behavior control scheme and the network robots also described. These scheme is very important for the multiple robotic system to control them autonomously and efficiently.

References

[1] T. Fukuda and T. Shibata, Theory and Applications for Neural Networks for Industrial Control Systems, IEEE Trans. on Industrial Electronics, Vol. 39, No.6, pp. 472-489 (1992)

[2] L. A. Zadeh, Fuzzy Sets, Information and Control, Vol. 8, pp. 228, (1965) [3] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning,

Addison Welsey (1989) [4] T. Shibata and T. Fukuda, Skill Based Control by using Fuzzy Neural Network for

Hierarchical Intelligent Control, Proc. of IJCNN'92 - Baltimore, Vol. 2, pp. 81-86 (1992)

[5] T. Fukuda, S. Shiotani, F. Arai, A New Neuron Model for Additional Learning, Proc. ofIJCNN92-Baltimore, Vol. 1, pp. 938-943 (1992)

[6] T. Shibata, T. Fukuda, K. Kosuge, F. Arai, Selfish and Coordinative Planning for Multiple Mobile Robots by Genetic Algorithm, Proc. of the 31st IEEE Conf. on Decision and Control, Tucson, Vol. 3, pp. 2686-2691 (1992)

[7] T. Fukuda, T. Shibata, M. Tokita, T. Mitsuoka, Neuromorphic Control - Adaptation and Learning, IEEE Trans. on Industrial Electronics, Vol. 39, No.6, pp. 497-503 (1992)

[8] T. Shibata and T. Fukuda, Hierarchical. Intelligent Control of Robotic Motion, Trans. on NN (1992)

[9] K. Shimojima, T. Fukuda, Y. Hasegawa, RBF-Fuzzy System with GA based Unsupervised/Supervised Learning Method, Proc. Fuzz-IEEElIFES'95, VoU, pp. 253-258 (1995)

[10] T. Shibata and T. Fukuda, Fuzzy Critic for Robotic Motion Planning by Genetic Algorithm in Hierarchical Intelligent Control, Proc. of IJCNN'93-Nagoya (1993)

[11] H. Ichihashi, Learning in Hierarchical Fuzzy Models by Conjugate Gradient Method using Backpropagation Errors, Proc. ofintelligent System Symp., pp. 235-240 (1991)

[12] T. Parisini, R. Zoppoli, Radial basis function and multilayered feedforward neural networks for optimal control of nonlinear stochastic systems, Proc. of Int'l Conf. on Neural Networks, pp. 1853-1858 (1993)

[13] K. Shimojima, N. Kubota, T. Fukuda, RBF Fuzzy Controller with Virus-Evolutionary Genetic Algorithm, Proc. of the Int'! Conf. on Neural Networks, Vol.2, pp. 1040-1043 (1996)

[14] C. L. Karr, E. J. Gentry, Fuzzy Control of pH Using Genetic Algorithm, IEEE Trans. on Fuzzy Systems, Vol. 1, No.1, pp. 46-53 (1993)

479

[15] J. Koza, Genetic Programming on the Programming of Computers by means of Natural Selection, MIT Press (1992)

[16] D. A Sofge (Ed.), Handbook of Intelligent Control - Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold (1992)

[17] T. Shibata and T. Fukuda, Coordinative Behavior by Genetic Algorithm and Fuzzy in Evolutionary Multi-Agent System, Proc. of IEEE Int'l Conf. on Robotics and Automation, Vol. 1, pp. 760-765 (1993)

[18] S. Shiotani, T. Fukuda, T. Shibata, Recognition System by Neural Network for Incremental Learning, Proc. of the IEEFJRSJ Int'l Conf. on Intelligent Robotics and Systems, pp. 1729-1735 (1993)

[19] S. Shiotani, T. Fukuda, T. Shibata, A Neural Network Architecture for Incremental Learning, Neurocomputing, Vo1.9, No.2, Elsevier, pp. 111-130 (1995)

[20] T. Fukuda, Y. Hasegawa, K. Shimojima, Structure Organization of Hierarchical Fuzzy Model using by Genetic Algorithm, Proc. Fuzz-IEEElIFES'95, YoU, pp. 295-299 (1995)

[21] F. Saitoo, T. Fukuda, Learning Architecture for Real Robotics System-Extension of Connectionist Q-Learning for Continuous Robot Control Domain, Proc. of Int'l Conf. on Robotics and Automation, YoU, pp. 27-32, (1994)

[22] F. Saitoo, T. Fukuda, Two-Link-Robot Brachiation with Connectionist Q-Learning, Proc. of 3rd Int'l Conf. on Adaptive Behavior (From Animals to Animats 3), pp. 309-314, (1994)

[23] J. H. Connell, S. Mahadevan, Robot Learning, Kluwer Academic Publishers, (1993) [24] A.G. Barto, R. S. Sutton, C. W. Anderson, Neurolike adaptive elements that can solve

difficult learning control problems, IEEE Trans. on Systems, Man, and Cybernetics, SMC-13(5), pp. 834-846, (1983)

[25] F.R. Noreils, AA Recherche, and R. de Nozay, An Architecture for Cooperative and Autonomous Mobile Robots, Int'l Conf. on Robotics and Automation, pp2703-271O (1992)

[26] M. K. Habib, H. Asama, Y. Isida, A Matsumoto and I. Endo., Simulation Environment for An Autonomous and Decentralized Multi-Agent Robot System, Int'l Conf. on Intelligent Robots and Systems, pp. 1550-1557 (1992)

[27] S. Yuta and S. Premvti., Coordinating Autonomous and Centralized Decision Making to Achive Cooperative Behaviors Between Multiple Mobile Robots, Int'l Conf. on Intelligent Robots and Systems, pp. 1566-1574 (1992)

[28] R.A. Brooks, A Robust Layered Control System for a Mobile Robot, IEEE J. of Robotics and Automation, RA-2, April, 14-23 (1986)

[29] T. Fukuda, S. Nakagawa, Y. Kawauchi, M. Buss, Structure Decision Method for Self Organizing Robots based on Cell Structure-CEBOT, Proc. 1989 IEEE Int'l Conf. on Robotics and Automation, Vo1.2, pp. 695-700, (1989)

[30] Y. Kawauchi, M. Inaba, T. Fukuda, A Principle of Distributed Decision Making of Cellular Robotic System(CEBOT), Proc. 1993 IEEE Int'l Conf. on Robotics and Automation Vol.3, pp. 833-838 (1993)

[31] T. Ueyama, T. Fukuda, Self-Organization of Cellular Robotic using Random Walk with Simple Rules, Proc. 1993 IEEE Int'l Conf. on Robotics and Automation Vol.3, pp.595-600(1993)

[32] T. Fukuda, T. Ueyama, Cellular Robotic and Micro Robotic Systems, World Scientific Series in Robotics and Automated Systems - VoUO, World Scientific (1994)

[33] T. Fukuda, G. Iritani, T. Ueyama, F. Arai, Optimization of Group Behavior on Cellular Robotic System in Dynamic Environment, 1994 IEEE Int'I Conf. on Robotics and Automation, Vo1.2, pp. 1027-1032 (1994)

480

[34] A. Bejczy, G. Bekey, R. Taylor, and S. Rovetta, A Research Methodology for Telesurgery with Time Delays, The First Int'! Symp. on Medical Robotics and Computer Assisted Surgery (1994)

[35] K. Kosuge, T. Itoh, T. Fukuda, Telemanipulation System Based on Task-Oriented Virtual Tool, IEEE Int'I Conf. on Robotics and Automation, pp. 351-356 (1995)

[36] N. Kubota, T. Fukuda, K. Shimojima, Virus-Evolutionary Genetic Algorithm for SelfOrganizing manufacturing System, Computer & Industrial Engineering 1., Vo1.30, No.2 (1996)

[37] N. Kubota, K. Shimojima, T. Fukuda, The Role of Virus Infection in VirusEvolutionary Genetic Algorithm, Proc. of 1996 IEEE Int'l Conf. on Evolutionary Computation, pp. 182-187 (1996)

[38] G. Syswerda, A Study Reproduetion in Generational and Steady-State Genetic Algorithms, Foundations of Genetic Algorithms, Morgan Kaufmann, pp. 94-101 (1991)

[39] H. Muhlenbein, M. Schomisch, 1. Born, The parallel Genetic Algorithm as Function Optimizer, Proc. of the Fourth Int'l Conf. on Genetic Algorithms, pp. 271-278 (1991)

[40] D. Whitley, The GENITOR Algorithm and Selection Pressure: Why Rank-Based Allocation of Reproduction is Best, The Third Int'l Conf. on Genetic Algorithms, pp. 110-115 (1989)

[41] N. Kubota, T. Fukuda, F. Arai, K. Shimojima, Genetic Algorithm with Age Structure and Its Application to Self-Organizing Manufacturing System, Proc. of 1994 IEEE Symp. on Emerging Technologies and Factory Automation, (ETFA'94), pp. 472-477 (1994)

[42] M. Bramlette, Initialization, Mutation and Selection Methods in Genetic Algorithms for Function Optimization, The Fourth Int'l Conf. on Genetic Algorithms, pp. 100-107 (1991)

[43] T. Furuhashi, K. Nakaoka, Y. Uchikawa, An Efficient Finding of Fuzzy Rules Using a New Approach to Genetic Based Machine Learning, Proc. of Fuzz-IEEElIFES'95, pp.715-722 (1995)

[44] K. Shimojima, Y. Hasegawa, T. Fukuda, Unsupervised/Supervised Learning for RBFFuzzy System-Adaptive Rules, Membership Functions and Hierarchical Structure by Genetic Algorithm, Lecture Notes in Artificial Intelligence lOll, Subseries of Lecture Notes in Computer Science, Advances in Fuzzy Logic, Neural Networks and Genetic Algorithms, Springer, pp.127-147 (1995)

[45] T. Fukuda, Y. Hasegawa, K. Shimojima, F. Saito, Self Scaling Reinforcement Learning for Fuzzy Logic Controller, Proc. of 1996 IEEE Int'l Conf. on Evolutionary Computation, pp. 247-252 (1996)

[46] N. Kubota, K. Shimojima, T. Fukuda, Virus-Evolutionary Genetic Algorithm -Ecological Model on Planar Grid, IEEE Proceedings of the Biennial Conference of the North American Fuzzy Information Processing Society (NAFIPS'96), pp. 505-509 (1996)

[47] N. Kubota, T. Fukuda, K. Shimojima, Trajectory Planning of Redundant Manipulator Using Virus-Evolutionary Genetic Algorithm, IEEE-SMC Proceedings of the Symposium on Robotics and Cybernetics - Computational Engineering in Systems Applications (CESA'96), pp. 728-733 (1996)

[48] N. Kubota, T. Fukuda, K. Shimojima, Trajectory Planning of Reconfigurable Redundant Manipulator Using Virus-Evolutionary Genetic Algorithm, Proc. of The 22nd International Conference on Industrial Electronics, Control, and Instrumentation, pp. 836-841 (1996)

481

[49] O. Khatib, Real-Time Obstacle Avoidance for Manipulators and Mobile Robots, Robotics and Research, Vo1.5, No.1, pp. 90-98 (1986)

[50] N. Mitsumoto, T. Fukuda, K. Shimojima and A. Ogawa, "Micro Autonomous Robotic System and Biologically Inspired Immune Swarm Strategy as a Multi Agent Robotic System," Proc. of IEEE Int'l Conference on Robotics and Automation (ICRA'95), Vo1.2, pp. 2187-2192 (1995)

[51] N. Mitsumoto, T. Fukuda and F. Arai, "Self-organizing Multiple Robotic System (A Population Control through Biologically Inspired Immune Network Architecture)," Proc. of the 1996 IEEE Int'l Conf. on Robotics and Automation (ICRA'96) Vo1.2, pp. 1614-1619 (1996)

[52] V. Roitt, "Essential Immunology", pp.1-152, Blackwell Scientific Publications (1992) [53] H. J. Bremermann, "Self-Organization in Evolution, Immune System, Economics,

Neural Nets and Brains", On Self-Organization, Springer Series in Synergetics 61, Springer-Verlag, pp. 5-34 (1993)

[54] J. D. Farmer, S. A. Kauffman, N. H. Packard and A. S. Perelson, "Adaptive Dynamic Networks as Model for the Immune System and Autocatalytic Sets", Perspectives in Biological Dynamics and Theoretical Medicine, Annals of the New York Academy of Science, Vol. 504, pp.118-131 (1987)

Hardware and Software Architectures for Soft Computing

Rinaldo Poluzzi

Corporate Advanced System Architectures (C.A.S.A), SGS-THOMSON Microelectronics, Via Olivetti 2, 20041 Agrate Brianza (Mi) Italy e-mail [email protected]

Abstract.The paper aims to present well-described examples of VLSI-dedicated architectures for soft computing applications in terms of performances and internal functional structure. In addition, closely connected to the described hardware components, an example of a Neuro-Fuzzy software architecture able to perform a system identification process on available measurements, is presented with some reference applications.

1. Introduction

Soft computing conceived as a subject of information technology aImmg for interactive exploitation of fuzzy sets theory, neuronal nets and genetic algorithms, promises to have a strong impact on different applications fields, e.g. multimedia processing, process control, diagnostic and decision making systems.

In fact soft Ccomputing utilises the winning "genes" of the three methodologies as follows: - Possibility to control complex and uncertain systems

- To embed knowledge by fuzzy rules - Optimisation capability by natural like selection and mutation criteria

- Learning capability of complex functional relations.

The main research items today on Soft Computing can be synthetically summarised in the following:

l.The Neuro-Fuzzy networks able to synthesize in terms of "if-then" rules the model of non linear systems with promising results in the area of fuzzy clustering and classification and in the control.

2.Geneti~ algorithms as optimisation technique of complex fuzzy process in which the fuzzy sets must be identified in term of numbers and shape.

3.1n the field of genetic algorithms the implementation of a linguistic supervisor devoted to decide the mutation and crossover probability, the death rate of the genes population to improve drastically the best "candidate" identification process.


483

4.Neuronal nets as "fitness function" identifiers for genetic algorithms starting from indirect measurements for the quality of "phenotype" expression.

The present article is focused on the first research item i.e. Neuro-Fuzzy networks with the objective to describe hardware (hw) and software (sw) architectures for industrial applications of soft computing.

2. Hardware Architectures for Soft Computing

All the applications in the soft computing domain, from the process control (static or adaptive), to the decision making and pattern recognition systems can be expressed normally in term of "if-then"rules acting on suitable input/output fuzzy sets. Presently the implementation approach for these large sets of rules representing the solution to a specific problem, is based on pure sw using standard highllow levels languages Whenever the process to be implemented requires a large number of input/output variables and related rules with a very high speed system throughput (like for example in the robotic environment, electronic control, real-time signals classification), the pure sw approach using non-dedicated hw platforms should be not suitable for on-line applications.

To overcome the limits arising from a software implementation of large sets of rules, SGS-THOMSON has introduced a complete family of digital fuzzy coprocessors and fuzzy microcontrollers called W.A.R.P. (Weight Associative Rules Processor) precisely W.A.R.P. 1.1/1.2 Coprocessors and W.A.R.P. 3 Microcontroller Family.

The hardware implementation of a fuzzy set of rules allows to solve all the limitations related with the software implementation. It is possible to obtain an optimised code generation starting from a high level description language. In fact it is no more necessary to use complex constructs like "GO TO", "WHILE loop" or jump to subroutine. The designer must describe the control algorithm by using simple If-Then rules. Considering that the compiler is developed according to the internal structure of the fuzzy processor, the size of the memory used for the fuzzy algorithm is automatically optimised. The computational time is reduced to the minimum. These allow to reduce the internal memory of the fuzzy processor and to optimise the computational time. Working with a high level description language, the debugging of the fuzzy code is easily obtained.

Thanks to the high computational power of such kind of fuzzy processors, a real time control can be implemented. For example it will be possible to carry out predictive controls based on the system fuzzy modelization defining several sets of rules for different tasks: system modellization, system control, rules for monitoring and experts system implementations and rules for signal analysis and classification.

484

2.1 W.A.R.P. Architecture and Performances

To cover a wide range of applications, a complete set of fuzzy machines has been developed by SGS-Thomson. The family of the fuzzy machines called W.A.R.P. includes fuzzy coprocessors (W.A.R.P. 1.1 and W.A.R.P. 2) and fuzzy microcontrollers (W.A.R.P. 3 family).

In the first case, the fuzzy coprocessor structure is optimised to work with a standard microcontroller in order to implement complex algorithms for systems' control and signal processing. In this way the standard microcontroller shall perform normal control tasks while W.A.R.P. fuzzy coprocessors will be independently responsible for all the fuzzy related computing. The following table summarises the key features of these two products:

W.A.R.P.1.1 W.A.R.P.2

Up to 256 Rules (4 Antecedents, Up to 256 Rules (4 Antecedents, 1 Consequent) 1 Consequent)

Up to 16 Inputs Variables U~ to 8 In~uts Variables

Up to 16 Membership Functions for Input Variable Up to 16 Membership Functions for Input Variable

Membership Functions with any shape Triangular and Trapezoidal Membership Functions

Up to 16 Output Variables Up to 4 Output Variables

Max-Dot Inference Method Max-Dot Inference Method

Rules Computation Time 14 mSec for 256 Rules Rules Computation Time 200 mSec for 256 Rules

Clock fmguen9' 40 MHz Clock frequency40 MHz

WARP architecture is based on the storage of the information in two main memory sections: one for the membership functions (M.F.s) of the left side of the rules (antecedent memory blocks) and one for those connected to the right side of the rules (consequent memory block). In order to represent the membership functions connected to the fuzzy variables of the left side of the rules we adopted a vectorial representation of the membership functions based on 64 (26) or 128 (27) elements, each possessing 16 (24) truth levels. The utilisation of vectors for this phase of the fuzzy calculus has the great advantage that in the case of a controller, for each rule the data involved in the computing are one or more MF .. s (representing the knowledge of the system) and one or more crisp values (representing the input from the "external" world). With this data representation, in order to find the match level between the input and the stored M.F.s it is sufficient to get the various levels corresponding to the truth level of the element located by the projection of the input in the universe of discourse. Storing in succession all the values of a term set, representing the membership functions connected to left side of the rule, it is possible to retrieve all the value of a term set using the crisp input value for calculate the address in the fuzzy memory device utilised.

485

WARP

MEMORY MEMORY MEMORY MEMORY BLOCK I BLOCK 2 BLOCK 3 BLOCK 4 ADDRESS ADDRESS FUZZIFIER ADDRESS ADDRESS

DECODING DECODING DECODING DECODING UJGIC UJGIC UJGIC UJGIC

~ I

I ANTECEDENT II ANTECEDENT I I ANTECEDENT II ANTECEDENT I MEMORY MEMORY MEMORY MEMORY BLOCK 1 BLOCK 2 BLOCK J BLOCK 4

I REGISTER I I REGISTER I I REGISTER I I REGISTER I I I

FUZZY I THETA OPERATOR I INFERENCE I THETA TEMPORARY STORAGE I ENGINE I CONSEQUENT MEMORY BLOCK I

ADDRESS DECODING UNIT

I CONSEQUENT MEMORY BLOCK I

I ADDER I I ADDER I DEFUZZIFIER

I DIVIDER

I

Fig. 1. Global architecture of the W.A.R.P. 1.1 Coprocessor

The global architecture of the W.A.R.P. 1.1 Coprocessor is illustrated in Fig. 1. The Fuzzifier section is devoted to the calculus of the memory location and to the retrieving of the values. Inside the Memory Blocks, the data representing the M.F.s are stored according to below scheme. To obtain higher performances the memory device devoted to the M.F.s of the left side of the rules has been divided in 4 independent blocks. Each of these blocks contains all the value of one or more fuzzy variables, allowing the parallel retrieval of the values. The values founded are memorised in a set of devoted register and then opportunely processed in the fuzzy inference engine to calculate the value of each rule. The adoption of the vectorial data representation for the M.F. of the left side of the ules, allows this operation to be performed in an highly efficiently way via the Theta-operator (Tnormff-conorm operator). The values are used to calculate the address of the memory word, containing the values inside the consequent memory block where are stored the membership functions related to the fuzzy variables of the right side of the rules. The assembling of all the M.F.s comprising an output and the

486

defuzzification process are carried out in the defuzzifier block. In the case of WARP. the defuzzification adopted is the one defined "method of the centroyds". This particular architecture makes unfeasible the splitting of the memory devoted to the M.F.s of the right side of the rules as this would also require an increase of the number of defuzzification blocks. The four identical memory block of the antecedents memory are clearly visible, along with the microcode and consequent memory blocks. The WARP 2.0 Coprocessor is an option of WARP 1.1 with less processing speed and parallelisation designed for low cost applications.

3. Software Architectures for Soft Computing

In this section is presented in detail an example of a software architecture for soft computing and exactly a Neuro-Fuzzy system for automatic process modelling . The ouput of this system is a set of fuzzy If-Then rules fully compatible with above described hardware architectures achieving the goal to implement real time applications in the Control domain.

A Neuro-Fuzzy system utilises the learning capability typical of Neural Networks in order to find out the model of the process supplied as input, and uses the typical data structures of fuzzy logic. The necessity to implement such an approach comes out from the following considerations:

- Neural networks are powerful instruments for process identification and modeling. Their main disadvantage lies in their information storage methodology. As a matter of fact the information is distributed all over the connections of the network, and thus for the user is very difficult to operate on the stored knowledge base. - Fuzzy logic, thanks to the possibility to rely on a linguistic approach, represents a useful way to model complex and/or non-linear systems. The drawback lies in the relative difficulty in the definition of the optimal knowledge base.

The Neuro-Fuzzy system treats the information in a different way with respect to traditional neural net. The information handled by the tool is structured and it means that the activation functions and the connection weights between the elements of the networks have a precise significate according to the fuzzy logic theory. In fact they represents either a fuzzy operator or a fuzzy data structure as will be shown in the following paragraphs.

The learning phase supplies the user with a fuzzy logic based model of the process. Such a model overcomes the typical disadvantages of the traditional modeling approaches:

- Mathematical model: it is difficult to identify and it hardly handles the intrinsic imprecision of physical data and processes.

- Neural model: the process model is distributed all over the neural network connections. It means that the model is represented by the network itself, consequently it is difficult for the user to understand and manage it.

487

By stressing the advantages coming out from the combined use of fuzzy logic and neural networks, it is then possible to obtain a breakthrough in the modeling field, stepping up from the use of traditional models to the exploitation of fuzzy logic based ones.

Neuro-Fuzzy System The Neuro-Fuzzy system that has been implemented represents an approach to

automatic synthesis of processes based on fuzzy logic data structures.

The modelling of a problem in terms of fuzzy logic requires the definition of inferencing rules (conditional sentences in the form "if antecedent then consequent"), describing the correlation among the variables, and the tuning of each membership function associated to each fuzzy set.

Our system implements the following two steps procedure:

- Rules selection: identifies the effective number of rules necessary to describe the correlation between input and output variables

- Membership functions tuning: determines the shape of each membership function in order to achieve the best level of approximation.

This procedure has been realised by using two neural networks trained with couples of values for input-output variables called "patterns".

To perform the rules selection the use of fuzzy associative memory table has been introduced.

Such a table is built up starting from the number of fuzzy sets fixed by the user for each variable, and represents the collection of all the possible fuzzy rules. Purpose of the network is to choose, among them, those ones necessary and sufficient to describe the correlation among the variables expressed by the patterns.

To solve this problem we have implemented a clustering technique realised through a two levels neural network having the following structure:

- Input level: it has a number of units equal to the total number of variables and it is activated by using the patterns describing the system;

- Output level: it has a number of units equal to the total number of possible rules which are activated by using the squared distance between the input vector (current pattern) and the connection vector entering the unit.

The structure of the network is illustrated in Fig. 2. The net is completely connected. The weights are modified by using an unsupervised learning algorithm based on the competitive learning model. Each weights vector entering into an

output unit j: (WI ,j, W2,j. ... wm+n,j ), represents the position of the j-th centre of the clustering. During the learning phase the following error function is to be minimised:

E = L ±Hnput - W~ntre112 t E training set t

488

Rs_1 ,

Yn

Fig. 2. The structure of the neural network for clustering

Naming x as the current pattern and Wj as the j-th centre, the algorithm determines the centre nearest to the input pattern and modifies the corresponding weights according to the gradient descent method as follows:

LlWi,winner = -1l( - (Xi - Wi,winner)) = 1l( xi - Wi,winner)

At the end of the learning phase the centres are compared with the FAM table initially constructed and reclassified as fuzzy rules.

Once determined the fuzzy rules the second step of the modelling procedure get started. Purpose of this second phase is the gradual optimisation of the approximation level realised through a change in the shape and position of the membership function associated with each fuzzy set.

The stucture of the neural network performs that performs the fuzzy inference from the activation of the antecedents to the defuzzification of the output value is shown in Fig. 3. Such a net has four levels:

- input level: it has a number of units equal to the number of input variables.

- LO level: it represents the fuzzy sets defined for each input variable. Each input variable is associated to a group characterised by the fuzzy sets defined on it.

- Ll level: it is composed by a number of units equal to the number of rules selected by the execution of the previous phase

489

- L2 level: it is the output level and is composed of a number of units equal to the number of output variables.

--=----- L2 level

L2_weights

------- L 1 level

------- LO level

LO_weights

------- Input level

Fig. 3. The stucture of the neural network that performs the fuzzy inference

The connections between the different levels shown in Fig. 3 have the following meanings:

- LO level: each unit is characterised by a threshold corresponding to the centre of the represented membership function . The connections LO weights between the unit and LO level are the parameters that, together with the centre, completely characterise the membership function. This parameters depend on the shape of the adopted function. The possible choices are between triangular and gaussian shape.

- Lllevel: the connections Ll weights between LO level and Ll level can assume binary values only (0 or 1) and have the purpose to reproduce the selected fuzzy rules.

- L2 level: the values of the connections between Ll level ad L2 level represent the centres of the membership functions defined for the consequent variables.

The parameters that will be learned during the execution are: LO weights, LO threshold and L2 weights.

For a given input value, the neural network performs the fuzzy inference phase. In order to realise it, the following activation functions for the different levels have been defined:

490

- LO level: the activation function represents the membership degree of a value to a fuzzy set.

- Lllevel: the activation function represents the degree of activation of each rule, and corresponds to the application of one of the following fuzzy intersection operators:

min function, applied to the antecedents membership degree, with the disadvantage of being not derivable;

product function, applied to the antecedents membership degree, with the advantage of being derivable and the disadvantage of being very expensive during the learning phase.

- L2 level: the activation function for each output unit is a linear combination of the rules activation values. At this level the defuzzified values for the output variables is computed.

Naming the weights between i-th unit of L1 level and j-th unit of L2 level is as

bij, the activation vector for the units of level L1 as ,!!, the activation function for the j-th unit of the output level is:

i = 1..r, with r the number of rules

The error function at the output level is:

E =.!.lltarget-outputI12 2

with target the process value corresponding to the input and output the defuzzified value computed by the network. To modify the parameters connected to the membership function shape in order to minimise the error function, the traditional gradient descent learning algorithm based on the following rule has been applied:

em L\W=-T\aw

In the following the implemented learning algorithm is proposed. Defined:

()j (2) the backpropagation coefficient from j-th unit of level L2 to the unit of level L1

&/0) the backpropagation coefficient from the units oflevel L1 to ij unit of level LO

the learning algorithm is:

1) learning for bij (L2 weight)

491

_ ~(2) ai Abij --".Uj .~

.L.t ak k

2) learning fOrWij and Cij (LO weight and LO threshold respectively)

The value of O'ij (Cij ) and o'ij (wij )depends on the membership function adopted as follow:

ISOSCELES SCALENE GAUSSIAN

TRIANGLE TRIANGLE SHAPE

FUNCTION 1-{2Ixi - Cij VWij} Wij (X;-<!ij) exp(-(xi -cill Wij»

dOij/wij l-oi/wij Xi-Cij { (x;-<!il1wl}Oij

dOi/Wij 2sign IXi - Cij VWij} -Wij { (x;-<!ij)/whoij

Neuro-Fuzzy Module Neuro-Fuzzy Module is a graphic interface, running under Windows 3.1, which implements the neural network models described in the previous sections. Purpose of this tool is to provide an 'user-friendly interface guiding the user in all the steps to obtain a fuzzy system starting from a set of patterns.

This tool provides the capabilities to communicate with W.A.R.P. Software Development Tool, a development environment to program the fuzzy digital microcontroller W.A.R.P. For this purpose, an exporter translating a Neuro-Fuzzy project into a W.A.R.P.-SDT project file containing all the information about the membership function shapes and inferencing rules has been carried out. This step,

492

which implies a transformation from a continue model (Neuro-Fuzzy) to a discrete one (W.A.R.P.-SDT), is realised taking care to reproduce the input one in the best way possible. By using W.A.R.P.-SDT facilities it is then possible to build a fuzzy model in C or MATLAB environment. Defining the error as the difference between the original and the interpolated function, once the learning phase is over, the expert can reduce (and in the optimum case nullify) the error by introducing local rules deriving from his experience.

Figure 4 shows the linkage between Neuro-Fuzzy Module and W.A.R.P.-SDT tool.

Pattern File

CModel W.A.R.P. NEURO-FUZZV Exporter Software

MODULE Development MATLABM ode I Tool

Fig. 4. The linkage between Neuro-Fuzzy Module and W.A.R.P.-SDT tool

Other functionalities are provided in order to allow the simulation of the behaviour of the neural net starting either from a single pattern or from a file of patterns.

Example In this section a modelling example of a process described by a 4th order

function will be provided. After the definition of a first fuzzy model of the process, the obtained fuzzy system itself will be corrected by introducing some local rules. The example will be concluded by comparing the Neuro-Fuzzy model and that one obtained from the translation to W.A.R.P.-SDT tool.

The example is based on a process described by the following function:

y[k] = f(y[k-l], y[k-2], u[k-l], u[k-2]) (1)

where k is the discretized time,u and y the input and output variables.

493

{e-Y2 [k_1J+Y2[k-21 +~u2[k]+ u2[k -1]

f(.) = eu[kl + cos(a1t(y2[k]+ y2[k -1]))+ 2

1 + u2[k -1]+ u2[k -2]

k::;400 }

400::; k::; 2500

where ex. = 3.5 and u is defined as follow:

u[k] = sin(2 k / 250)

u[k] = 0.6

u[k] = 0.4

u[k] =-0.2

u[k] =-0.6

Variable parameters:

Ok 700 or 1800 k 2500

700 k 875

875 k 1050

1050 k 1400

140012 k 1800

I Variable II Minimum II Maximum I Range I Fuzz~ Sets

1 0,549705 4,011129 3,461424 5

2 0,001000 4,010431 4,009431 5

3 -0,999952 0,999985 1,999937 3

4 -0,999993 -0,999966 1,999959 3

I

Both the neural nets have been trained with a file of 799 patterns obtained by sampling the function (1).

1. Rule selection: During this phase 21 rules have been selected among the maximum possible amount of 225 possible rules.

2. Membership function tuning: It has required 230 iterations to determine the membership function shapes, obtaining a mean square error equal to 0.007.

Figure 5 shows the original and the interpolated functions. The abscissa represents the i-th pattern and the ordinate represents y[i].

It has been possible to correct some errors near the 20-th and the 102-th patterns by adding two local rules. For this aim, Neuro-Fuzzy Module automatically has builded up those local rules and their related fuzzy sets requiring as parameters the correct pattern and the fuzzy set base width. The last parameter allows the user to control the activation zone of the local rule.

Antecedent Values Output

Rule 1 1,430300 1,430800 0,998000 0,993200 1,429000

Rule 2 1,430400 1,429200 -0,999900 -0,999500 1,430800

494

It is important to underline that the precision is maintained during the translation step from Neuro-Fuzzy Module to W.A.R.P.-SDT tool. This second tool allows antecedent variables having a 7-bits resolution and consequent variables having 10-bit resolution to be defined.

4.5,-----,----.------,-----.---,---.-----,----,

4

3.5

3

2.5

2

°O~--1~070-~2~OO~-730~O~-4~070--5~OO~-760~O--7~070-~800

Fig. 5 The modelling example of a process described by a 4th order function (-- original function, 0 interpolated function)

4. Conclusion

In this paper two examples of Hardware and Software Architectures for Soft computing has been discussed. The first one (W.A.R.P. Coprocessor) able to compute a large number of fuzzy if-then rules in the microsecond range for real time applications.

The second one, closely connected to W.A.R.P. Coprocessor, a two step neurofuzzy system based on FAM approach and membership function shape tuning allowing to model a process. The main features of the proposed tool can be summarised as follows:

1. Simple neural networks structures thanks to their low number of levels and units;

2. Process modelling based on rules and membership function allowing to easily handle intrinsic imprecision of physical data used to train the network;

3. Fast conver&ence to an acceptable level of approximation.

495

Acknowledgment. The author would like to thank Dr.M.Lo Presti (Fuzzy Logic Group) and Dr.N.Serina (CASA Group) from SGS-THOMSON for their fundamental contribution.

References

1. Lotti A. Zadeh, "Making computer think like people", IEEE Spectrum, August 1984.

2. Lotti A. Zadeh, "Outline of a New Approach to the Analysis of Complex Systems and Decision Precesses", IEEE Transactions on Systems, Man and Cybernetics, vol.3, n.l, January 1973.

3. A. Pagni, R Poluzzi, G.G. Rizzotto, M. Lo Presti, "Automatic synthesis,analysis and implementation of a fuzzy controller". Proceeding of Second IEEE International Conference on Fuzzy Systems, San Francisco, March 1993

4. B. Kosko, Neural Network and Fuzzy Systems, Prentice Hall, 1992.

5. D.E. Rumelhart, lL. McClelland, Parallel distributed processing,' explorations in the microstructure of cognition, MIT Press, Cambridge, 1986

6. lM. Keller, RR Yager, H. Tahani, "Neural Networks implementation of Fuzzy logic", Fuzzy Sets and Systems, vol. 45 n.l, January 1992.

7. W.Pedrycz, "Fuzzy Neural Networks and Neuro Computations", Fuzzy Sets and Systems, vol. 56, n.l, May 1993.

Fuzzy Logic Control for Design and Control of Manufacturing Systems

BaTl~ Tan

Graduate School of Business, Ko~ University, 80860 Istinye-Istanbul, Turkey

Abstract. In this study, first recent applications of fuzzy logic in design and control of manufacturing systems are briefly reviewed. Fuzzy logic control is presented as a technique to implement real-time control algorithms that can be embedded in workstations in the framework of intelligent hierarchical control of manufacturing systems. Then two applications of fuzzy logic in design and control of manufacturing systems are presented. The first application is a fuzzy decomposition method for performance evaluation of manufacturing systems. A model of a workcell with rework is used to describe the fuzzy decomposition method. Fuzzy logic is used as a gain scheduler in the search algorithm of the decomposition method. It is observed that fuzzy logic improves the convergence rate of the decomposition method substantially. The second application is a fuzzy flow controller that adjusts the production rate of a failure prone manufacturing system in order to minimize total inventory carrying and backorder costs while satisfying uncertain demand. As an extension of this model, adaptive fuzzy flow rate controller is discussed.

Keywords. Fuzzy logic control, design and control of manufacturing systems, hierarchical control, decomposition methods, queuing networks subject to blocking, real-time scheduling

1. Introduction

Fuzzy logic makes it very easy to encode expert knowledge and simulate human thinking. By using fewer rules and simpler programming, it is possible to introduce new products to market in a very short time period. At the same time, using embedded microcontrollers with fuzzy logic lowers the cost of introducing an intelligent product to market. Manufacturers also claim that the benefits for the customers are substantial, since they are easier to program, give better results and are more economical to run. Fuzzy logic is increasingly being used in a variety of consumer products, in automotive industry and also to control complex industrial processes. For a comprehensive sample of applications in these areas, the reader is referred to (Terano et al. 1994).


497

Efficiency, effectiveness, and productivity which translate into industrial competitiveness have always been the main concern of manufacturing companies. Although, fuzzy logic has been used in end products for over a decade, applications of fuzzy logic in manufacturing systems design and control are limited. Design and control of manufacturing system have been the subject of industrial engineering for decades. Today design and control of manufacturing systems are getting more complicated due to challenging requirements of the market for rapid response, high variability of the products, small batch sizes, uncertain and highly variable demand. To cope with these requirements effectively, more and more artificial intelligence methods are being incorporated into traditional techniques used in design and control of manufacturing systems. Future trends in design and control of manufacturing systems are in the areas of hierarchical control, real-time control, autonomous control, and intelligent manufacturing.

Over the last few decades, the single most important factor that favorably affected efficiency, effectiveness, and productivity is arguably the computerization of almost all aspects of industrial activity. Parallel to this development, real-time hierarchical control of production systems has been subject of numerous studies in industrial engineering in recent years. At the same time, the cost of implementing a control policy in terms of fuzzy logic has been decreasing steadily. Today, it is possible to add intelligence into a workstation by using an embedded controller at a very low cost. Fuzzy-logic control can be considered as a general-purpose programming language for embedded control (Yeralan and Tan 1994). Thus it is possible to develop operational controllers for manufacturing systems very rapidly and at a very low cost by using embedded fuzzy logic control. Huang and Zhang (1995) review neural-expert hybrid approaches for intelligent manufacturing. They state that learning capability of neural networks combined with the ease of capturing expert knowledge in rule-based system is essential for intelligent manufacturing systems.

The organization of this paper is as follows: first recent applications of fuzzy logic in design and control of manufacturing systems are briefly reviewed in Section 2. Fuzzy logic control is presented as a technique to implement real-time control algorithms that can be embedded in workstations in the framework of intelligent hierarchical control of manufacturing systems. Intelligent hierarchical control is also discussed in Section 2. Then two applications of fuzzy logic in design and control of manufacturing systems are presented. The first application presented in Section 3 is a fuzzy decomposition method for performance evaluation of manufacturing systems. A model of a workcell with rework is used to describe the fuzzy decomposition method. The second application presented in Section 4 is a fuzzy flow controller that adjusts the production rate of a failure prone manufacturing system in order to minimize total inventory carrying and backorder costs while satisfying uncertain demand. Concluding remarks are given in Section 5.

498

2. Hierarchical Control of a Manufacturing System

A flexible manufacturing system consists of a set of workstations capable of performing a number of different operations and interconnected by a transportation mechanism. Since most manufacturing systems are large and complex, it is desirable to divide the control into a hierarchy consisting of a number of different levels. Each level of the hierarchy is characterized by the length of planning horizon and the kind of data required for the decision making process.

Kimenia and Gershwin (1983) propose a four level hierarchy: the flow control level, the routing level, the sequence controller, and generation of decision tables where initial planning is made.

Hintz and Zimmermann (1989) also consider a hybrid system for production planning and control in Flexible Manufacturing Systems. They decompose the planning process into master scheduling, tool loading, realizing scheduling, and machine scheduling subproblems. They use fuzzy linear programming to solve master scheduling problem and approximate reasoning for releasing scheduling and machine scheduling. A heuristic is used for tool loading.

Similarly, Bai and Gershwin (1995) propose a three-level hierarchical controller to regulate production to compensate for workstation failures and changes in part requirements. At the first level the desirable buffer sizes and the target production levels for each operation are determined. At the middle level a production flow rate controller recalculates the production rates whenever a machine fails or is starved or blocked. The loading times for individual parts are determined at the bottom of the hierarchy.

At each level, fuzzy logic control can be used effectively in order to reduce the time required to develop an operational system. Tiirk§en (1988) shows the applicability of fuzzy reasoning in production planning. Furthermore, fuzzy set theory has been incorporated into traditional tools of operational research used in design and control of manufacturing systems. For the review of fuzzy set models in operations research including logistics, transportation problems, fuzzy linear programming, control of flexible manufacturing systems, scheduling, inventory control, and location, the reader is referred to Zimmermann (1990).

At the first level of the control hierarchy, a method is desired to evaluate the performance of a manufacturing system efficiently. Traditionally, queueing networks and Markov models are used in the performance evaluation of manufacturing systems (Dallery and Gershwin 1992). Fuzzy logic is recently incorporated with these existing techniques. Fuzzy queues are used in performance evaluation of computer networks and manufacturing systems (Negi and Lee 1992; Jo et al. 1994). In the first part of this study, a fuzzy decomposition method for performance evaluation of manufacturing systems is presented in detail. In this approximation method, fuzzy logic control is used as a gain scheduler in the search algorithm.

In hierarchical control of manufacturing systems, optimal control methods are used to determine the control strategy to be used at the middle level. Namely, the problem of controlling the production rates of workstations in a failure prone manufacturing system in order to optimize an objective function such as minimiz-

499

ing the discounted inventory costs is studied by using optimal control methods. Deriving optimal strategies for general manufacturing systems by using optimal control techniques is very complicated. In literature, there is only a few number of studies that present optimal strategies for rather restricted systems (for example, Akella and Kumar, 1986). Thus fuzzy logic control is an alternative to develop operational control strategies for general manufacturing systems. In the second part of this study, it is discussed how to use fuzzy logic control as a flow controller in manufacturing systems.

Ben-Arieh and Lee (1995) present a real time fuzzy logic controller for part routing for modern computerized manufacturing systems that produce many different kinds of products in small batches. The fuzzy logic controller uses processing time, slack time, queue length, and machine breakdown rate described in linguistic terms as inputs to determine selectibility factor that is used in routing. They found out that their results are comparable and sometimes better than some other heuristics used in routing and scheduling.

Bugnon et al. (1995) present a neuro-fuzzy controller for real time scheduling. The controller adapts dynamically following perturbations in the system. They modify the techniques developed for task allocations for computers and apply these techniques to job shop scheduling.

3. Fuzzy Logic Control for Design of Multistation Production Systems

A multistation production system is basically a queueing network consists of workstations and finite capacity buffers. Two important performance measures used in the design of manufacturing systems are the production rate, that is the number of parts produced per unit time in the long run, and the average work-in-progress inventory levels. Mathematical models are developed to determine these performance measures as functions of the system parameters. Exact analysis of queueing network and Markov models of manufacturing systems is very complicated, in most of the cases, mathematically intractable. Thus approximate analysis of manufacturing systems is of interest (Dallery and Gerhwin 1992). Tan and Yeralan (1996) introduce a decomposition method for multi station production systems. The decomposition method is based on evaluating one station one buffer subsystems independently by incorporating blocking process of downstream and arrival process from upstream. This decomposition method is applicable to general arrangement of workstations. Tan and Yeralan (1995) use fuzzy logic control to increase the convergence rate of this decomposition method. It is observed that the convergence rate of the fuzzy decomposition method for multistation unbalanced production lines is ten times better than the original decomposition method. In this section, we consider a workcell with rework. The decomposition method developed for this system by Tan and Yeralan (1994) is extended by using fuzzy logic control.

500

3.1 Example of a Workcell with Rework

Consider a workcell consisting of a workstation and a rework station and finite interstation buffers fed by a Poisson stream of discrete parts with rate Uw. Figure

3.1. shows such a workcell.

Rework station and rework buffer I I I

----~- I _____ \ ~1_~_ -__ 11-. I

I

Workstation and work buffer

_______ ..J

WORKCELL

Fig. 3.1. A workcell with workstation and rework station

Suppose the parts are delivered by a materials handling system, such as an automated guided vehicle or recirculating conveyor. Parts arriving to a full buffer are sent back to the input pool. The output of the workstation is inspected. Those parts that fail inspection are routed to be reworked. The probability that a part passes

-inspection is p and the rejection probability is p = 1 - p. Let there be two buffers,

one in front of each station. The reworked parts are fed into the buffer along with incoming parts. Let the workstation and the rework station be subject to breakdown. For station i a;, J1i, q>., 1Ji> r., bi> f3i> Bi i= 1,2 are the arrival rate, processing rate, failure rate, repair rate, output rate, blocking probability, blocking removal rate, and buffer capacity respectively. Similarly, 9) is the production rate of the workcell. The decomposition method uses J1i, q>., 1Ji> Bi> i= 1,2 and fJ.w to determine Yi, ~, 9), and average levels of B) and B2.

In the decomposition method, the rate at which the parts are brought to the subsystem, fJ.i is controlled to compensate the losses due to the blocking of down-

stream subsystems. The blocking probability and the blocking removal rate of the downstream of the workstation are the blocking probability and the blocking removal rate of the rework station respectively. Similarly, the blocking probability and the blocking removal rate of the downstream of the rework station are the blocking probability and the blocking removal rate of the workstation. The outputs of each station are calculated by using the method given in (Yeralan and Tan 1996) by assuming that the failure, repair, and processing time distributions are

501

exponential. If items are conserved, the input to the rework station is equal to its output. Symbolically, 11 = n. The output of the rework station is then added to al

to find the next input to the workstation. An algorithmic solution procedure is employed to adjust the input parameters of

each subsystem based on the output parameters of the previous iteration. The procedure converges to the approximate solution. The control mechanism of ai should

be such that it satisfies the above condition when the system converges. The following control mechanisms for the input rates al and a2,

al t+ 1 = <lw + "d (3.1)

a2t+l = a2t + K (11 t - 'd) (3.2)

satisfy the conditions for convergence, where superscript t denotes the discrete iteration count, aW is the input rate to the workcell, and K>O is the proportionality

constant. Note that when 11 t ~ 12t, a2t+ l~ a2t, and al t+ l~ aW+12t.

3.2 Representation of the Decomposition Method as a Control Problem

The control of the input arrival rates in the algorithm can be viewed as a control problem. The system that is to be controlled in this case is not a physical system but an algorithm. The control problem is adjusting input arrival rates such a way that the algorithm converges to the solution as fast as possible. Figure 3.2. below depicts the block diagram of this system.

YI + 12

Workstation bl Reworkstation b calculation

gain calculation

~2 131

81

Fig. 3.2. Block diagram of the decomposition technique for a multistation production line

The objective of this study is to replace the gain block in Figure 3 by afuzzy Kvalue adjuster to improve its convergence rate. Equation (3.2) assumes a linear

502

relationship between ~t and 11 t - 12t . Thus the search method given in equation

(3.2) is a first order method. Since the output rate of a station 1i also depends on <Ii

, the linear dependence assumption is not valid. If K = (d(12 - 11) Id~rl then the

search method will be equivalent to the Newton's method which is a second order method. Due to mathematical intractability of the model, it is not possible to determine d(12 - 11) Id(J,2 in closed form. Thus the dependence of 12 - 11 on ~ is

investigated numerically. Parameters of the workcell used in the numerical work are tabulated in Table 3.1.

Table 3.1. Parameters of the workcell used in the numerical work

Workstation Rework Station processing rate Il 1 0,4 failure rate cp 0.1 0.1 repair rate 1'\ 0.7 1 buffer size M 5 3 external arrival rate Clw 0.7 -rejection ratio I-p 0.1 -

Figure 3.3 and 3.4 show 12 - 11 and d(12 - 11) Id(J,2 as a function of Clz respec

tively. Figure 3.4 shows that when (J,2 becomes very large, d(12 - 11) Id~ be

comes very small. Thus, at each iteration multiplying the difference by a constant K gives a small change that is added to the previous value. Therefore it takes longer to reach the solution.

12 - 11 vs. ~ 0.6

0.5

0.4

0.3

0.2

0.1

0.0

-0.1 0.5 1.0 1.5 (J,2

Fig. 3.3. "V2 -"VI as a function of az

1.0

0.8

0.6

0.4

0.2

503

0.0 +------+-------+==========1 0.0 0.5 1.0 1.5

Fig. 3.4. 0(12 - 11)loal as a function of a2

In the previous discussion, linguistic terms such as very small, very large for the input rate, and small for the change are used to describe the response of the decomposition method. As an extension of this argument, fuzzy logic control is proposed as a methodology to change the output term K as a function of the input terms described in linguistic terms. For example, when <Xi becomes very large and

the change is very small, K must be set to a large value. Using fuzzy K-value adjuster removes the difficulties associated with the unavailability of a function for a ('Y2 - '(1) lau2 or using an approximation of it.

3.3 Fuzzy K-Value Adjuster

Fuzzy K-value adjuster uses the input rate and the difference between the production rates of the workstation and the rework station to determine the gain to be used at the next iteration of the decomposition method. The internal structure of the fuzzy controller, namely, the inputs, the output, membership functions, inference method, and defuzzification method are discussed below.

3.3.1 Inputs to the Fuzzy K-Value Adjuster

Let the absolute deviation between the production rates of the rework station and the workstation normalized with the production rate of the workstation be called as error (e), i.e., e = I 'Y2 - 'Y1"'Y1' The inputs to the fuzzy K-value adjuster are error and input rate to the rework station (U2)' Associated with each input is a linguistic variable.

504

3.3.2 Membership Grades of the Terms of the Inputs

In this study, the qualitative knowledge gained from the numerical experiments and also the general knowledge about the operation of production systems are used to determine the membership functions. It is known that in a queueing system, if the input rate is much lower than the stand-alone production rate, then the output rate of the queueing system is approximately equal to the input rate. This is referred as the light traffic approximation. Similarly, if the input rate is much higher than the stand-alone service rate of the system then, the system is saturated and the output rate is approximately equal to the stand-alone service rate of the station. This is referred as the heavy traffic approximation. Accordingly the input rate is assumed to have five terms Very low, Low, Medium, High, and Very high according to the load of the server. In the definitions above again it is not certain how much the input rate must be lower or higher in order to use light- and heavy-traffic approximations. We assume triangular membership functions for the terms of the input rate. The stand-alone production rate determines the time unit used in the model. When the input rate is equal to the stand-alone production rate of the rework station, i.e. when input rate is one measured in stand-alone production rate time units, it is assumed to be medium with membership grade of one. The other terms are defined accordingly. Figure 3.5 depicts the membership grades of the terms of the input variable as a function of input rate.

1 Very Low

0.9 0.8

u 0.7 1 0.6

"-:2 0.5 ~ 0.4 u .0 E 0.3 u E 0.2

0.1 0

0

Membership Functions for the Terms of Input Rate

Low Medium High Very High 1'\, ;/\. /\ 11""-"·"·"·"·"·"·'" 1'\ / \ ,,/\, ,.. 1/\ 1 / ' \'," '., \\,.. l '. / .'

, \ l •. . "\ ,l I '\ •• l .''t'' \,/

0.5

.', 1\ ,i'\ /l " I ".

l \ ,/,.\\ .. 1\ ,/ " ,.' "\ 1\1 \ I \

./' '\ // \'\ ./ \\ 1.5

input rate «X2)

2

Fig. 3.5. Terms of the linguistic variable "input rate"

2.5

Similarly, the membership grades of the terms of the input variable error are defined by using our judgment and knowledge. The input variable error is assumed to have five terms Very Low, Low, Medium, High, and Very High. When the error is higher than 5%, that is when the difference between the production rates of the workstation and the rework station is more than 5% of the production rate of the workstation, it is said to be very high. Similarly, when it is close to zero, it is said

505

to be very low. The terms in between and the overlaps are defined accordingly. Figure 3.6 depicts the membership grades of the terms of the input variables as a function of error.

Membership Functions for the Terms of Error

Very Low 1.0

Low Medium High Very High

0 0.8 ] bO 0.6 0.. :.a '" ~ 0.4 .0 e 0 0.2 e

0.0 0 0.Q1 0.02 0.03 0.04 0.05

Error

Fig. 3.6. Terms of the linguistic variable "error"

3.3.3 Output Term of the Fuzzy K-Value Adjuster

Numerical experiments show that if a constant K value is to be used in the algorithm, then setting K to one gives good results. It is proven that the decomposition method always converges to the unique solution when K is less than 2 (Tan and Yeralan 1996). The purpose of using a fuzzy K-value adjuster is to increase the convergence rate of the algorithm by increasing the K value when it is possible. When K is greater than 3, numerical problems arise. Thus fuzzy K-value adjuster should adjust the K-value between 1 and 3.

To avoid extensive calculations in the defuzzification step for the case when output has also fuzzy terms, it is assumed that output term K has singleton outputs Very Low (VL), Low (L), Medium (M), High (H), and Very High (VH). The membership grades associated with the singleton outputs KVL=i, KL=i.5, KM=2, KH

= 2.5, and KVH = 3 are denoted by mKViJ mKL, mKM mKH' mKVH respec

tively.

3.3.4 Knowledge Base

Once the input and outputs are established, and the terms of the linguistic variables are determined, the control task is written as a set of rules. These rules are written by using our numerical experience and knowledge about the decomposition method. The knowledge base is represented in a compact form by the following rule matrix;

506

in~ut rate very low low medium high very high error very low very low very low very low low low low low low medium medium medium medium low medium medium high high high medium high high high very high very high medium very high very high very high very high

In the rule matrix given above, the element in the top left-hand corner, for example, states that if error is very low and input rate is very low then K is set to very low.

3.3.5 Generation of the Crisp Output of the Fuzzy K· Value Adjuster

Since the output term has singleton terms, the output value of K is calculated as

(3.3)

where summations are performed on the knowledge base, i.e., the membership grades of the same term of the output variable which appeared in different rules in the knowledge base are added.

3.4. Numerical Results

The method described here is implemented in software. Extensive numerical experiments show that the typical performance of the fuzzy K-value adjuster is better than using a constant value for K in the decomposition method. Table 3.2 below shows the outputs of the decomposition method and also the simulation results for the workcell given in Table 3.1.

Table 3.2 shows that the relative difference between the simulation and the decomposition method is less than 1 %. Furthermore, although the results obtained from' fuzzy K-value adjuster and using a constant K value are identical, the decomposition method which uses fuzzy K-value adjuster converges in almost half of the time required for the case where constant K value is used.

Note that, it is possible to use different membership functions, or different defuzzification and inference methods. The final result obtained from the fuzzy decomposition method is an approximate value of the production rate of a production system. Using different defuzzification or inference methods does not improve the accuracy of the approximate results but may increase the rate of convergence at

507

which these results are obtained. Thus, future research will focus on improving the accuracy of the approximate result and then on tuning the fuzzy logic controller to obtain the best convergence rate possible.

Table 3.2. Numerical results for the decomposition of the workcell

Decomposition Simulation % Error constant K fuzzy K

production rate 81 0.62977 0.62977 0.63376 0.6

expected work buffer level 2.015 2.015 2.020 0.25 expected rework buffer level 0.0135 0.0135 0.0130 3.8 input rate to the workstation '12 0.0699 0.0699 0.0704 0.78

number of iterations 22 13

4 Fuzzy Logic Control for Control of Manufacturing Systems

Control of manufacturing systems is a complicated problem due to lack of comprehensive mathematical models, intractability of the mathematical models, nonlinearities in the system, lack of information, uncertainties, inaccuracies, and delays. Under these conditions, fuzzy logic control incorporated with other intelligent control methods is expected to be very effective to develop an operational controller in a very short time. In most of the cases, the effectiveness of fuzzy logic controllers is found to be comparable or even better than that of traditional control techniques. Furthermore, the cost of implementing these control policies at the workstation level by using embedded controllers is decreasing. This cost-timetypical performance triad makes fuzzy logic control an appropriate technique to implement real-time control algorithms that can be embedded in workstations in the framework of intelligent hierarchical control of manufacturing systems.

4.1 Fuzzy Logic Flow Controller in an Unreliable Production Line

In this section, a sample problem of controlling the flow rates of machines in an unreliable production system to minimize the overall inventory carrying cost and backorder cost is studied. Fuzzy logic flow controller is used as a real-time controller to recalculate the production rates of machines that were set in a higher hierarchy of control whenever a machine fails, is starved or blocked.

For simplicity, we consider a manufacturing system producing a single product. There is a demand rate d for the product. The manufacturing system is unreliable. When the system is up and operating, it can produce at any rate u(t) at time t up to a maximum production rate T. It is assumed that r>d>O. Figure 4.1 depicts such a system.

Manufacturing System

flow rate u(t)

508

inventory level

@= Buffer M

demand d

Fig. 4.1 A manufacturing system with variable flow rate and a buffer

This problem is analyzed by using optimal control by Akella and Kumar (1986). They assumed that the transitions between the functional state and operational state of the manufacturing systems are Markovian and they also assume that the demand rate d is constant. Then they studied the problem as the optimal control of continuous-time system with jump Markov disturbances, with an infinite horizon discounted cost criterion. For this model, they derived the optimal policy. The optimal policy is as follows. Let u*(t) be the optimal flow rate of the manufacturing system at time t. Then

U'(t)={~ if x(t) < z * if x(t) = z * if x(t) > z *

(4.1)

where x(t) is the inventory level at time t and z* is the critical inventory level that is determined as a function of system parameters. Equation (4.1) states that whenever the manufacturing system is in the functional state, one should produce at the maximum rate r if the inventory level is less than a certain level, z*, one should produce exactly enough to meet demand if the inventory is exactly equal to z*, and one should stop production if the production level is above z*.

Although the optimal solution for this simple model can be found, it is not possible to derive optimal policies for general multi station manufacturing systems. In this case, fuzzy logic control can be used to translate the ideas presented in the theoretical model into more general case. There is a trade off between the complexity of the mathematical models, assurance of performance and stability of controllers designed by using traditional mathematical control theory and the simplicity, realism and typical-case efficiency of heuristic control techniques. Typicalcase efficiency, simplicity and realism assure the success of fuzzy logic control in practice.

509

4.2 Fuzzy Flow Rate Controller

Now let us consider an extension of this problem where the demand rate is uncertain. We assume that although the demand rate is uncertain, it is always less than the maximum production rate of the system. Note that otherwise it is not possible to sustain positive inventory. The optimal solution of this problem under general assumptions is not known. We present a fuzzy logic flow controller for this problem which cannot be proven to be the optimal one, but its typical case efficiency may assure its success in practice.

4.2.1 Inputs to the Fuzzy Flow Rate Controller

The inputs to the fuzzy flow controller are the demand rate and the inventory level. The demand rate is described in five linguistic terms very low, low, medium, high, very high relative to the maximum production rate r. The inventory level is described in three linguistic terms low, medium, high relative to buffer capacity M and a certain number z* originally set to M12. Figures 4.2 and 4.3 show the membership grades for the terms of demand rate and inventory level respectively.

4.2.2 Output Term of the Fuzzy Flow Rate Controller

The output is the flow rate of the machine also described in five linguistic terms low, low-medium, medium, high-medium, high relative to the maximum production rate of the system r. Figure 4.4 depicts the membership functions of the terms of flow rate.

4.2.3 The Knowledge Base

The knowledge base is represented by the following rule matrix:

invento!1. level low medium high demand rate very low low low very low low low low low medium medium medium low high high medium medium very high very high high medium

In this rule matrix given above, the element in the lower right cell, for example, states that if inventory level is very high and demand rate is high then flow rate is set to medium

510

Membership Functions for the Terms of Demand Rate

Very Low 1.0

Low Medium High Very High ,1\\ , .. 1\.... /\\ / 1/.... ,I / , I.... / / i 0.8 / "-... .,/ \ /' \ .' 6h '- / .... I' ,/"

Po. 0.6 / \ / \ 1/ \ I :a ' Y l\ I .ce~ 0.4 //\./ \ I \\

./ "-... / \ I '\ E 0.2 /' '\,/ \,1 ....• '" lIt ........ , .• ,." ,

O.O~----------~L---------~~--------~~----------~

o 0.25, 0.5,

Demand rate 0.75'

Fig. 4.2. Terms of the linguistic variable "demand rate"

Membership Functions for the Terms of Inventory Level

Low Medium High 1.0 ------------\" /' .... .... r------------, I' \ "" " 0.8

1 Po. 0.6 :a 5 0.4

.oe5 0.2

, /' , \\, " '.. /

/ "" \ /" , / \/ , ' X ,I // \\ I'"~

/ I , /" \ /"

O.O+-------~--------~------~------~--------~------~

o z· M inventory level

Fig. 4.3. Terms of the linguistic variable "inventory level"

Membership Functions for the Terms of Flow Rate

Very Low Low Medium High Very High

l",\ /\. /.\ // I l/\ \.1 I "\ l .... I .I

I / \ I / I' \\ .. /' ........ / \\ ./ , . ., )/

,ll\,\ //,. ... \\ ,.1,1 :\\\ / \ I I ,I

.l \/ I '\

1.0

0.8

"" .. ~ 0.6 .9-

1 0.4

~ 0.2

l 'I 'x' , 0.0 +------------r-----------,------,--------,

o 0.25, 0.5, flow rate

Fig. 4.4 Terms of the linguistic variable "flow rate"

0.75, 1.0,

511

4.2.4 Generation of the Crisp Output of the Fuzzy Flow Rate Controller

Since the output of the fuzzy controller is described in fuzzy terms, first the terms are truncated by the minimal membership values for each rule. Then truncated terms are aggregated in a combined fuzzy set. Finally, the output of the fuzzy flow controller is calculated as a crisp value by calculating the center of gravity of the resulting fuzzy set.

4.3 Preliminary Results

The fuzzy flow rate controller described in this section is implemented in software. Performance of the system is investigated by using simulation. The rules and membership functions are updated manually according to the results of simulation runs. Note that the most important parameter in the fuzzy flow controller is z* that is the critical inventory level. In the first stage of this study, z* which gives the best result is determined by using trial and error. Simulation studies show that the typical performance of the controller is acceptable. Since there is no theoretical result to compare for this system, in another experiment, demand rate is kept constant and the results obtained from fuzzy flow controller is compared to those obtained by Akella and Kumar (1986). Preliminary results show that the fuzzy flow controller results are comparable to the optimal ones.

4.4 Extension to an Adaptive Fuzzy Flow Rate Controller

In the second stage of this study, a simulation-based search methodology is used to find the best value of z*. This simulation-based search method reduced the tuning time of the fuzzy controller dramatically. We started testing an adaptive fuzzy flow rate controller. In this case, simulation runs are being used to train a neuro-fuzzy system to update the membership functions. The proposed system is depicted in Figure 4.5.

inventory level

Manufacturing system

Update controller • I SimUlatio:

L-..:p;...ara_m_e_ter __ ---' 1 Fig. 4.5. Block diagram of the adaptive fuzzy flow controller

demand

=--;-+

512

5. Conclusions

In this study, fuzzy logic control is presented as an effective technique to be used in design and control of manufacturing systems. Especially, fuzzy logic control is very convenient to implement real-time control algorithms that can be embedded in workstations by using microcontrollers. Reviewed applications and preliminary numerical experiments suggest that fuzzy logic control can be used at all levels of hierarchical control of manufacturing systems. It is expected that incorporating fuzzy logic control and other intelligent techniques into hierarchical control of manufacturing systems will increase the effectiveness, efficiency, and productivity of manufacturing systems that will then translate into industrialcompetitiveness.

Elkan (1994) considers fuzzy logic in its form used in industry as a heuristic control. There is a trade off between the complexity of the mathematical models, assurance of performance and stability of controllers designed by using traditional mathematical control theory and the simplicity, realism and typical-case efficiency of heuristic control techniques. Typical-case efficiency, simplicity and realism assure the success of fuzzy logic control in practice. The success of fuzzy logic controllers in industry is due to its rule-based formalism with numerical factors qualifying rules and the ease of building and modifying knowledge base by using expert knowledge or experience. Continuous nature of the input and output of controllers makes numerical factors qualifying rules a suitable interface between the environment and the controller. Furthermore behavior of the controller can be adjusted accurately by changing the numerical values.

References

Akella, R. and Kumar, P.R. (1986), "Optimal Control of Production Rate in a Failure Prone Manufacturing System," IEEE Transactions on Automatic Control, Vol. AC-31, No.2., pp. 116-126.

Bai, s.x. and Gershwin, S. B. (1995), "Scheduling Manufacturing Systems with Work-In-Process Inventory Control: Single-Part-Type Systems," /IE Transactions, Vol. 27, pp. 599-617.

Ben-Arieh, D. and Lee, E.S. (1995), "Fuzzy Logic Controller for Part Routing," in: H.R. Parsaei and MJamshidi (eds), Design and Implementation of Intelligent Manufacturing Systems: From Expert Systems, Neural Networks, to Fuzzy Logic, Prentice Hall Inc., Englewood Cliffs, NJ, pp. 81-106.

Bugnon, B. Stoffel, K. and Widmer, M. (1995), "FUN: a Dynamic Method for Scheduling Problems," European Journal of Operational Research, v. 83 No. 2 pp. 271-282

Dallery, Y., and Gershwin, S.B. (1992), "Manufacturing Flow Line Systems: A review of Models and Analytical Results," Queueing Systems Theory and Applications, Special Issue on Queueing Models of Manufacturing Systems, Vol. 12, No. 1-2, December, 1992, pp. 3-94.

513

Elkan, e. (1994), "The Paradoxial Success of Fuzzy Logic," IEEE Expert, Vol. 9, Iss. 4, p. 3-8.

Hintz, G.W. and Zimmermann, H.-J. (1989), "A Method to Control Flexible Manufacturing Systems," European Journal of Operational Research, pp. 321-334.

Huang, S. and Zhang, H.e. (1995), "Neural-Expert Hybrid Approach for Intelligent Manufacturing: A Survey," Computers in Industry, Vol. 26, No.2, pp. 107-126.

Jo, J.B., Tsujimura, Y., Gen, M. and Yamazaki, G. (1994), "Failure Analysis of Computer System Based Fuzzy Queueing Theory," Computers & Industrial Engineering, Vol. 27, Nos. 1-4, pp. 425-428.

Kimenia, J. and Gershwin, S.B. (1983), "An Algorithm for the Computer Control of a Flexible Manufacturing System," lIE Transactions, Vol. 15, No.4, pp. 353-362.

Moutaz, K. and Booth, D.E. (1995), "Fuzzy clustering procedure for evaluation and selection of industrial robots," Journal of Manufacturing Systems, Vol. 14, No.4, pp. 244-251.

Negi, D.S. and Lee, E.S. (1992), "Analysis and simulation of fuzzy queues," Fuzzy Sets and Systems, Vol. 46, pp. 321-330.

Tan, B. and Yeralan, S. (1994), "A Decomposition Method for General Queueing Networks Subject to Blocking," In: S. Kuru, M.U. <;aOlayan, E. Gelenbe, H.L. Akin, C.Ersoy (eds), Proceedings of the Ninth International Symposium on Computer and Information Sciences, Bogazi~i University, Istanbul, Turkey, pp. 802-810.

Tan, B. and Yeralan, S. (1995), "A Fuzzy Decomposition Method for Multistation Production Systems Subject to Blocking," International Journal of Production Economics, Vol. 42, pp. 245-262.

Tan, B. and Yeralan, S. (1996), "Analysis of Multistation Production Systems with Limited Buffer Capacity, Part II," Mathematical and Computer Modeling, forthcoming.

Terano, T, Asai, K and Sugeno, M. (1994), (eds, transl. e.G. Aschmann), Applied Fuzzy Systems, AP Professional, Boston.

Tiirk!jlen, LB. (1988), "Approximate Reasoning for Production Planning," Fuzzy Sets and Systems, Vol. 26, pp. 1-15.

Yeralan, S. and Tan, B. (1995), "Fuzzy Logic Control as an Industrial Control Language for Embedded Controllers," in: H.R. Parsaei and M.Jamshidi (eds), Design and Implementation of Intelligent Manufacturing Systems: From Expert Systems, Neural Networks, to Fuzzy Logic, Prentice Hall Inc., Englewood Cliffs, NJ, pp. 107-140.

Yeralan, S. and Tan, B. (1996), "Analysis of Multistation Production Systems with Limited Buffer Capacity, Part I," Mathematical and Computer Modeling, forthcoming.

Zimmermann, H.-J. (1990), Fuzzy Set Theory and its Applications, Kluwer Academic Publishing, Dordrecht, Netherlands.

Applications Of Intelligent Multiobjective Fuzzy Decision Making

Enrique H. Ruspini

Artificial Intelligence Center, SRI International, Menlo Park, CA 94025, USA [email protected]

Abstract. We discuss the major characteristics of two classes of fuzzy-logic techniques for the planning and control of systems operating in highly uncertain environments. These applications are characterized by strong requirements for robust behavior and for reactive response to unexpected circumstances. These requirements demand that the decision/control policies be capable of attaining, to the highest possible degree, a number of purposive and reactive goals. Fuzzy logic is an attractive approach to treat this type of questions because of its ability to combine numerical treatments of decision-making problems, its reliance on artificial-intelligence techniques for the context-dependent activation of control rules, and its conceptual relations to analogical-reasoning methods based on notions of similarity and resemblance.

Techniques in the first class - developed in the context of autonomous robot applications - are based on hierarchical supervisory approaches that divide decision/control responsibilities between low-level controllers - concerned with the attainment of specific goals - and high-level supervisors - deliberating about context-dependent goal attainability. Methods in the second class - developed for applications with stringent real-time requirements - are based on an axiomatic approach to the formal representation of knowledge about the relative importance of goals in various operational contexts.

1. Introduction

In this paper we discuss the basic principles underlying the successful application of fuzzy logic to the planning and control of complex systems operating under conditions of impreeision and uncertainty. The regulation of these systems poses particular problems that go well beyond those encountered in other control applications. Of particular importance among these questions are issues that stem from the need to react in a robust and smooth fashion to a wide variety of unexpected circumstances while attempting to attain multiple, conflicting, objectives. Controllers regulating this type of systems must, in addition to their ability to deal with explicit purposive goals, be able to deliberate about attainability of goals, sense the need to respond to unexpected circumstances, and


515

choose courses of action that address such eventualities in the best possible manner.

We will discuss two applications of fuzzy-logic techniques to the reactive control of systems operating in complex, unstructured,· uncertain environments. The first of these applications concerns the planning and control of an autonomous robot while the second involves the real-time control of the power train of a hybrid electric vehicle (HEY). We have described details of these applications elsewhere[2, 6]. In this paper, however, our objective is to focus on the essential conceptual basis of the approach in order to squarely address the nature of the advantages provided by the methodology.

These advantages may be succinctly summarized as being related to the ability to generalize the rule-based approaches of artificial intelligence and to allow for the numerical combination of various control modes.

The ability to generalize rule-based approaches allows explicit statement of control actions and their scope of applicability. Furthermore, controllers may be readily modified and adjusted - a task that is considerably simplified by the ability to explain the rationale leading to particular decisions. These sound logical foundations are complemented by the ability to employ numerical methods to combine, to different degrees, the characteristics of multiple regulation modes to derive a combined strategy based on judicious tradeoffs between multiple conflicting objectives.

Beyond these general considerations, there ate other, application-specific, advantages that, in our experience, suggest the applicability of fuzzy-logic techniques to the problems discussed in this paper. We start our discussion of these advantages in Section 2 by reviewing the general characteristics of the problems involved in reactive control of systems under conditions of imprecision and uncertainty. Section 3 is devoted to the discussion of the rule-based approach employed in our autonomous robot application while Section 4 presents an axiomatic approach to the specification and synthesis of real-time controllers.

2. Control of Complex Systems Under Uncertainty

In this work we are concerned with the planning and control of the activities of complex systems operating in unstructured, i.e., highly dynamic and uncertain, environments. These systems have the following broad characteristics:

1. The systems are complex and difficult to model with precision. Even when adequate models are available-as is the case with our application to the control of HEY power train-the complexity of these formal structures makes them unsuitable for treatment with existing analytical techniques.

2. The regulation of the system seeks to attain more than one objective. Often these multiple objectives are partially inconsistent with each other or can only be attained to limited degrees.

516

3. In addition to explicit control objectives, there exist certain implicit survivability, smoothness, robustness, and stability objectives that must also be addressed by the controller.

4. The system must, in particular, be able to sense and react to various unexpected circumstances while still trying to attain, to the best possible degree, explicit purposive objectives. In the applications discussed in this paper these unexpected circumstances range from unpredictable variations on system demand - exemplified by the driver load on an automobile power train - to unforeseen external events - such as the appearance of obstacles in the trajectory of an autonomous robot.

5. System goals, whether purposive or reactive, change dynamically as circumstances evolve changing their relative priority and necessity.

Clearly, these inconvenient characteristics preclude the utilization of a controller without sufficient flexibility since changing conditions will soon render it useless. Any successful control system must, therefore, be able to consider a wide scope of environmental circumstances choosing control decisions that fully consider the situational context and the nature of the goals that are sought and achievable.

Fuzzy-logic techniques are a particularly attractive technological avenue for the treatment of this class of problems. The fundamental relation between utilitarian notions and fuzzy logic [1] facilitate the representation, selective relaxation, modificl;ltion, and tradeoff of elastic goals and constraints employing a sound logical scheme. This reliance on logical methods also permits the production of explanations about the rationale leading to particular planning and control decisions. Furthermore, fuzzy-control techniques permit the explicit specification - usually in the form of inferential rules - of conditions for the context-dependent modification of control policies and of their associated actions (also specified in an elastic fashion).

In our studies we have also sought to gain a better understanding of the conceptual foundations of fuzzy-logic approaches as part of a broader effort to move the technology from heuristic procedures to analytical normative techniques.

The control and planning systems discussed in the following sections are based on supervisory-control principles with controllers at one level mediating and regulating the actions of controllers at the immediately lower level. We discuss first metalevel control techniques developed for the regulation of autonomous robots, turning later our attention to axiomatic methods for real-time control of automotive power trains.

3. Metalevel Control and Analysis

Our approach to the planning and control of intelligent autonomous robots is based on the mediation, at the supervisory level, of objective-specific controllers. These low-level regulators - each seeking an individual objective or subset of objectives

517

- are activated and regulated-by higher level controllers-in response to userspecific demands (purposive control) or to unexpected operational circumstances (reactive control). These controllers act as autonomous reasoning agents - measuring the utility [1], or desirability, of potential control actions from the viewpoint of specific goals. Desirability functions corresponding to individual goals may be combined using the logical operators of fuzzy logic, which are generalizations of their classical counterparts, to describe complex goal statements such as "attain goal G" and if goal G2 is attained, then attain goal G3."

While we have relied, in our robotics application, in fuzzy-logic implementations of low-level controllers, this methodology is particularly useful to implement higher-level supervisory regulators. These modules apply contextdependent metarules to combine the desirability functions produced by low-level controllers to determine a global desirability function. These metarules may also be thought of as specifications of the procedures that, on the basis of the current situation, weigh the relative importance of purposive and reactive objectives, tradeoff the demands imposed on the system by multiple goals, and, on the basis of these considerations, recommend a specific control action.

The context-dependent nature of these deliberations is exemplified by the operation of the controller when the robot approaches an obstacle. While far from the obstacle, low-level controllers seeking purposive goals (e.g., "reach the end of the hallway") essentially determine the actions of the robot. As the distance to the obstacle becomes smaller, reactive controllers - specially devised to avoid sensed obstacles - are given a priority that reflects the situation-dependent importance of their objective.

The theoretical bases of our approach, its reliance on the notion of behavior blending, and experimental results of its application, have been discussed in detail elsewhere [5]. In closing this section, we may summarize the salient conceptual bases of our methodology as its reliance on a sound approach permitting the formal characterization of the concepts of goal, goal attainability, goal satisfiability, and trajectory admissibility; the notion of contextual restriction of behaviors; and the ability to determine properties of the combined behavior from knowledge of the properties of the behaviors being combined.

4. Axiom-Based Integration of Control Goals and Constraints

The theoretical underpinnings for the axiomatic derivation of measures of contextdependent adequacy of control actions were first formulated by Ruspini [4] in the context of problems involving hierarchically organized objectives. In this section, we show the application of the basic notion behind these developments - the derivation of the most general fuzzy predicate satisfying certain axioms that formally represent control objectives, their relations, and importance - to a problem arising from the regulation of HEV power trains.

as

518

Formal statements describing preferred courses of action in a given context, such

"Whenever the state x is in C (context), if goal G1 is not attainable, then goal G2 should be sought instead"

are formally represented as axioms that, as a set, characterize the acceptability or adequacy of individual control actions.

Once these restrictions are formally transcribed as first-order logical formulas, the result is, in general, a set of fixed-point expressions that limit the scope of the predicates Adequate(x,u) that might be considered to determine the adequacy of control u in situation x. Once translated - employing fuzzy-logic operations - into their multi valued-logic counterparts, these fixed-point expressions become implicit numerical equations constraining the values of a global numerical utility measure of control adequacy. It is often possible - as exemplified by our application to HEV control - to determine, in closed form, the largest of these measures, i.e., the most general definition of integrated, relative, control adequacy.

In our HEV application, we considered three low-level, independent objectives - GJ, G2 and G3 - acting as multi valued logical constraints (Le., possibility measures) on possible control actions. These elastic constraints were formally expressed in terms of three functions - GJ, G2 and G3 - defined over the cartesian product X X X X U of the state and control spaces, respectively, describing how well the control u promotes each of the goals when it steers the system state from x to y during a single control cycle. This representation essentially embodies the assumption, common in fuzzy-control applications [7], that the adequacy of control actions may be expressed as a combination of simple one-step, purpose-specific, look-ahead strategies.

In our HEV problem, the explicit overall control goal was to achieve both G1

and G2• If this objective proved to be unattainable, however, an alternative goal was defined as the conjunction of G1 and G3•

The control-synthesis problem was then defined as the determination of an overall desirability function A defined in the control space U for each pair (x,y) in X X X, having as values fuzzy subsets of the real line (Le., a generalized fuzzy set in U). When the current and future states x and y, respectively, are fixed, this function A assigns a fuzzy number - measuring the approximate adequacy of u - to each potential control action u.

The formal definition of this problem started with the informal expression of the overall control objectives as

1. The principal goal is to attain

2. If this is unfeasible, then the alternative goal is to reach

519

These informal statements were then formalized as fixed-point first-order expressions constraining the nature of the "acceptable relation" A between states and controls:

1. E\ ~ A, i.e., all controls consistent with E\ are acceptable.

2. A ~ E\ V E2, i.e., all admissible controls either achieve E\ or Ez. 3. If there exists a control u that is consistent with Elo then all controls u' that are

admissible and consistent with E2 are also consistent with E\. In other words, E\ is preferable to E2• Any control u, therefore, satisfying Elo is preferable to all controls u' that satisfy Ez. This condition may be expressed as

'V u, E\(u) ~ { 'V u', Ez(u~ 1\ A(u~ ~ E\(u~ },

where we have assumed, for simplicity, that x andy are fixed denoting Eh Ez as functions of u only.

These fixed-point logical equations were then reformulated as multi valued logic expressions by replacing logical operators with their fuzzy-logic counterparts. Conjunctions 1\ were replaced by the triangular norm ®, disjunctions v by the triangular conorm E9, and implication operators ~ by the by the pseudoinverse (2) of ®. Conventional predicates representing low-level and overall goals were replaced by possibility measures on their corresponding domains. We further simplify the translated system by noting that requiring that p(x,y,u) ~ q(x,y,u), for arbitrary predicates p and q, is equivalent to stating that p S q. The third axiom, for example, was translated to

which is valid for all u and u'. Solving the resulting set of inequalities results in the expression

A(u)=min { (E\(u) (2) A) (2) E2(u), E\(u) E9 E2(u) },

where

which defines the largest admissible set A that is consistent with the axiomatic restrictions. The ability to express, in closed form, a measure of global adequacy greatly reduces the computational effort associated with the determination of relative desirability of potential control actions. Furthermore, availability of such an expression facilitates analysis of the properties of the controlled system.

520

Our efforts were also considerably simplified by the availability of a sophisticated simulation model developed at the Research Laboratories of the Ford Motor Company [3]. While being too complex to support the analysis that is typically involved in control synthesis. this model accurately predicts, nonetheless, the next state y, given the current state x and the control u. This predictive ability permits to simplify the definition of the functions Gi, i=I,2,3, as functions of the current state x and the control u.

References

1. Bellman R.E. and Zadeh L.A.: Decision-making in a fuzzy environment. Management Science, 17: BI41-BI64, 1980.

2. Berenji H. R. and Ruspini E. H.: Automated Controller Elicitation and Refinement for Power Trains of Hybrid Electric Vehicles. Proceedings of the 4th IEEE Conference on Control Applications, Albany, New York, 329-334, September 1995.

3. Powell B.K. and Pilutti T.E.: A Range Extender Hybrid Electric Vehicle Dynamic Model, In: Proceedings of the 33n1 IEEE International Conference on Decision and Control, Lake Buena Vista, Florida, 1994.

4. Ruspini E. H.: On truth, similarity, and utility. Proceedings of the 1991 Conference on Intelligent Fuzzy Engineering Systems (IFES-91), Yokohama, Japan, 1991.

5. Saffiotti A., Konolige K., and Ruspini E.H.: A Multivalued Logic Approach to Integrating Planning and Control. Artificial Intelligence, 76, 481-526, 1995.

6. A. Saffiotti, Ruspini E.H., and Konolige K.: Blending Reactivity and Goal-directedness in a Fuzzy Controller. Proceedings of the Second IEEE International Conference on Fuzzy Systems, San Francisco, California, 134-139, 1993.

7. Yasunobu S., Miyamoto S., and Ihara H.: Fuzzy Control for Automatic Train System Operation. In: Proceedings of the 4th IFAClIFIPIIFORS International Conference on Control in Transportation Systems, Baden-Baden, Germany, pp. 33-39.

A Product Life Cycle Information Management System Infrastructure with CAD/CAE/CAM, Task Automation, and Intelligent Support Capabilities

Harold P. Frisch

NASNGoddard Space Flight Center, Greenbelt, MD 20771, USA

Abstract. NASA is not unique in its quest for a product development cycle that is better, faster, and cheaper. Major advances in technical information management will be required to achieve significant and obvious process improvement goals. A vision of order for the associated systems of unstructured and unconnected files and databases is the first step towards organization. This is provided by examining the basic nature of technical information, item relationships, change and knowledge processing demands to be placed on any management system that supports all aspects of data representation and exchange during the product's full propose, design, develop and deploy life cycle. An infrastructure that partitions product technical information relative to the perspectives of creation time phase, type and the cause of change provides sufficient structure. This enables maximal use of existing CAD/CAE/CAM/ ... software tool systems and digital library data mining capabilities. Introducing the concept of packaging technical information in a machine interpretable manner, at key life cycle deliverable and product review milestone points, provides the fastener needed for the attachment of the relevant soft computing and intelligent support capabilities discussed at the NATO Advanced Study Institute on Soft Computing and reported elsewhere within this volume. It also provides the basis upon which task automation capabilities can evolve.

1. Partitioning Technical Information

Technical information evolves in 3 stages:

• Computation with metaphors: This is the product creation process. Within an organization it takes place within the domain of social communication. In reference [1] evidence is presented to suggest that both communication and understanding in a social environment is carried out via metaphors. If these arguments are accepted it follows that the project creation process is effectively a metaphorical feedback process that stabilizes around a fuzzy, word, definition of the product.


522

• Computation with words: This is rarely done, in a strict mathematical sense. However, it is argued within the conclusions of Zadeh's 1996 paper, reference [9], that "computing with words is likely to emerge as a major field in its own right. ... there is much to be gained by exploiting the tolerance for imprecision." To compute with words, selected design parameters are assigned linguistic rather than numeric measures. These fuzzy word descriptors form the natural connective linkage between metaphoric and numeric product data representations. Computing with words within this interface domain enables imprecision and the tolerance for imprecision to be exploited. The associated first order sensitivity and design trade-off studies accomplished with fuzzy words provides deep design insight and has the potential of providing early design value checks.

• Computation with numbers: This spans the complete, business as usual, womb-to-tomb spectrum of CAx1 software tool capabilities.

A vision of order within the complexity of a universe of discourse is the essential first step toward the development of an infrastructure. The most natural approach to ordering life cycle technical information is to partition it relative to the perspectives of time and type.

• Reference [5] details the NASA spacecraft system engineering process. By placing this work in a more generic setting, it is reasonable to suggest that a product's life cycle can be partitioned into 4 time phases:

1. Propose the product: Develop a product proposal which maximizes the ability to satisfy a set of broadly stated mission goals while minimizing life cycle system costs.

2. Design the product: Define product's mission, its operational support system, establish and refine a baseline design,

3. Develop the product: Establish final design, fabricate subsystems, integrate them into the system, test, validate, verify and establish mission support infrastructure.

4. Deploy the product: Prepare and deploy the product, verify its operation, carry out mission objectives and dispose of product at mission end of life.

• Technical information may be partitioned into 4 type categories:

1. Unstructured collections of reports. These may be formal or informal. They are essentially text but contain a generous mix of graphics, tables, figures and equations.

2. Drawings and schematics from many CAD software systems.

3. Input/output data files from many CAE and CAM software system.

4. Technical data packages: These are focused collections of structured technical information. For example:

1 CAx = CAD/CAElCAM ...

523

• Specifications & requirements for product and subsystems,

• Plans & procedures for product and subsystems,

• Baseline system & subsystem descriptions,

• System \& subsystem reviews,

• Test plans, procedures and results,

• Subsystem interface integration standards,

• Interface control documents,

• Subsystem function modeling reports.

Unstructured reports are best collected within digital libraries which have extensive data mining capability, see reference [4] These enable complex queries to be launched with query results returned to the user as images of the identified pages on their desktop computer screen. Cut and paste tools enable timely information reuse, without transcription error. Query tools which simply do key word searches and return a list of reports that contain the key words are totally inadequate; this is not data mining.

CAx drawings, schematics, and input/output data files come from many different software systems. These systems usually provide an extensive range of options for viewing and databasing. Furthermore, user groups usually have a full range of legacy and proprietary post processing, viewing, databasing and data reformating tools for moving detailed modeling data between major CAx software systems2,3 Supporting the full range of associated data representation and exchange problems are several STEp4 standard projects in America, Europe and Asia5,6.7.8,9 .

The author strongly supports these efforts to enhance data communication between software systems, at the detailed modeling level. However, it is the author's opinion that managing the contents of technical data packages, with their formally defined views of system, subsystem, interface and function is the true door to major life cycle cost saving potential. Task automation, software reuse, shortening and focusing review meetings, corporate knowledge capture, infusion

2http://skipper2.mar.external.lmco.comlsave/index.html - SAVE (Simulation Assessment Validation Environment, at Lockheed Martin) 3http://www.ccad.uiowa.edu/-infointl - Information Integration for Simulation-Based Design and Manufacturing at The University oflowa 4Standard for the Exchange of Product Model Data 5http://arioch.gsfc.nasa.gov/nasa\_pdewglnasa\_pdewg.html NASA Product Data Exchange Working Group 6http://www.scra.orgluspro/ - U.S. Product Data Association, National Product Data Exchange Resource Center 7http://www.cadlab.tu-berlin.del-PDTAG/whois.html- Product Data Technology Advisory Group, ESPRIT 9049 8http://www.univoa.ptlCRIIGR\_SSNCIRESEARCH/projsip.html - SIP-STEP - based Integration Platform 9http://www.hike.te.chiba-u.ac.jp/ikedaldocumentationlSTEP.html - STEP Home Page, Ikeda Lab. Japan

524

and reuse are all possible with today's state of the art in computational science and soft computinglO,lI.

1.1 Accommodating Information Relationships

Technical information accumulated over a product's life cycle can be viewed as an aggregation of information items linked together through a complex network of soft, fuzzy and crisp relationships.

The organizational group with the best knowledge of these relationships, the ability to avoid their rediscovery, to innovatively use them for system analysis and process automation will successfully meet the significant and obvious process improvement goals of "better, faster, cheaper."

It is easy to get lost within the microscopic views of subsystems and the details of their associated theory, modeling data and computational support systems. It is therefore important to view relationships from the macroscopic perspective that is captured within the technical data packages associated with reviews and summary reports.

Relationships between items of information within and across technical data packages can be categorized as:

• First principle relations. These are dictated by the product's configuration and the laws of nature. These cannot be violated under any circumstances. These range from trivial; e.g., product mass shall be a positive real number, to exceedingly complex dynamic cross coupling relations within a multidisciplinary system model.

• Standards. These are dictated by international, national and corporate standards setting organizations. They are crisp and their violation normally requires detail support analysis and an extensive approval cycle.

• Best practices, corporate knowledge and lessons learned. These relationships are inherently fuzzy. The machine interpretable representation of this knowledge and its utilization requires the instantiation of knowledge expressions with linguistic measures and evaluation via computation with words, see reference [9].

• Design tradeoffs. These attempt to minimize cost while maximizing performance. They are a mix of crisp, fuzzy, probabilistic, random, etc. relationships that are difficult to define and computationally evaluate.

l'bttp:llmecha.ee.boun,edu.tr/asLhtml - NATO Advanced Study Institute on Soft Computing and its Applications llhttp://http.cs.berkeley.eduiprojectslBisclbisc.welcome.html - Home page for BISC (Berkeley Initiative in Soft Computing)

525

1.2 Accommodating Information Change

Product technical information is dynamic. Fundamental to its life cycle accumulation is "change". If knowledge is to be linked with information then change too must be categorized. This can be done relative to the perspective of its "cause". The following 4 causes of change are considered:

• Evolutionary change: This is associated with the normal course of events during the product's propose, design, develop and deploy life cycle process. Evolutionary changes are discontinuous and distributed across the system. Normally these are collected at irregular time intervals and released as a configuration data update. The update is effectively time tagged by version number. Versioning tracks change while allowing users to have a fixed baseline reference model over programmatically convenient time periods. To support the "version release" process, an intelligent support system can be used to check for relationship violations between design parameters.

• Design change: This is introduced to improve product performance or to solve a problem uncovered during design analysis, testing or operation. There are lessons learned, design change rationale and other items of information important to manage for future reference and reuse.

• Dependency change: Design parameter dependencies transition from a state of softness, during initial trade studies, to a state of crispness as the life cycle matures. If one attempts to model this change as a continuous process one loses the ability to associate dependency change with an associated life cycle time phase and technical data package contents.

Dependency within and across data packages evolves as follows:

1. Design parameters start from a single point expert's best estimate and end as the output of a complex sequence of computational, logical and fuzzy reasoning procedures. From this perspective, definitions of design parameter attributes remain fixed but their instantiation source changes.

2. Attempts to minimize cost while maximizing function involve a complex intertwined mix of physics, mission viability and performance tradeoff considerations. From this perspective, design parameter dependencies transition from a mix of soft and crisp dependencies during product proposal preparation and design, to a set of crisp dependencies during product development and deployment.

• Knowledge change: Design rules, best practices, corporate knowledge and other items of knowledge base type information are usually linked to phases in the life cycle process. As the process matures higher fidelity knowledge constructs become relevant and must be introduced into the intelligent support system. Once articulated in clear text and translated into a machine readable format computational intelligence can be used to initiate the actions which provide intelligent support. This is not an easy task, since knowledge is usually derived from an expert. While the expert instinctively provides accurate and timely evaluations via mental computation with metaphors; they

526

find that the articulation of these metaphors into fuzzy words, relations, constraints and numerics is extremely difficult.

1.3 Using STEP To Model Information

The ability to model information in a machine parsable format provides the fastener necessary to attach intelligent support and a variety of soft computing capabilities to a technical information management system. The EXPRESS information modeling language provides the bridge between the clear text representation of information and its machine interpretable representation. The EXPRESS language is an international standard and it is a Part of the ISO 10303 STEP standards for Product Data Representation and Exchange!2

It is assumed that a clear text definition of information to be collected within technical data packages can be developed. If this can be obtained then the associated information can be modeled via EXPRESS and presented as a "technical data package" in a STEP compatible manner13 • The technical information component associated with change can also be presented in a STEP compatible manner!4.

The most difficult aspect of the technical data packaging problem is to develop a textural definition which is clear, unambiguous, complete and nonredundant. Unaided, this is a near impossible task. The language EXPRESS was specifically designed to aid this task. Once information is translated from a clear text definition into the machine readable language EXPRESS, the EXPRESS source code is compiled. The compiler returns an error list of ambiguities, incompleteness and redundancies. This list is then used to both adjust the textural definition and the EXPRESS source code. This feedback process implies that clear text development and its EXPRESS translation is an interactive process.

2. Information Modeling

The ISO 10303 STEP standards provide the enabling technology needed for product data representation and exchange. While IGES, the Initial Graphics Exchange Specification, was developed with the direct need for human intervention to assure correct translations, STEP is being developed to be machine to machine processable with no human intervention. See references [3] and [6].

STEP provides a representation of product information along with the necessary mechanisms and definitions to enable product data to be exchanged among different computer systems and environments associated with the complete product life cycle process.

12http://www.scra.orglusprollstdslstepage.html- "STEP on a Page" 131S0 10303, Proposed Application Protocol AP-232: Technical Data Packaging Core Information and Exchange 141S0 10303, Proposed Application Protocol AP-208: Life Cycle Product Change Process

527

STEP uses the formal information modeling language EXPRESS to specify product information in a computer readable manner. This formal language enables precision and consistency of representation and facilitates the development of applications. STEP also uses application protocols (APs) to represent generic classes of information relevant to broad application areas.

The overall objective of STEP is to provide a mechanism that is capable of describing product data throughout the life cycle, independent from any particular system. The nature of this description makes it suitable not only for neutral file exchange, but also as a basis for implementing and sharing product data bases and archiving. The ultimate goal is an integrated product information database that is accessible and useful to all the resources necessary to support a product over its life cycle.

2.1 EXPRESS - Language for Information Modeling

The language EXPRESS (ISO 10303 Part 11) is the international standard language for information modeling. EXPRESS is a data-specification language as defined in ISO 10303 Part 1: Overview and Fundamental Principles. It consists of language elements which allow an unambiguous-object definition and specification of constraints on the objects defined15. Reference [7] provides an excellent introduction to the language with illustrative examples.

Within this language the schema contains the information model relative to a focused universe of discourse. The entity defines some atomic level of information associated with the product within the universe of discourse and attributes provide the properties of the entity relative to all relevant product views. Context, rules, relationships, functions and constraints between entities and their attributes can be defined within and across schemas.

The following is an example of information modeling with EXPRESS:

SCHEMA car_trace; ENTITY history;

item car; transfers LIST [o:?] OF transfer;

END_ENTITY;

ENTITY transfer; item car; prior owner; new owner; on date;

INVERSE mus t_be_in_his tory history FOR transfers;

END_ENTITY;

ENTITY car;

15 ISO lO303 Part 11: EXPRESS Reference Manual

model_type made_by mnfg_no registration_no production-year

UNIQUE joint single

END_ENTITY;

ENTITY car_model; name made_by consumption

END_ENTITY;

ENTITY date; day month year

WHERE days_ok

528

day <= 31};

car_model manufacture; STRING; STRING; date;

made_by, mfg_no; registration_no;

STRING; manufacture; REAL;

INTEGER; INTEGER; INTEGER;

{1 <= months_ok:

{1 <= month <= 12}; year_ok

year> 0; date_ok

valid_date(SELF); END_SCHEMA;

The above example illustrates the definition of entities, attributes and associated constraints. It is part of a model for the process of tracking automobile registration. The entity history provides the arbitrary length transfer list. The entity transfer defines exactly what information is recorded with each change of ownership. The entity car_model defines car model information. The entity car defines exactly what information is recorded for each car. Within the entity car two uniqueness constraints must be satisfied, registration numbers must be unique and manufacture number per manufacturer must be unique. The entity date has constraints on day, month, year integer value and also uses the function valid_date (not shown) to check for proper day-month relations and leap years.

Hopefully this short example provides sufficient inference material for the reader to develop the suspicion that the language might have sufficient information modeling capability to form a bridge between a clear text definition of technical information and a machine readable translation, that is also human readable.

2.2 Technical Data Packages

One method for collecting information is to design technical data collection packages that support the life cycle need to gather, review and distribute design

529

information in a timely manner. A process enhancement approach is to design the packages to compliment established documentation delivery requirements and product reviews. The packaged data provides a particular view of product information at a particular time point in the life cycle process. ISO 10303 AP 232 - "Technical Data Packaging Core Information and Exchange" provides the structure to package and relate groups of product information so that configuration controlled exchanges can be achieved. This Application Protocol provides an information structure that meets the requirements for collecting, organizing and managing the exchange of a complex set of data files and databases associated with the technical information being packaged.

2.3 Life Cycle Change Management

Product life cycle information management includes the exchange of product data relative to the identification of a problem and its causes. Product life cycle change management includes the identification of the reason for change, its cause, the approval and performance of the resulting changes to the product, and tl)e authorization of corrective actions to prevent reoccurrence of the anomaly. ISO 10303 AP 208 - "Life Cycle Management - Change Process" provides the structure necessary to support the management of change with the product and its subsystems.

3. Baseline Spacecraft System Definition

Section 1 provides a list of focused information collections that could be transformed into STEP compatible technical data packages. For any organization with a well-established product line significantly large segments of the focused information collections are repeated with each new product. These repeated segments can be expressed in such a manner that the information template will satisfy the data recording needs of all products within the product line.

The output of the NASA spacecraft proposal development phase provides an illustrative example. The deliverable is a definition of the baseline system. It is a well structured collection of information developed by expert teams representing the subsystem design and development groups shown in Fig. 1. Relative to the needs of the systems engineer this view of the baseline system is complete and unambiguous. To support this proposal development process various NASA groups are developing highly advanced software support systemsI6.17.18.

16 http://gsfccost.gsfc.nasa.gov/mio/assetlasset.htm - Advanced System Synthesis and Evaluation Tool (ASSET) 17 http://pdc.jpl.nasa.gov/ - JPL's Project Design Center 18 http://mijuno.larc.nasa.gov/default.html- Design for Competitive Advantage

530

Fig. 1. System's engineering view of NASA spacecraft subsystems

Irrespective of proposal development approach the resultant baseline system is effectively defined by a template of information, the items of which are referred to as design parameters. During the proposal development stage this template is instantiated with the best estimates that the expert teams and intelligent support aids can provide. Once this phase ends the best estimates are refined by detailed work carried out within subsystem design and development groups.

Baseline subsystems can also be defined by templates of information and technical data packages developed for them.

From a technical information management perspective it is important to note where the information comes and came from and how it was developed. The system must be able to both identify the initial creator team and all appropriate links and pointers to instantiation procedures.

Relative to STEP, within the baseline system template, each system design parameter and all associated elements of information become EXPRESS entities. Each entity has an associated list of attributes. These support the viewing needs of the different design and development groups. As will be shown, additional attributes can be added to support knowledge infusion

The language EXPRESS defines the information elements of the template while ISO 10303 AP 232 - "Technical Data Packaging Core Information and Exchange" provides the additional structure needed to track the information creation trail. Who did what, when, why, what were the results and who approved it?

As the volume of product technical information accumulates the desire to infuse knowledge and intelligence into the process becomes a necessity. Database maintenance becomes a more and more difficult process. Data redundancy, ambiguity and completeness are critical uses requiring intelligent support aids. Furthermore, if the product is complex, intelligent aids are desired to support cross department and cross disciplinary design checking.

531

3.1 Capturing Intelligence

Intelligence is the capacity to acquire and apply knowledge. During one of Zadeh's NATO/AS I lectures it was suggested that the subject domain of artificial intelligence is based upon hard computing. That is, quantitative, precise and formal computation with numbers. It was also suggested that the subject domain of computational intelligence is based upon soft computing. That is,qualitative, imprecise and informal computation with words. Both subjects areas are based upon the assumption that acquired knowledge can be presented in some machine readable format.

The author proposes to take one step further into the domain were acquired knowledge is not machine readable. In this domain knowledge is in the format of metaphors stored in the minds of product experts. It is suggest that creative intelligence is based upon computation with metaphors.

This concept aids in understanding the process and difficulties associated with acquiring knowledge from experts. The concept implies that an intelligent support capability must first translate the metaphors of experts into machine readable constructs of hard computation numbers and soft computation words. It is not difficult to be convinced that this is very difficult. One can also be convinced that, for the foreseeable future, the creation process will remain in the minds of the experts; while, what is mundane to the expert can be automated.

The latter has been proven many times over by the CAx software development

community. The key to process automation is to make provision for the machine readable

acquisition of all knowledge needed to make tasks within the product life cycle process "mundane". If the task is mundane to the expert and if all information needed for its automation is databased; then, it can be entered into the "task to be automated" queue.

3.2 Knowledge Representation

The dictionary definition of knowledgel9 goes well beyond what today's technology can hope to represent in a machine readable format. This contribution bounds "knowledge representation" by the tools and capabilities available for its instantiation within an information management system.

To illustrate a method for infusing knowledge into such a system, consider the need to provide linguistic descriptors to the entity car, defined in Section 2.1, and label them car_body_condition, car_engineJondition and carJlassification. The

19 1) the act, fact, or state of knowing; specifically, a) acquaintance or familiarity (with a fact, place, etc.). b) awareness. c) understanding. 2) acquaintance with facts; range of information, awareness, or understanding. 3) all that has been perceived or grasped by the mind; learning; enlightenment. 4) the body of facts accumulated by mankind. From: Webster's New World Dictionary of the American Language, 1966.

532

attributes car _bodY30ndition and car _engine_condition are subjective linguistic measures provided by an inspector. The attribute car _classification is another linguistic measure. It may be provided directly by an appraiser using the estimates of the inspector or it may be computationally inferred from the inspector's estimates in combination with other available information such as car model, age, manufacturer, owner history, etc. Within the insurance industry this is the claims adjustment process.

The ability to automate the instantiation of the attribute car_classification requires an ability to reason in the face of uncertain information. This capability is based upon the use of fuzzy sets, possibility theory and approximate reasoning as outlined in references [2] and [8]. It requires an ability to aggregate fuzzy subsets and to use fuzzy sets to constraint variables.

The knowledge representation problem reduces to the need to supply all information needed for the operations of fuzzy set aggregation and for the statement of propositions. The following definitions for fuzzy sets and propositions are used.

• Assume X is a set serving as the universe of discourse. A fuzzy subset A of X is associated with a characteristic function 1.1. A such that

(1)

In the framework of fuzzy set theory the characteristic function is generally called the membership function associated with the fuzzy subset A. This terminology stresses the idea that a fuzzy set is defined by the pair of points x and 1.1. A (x). For every value of x within the universe of discourse X there is a

different degree of belonging to the fuzzy set A. The degree of belonging is defined by 0 <= 1.1. A <= 1 .

The following EXPRESS code captures this definition of fuzzy subsets and membership functions within the context of the above example:

ENTITY car_condition_set; SUPERTYPE OF (ONEOF(trash, poor, fair, average, good, very_good, mint)); universe_of_discourse car; constraint_type : constraint_types;

END_ENTITY;

ENTITY trash; SUBTYPE OF (car_condition_set); membership : membership_function;

END_ENTITY;

etc.

533

ENTITY car_class_set; SUPERTYPE OF (ONEOF(junker, budget, economy, mid_range, family_range, deluxe, luxury»; universe_of_discourse car; constraint_type : constraint_types;

END_ENTITY;

ENTITY junker; SUBTYPE OF (car_class_set); membership

END_ENTITY;

etc.

ENTITY membership_function;

membership_function;

graph LIST(O:?) x_mu_coordinate; END_ENTITY;

ENTITY x_mu_coordinate; x_coordinate mu_coordinate

WHERE 0.0 <= mu_coordinate <= 1.0;

END_ENTITY;

REAL; REAL;

In the above code the entities car 30ndition_set and car _class_set are supertypes. Their attributes universe_of-discourse and constrainctype are inherited by each of their respective SUbtypes. The subtype entities trash, poor, ... and junker, budget, ... are the fuzzy subsets of the universe of discourse. Each will have a membership function. The membership function is not inherited since it will be different for each fuzzy subset within the universe of discourse.

The entity membership_function provides an example of how the EXPRESS language enforces consistency. The EXPRESS specification requires that the function be databased as linear segments connecting the listed set of points. The entity x_mu_coordinate defines the point [x, ~(x)] as a real number pair

and bounds ~(x) within the interval [0,1). If users wish to communicate to the

database using other functional formats "user friendlies" must be provided. These are written using the programming objects that can be derived from the EXPRESS specification. This enables views to be tailored to in-house corporate preferences, while leaving the database untouched and readable by foreign groups having access to the EXPRESS specification.

• A proposition in a natural language is viewed as a network of fuzzy constraints. Upon aggregation, the constraints which are embodied in the proposition result in an overall fuzzy constraint which can be represented as an expression of the form

XisR (2)

534

where R is a constraining fuzzy relation and X is the constrained variable. The generalized constraint is represented as

XisrR (3)

where "isr" is a variable which defines the way in which R constrains X, see reference [9]. The "isr" variables identify constraint relation type. This information is required for the operations of fuzzy subset aggregation. The associated keywords are identified via the semantic construct constrainctypes defined by the TYPEing operator of EXPRESS. The following EXPRESS code captures this generalized definition of constraiIJ,t type by providing an enumerated list of allowable constraint relation types:

TYPE constraint_types = ENUMERATION OF (equal, possibilistic, conjunctive, probabilistic, usuality, random_set, random_fuzzy_set, fuzzy_graph, rough_set) ;

END_TYPE;

3.3 EXPRESS - Not a Computing Language

The language EXPRESS is not a computing language. It effectively provides the template for information to be stored within an object oriented database. Within this context knowledge representation via the statement of propositions is accomplished during database instantiation. The following EXPRESS code provides the necessary template for car condition and appraisal knowledge representation. It uses the above EXPRESS definitions for fuzzy subsets and adds 3 new attributes to the attribute list of the entity

ENTITY car; model_type made_by mnfg_no registration_no production-year car_body_condition car_engine_condition car_classification

UNIQUE joint single

END_ENTITY;

car_model manufacture; STRING; STRING; owner; car_condition_set; car_condition_set; car_class_set;

made_by, mfg_no; registration_no;

The entity car now contains three linguistic variables that may be instantiated with the fuzzy subsets identified in the respective "SUPERTYPE OF" lists of the

535

entities car_condition_set and car_class_set. The associated database instantiation process is the proposition statement process.

It important to note that EXPRESS provides for the modeling of information. How it is to be used is of no concern to EXPRESS. In this example, the attribute car _classification may be provided directly by an appraiser or it may be derived via the fuzzy inference techniques associated with the theory of approximate reasoning.

3.4 Link to Soft Computing

The introduction of linguistic variables to an entity's attribute list, with an enumerated list of associated fuzzy set names, provides the necessary link to the realm of soft computing. Entities which must be vie'Yed from several perspectives (mass, temperature, size, ... ) will have several associated linguistic variable descriptors.

Attribute instantiation is the first problem. In many situations these are simply estimated via expert experience. In other situations a body of information exists which will allow the techniques of fuzzy cluster analysis to be applied to the attribute instantiation process.

Knowledge representation is the next problem. This is enabled by soft computing's ability to compute with words. Complex IF .,. THEN ... word rules can be set up and evaluated via the theory of approximate reasoning. This can

• Support the engineering review process by creating a library of the IF ... THEN ... design rules used by the experts during the review process. These can be linked with the technical information management system's object oriented database to continuously perform background disciplinary and cross disciplinary checks on all baseline system design variables as they are entered into the system.

• Support the launching of data mining queries to a digital library for background information to be checked when certain design conditions are encountered. For example, when design parameters near a past region of anomalous or reduced performance.

• Support the product proposal creation and design process by linguistic sensitivity analysis. This analysis yields the allowable linguistic ranges for design parameter variation. This knowledge can complement and add insight into the results obtained by traditional, numerical, sensitivity analysis.

• Automate the checking for standards, best practices and lessons learned violation.

3.5 Computing with Words for Design Insight

In a metaphoric sense, consider the rectangular matrix equation

536

(4)

were { x} contains numeric measures of design parameters, and {b} contains numeric measures for design requirements, cost and performance objectives. The rectangular matrix [A] captures all design parameter dependencies associated with the physical and natural laws associated with modeling product requirements, cost and performance. The design process attempts to find an optimal solution to {x} relative to the measures contained in {b}. Of course the actual design process is more complex and carried out differently; but, this is its metaphoric essence.

Now let the elements of {x} and {b} contain linguistic measures. Matrix [A] still captures all design parameter dependencies however the operations of summation and multiplication must now be replaced by fuzzy constraint aggregation and the operators used for generalized constraint specification, as defined in reference [9].

At a metaphoric level, one does not need to understand exactly how linguistic constraints are defined and aggregated to see that a powerful, insight rich, capability is evolving in the soft computing community.

The design equation [Al{ x} ::;; {b} defines performance, realizability, manufacturability, etc. The inability to find a linguistic solution for {x} implies a product design with either a "null" or a single point solution in design space. This implies that, for the defined mix of requirements, performance and cost objectives, all possible product designs will have the fundamental property that they will not be robust enough to satisfy the desired mix of objectives. The key here is that the linguistic solution {x} is not a point solution; it is a fuzzy region in design space. This has much deeper interpretive meaning since the size of the fuzzy region can be directly associated with design solution robustness.

Computing with words has the potential for replacing numerical sensitivity analysis via parameter variation with a one step process that defines the fuzzy region in which the optimal design solution exists.

4. STEP - Subsystem Interface Integration

The STEP-SII (Subsystem Interface Integration) project argues that subsystem interface specifications are technical information and therefore they too can be presented in a machine readable manner via EXPRESS. In this format the interface specifications become complete, unambiguous and nonredundant; a goal that is exceedingly hard to achieve via traditional textural documentation.

Figure 2 provides a big picture view of the STEP-SII project. The process starts with a clear text definition of a subsystem's interface. The text of this Interface Control Document (ICD) is then translated into the EXPRESS language. Once in EXPRESS a variety of commercial software tools20,21 are available to provide

lOtittp:/Iwww.steptools.com - STEP Tools Inc. STEP Software for World-wide Manufacturing 21http://www.iti-oh.com - International TechneGroup Incorporated

537

access to object oriented databases and objects for use with programming languages such as C++ and JAVA. The EXPRESS model effectively defines the template for the technical information to "be stored in the database. The next step is to instantiate the database. To support this. user friendlies are developed with a software toolbox of graphical user interfaces (GUIs). The availability of EXPRESS derived objects and GUIs makes this a straightforward process. In Fig. 2 software linkages SIW 1 and SIW 2 instantiate the database with standards and project specific data. SIW 3 enables data to be requested and SIW 4 enables it to be obtained. Since the database is object oriented it is possible for the system itself to request missing items of data. This is done by SIW 5. Engineers use requested data via "established in-house lines of communication" to do whatever has to be done to satisfy the data request and deliver it to the database via SIW 6.

Project Life Cycle Design. Analysis. Verification and Validation Needs

SIW 1

SIW2

Project Life Cycle Design. Analysis. Verification and Validation Needs

Fig. 2. Subsystem Interface Integration

EXPRESS Model of Interfaces Defines

Object Oriented ICD

Baseline System Object Oriented

Database

Established in house ...-__ ---""'---=-_--, I-___ li .. nes .... o .. f _~I User Preferred Design.

SIW7 Analysis. Verification

and Validation Tools and Capabilities

As this process proceeds there are many "mundane" tasks that must be done. These are mundane. to the expert. in the sense that what has to be done is clearly known and all the data needed to do the task is in the database. When this situation exists the potential for task automation is high. SIW 7 is the link which provides for task automation. Along this link data is automatically requested. processed and results placed into the database. The STEP-SII project intends to demonstrate that this can be done. The demonstration will use an EXPRESS model of subsystem

538

characterization datil to support a highly focused controls design analysis problem and a flight software development task22•

5. Conclusions

An infrastructure for the technical management of information has been presented. Areas requiring the support of state of the art soft computing and intelligent support capabilities have been identified. Have no illusions; implementation of the proposed infrastructure will require commitment, it will be slow and it will be painful. Doing the necessary computer science should present no major problems. The EXPRESS translation of technical data packages should become easier after the first few have been done. The hardest part will be capturing the soft, crisp and fuzzy dependencies and relationships between the items of information that define the product. This must be overcome and there appears to be no easy solution, today. Never-the-Iess, the author is convinced that the road to significant and obvious process improvement goals is the road which takes a system level view while providing intelligent support and enabling task automation within and across subsystems.

References

1. Peter Brown, M.D., 'The Hypnotic Brain, Hypnotherapy and Social Communication," Yale University Press, 1991

2. Didier Dubois and Henri Prade, "Possibility Theory, An Approach to Computerized Processing of Uncertainty, Plenum Press, 1988

3. Julian Fowler, "S'IEP for Data Management, Exchange and Sharing," Technology Appraisals, UK, 1995

4. Jonathan T. Hujsak,"Digital Libraries and Corporate Technology Reuse," D-Lib Magazine, January 1996, http://www.dlib.orgldlib/january96/01hujsak.html

5. "NASA Systems Engineering Process for Programs and Projects," NASNJSC, JSC-49040, Oct 1994

6. Jon Owen, "S'IEP: An Introduction," Information Geometers Ltd., 1993

7. Douglas A. Schenck and Peter R. Wilson, "Information Modeling: The EXPRESS Way," Oxford University Press, 1994

8. Ronald R Yager and Dimitar P. Filev, "Essentials of Fuzzy Modeling and Control," John Wiley & Sons, 1994

9. Lotti A. Zadeh, "Fuzzy Logic = Computing with Words," IEEE Transactions on Fuzzy Systems, Vol. 4, No.2, May 1996, pp. 103-111.

22http://skipper2.mar.external.lmco.com!sii/index.html - The NASNGSFC supported S'IEP-SII (Subsystems Interface Integration) Project, at Lockheed Martin

NATO ASI Series F

Including Special Programmes on Sensory Systems for Robotic Control (ROB) and on Advanced Educational Technology (AFT) Vol. 92: Hypermedia Courseware: Structures of Communication and Intelligent Help. Edited by A. Oliveira. X, 241 pages. 1992. (AET)

Vol. 93: Interactive Multimedia Leaming Environments. Human Factors and Technical Considerations on Design Issues. Edited by M. Giardina. VIII, 254 pages. 1992. (AET)

Vol. 94: Logic and Algebra of Specification. Edited by F. L. Bauer, W. Brauer, and H. Schwichtenberg. VII, 442 pages. 1993.

Vol. 95: Comprehensive Systems Design: A New Educational Technology. Edited by C. M. Reigeluth, B. H. Banathy, and J. R. Olson. IX, 437 pages. 1993. (AET)

Vol. 96: New Directions in Educational Technology. Edited by E. Scanlon and T. O'Shea. VIII, 251 pages. 1992. (AET)

Vol. 97: Advanced Models of Cognition for Medical Training and Practice. Edited by D. A. Evans and V. L. Patel. XI, 372 pages. 1992. (AET)

Vol. 98: Medical Images: Formation, Handling and Evaluation. Edited by A. E. Todd-Pokropek and M. A. Viergever. IX, 700 pages. 1992.

Vol. 99: Multisensor Fusion for Computer Vision. Edited by J. K. Aggarwal. XI, 456 pages. 1993. (ROB)

Vol. 100: Communication from an ArtificiaJ Intelligence Perspective. Theoretical and Applied Issues. Edited by A. Ortony, J. Slack and O. Stock. XII, 260 pages. 1992.

Vol. 101: Recent Developments in Decision Support Systems. Edited by C. W. Holsapple and A. B. Whinston. XI, 618 pages. 1993.

Vol. 102: Robots and Biological Systems: Towards a New Bionics? Edited by P. Dario, G. Sandini and P. Aebischer. XII, 786 pages. 1993.

Vol. 103: Parallel Computing on Distributed Memory Multiprocessors. Edited by F. OzgGner and F. En;al. VIII, 332 pages. 1993.

Vol. 104: Instructional Models in Computer-Based Learning Environments. Edited by S. Dijkstra, H. P. M. Krammer and J. J. G. van Merrienboer. X, 510 pages. 1993. (AET)

Vol. 105: Designing Environments for Constructive Leaming. Edited by T. M. Duffy, J. Lowyck and D. H. Jonassen. VIII, 374 pages. 1993. (AET)

Vol. 106: Software for Parallel Computation. Edited by J. S. Kowalik and L. Grandinetti. IX, 363 pages. 1993.

Vol. 107: Advanced Educational Technologies for Mathematics and Science. Edited by D. L. Ferguson. XII, 749 pages. 1993. (AET)

Vol. 108: Concurrent Engineering: Tools and Technologies for Mechanical System Design. Edited by E. J. Haug. XIII, 998 pages. 1993.

Vol. 109: Advanced Educational Technology in Technology Education. Edited by A. Gordon, M. Hacker and M. de Vries. VIII, 253 pages. 1993. (AET)

Vol. 110: Verification and Validation of Complex Systems: Human Factors Issues. Edited by J. A. Wise, V. D. Hopkin and P. Stager. XIII, 704 pages. 1993.

Vol. 111: Cognitive Models and Intelligent Environments for Leaming Programming. Edited by E. Lemut, B. du Boulay and G. Dettori. VIII, 305 pages. 1993. (AET)

Vol. 112: Item Banking: Interactive Testing and Self-Assessment. Edited by D. A. Leclercq and J. E. Bruno. VIII, 261 pages. 1993. (AET)

Vol. 113: Interactive Learning Technology for the Deaf. Edited by B. A. G. Elsendoom and F. Coninx. XIII, 285 pages. 1993. (AET)

NATO ASI Series F

Including Special Programmes on Sensory Systems for Robotic Control (ROB) and on Advanced Educational Technology (AB)

Vol. 114: Intelligent Systems: Safety, Reliability and Maintainability Issues. Edited by O. Kaynak, G. Honderd and E. Grant. XI, 340 pages. 1993.

Vol. 115: Leaming Electricity and Electronics with Advanced Educational Technology. Edited by M. Caillot. VII, 329 pages. 1993. (AET)

Vol. 116: Control Technology in Elementary Education. Edited by B. Denis. IX, 311 pages. 1993. (AET)

Vol. 117: Intelligent Learning Environments: The Case of Geometry. Edited by J.-M. Laborde. VIII, 267 pages. 1996. (AET)

Vol. 118: Program Design Calculi. Edited by M. Broy. VIII, 409 pages. 1993.

Vol. 119: Automating Instructional Design, Development, and Delivery. Edited by. R. D. Tennyson. VIII, 266 pages. 1994. (AET)

Vol. 120: Reliability and Safety Assessment of Dynamic Process Systems. Edited by T. Aldemir, N. O. Siu, A. Mosleh, P. C. Cacciabue and B. G. G5ktepe. X, 242 pages. 1994.

Vol. 121: Leaming from Computers: Mathematics Education and Technology. Edited by C. Keitel and K. Ruthven. XIII, 332 pages. 1993. (AET)

Vol. 122: Simulation-Based Experiential Learning. Edited by D. M. Towne, T. de Jong and H. Spada. XIV, 274 pages. 1993. (AET)

Vol. 123: User-Centred Requirements for Software Engineering Environments. Edited by D. J. Gilmore, R. L. Winder and F. Detienne. VII, 377 pages. 1994.

Vol. 124: Fundamentals in Handwriting Recognition. Edited by S. Impedovo. IX, 496 pages. 1994.

Vol. 125: Student Modelling: The Key to Individualized Knowledge-Based Instruction. Edited by J. E. Greer and G. I. McCalla. X, 383 pages. 1994. (AET)

Vol. 126: Shape in Picture. Mathematical Description of Shape in Grey-level Images. Edited by Y.-L. 0, A. Toet, D. Foster, H. J. A. M. Heijmans and P. Meer. XI, 676 pages. 1994.

Vol. 127: Real Time Computing. Edited by W. A. Halang and A. D. Stoyenko. XXII, 762 pages. 1994.

Vol. 128: Computer Supported Collaborative Learning. Edited by C. O'Malley. X, 303 pages. 1994. (AET)

Vol. 129: Human-Machine Communication for Educational Systems Design. Edited by M. D. Brouwer-Janse and T. L. Harrington. X, 342 pages. 1994. (AET)

Vol. 130: Advances in Object-Oriented Database Systems. Edited by A. Dogac, M. T. 6zsu, A. Biliris and T. Sellis. XI, 515 pages. 1994.

Vol. 131: Constraint Programming. Edited by B. Mayoh, E. Tyugu and J. Penjam. VII, 452 pages. 1994.

Vol. 132: Mathematical Modelling Courses for Engineering Education. Edited by Y. Ersoy and A. O. Moscardini. X, 246 pages. 1994. (AET)

Vol. 133: Collaborative Dialogue Technologies in Distance Learning. Edited by M. F. Verdejo and S. A. Cerri. XIV, 296 pages. 1994. (AET)

Vol. 134: Computer Integrated Production Systems and Organizations. The Human-Centred Approach. Edited by F. Schmid, S. Evans, A. W. S. Ainger and R. J. Grieve. X, 347 pages. 1994.

Vol. 135: Technology Education in School and Industry. Emerging Didactics for Human Resource Development. Edited by D. Blandow and M. J. Dyrenfurth. XI, 367 pages. 1994. (AET)

Vol. 136: From Statistics to Neural Networks. Theory and Pattem Recognition Applications. Edited by V. Cherkassky, J. H. Friedman and H. Wechsler. XII, 394 pages. 1994.

NATO ASI Series F

Including Special Programmes on Sensory Systems for Robotic Control (ROB) and on Advanced Educational Technology (AET)

Vol. 137: Technology-Based Learning Environrnents. Psychological and Educational Foundations. Edited by S. Vosniadou, E. De Corte and H. Mandl. X, 302 pages. 1994. (AET)

Vol. 138: Exploiting Mental Irnagery with Computers in Mathematics Education. Edited by R. Sutherland and J. Mason. VIII, 326 pages. 1995. (AET)

Vol. 139: Proof and Computation. Edited by H. Schwichtenberg. VII, 470 pages. 1995.

Vol. 140: Automating Instructional Design: Computer-Based Development and DeliveryTools. Edited by R. D. Tennyson and A. E. Barron. IX, 618 pages. 1995. (AET)

Vol. 141: Organizational Learning and Technological Change. Edited by C. Zucchermaglio, S. Bagnara and S. U. Stucky. X, 368 pages. 1995. (AET)

Vol. 142: Dialogue and Instruction. Modeling Interaction in Intelligent Tutoring Systems. Edited by R.-J. Beun, M. Baker and M. Reiner. IX, 368 pages. 1995. (AET)

Vol. 143: Batch Processing Systems Engineering. Fundamentals of Chemical Engineering. Edited by G. V. Reklaitis, A. K. Sunol, D. W. T. Rippin, and O. Hortac;:su. XIV, 868 pages. 1996.

Vol. 144: The Biology and Technology of Intelligent Autonomous Agents. Edited by Luc Steels. VIII, 517 pages. 1995.

Vol. 145: Advanced Educational Technology: Research Issues and Future Potential. Edited by T. T. Liao. VIII, 219 pages. 1996. (AET)

Vol. 146: Computers and Exploratory Learning. Edited by A. A. diSessa, C. Hoyles and R. Noss. VIII, 482 pages. 1995. (AET)

Vol. 147: Speech Recognition and Coding. New Advances and Trends. Edited by A. J. Rubio Ayuso and J. M. L6pez Soler. XI, 505 pages. 1995.

Vol. 148: Knowledge Acquisition, Organization, and Use in Biology. Edited by K. M. Fisher and M. R. Kibby. X, 246 pages. 1996. (AET)

Vol. 149: Emergent Computing Methods in Engineering Design. Applications of Genetic AlgOrithms and Neural Networks. Edited by D.E. Grierson and P. Hajela. VIII, 350 pages. 1996.

Vol. 150: Speechreading by Humans and Machines. Edited by D. G. Stork and M. E. Hennecke. XV, 686 pages. 1996.

Vol. 151: Computational and Conversational Discourse. Burning Issues - An Interdisciplinary Account. Edited by E. H. Hovy and D. R. Scott. XII, 202 pages. 1996.

Vol. 152: Deductive Program Design. Edited by M. Broy. IX, 467 pages. 1996.

Vol. 153: Identification, Adaptation, Learning. Edited by S. Bittanti and G. Picci. XIV, 553 pages. 1996.

Vol. 154: Reliability and Maintenance of Complex Systems. Edited by S. Ozekici. XI, 589 pages. 1996.

Vol. 155: Cooperation: Game-Theoretic Approaches. Edited by S. Hart and A. Mas-Colell. VIII, 328 pages. 1997.

Vol. 156: Microcomputer-Based Labs: Educational Research and Standards. Edited by R.F. Tinker. XIV, 398 pages. 1996. (AET)

Vol. 157: Logic of Computation. Edited by H. Schwichtenberg. VII, 396 pages. 1997.

Vol. 158: Mathematical Methods in Program Development. Edited by M. Broy and B. Schieder. VIII, 528 pages. 1997.

Vol. 159: Fractal Image Encoding and Analysis. Edited by Y. Fisher. XIX, 362 pages. 1998

Vol. 160: Discourse, Tools, and Reasoning: Essays on Situated Cognition. Edited by L.B. Resnick, R. Si:iljo, C. Pontecorvo and B. Bunge. XII, 474 pages. 1997 (AET)

NATO ASI Series F

Vol. 161: Computational Methods in MechanicaJ Systems. Edited by J. Angeles Md E. zakhariev. X, 425 pages, 1998

Vol 162: Computalionallnlelligence: Soft Computing and Fuzzy-Neuro Integration with Applications. Edited by O. Kaynal<, L.A. Zadeh, B. TOrksen, I., J. Audas. IX, 538 pages, 1998

Vol. 163: Face Recognition: From Theory to AppHcatioos. Edited by H. Wechsler, P.J. Phi llips, V. Bruce, F. Fogelman Soulie, T.S. Huang. IX, 626 pages. 1998

Vol. 164: Workflow Management Systems and Interoperability. Edited by A. ~, L. Kalinichenko, M.T. 6zsu, A. Sheth. XVlI. 481 pages, 1998

Computational Intelligence: Soft Computing and Fuzzy-Neuro Integration with Applications

Documents

Transcript of Computational Intelligence: Soft Computing and Fuzzy-Neuro Integration with Applications