Ch 6 final

Semantic NetsFramesSlots ExceptionsSlot – Values as ObjectsHandling UncertaintiesProbabilistic ReasoningUse of Certainty factorFuzzy Logic

Unit 6Structured Knowledge Representation Semantic Net

Learning Objectives

After reading this unit you should appreciate the following:

Semantic Nets

Frames

Slots Exceptions

Handling Uncertainties

Probabilistic Reasoning

Use of Certainty Factors

Fuzzy Logic

Top

Semantic Nets

In a semantic net, information is represented as a set of nodes connected to each other by a set of labelled ones, which represent relationships among the nodes. A fragment of a typical semantic net is shown in Figure 6.1.

ARTIFICIAL INTELLIGENCE

Figure 6.1: A Semantic Network

This network contains example of both the isa and instance relations, as well as some other, more domain-specific relations like team and uniform-color. In this network we would use inheritance to derive the additional relation.

has-part (Pee-Wee-Reese, Nose)

Intersection Search

Intersection search is one of the early ways that semantic nets used to find relationships among objects by spreading activation out from each of two nodes and seeing where the activation met. Using this process, it is possible to use the network of Figure 6.1 to answer questions such as “What is the connection between the Brooklyn Dodgers and blue?” This kind of reasoning exploits one of the important advantages that slot-and-filler structures have over purely logical representation because it takes advantage of the entity –based organization of knowledge that slot-and-filler representation provide.

To answer more structured questions, however, requires networks that are themselves more highly structured. In the next few sections we expand and refine our notion of a network in order to support more sophisticated reasoning.

Representing Nonbinary Predicates

Semantic nets are a natural way to represent relationships that would appear as ground instance of binary predicates in predicate logic. For example, some of the areas from Figure 6.1 could be represented in logic as. (Figure 6.2)

Figure 6.2: A Semantic Net for an n-Place Predicate

isa (Person, Mammal)instance (Pee-Wee-Reese, Person)team (Pee-Wee-Reese, Broklyn-Dodgers)uniform-color(Pee-Wee-Reese,Blue)

But the knowledge expressed by predicates of other airties can also be expressed in semantic nets. We have already seen that many unary predicates in logic can be though of a binary predicates using some very general-purpose predicates, such as isa and instance. So for example,

140

STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET

man(Marcus)

could be rewritten as

instance (Marcus, Man)

thereby making it easy to represent in a semantic net.

Three or more place predicates can also be converted to a binary form by creating one new object representing the entire predicate statement and then introducing binary predicates to describe the relationship to this new object of each of the original arguments. For example, suppose we know that

score (Cubs, Dodgers, 5-3)

This can be represented in a semantic net by creating a node to represent the specific game and then relating each of the three pieces of information to it. Doing this produces the network shown in Figure 6.2.

This technique is particularly useful for representing the contents of a typical declarative sentence that describes several aspects of a particular event. The sentence could be represented by the network show in Figure 6.3.

John gave the book to Mary

Figure 6.3: A Semantic Net Representing a Sentence

Making Some Important Distinctions

In the networks we have described so far, we have glossed over some distinctions that are important in reasoning. For example, there should be a difference between a link that defines a new entity and one that relates two existing entities. Consider the net

Both nodes represent objects that exist independent of their relationship to each other. But now suppose we want to represent the fact that John is taller than Bill, using the net.

141


The nodes H1 and H2 are new concepts representing John’s height and Bills’ height, respectively. They are defined by their relationships to the nodes John and Bill. Using these defined concepts, it is possible to represent such facts as that John’s height increased, which we could not do before. (The number 72 increased.)

Sometimes it is useful to introduce the arc value to make this distinction clear. Thus we might use the following net to represent the fact that John is 6 feet tall and that he is taller than Bill:

The procedures that operate on nets such as this can exploit the fact that some arcs, such as height, define new entities, while others, such as greater-than and value, merely describe relationship among existing entities.

Yet another example that we have missed is the difference between the properties of a node itself and the properties that a node simply holds and passes on to its instances. For example, it is a properly of the node Person that it is a subclass of the node Mammal. But the node Person does not have as one of its parts a nose. Instances of the node Person do, and we want them to inherit it.

It’s easier said than done to capture these distinctions without assigning more structure to our notions of node, link, and value. But first, we discuss a network-oriented solution to a simpler problem; this solution illustrates what can be done in the network model but at what price in complexity.

Partitioned Semantic Nets

Suppose we want to represent simple quantified expressions in semantic nets. One was to do this is to partition the semantic net into a hierarchical set of spaces, each of which corresponds to the scope of one or more variables. To see how this works, consider first the simple net shown in Figure 6.4. This net corresponds to the statement.

The dog bit the mail carrier.

The nodes Dogs, Bite, and Mail-Carrier represent the classes of dogs, bitings, and mail carriers, respectively, while the nodes d, b, and m represent a particular dog, a

142


particular biting, and a particular mail carrier. A single net with no partitioning can easily represent this fact.

But now suppose that we want to represent the fact

Every dog has bitten a mail carrier.

Figure 6.4: Using Partitioned Semantic Nets

It is necessary to encode the scope of the universally quantified variable x in order to represent this variable. This can be done using partitioning as shown in Figure 6.4(b). The node stands for the assertion given above. Node g is an instance of the special class GS of general statements about the world (i.e., those with universal quantifiers). Every element to GS has at least two attributes: a form, which states the relation that is being assert one or more connections, one for each of the universally quatified variable. In this example, there is only one such variable d, which can stand for any element the class Dogs. The other two variables in the form, b and m, are understood to existentially qualified. In other words, for every dog d, there exists a biting event and a mail carrier m, such that d is the assailant of b and m is the victim.

To see how partitioning makes variable quantification explicit, consider next similar sentence:

Every dog in town has bitten the constable

The representation of this sentence is shown in Figure 6.4 (c). In this net, the node representing the victim lies outside the form of the general statement. Thus it is not viewed as an existentially quantified variable whose value may depend on the value of d. Instead it is interpreted as standing for a specific entity (in this case, a particular table), just as do other nodes in a standard, no partitioned net. Figure 6.4(d) shows how yet another similar sentences

143


Every dog has bitten every mail carrier.

should be represented. In this case, g has two links, one pointing to d, which represents dog, and one pointing to m, representing ay mail carrier.

The spaces of a partitioned semantic net are related to each other by an inclusion Searchy. For example, in Figure 6.4(d), space SI is included in space SA. Whenever search process operates in a partitioned semantic net, it can explore nodes and arcs in space from which it starts and in other spaces that contain the starting point, but it does not go downward, except in special circumstances, such as when a form arc is being traversed. So, returning to figure 6.4(d), from node d it can be determined that d must be a dog. But if we were to start at the node Dogs and search for all known instances dogs by traversing is a likes, we would not find d since it and the link to it are in space SI, which is at a lower level than space SA, which contains Dogs. This is constant, since d does not stand for a particular dog; it is merely a variable that can be initiated with a value that represents a dog.

Student Activity 6.1

Before reading next section, answer the following questions.

1. How semantic nets are useful in knowledge representation?

2. Draw semantic nets for the following facts:

a. Every player hits the ball.

b. Every dog eats a cat.

If your answers are correct, then proceed to the next section.

The Evolution into Frames

The idea of a semantic net started out simply as a way to represent labelled connections among entities. But, as we have just seen, as we expand the rage of problem-solving tasks that the representation must support, the representation itself necessarily begins to become more complex. In particular, it becomes useful to assign more structure to nodes as well as to links. Although there is no clear distinction between a semantic net and a frame system, the more structure the system has, the more likely it is to be termed a frame system. In the next section we continue our discussion of structured slot and-filler representations by describing some of the most important capabilities that frame systems offer.

Top

Frames

A frame is a collection of attributes called slots and associated values (and possibly constraints on values) that describe some entity in the world. Sometimes a frame describes an entity in some absolute sense; sometimes it represents the entity from particular point of view. A single frame taken alone is rarely useful. Instead, we build frame systems out of collections of frames that are connected to each other by virtue of the fact that the value of an attribute of one frame may be another frame. In the rest of this section, we expand on this simple definition and explore ways that frame systems can be used to encode knowledge and support reasoning.

144


Default Frames

Set theory provides a good basis for understanding frame systems. Although not all frame systems are defined this way, we do so here. In this view, each frame represents either a class (a set) or an instance (an element of a class). To see how this works, consider the frame system shown in Figure 6.5. In this example, the frames Person, Adult-Male, ML-Baseball-Player (corresponding to major league baseball players) Pitter, and ML-Baseball-Team (for major league baseball team) are all classes. The frame Pee-Wee-Reese and Brooklyn-Dodgers are instances.

The is a relation that we have been using without a precise definition is in fact the subset relation. The isa of adult males is a subset of the set of people. The set of major league baseball players is a subset of the set of adult males, and so forth. Our instance relation corresponds to the relation element of Pee Wee Reese that is an element of the set of fielders. Thus he is also an element of all of the superset of fielders, including major league baseball players and people. The transitivity of isa that we have taken for granted in our description of property inheritance follows directly from the transitivity of the subset relation.

Both the isa and instance relations have inverse attributes, where we call subclasses and all-instances. We do not bother to write them explicitly in our examples unless we need to refer to them. We assume that the frame system maintains them automatically. Either explicitly or by computing them if necessary.

Since a class represents a set, there are two kinds of attributes that can be associated with it. There are attributes about the set itself, and there are attributes that are to be inherited by each element of the set. We indicate the difference between these two by prefixing the latter with an asterisk (*). For example, consider the class ML-Baseball-Player. We have shown only two properties of it as a set: It is a subset of the set of adult males. And it has cardinality 624 (i.e., there are 624 major league baseball players). We have listed five properties that all major league baseball players have (height, bats, batting-average, team, and uniform-colour), and we have specified default values for the first three of them. By providing both kinds of slots, we allow a class both to define a set of objects and to describe a prototypical object of the set.

Sometimes, the distinction between a set and an individual instance may not be seen clearly. For example, the team Brookln-Dodgers, which we have described as a instance of the class of major league baseball teams, could be thought of as a set of players in fact, notice that the value of the slot players is a set. Suppose, instead, what we want to represent the Dodgers as a class instead of an instance. Then its instances would be the individual players. It cannot stay where it is in the isa hierarchy; it cannot be a subclass of ML-Baseball-Team, because if it were, then its elements, namely the players would also, by the transitivity of subclass, be elements of ML-Baseball-team, which is not what we want so say. We have to put it somewhere else in the isa hierarchy. For example, we could make it a subclass of major league baseball players. Then its elements, the players, are also elements of ML-Baseball-Players, Adult-Male, and Person. That is acceptable. But if we do that, we lose the ability to inherit properties of the Dodges from general information about baseball teams. We can still inherit attributes for the elements of the team, but we cannot inherit properties of the team as a whole, i.e., of the set of players. For example, we might like to know what the default size of the team is, that it has a manager, and so

145


on. The easiest way to allow for this is to go back to the idea of the Dodgers as an instance of ML-Baseball-Team, with the set of players given as a slot value.

Personisa: Mammalcardinality: 6,000,000,000*handed: Right

Adult-Maleisa: Personcardinality: 2,000,000,000*height; 5-10

ML-Baseball-Playerisa: Adult-Malecardinality: 624*height: 6-1*bats: equal to handed*batting-average: . 252*team:*uniform-color:

FielderIsa: ML-Baseball-Playercardinality: 36*batting-average .262

Johaninsance: Fielderheight: 5-10bats: Rightbatting-average: . 309team: Brooklyn-Dodgersuniform-color: Blue

ML-Baseball-Teamisa: Teamcardinality: 26*team-size: 24*manager: 24

Brooklyn-Dodgersinstance: ML-Baseball-Teamteam-size: 24manager: Leo-Durocherplayers: (Johan,Pee-Wee-Reese,…)

Figure 6.5: A Simplified Frame System

But what we have encountered here is an example of a more general problem. A class is a set, and we want to be able to talk about properties that its elements

146


possess. We want to use inheritance to infer those properties from general knowledge about the set. But a class is also an entity in itself. It may possess properties that belong not to the individual instances but rather to the class as a whole. In the case of Brooklyn-Dodgers, such properties included team size and the existence of a manager. We may even want to inherit some of these properties from a more general kind of set. For example, the Dodgers can inherit a default team size from the set of all major league baseball teams. To support this, we need to view a class as two things simultaneously: a subset (isa) of a larger class that also contains its elements and an instance (instance) of a class of sets, from which it inherits its set-level properties.

To make this distinction clear, it is useful to distinguish between regular classes,

whose elements are individual entities, and meta classes, which are special classes

whose elements are themselves classes. A class is now an element of (instance)

some class (or classes) as well as a subclass (isa) of one or more classes. A class

inherits properties from the class of which it is an instance, just as any instance does.

In addition, a class passes inheritable properties down from is super classes to its

instances.

Let’s consider an example. Figure 6.6 shows how we would represent teams as

classes using this distinction. Figure 6.7 shows a graphic view of the same classes.

The most basic met class in the class represents the set of all classes. All classes are

instance of it, either directly or through one of its subclasses. In the example, Team

in a subclass (subset) of Class and ML-Baseball-Team is a subclass of Team. The class

introduces the attribute cardinality, which is to be inherited by all instances of Class

(including itself). This makes sense that all the instances of Class are sets and all sets

have cardinality.

Team represents a subset of the set of all sets, namely those elements are sets of

players on a team. It inherits the property of having cardinality from Class. Team

introduces the attribute team-size, which all its elements possess. Notice that team-

size is like cardinality in that it measures the size of a set. But it applies to something

different cardinality applies to sets of sets and is inherited by all elements of Class.

The slot team-size applies to the element of those sets that happen to be teams.

Those elements are of individuals.

ML-Baseball-Team is also an instance of Class, since it is a set. It inherits the property

of having cardinality from the set of which it is an instance, namely Class. But it is a

subset of Team. All of its instances will have the property of having a team-size since

they are also instances of the super class Team. We have added at this level the

147


additional fact that the default team size is 24; so all instance of ML-Baseball-Team

will inherit that as well. In addition, we have added the inheritable slot manager.

Brooklyn-Dodgers in an instance of a ML-Baseball-Team. It is not an instance of Class

because its elements are individuals, not sets. Brooklyn-Dodgers is a subclass of ML-

Baseball-Player since all of its elements are also elements of that set. Since it is an

instance of a ML-Baseball-Team, it inherits the properties team-size and manager, as

well as their default values. It specifies a new attribute uniform-colour, which is to be

inherited by all of its instances (who will be individual players).

Classinstance : Classisa : Class *cardinanality :

Team istance : Classisa : Classcardinality : { the number of teams that exist}* team size : { each team has a size}

ML – Baseball – Teaminstance : Classisa : Team

cardinality : 26 { the number of baseball teams that exist}

* team-size : 24 { default 24 players on a team}* manager :

Brooklyn-Dodgersinstance : ML – Baseball – Teamisa : ML-Baseball – Playerteam-size : 24manager : Leo – Durocher* unifoirm-color Blue

Pee-Wee – Reeseinstance : Broklyn – Dodgersinstance : Fielderuniform-color: Bluebatting –average : 309

Figure 6.6 : Representing the Class of All Teams as a Metaclass

148


Figure 6.7: Classes and Metaclasses

Finally, Pee-Wee-Reese is an instance of Brooklyn-Dodgers. That makes him also, by transitivity up isa links, an instance of ML-Baseball-Player. But recall that in earlier example we also used the class Fielder, to which we attached the fact that fielders have above average batting averages. To allow that here, we simply make Pee Wee an instance of Fielder as well. He will thus inherit properties from both Brooklyn-Dodgers and from fielder, as well as from the classes above these. We need to guarantee that when multiple inheritance occurs, as it does here, that it works correctly. Specified in this case, we need to assure that batting – average gets inherited from Fielder and not from ML-Baseball-Player through Brooklyn-Dodgers.

In all the frame system we illustrate, all classes are instances of the metaclass Class. As a result, they all have the attribute cardinality out of our description of our examples, though unless there is some particular reason to include them.

Every class is a set. But not every set should be described as a class. A class describes a set of entries that share significant properties. In particular, the default information associated with a class can be used as a basis for inferring values for the properties if it’s in individual elements. So there is an advantage to representing as a class these sets for which membership serves as a basis for nonmonotonic inheritance. Typically, these are sets in which membership is not highly ephemeral. Instead, membership is on some fundamental structural or functional properties. To see the difference, consider the following sets:

People

People who are major league baseball players

People who are on my plane to New York

The first two sets can be advantageously represented as classes, with which a sub-statistical number of inheritable attributes can be associated. The last, though, is

149


different. The only properties that all the elements of that set probably share are the definition of the set itself and some other properties that follow from the definition (e.g. they are being transported from one place to another). A simple set, with some associated assertions, is adequate to represent these facts: non-monotonic inheritance is not necessary.

Other Ways of Relating Classes to Each Other

We have talked up to this point about two ways in which classes(sets) can be related to each other. Class1 can be a subset of Class2. Or, if Class2 is a metaclass, then Class1 can be an instance of Class2. But there are other ways that classes can be related to each other, corresponding to ways that sets of objects in the world can be related.

One such relationship is mutually– disjoint– with, which relates a class to one or more other classes that are guaranteed to have no elements in common with it. Another important relationship is is-covered-by, which relates class to a set of subclasses, the man of which is equal to it. If a class is covered by a set S of mutually disjoint classes, then S is called a partition of the class.

For examples of these relationships, consider the classes shown in Figure 6.7, which represent two orthogonal ways of decomposing the class of major league baseball players. Everyone is either a pitcher a catcher, or a fielder (and no one is more than one if these). In addition, everyone plays in either the Normal League or the American League but not both.

Top

Slots Exceptions

We have provided a way to describe sets of objects and individual objects, both in terms of attributes and values. Thus we have made extensive use of attributes, which we have represented as slots attached to frames. But it turns out that there are several means why we would like to be able to represent attributes explicitly and describe their properties. Some of the properties we would like to be able to represent and use in meaning include:

The classes to which the attribute can be attached, i.e. for what classes does it make sense? For example, weight makes sense for physical objects but not for conceptual ones (except in some metaphorical sense).

A value that all instances of a class must have by the definition of the class.

A default value for the attribute.

Rules for inheriting values for the attribute. The usual rule is to inherit down isa and instance links. But some attributes inherit in other ways. For example last-name inherits down the child of link.

150


ML-Baseball-Playeris-covered by : { Pitcher,Catcher, Fielder}

{ American-Leaguer , National-Leagwer}Pitcher

isa : ML-Baseball –Playermutually – disjoint with: { catcher, Fielder}

Catcherisa : ML-Baseball – Playermutually-disjoint –with : {Pitcher, Fielder}

Fielderisa : ML-Baseball Playermutually –disjoint-with : { Pitcher, Catcher}

American – Leaguerisa : ML-Baseball –Playermutually-disjoint-with { National-Leaguer }

National Leaguerisa : ML-Baseball-Playermutually-disjoint-with : {american-Leaguer}

Three-Finger-Browninstance : Pitcherinstance : National – Leaguer

Figure 6.8 : Representing Relationships among Classes

Rules for computing a value separately from inheritance. One extreme form of such a rule is a procedure written in some procedural programming language such as LISP.

An inverse attribute.

Whether the slot is single – valued or multivalued.

In order to be able to represent these attributes of attributes, we need to describe attributes (slots) as frames. These frames will be organized into an isa hierarchy, just as any other frames for attributes of slots. Before we can describe such a hierarchy in detail, we need to formalize our notion of a slot.

A slot is a relation. It maps from elements of its domain (the classes for which it makes sense) to elements of its range (its possible values). A relation is a set of ordered pair. Thus it makes sense to say that one relation (R1) is a subset of another (R2). In the case, R1 is a specification of R2, so in our terminology is a (R1, R2). Since a slot is yet the set of all slots, which we will call Slot, is a metaclass. Its instances are slots, which may have subslots.

151


Figure 6.8 and 6.9 illustrate several examples of slots represented as frames of slot metaclass. Its instances are slots (each of which is a set of ordered pairs). Associated with the metaclass are attributes that each instance(i.e. each actual slot) will inherit. Each slot, since it is a relation, has a domain and a range. We represent the domain in the slot labelled domain. We break up the representation of the range into two parts: contains logical expressions that further constrain the range to be TRUE. The advantage to breaking the description apart into these two pieces is that type checking a such cheaper than is arbitrary constraint checking, so it is useful to be able to do it separately and early during some reasoning processes.

The other slots do what you would expect from their names. If there is a value for definition, it must be propagated to all instances of the slot. If there is a value for default, that value is inherited to all instance of the slot unless there is an overriding value. The attribute transfers lists other slots from which values for this slot can be derived through inheritance. The to-compute slot contains a procedure for deriving its value. Inverse, sometimes they are not useful enough in reasoning to be worth representing. And single valued is used to mark the special cases in which the slot is a function and can have only one value.

Of course, there is a no advantage of representing these properties of slots if there is a reasoning mechanism that exploits them. In the rest of our discussion we assume for the frame system interpreter knows how to reason with all of these slots of slots as part of its built-in reasoning capability. In particular, we assume that it is capable of forming the following reasoning actions.

Consistency checking to verify that when a slot value is added to a frame

- The slot makes sense for the frame. This relies on the domain attribute of the slot.

Slot isa : Classinstance : Classdomain :range :

152


range-constraint :definition :default :trangulars-through :to-compute :inverse :single-valued :

manager

instance : Slotdomain : ML-Baseball –Teamrange : Personrange-constraint : kx (baseball-experience x,

manager)default :inverse : manager – ofsingle – valued : TRUE

Figure 6. 9 ; Representing Slots as Frames , I

My – manager instance : Slotdomain : ML-Baseball Playerrange : Personrange-constraint : kx(baseball-experience x any

manager)to-compute : kx( x,team), managersingle-valued : TRUE

Colourinstance : Slotdomain : Physical-Objectrange : Colour-Settransfer-through : top-level-part-ofvisual-salience : Highsingle-valued : FALSE

Uniform-colourinstance : Slotisa : colourdomain : team – Playerrange : Colour – Setrange-constraint : non-Pinkvisual-sailence : Highsingle-valued : FALSE

Batsinstance : Slotdomain : ML-Baseball-Playerrange : ( Left,Right, Switch)to-compute : kx x, handedsingle-valued : TRUE

Figure 6.10 : Representing Slots as Frames II

- The value is a legal value for the slot. This relies on the range and range – constraint attributes.

Maintenance of consistency between the values for slots and their inverses whenever one is updated.

153


Propagation of definition values along isa and instance links.

Inheritance of default values along isa and instance links.

Computation of a value of a slot as needed. This relies on the to-compute and transfers through attributes.

Checking that only a single value is asserted for single –valued slots. Replacing an old value by the new one when it is asserted usually does this. An alternative is to force explicit retraction of the old value and to signal a contradiction if a new value is asserted when another is already there.

There is something slightly counterintuitive about this way of defining slots. We have defined properties range – constraint and default as parts of a slot. But we then think of them as being properties of a slot associated with a particular class. For example in Figure 6.11, we listed two defaults for the batting – average slot, one associated with major league baseball players and one associated with fielders. Figure 6.10 shows how this can be represented correctly by creating a specialization of batting average that can he associated with a specialization of ML-Baseball-Player to represent the more special information that is known about the specialized class. This seems cumbersome. It is natural, though given our definition of a slot as relation. There are really two relations here, one a specialization of the other. And below we will define inheritance so that it looks for values of either the slot it is given or any of that slot’s generations.

Unfortunately, although this model of slots is simple and it is internally consisted it is not easy to see. So we introduce some notational shorthand that allows the four most important properties of a slot (domain range definition and default) to be defined implicitly by how the slot is used in the definitions of the classes in its domain. We describe the domain implicitly to be the class where the slot appears. We describe the range and any range constraints with the clause MUST BE, as the value of an inherited slot. Figure 6.12 shows an example of this notation. And we describe the definition and the default. If they are present by inserting them as the value of the slot when one appears. The two will be distinguished by perplexing a definitional value with an assts (“). We then let the underlying book keeping of the frame system create the frames to represent slots as they are needed.

Now let’s look at examples of how these slots can be used. The slots bats and my manager illustrate the use of the to-compute attribute of a slot. The variable x will be bound to the frame to which the slot is attached. We are the notation to specify the value of a slot of a frame. Specially, x, y describes the value (s) of the y slot it frame x. So we know that to compute a frame a value for any manager, it is necessary find the frame’s value for team, then find the resulting team’s manager. We have simply composed two slots to form a new one. Computing the value of the bats slots is a even simpler. Just go get the value of the hand slot.

Batting averageinstance : Slotdomain : ML-Baseball Playerrange : Numberrange-constraint : kx( 0 < x range-constraint < 1 )default : 252single-valued : TRUE

Fielder batting averageinstance : Slotisa : batting-averagedomain : Fielderrange : Numberrange-constraint : kx 9o < x,range – constraint < 1)

154


default : 262 single-valued : TRUE

Figure 6.11 Associating Defaults with Slots

ML-Baseball-Player

Bats : MUST BE (Left, Right, Switch)

Figure 6.12. A Shorthand Notation For Slot – Range Specification

The manager slots illustrate the use of a range constraint. It is stated in terms of a variable x, which is bound to the frame whose manager slot is being described. It requires that any manager be not only a person but someone with baseball experience relies on the domain-specific function baseball experience, which must be defined somewhere in the system.

The slots colour and uniform–colour illustrate the arrangements of slots in is history. The relation colour is a fairly general one that holds between physical objects colour. The attribute uniform-colour is a restricted form of colour that applies only to team players and ones that are allowed for team uniform (anything but pink). Arranging slots in a hierarchy is useful for the same reason than arranging any thing else in a hierarchy is, it supports inheritance. In this example the general slot is known to have high visual salience. The more specific slot uniform colour then tests this property, so it too is known to have high visual salience.

The slot colour also illustrates the use of the transfer-through slot, which defines a way of computing a slot’s value by retrieving it from the same slot of a related object as its example. We used transfers through to capture the fact that if you take an object and chop it up into several top level parts (in other words, parts that are not contained for each other) then they will all be the same colour. For example, the arm of a sofa is the colour as the sofa. Formally what transfers through means in this example is

John

Height : 72

Bill

Height :

Figure 6.13 Representing Slot-Values

color(x,y) top-level–part-of( z,x) color(z,y)

In addition to these domain independent slot attributes slots may have domain specific properties that support problem solving in a particular domain. Since these slots are not treated explicity by the frame system interpreter, they will be useful precisely to the extent that the domain problem solver exploits them.

Top

Slot – Values as Objects

155


We have already retired with the notion of a slot by making it an explicit object that we could make assertions about. In some scene this was not necessary. A finite relation can be completely described by listing its elements. But in practical knowledge based system one often does not have that list. So, it can be very important to be able to make assertions about the list without knowing all of its elements. Rectification gave us a way to do this.

The next step along this path is to do the same thing to a particular attribute-value (an instance of a relation) that we did to the relation itself. We can rectify it and make it an object about which assertion can be made. To see why we might want to do this let us return to the example of John and Bill’s height that we discussed in previous section. Figure 6.13 shows a frame-based representation of some of the facts. We could easily record Bill’s height if we knew it. Suppose that we do not know it. All we know is that John is taller than Bill. We need a way to make an assertion about the and its value as an object.

We could attempt to do this the same way we made slots themselves into object namely by representing them explicitly as frames. There seems little advantages to doing that in this case, though, because the main advantage of frames does not apply to slot values, frames are organized into an in hierarchy and thus support inheritance. There is no basis for such an organization of slot values. So instead we augment our value representation language to allow the value of a slot to be stated as either or both:

A value of the type required by the slot.

A logical constraint on the value. This constraint may relate the slot’s value to the values of other slots or to domain constants.

John

Height : 72 : kx ( x , height > Bill height)

Bill

Height : kx ( x, height < John height)

Figure 6.14: Representating Slot – Values with Lambda Notation

If we do this to the frames of Figure 6.13 then we get the frames of Figure 6.14. We again use the lambda notation as a way to pick up the name of the frame that is being described.

Student Activity 6.2

Before reading next section, answer the following questions.

1. Describe the database of a cricket team using frames.

2. What is the difference between frames and meta-frames?

3. Write short note on slot exceptions.

If your answers are correct, then proceed to the next section.

Top

156


Handling Uncertainties

It was implicitly assumed that a sufficient amount of reliable knowledge (facts, rules, and the like) was available to defend confident conclusions. While this form of reasoning is important, it suffers from several limitations.

It is not possible to describe many envisioned or real-world concepts; that is, it is limited in expressive power.

There is no way to express uncertain, imprecise, hypothetical or vague knowledge, or the truth or falsity of such statements.

Available inference methods are known to be inefficient.

There is no way to produce new knowledge about the world. It is only possible to add what is derivable from the axioms and theorems in the knowledge base.

In other words, strict classical logic formalisms do not provide realistic representations of the world in which we live. On the contrary, intelligent beings are continuously required to make decisions under a veil of uncertainty.

Uncertainty can arise from a variety of sources. For one thing, the information we have available may be incomplete or highly volatile. Important facts and details that have bearing on the problems at hand may be missing or may change rapidly. In addition, many of the “facts’ available may be imprecise, vague, or fuzzy Indeed, some of the available information may be contradictory or even unbelievable. However, despite these shortcomings, we human miraculously deal with uncertainties on a daily basis and usually arrive at reasonable solutions. If it were otherwise, we would not be able to cope with the continually changing situations of our world.

In this and the following chapter, we shall discuss methods with which to accurately represent and deal with different forms of inconsistency, uncertainty, possibility, and beliefs. In other words, we shall be interested in representations and inference methods related to what is known as commonsense reasoning.

Consider the following real-life situation. Timothy enjoys shopping in the mall only when the stores are not crowded. He has agreed to accompany Sue there on the following Friday evening since this is normally a time when few people shop. Before the given date, several of the larger stores announce a one time, special sale starting on that Friday evening. Timothy, fearing large crowds, now retracts the offer to accompany Sue, promising to go on some future date. On the Thursday before the sale was to commence, weather forecasts predicted heavy snow. Now, believing the weather would discourage most shoppers. Timothy once again agreed to join Sue. But, unexpectedly, on the given Friday, the forecasts proved to be false, so Timothy once again declined to go.

This anecdote illustrates how one’s belief’s can change in a dynamic environment. And, while one’s beliefs may not fluctuate as much as Timothy’s in most situations, this form of belief revision is not uncommon. Indeed it is common enough that we label it as a form of commonsense reasoning, that is, reasoning with uncertain knowledge.

Non-monotonic Reasoning

157


The logics we studied in the previous chapter are known as monotonic logics. The conclusions derived using such logics are valid deduction, and they remain so. Adding new axioms increases the amount of knowledge contained in the knowledge base. Therefore, the set of facts and inferences in such systems can only grow larger, they cannot be reduced. That is they increase monotonically. The form of reasoning performed above by Timothy, on the other hand is non-monotonic. New facts became known which contradicted and invalidated old knowledge was retracted causing other dependent knowledge to become invalid, thereby requiring further retractions. The retractions led to a shrinkage or non-monotonic growth in the knowledge at times.

More formally, let KBL be a formal first order system consisting of a knowledge base and some logic L. Then, if KB1 and KB2 are knowledge bases where

KB1 = KBLKB2 = KBL F, for some wff F, thenKB1 KB2

In other words, a first order KB system can only grow monotonically with added knowledge.

When building knowledge-based systems, it is not reasonable to expect that all the knowledge needed for a set of tasks could be acquired, validated, and loaded into the system at the outset. More typically, the initial knowledge will be incomplete, contain redundancies, inconsistencies, and other sources of uncertainty. Even if it were possible to assemble complete, valid knowledge initially, it probably would not remain valid forever, not in a continually changing environment.

In an attempt to model real-world, commonsense reasoning, researchers have proposed extensions and alternatives to traditional logics such as PL and FOPL.

The extensions accommodate different forms of uncertainty and non-monotony. In some cases, the proposed methods have been implemented. In other cases they are still topics of research. In this and the following chapter, we will examine some of the more important of these methods.

We begin in the next section with a description of truth maintenance systems (TMS), systems that have been implemented to permit a form of non-monotonic reasoning by permitting the addition of changing (even contradictory) statements to a knowledge base. This is followed in further part. By a description of other methods that accommodate non-monotonic reasoning through default assumptions for incomplete knowledge bases. The assumptions are plausible most of the time, but may have to be retracted if other conflicting facts are learned. Methods to constrain the knowledge that must be considered for a given problem are considered next. These methods also relate to non-monotonic reasoning. Section 5.4 gives a brief treatment of modal and temporal logics, which extend the expressive power of classical logics to permit representations and reasoning about necessary and possible situations, temporal, and other related situations. The chapter is concluded with a brief presentation of a relatively new method for dealing with vague and imprecise information, namely fuzzy logic and language computation.

Truth Maintenance Systems

Truth maintenance systems (also known as belief revision or revision maintenance systems) are companion components to inference systems. The main job of the TMS

158


is to maintain consistency of the knowledge being used by the problem solver and not to perform any inference functions. As such, it frees the problem solver from any concerns of consistency and allows it to concentrate on the problem solution aspects. The TMS also gives the inference component the latitude to perform non-mononic inferences. When new discoveries are made, this more recent information can displace previous conclusions that are no longer valid. In this way, the set of beliefs available to the problem solver will continue to be current and consistent.

Figure 6.15 illustrates the role played by the TMS as part of the problem solver. The inference engine (IE) solves domain problems based on its current belief set, while the TMS maintains the currently active belief set. The updating process is incremental. After each inference, information is exchanged between the two components. The IE tells the, TMS what deductions it has made. The TMS, in turn, asks questions about current beliefs and reasons for failures. It maintains a consistent set of beliefs for the IE to work with even if new knowledge is added and removed.

For example, suppose the knowledge base (KB) contained only the propositions P, P Q, and modus pones. From this, the IE would rightfully conclude Q and add this conclusion to the KB. Later, if it was learned that ~P was appropriate, it would be added to the KB resulting in a contradiction. Consequently, it would be necessary to remove P to eliminate the inconsistency. But, with P now removed, Q is no longer a justified belief. It too should be removed. This type of belief revision is the job of the TMS.

Actually, the TMS does not discard conclusions like Q as suggested. That could be wasteful, since P may again become valid, which would require that Q and facts justified by Q be rederived. Instead, the TMS maintains dependency records for all such conclusions. These records determine which sets of beliefs are current (which are to be used by the IE). Thus, Q would be removed from the current belief set by making appropriate updates to the records and not by erasing Q. Since Q would not be lost, its reservation would not be necessary if P became valid once again.

The TMS maintains complete records of reasons or justifications for beliefs. Each proposition or statement having at least one valid justification is made a part of the current belief set. Statements lacking acceptable justifications are excluded from this set. When a contradiction is discovered, the statements responsible for the contradiction are identified and an appropriate one is retracted. This in turn may result in other retractions and additions. The procedure used to perform this process is called dependency-directed backtracking. This process is described later.

159


Figure 6.15: Architecture of the problem solver with a TMS

The TMS maintains records to reflect retractions and additions so that the IE will always know its current belief set. The records are maintained in the network. The nodes in the network represent KB entries such as premises, conclusions, inference rules, and the like. Attached to the nodes are justifications that represent the inference steps from which the node was derived. Nodes in the belief set must have valid justifications. A premise is a fundamental belief that is assumed to be always true. Premises need no justifications. They form a base from which all other currently active nodes can be explained in terms of valid justifications.

There are two types of justification records maintained for nodes, support lists (SL) and conceptual dependencies (CD). SLs are the most common type. They provide the supporting justifications for nodes. The data structure used for the SL contains two lists of other dependent node names, an in-list and an out-lists. It has the form

(SL < in-list> < out-list>)

In order for a node to be active and hence, labelled as IN the belief set, its SL must have at in least one valid node 1 in its inlist, and all nodes named in its outlist, if any, must be marked OUT of the belief set. For example, a current belief set that represents Cybil as a nonflying bird (an ostrich) might have the nodes and justifications listed in Table 6.1.

Each IN – node given in Table 6.1 is part of the current belief set. Nodes n1 and n5 are premises. They have empty support lists since they do not require justifications. Node n2 the belief that Cybil can fly is out because n3, a valid node, is in the out-list of n2,

Suppose it is discovered that Cybil is not an ostrich, thereby causing n5 to be retracted (marking its status as out). n1 which depends on n5, must also be retracted. This in turn, changes the status of n2 to be a justified node. The resultant belief set is now that the bird Cybil can fly.

To represent a belief network, the symbol conventions shown in Figure 6.16 are sometimes used. The meanings of the nodes shown in the figure are (1) a premises is a true proposition requiring no justifications, (2) an assumption is a current belief that could change, (3) a datum is either a currently assumed or 1E derived belief and (4) justifications are the beliefs (nodes) supports, consisting of supporting antecedent node links and a consequent node link.

160


Table 6.1 : Example Nodes In A Dependency Network

Node Status Meaning Support list Comments

n 1 IN Cybil is a bird (SL ( ) ( )) a premise

n 2 OUT Cybil can fly (SL (n1)(n3)) unjustified belief

n 3 IN Cybil cannot fly (SL (n5)(n4)) justified belief

n 4 OUT Cybil has wings (SL ( ) ( ) ) retracted premise

n 5 IN Cybil is an Ostrich (SL ( ) ( ) ) a premise

Figure 6.16: Belief network node meanings

An example of a typical network representation is given Figure 6. 17. The nodes T. U. and W are OUT. If the node labelled P is made IN for some reason, the TMS would update the network by propagating the “INness” support provided by node P to make T and W IN

As noted earlier, when a contradiction is discovered, the TMS locates the source of the contradiction and corrects it by retracting one of the contributing sources. It does this by checking the support lists of the contradictory node and going directly to the source of the contradiction. It goes directly to the source by examining the dependency structure supporting the justification and determining the offending nodes. This is in contrast to the native backtracking approach that would search a deduction tree sequentially, node by node until the contradictory node is reached. Backtracking directly to the node causing the contradiction is known as dependency – directed backtracking (DDB). This is clearly a more efficient search strategy than chronological backtracking.

161

Justifications


Figure 6.17: Typical Fragment of a Belief Network

Top

Probabilistic Reasoning

Here we will examine methods that use probabilistic representations for all knowledge and which reason by propagating the uncertainties from evidence and assertions to conclusions. As before, the uncertainties can arise from an inability to predict outcomes due to unreliable, vague, incomplete, or inconsistent knowledge.

The probability of an uncertain event A is a measure of the degree of likelihood of occurrence of that event. The set of all possible events is called the sample space; S A probability measure is a function P(A) which maps event outcome from S into real numbers and which satisfies the following axioms of probability:

1. for any event

2. =1, a certain outcome

3. For for all (the are mutually exclusive),

From these three axioms and the rules of set theory, the basic law of probability can be derived. Of course, the axioms are not sufficient to compute the probability of an outcome. That requires an understanding of the underlying distributions that must be established through one of the following approaches:

1. Use of a theoretical argument that accurately characterizes the processes.

2. Using one’s familiarity and understanding of the basic processes to assign subjective probabilities, or

3. Collecting experimental data from which statistical estimates of the underlying distributions can be made.

162

(a)


Since much of the knowledge we deal with is uncertain in nature, a number of our beliefs must be tenuous. Our conclusions are often based on available evidence and past experience, which is often far from complete. The conclusions are, therefore, no more than educated guesses. In a great many situations it is possible to obtain only partial knowledge concerning the possible outcome of some event. But, given that knowledge, one’s ability to predict the outcome is certainly better than with no knowledge at all. We manage quite well in drawing plausible conclusions from incomplete knowledge and past experiences.

Probabilistic reasoning is sometimes used when outcomes are unpredictable. For example, when a physician examines a patient, the patient’s history, symptoms, and test results provide some, but not conclusive, evidence of possible ailments. This knowledge, together with the physician’s experience with previous patients, improves the likelihood of predicting the unknown (disease) event, but there is still much uncertainty in most diagnoses. Likewise, weather forecasters “guess” at tomorrow’s weather based on available evidence such as temperature, humidity, barometric pressure, and cloud coverage observations. The physical relationships that overrun these phenomena are not fully understood; so predictability is far from certain. Even a business manager must make decisions based on uncertain predictions when the market for a new product is considered. Many interacting factors influence the market, including the target consumer’s lifestyle, population growth, potential consumer income, the general economic climate, and many other dependent factors.

In all of the above cases, the level of confidence placed in the hypothesized conclusions is dependent on the availability of reliable knowledge and the experience of the human prognosticator. Our objective in this chapter is to describe some approaches taken in AI systems to deal with reasoning under similar types of uncertain conditions.

Bayesian Probabilistic Inference

The form of probabilistic reasoning described in this section is based on the Bayesian method introduced by the clergyman Beyes in the eighteenth century. This form of reasoning depends on the use of conditional probabilities of specified events when it is known that other events have occurred. For two events H and E with the probability the conditional probability of event H, given that event E has occurred, is defined as

(6.1)

This expression can be given a frequency interpretation by considering a random experiment which is repeated a large number of times, n. The number of occurrences of the event E, say No. , and of the joint even H and E, No. are recorded and their relative frequencies computed as

(6.2)

When n is large, the two expressions (6.2) approach the corresponding probabilities respectively, and the ratio

163


then represents the proportion of times event H occurs relative to the occurrence of E, that is, the approximate conditional occurrence of H with E.

The conditional probability of event E given the event H occurred can likewise be written as

(6.3)

Solving 6.3 for and substituting this in equation 6.1 we obtain one form of Bayes Rule

(6.4)

This equation expresses the notion that the probability of event occurring when it is known that even E occurred is the same as the probability that E occurs when it is known that H occurred, multiplied by the ratio of the probabilities of the two events H and E occurring. As an example of the use of equation 6.4, consider the problem of determining the probability that a patient has a certain disease D1, given that a symptom E was observed. We wish to find

Suppose now it is known from previous experience that the prior (unconditional) probabilities P(D1) and P(E) for randomly chosen patients are P(D1) = 0.05, and P(E) = 0.15, respectively. Also, we assume that the conditional probability of the observed symptom given that a patient has disease D1 is known from experience to be P(E| D1) = 0.95. Then, we easily determine the value of P(D1| E) as

P(D1| E) =P (E|D1)P(D1) /P(E) = (0.95 x 0.05) / 0.15 =0.32

It may be the case that the probability P(E) is difficult to obtain. If that is the case, a different form of Bayes Rule may be used. To see this, we write equation 6.4 with

substituted in place of H to obtain

Next, we divide equation 6.4 by this result to eliminate P(E) and get

(6.5)

Note that equation 6.5 has two terms that are ratios of a probability of an event to the probability of its negation, P(H | E) / P(-H | E) and P (H) / P(-H). The ratio of the probability of an event E divided by the probability of its negation is called the odds of the event and is denoted as O (E). The remaining ratio P(E | H) / P(E | -H) in equation 6.5 is known as the likelihood ratio off with respect to H. We denote this quantity by L(E|H). Using these two new terms, the odds-likelihood form of Bayes' Rule for equation 6.5 may be written as

O(H | E) = L(E | H ) . O(H)

164


This form of Bayes Rule suggests how to compute the posterior odds 0(H\E) from the prior odds on H, 0(H). That value is proportional to the likelihood L(E | H). When L(E | H) is equal to one, the knowledge that E is true has no effect on the odds of H. Values of L(E | H) less than or greater than one decrease or increase the odds correspondingly. When L(E | H) cannot be computed, estimates may still be made by an expert having some knowledge of H and E. Estimating the ratio rather than the individual probabilities appears to be easier for individuals in such cases. This is sometimes done when developing expert systems where more reliable probabilities are not available.

In the example cited above, D1 is either true or false, and P(D1 | E) is the interpretation which assigns a measure of confidence that D1 is true when it is known that E is true. There is a similarity between E, P (D1 | E) and modus pones. For example, when E is known to be true and D1 and E are known to be related, one concludes the truth of D1 with a confidence level P(D1 | E).

One might wonder if it would not be simpler to assign probabilities to as many ground atoms E1, E2. . . , Ekas possible, and compute inferred probabilities (probabilities of E� j | H and H) directly from these. The answer is that in general this is not possible. To compute an inferred probability requires knowledge of the joint distributions of the ground predicates participating in the inference. From the joint distributions, the required marginal distributions may then be computed. The distributions and computations required for that approach are, in general, much more complex than the computations involving the use of Bayes Rule.

Consider now two events A and -A which are mutually exclusive (A -A = ) and exhaustive (A-A) = S. The probability of an arbitrary event B can always be expressed as

P(B) = P(B & A) + P(B & -A) = P(B|A)P(A) + P(B|-A)P(-A)

Using this result, equation 6.4 can be written as

P(H|E) = P(E|H)P(H) / [P(E|H)P(H) + P(E|-H)P(-H)] (6.6)

Equation 6.6 can be generalized for an arbitrary number of hypotheses H i, i = 1,..., k. Thus, suppose the Hi partition the universe; that is, the Hi are mutually exclusive and exhaustive. Then for any evidence E, we have

and hence,

(6.7)

Finally, to be more realistic and to accommodate multiple sources of evidence E1, E2,. . . , Em, we generalize equation 6.7 further to obtain

165


(6.8)

If there are several plausible hypotheses and a number of evidence sources, equation 6.8 can be fairly complex to compute. This is one of the serious drawbacks of the Bayesian approach. A large number of probabilities must be known in advance in order to apply an equation such as 6.8. If there were k hypotheses, H i, and m sources of evidence Ej, then k + m prior probabilities must be known in addition to the k likelihood probabilities. The real question then is where does one obtain such a large number of reliable probabilities?

To simplify equation 6.8, it is sometimes assumed that the E, are statistically independent. In that case, the numerator and denominator probability terms

factor into

resulting in a somewhat simpler form. But, even though the computations are straight- forward, the number of probabilities required in a moderately large system can still be prohibitive, and one may be forced to simply use subjective probabilities when more reliable values are not available. Furthermore, the E j are almost never completely independent. Consequently, this assumption may introduce intolerable errors.

The formulas presented above suggest how probabilistic evidence would be combined to produce a likelihood estimate for any given hypothesis. When a number of individual hypotheses are possible and several sources of evidence are available, it may be necessary to compute two or more alternative probabilities and select among them. This may mean that none, one, or possibly more than one of the hypotheses could be chosen. Normally, the one having the largest probability of occurrence would be selected, provided no other values were close. Before accepting such a choice, however, it may be desirable to require that the value exceed some threshold to avoid selecting weakly supported hypotheses. In previous section we had described a typical system that combines similar values and chooses only those conclusions that exceed a threshold of 0.2.

Bayesian Networks

Network representations of knowledge have been used to graphically exhibit the interdependencies that exist between related pieces of knowledge. Much work has been done in this area to develop a formal syntax and semantics for such representations. When we consider associative networks and conceptual graphs. Here, however, we are more interested in network representations that depict the degrees of belief of propositions and the causal dependencies that exist between them. Inferencing in a network amounts to probabilities the probabilities of given and related information through the network to one or more conclusion nodes.

Network representations for uncertain dependencies are further motivated by observations made earlier. If we wish to represent uncertain knowledge related to a set of prepositional variables

166


x1,. . ., xn by their joint distribution P(x1, ...xn), it will require some 2n entries to store the distribution explicitly. Furthermore, a determination of any of the marginal probabilities x, requires summing P(x1,…..xn) over the remaining n – 1 variables. Clearly, the time and storage requirements for such computations quickly become impractical. Inferring with such large numbers of probabilities does not appear to model the human process either. On the contrary, humans tend to single out only a few propositions that are known to be causally linked when reasoning with uncertain beliefs. This metaphor leads quite naturally to a form of network representation.

One useful way to portray the problem domain is with a network of nodes that represent propostional variables x1, connected by arcs that represent causal influences or dependencies among the nodes. The strengths of the influences are quantified by conditional probabilities of each variable. For example, to represent causal relationships between the prepositional variables x1…….x6 as illustrated in Figure 6.18, one can write the joint probability P(x1…….x6) by inspection as a product of (chain) conditional probabilities

Figure 6.18: Example of Bayesian belief Network

Once such a network is constructed, an inference engine can use it to maintain and propagate beliefs. When new information is received, the effects can be propagated throughout the network until equilibrium probabilities are reached. Pearl (1986, 1987) has proposed simplified methods for updating networks (trees and, more generally, graphs) of this type by fusing and propagating the effects of new evidence and beliefs such that equilibrium is reached in time proportional to the longest path through the network. At equilibrium, all propositions will have consistent probability assignments. Methods for graphs are more difficult. They require the use of dummy variables to transform them to equivalent tree structures that are then easier to work with.

To use the type of probabilistic inference we have been considering, it is first necessary to assign probabilities to all basic facts in the knowledge base. This requires the definition of an appropriate sample space and the assignment of a priori and conditional probabilities. In addition, some method must be chosen to compute the combined probabilities when pooling evidence in a sequence of inference steps (such as Pearl's method)is employed. Finally, when the outcome of an inference chain

167


results in one or more proposed conclusions, the alternatives must be compared, and one or more selected on the basis of its likelihood.

Top

Use of Certainty factor

MYCIN uses measures of both belief and disbelief to represent degrees of confirmation and disconfirmation respectively in a given hypothesis. The basic measure of belief, denoted by is actually a measure of increased belief in hypothesis H due to the evidence E. This is roughly equivalent to the estimated increase in probability of over given by an expert as a result of the knowledge gained by E. A value of 0 corresponds to no increase in belief and 1 corresponds to maximum increase or absolute belief. Likewise, is a measure of the increased disbelief in hypothesis H due to evidence E. MD ranges from 0 to +1 with +1 representing maximum increase in disbelief, (total disbelief) and 0 representing no increase. In both measures, the evidence E may be absent or may be replaced with another hypothesis, This represents the increased belief in

given is true.

In an attempt to formalize the uncertainty measure in MYCIN, definitions of MB and MD have been given in terms of prior and conditional probabilities. It should be remembered, however, the actual values are often subjective probability estimates provided by a physician. We have for the definitions.

1 If

otherwise

(6.11)

1 If

otherwise

(6.12)

Note that when and E and H are independent (So = then MB = MD = 0. This would be the case if E provided no useful information.

The two measures MB and MD are combined into a single measure called the certainty factor (CF), defined by

(6.13)

Note that the value of CF ranges from –1 (certain disbelief) to +1 (certain belief). Furthermore, a value of CF = 0 will result if E neither confirms nor unconfirms H (E and H are independent).

Top

Fuzzy Logic

168


In the techniques we have discussed so far, we have not modified the mathematical underpinnings provided by set theory and logic. We have instead augmented those ideas with additional constructs provided by probability theory. In this section, we take a different approach and briefly consider what happens if we make fundamental changes to our idea of set membership and corresponding changes to our definitions of logical operations.

The motivation for fuzzy sets is provided by the need to represent such propositions as:

John is very tall.

Mary is slightly till.

Sue and Linda are close friends.

Exceptions to the rule are nearly impossible.

Most Frenchmen are not very tall.

While traditional set theory defines set membership as a Boolean predicate, fuzzy set theory allows us to represent set membership as a possibility distribution. Once set membership has been redefined in this way, it is possible to define a reasoning system based on techniques for combining distributions, the papers in the journal (Fuzzy Sets and Systems). Such reasoners have been applied in control systems for devices as diverse as trains and washing machines.

Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the concept of partial truth -- truth-values between "completely true" and "completely false". Dr. Lotfi Zadeh of UC/Berkeley introduced it in the 1960's as a means to model the uncertainty of natural language. Zadeh says that rather than regarding fuzzy theory as a single theory, we should regard the process of ``fuzzification'' as a methodology to generalize ANY specific theory from a crisp (discrete) to a continuous (fuzzy) form.

Fuzzy Subsets

Just as there is a strong relationship between Boolean logic and the concept of a subset, there is a similar strong relationship between fuzzy logic and fuzzy subset theory.

In classical set theory, a subset U of a set S can be defined as a mapping from the elements of S to the elements of the set {0, 1},

U: S {0, 1}

This mapping may be represented as a set of ordered pairs, with exactly one ordered pair present for each element of S. The first element of the ordered pair is an element of the set S, and the second element is an element of the set {0, 1}. The value zero is used to represent non-membership, and the value one is used to represent membership. The truth or falsity of the statement x is in U and is determined by finding the ordered pair whose first element is x.

The statement is true if the second element of the ordered pair is 1, and the statement is false if it is 0. Similarly, a fuzzy subset F of a set S can be defined as a set of ordered pairs, each with the first element from S, and the second element from

169


the interval [0,1], with exactly one ordered pair present for each element of S. This defines a mapping between elements of the set S and values in the interval [0,1]. The value zero is used to represent complete non-membership, the value one is used to represent complete membership, and values in between are used to represent intermediate DEGREES OF MEMBERSHIP. The set S is referred to as the UNIVERSE OF DISCOURSE for the fuzzy subset F. Frequently, the mapping is described as a function, the MEMBERSHIP FUNCTION of F. The degree to which the statement x is in F is true is determined by finding the ordered pair whose first element is x. The DEGREE OF TRUTH of the statement is the second element of the ordered pair.

In practice, the terms "membership function" and fuzzy subset get used interchangeably.

That's a lot of mathematical baggage, so here's an example. Let's talk about people and "tallness". In this case the set S (the universe of discourse) is the set of people. Let's define a fuzzy subset TALL, which will answer the question "to what degree is person x tall?" Zadeh describes TALL as a LINGUISTIC VARIABLE, which represents our cognitive category of "tallness". To each person in the universe of discourse, we have to assign a degree of membership in the fuzzy subset TALL. The easiest way to do this is with a membership function based on the person's height.

tall(x) = { 0, if height(x) < 5 ft.,

(height(x)-5ft.)/2ft., if 5 ft. <= height (x) <= 7 ft.,

1, if height (x) > 7 ft. }

A graph of this looks like:

Given this definition, here are some example values:

Person Height degree of tallness

------------------------------------------------------------------

170

1.0

0.5

5.0 7.0 9.0

Ta ll(x)

He ig ht (ft)


Billy 3' 2" 0.00 [I think]

Yoke 5' 5" 0.21

Drew 5' 9" 0.38

Erik 5' 10" 0.42

Mark 6' 1" 0.54

Kareem 7' 2" 1.00 [depends on who you ask]

Expressions like "A is X" can be interpreted as degrees of truth,

e.g., "Drew is TALL" = 0.38.

Note: Membership functions used in most applications almost never have as simple a shape as tall(x). At minimum, they tend to be triangles pointing up, and they can be much more complex than that. Also, the discussion characterizes membership functions as if they always are based on a single criterion, but this isn't always the case, although it is quite common. One could, for example, want to have the membership function for TALL depend on both a person's height and their age (he's tall for his age). This is perfectly legitimate, and occasionally used in practice.

It's referred to as a two-dimensional membership function, or a "fuzzy relation". It's also possible to have even more criteria, or to have the membership function depend on elements from two completely different universes of discourse.

Logic Operations

Now that we know what a statement like "X is LOW" means in fuzzy logic, how do we interpret a statement like X is LOW and Y is HIGH or (not Z is MEDIUM).

The standard definitions in fuzzy logic are:

truth (not x) = 1.0 - truth (x)

truth (x and y) = minimum (truth(x), truth(y))

truth (x or y) = maximum (truth(x), truth(y))

Some researchers in fuzzy logic have explored the use of other interpretations of the AND and OR operations, but the definition for the NOT operation seems to be safe.

Note that if you plug just the values zero and one into these definitions, you get the same truth tables as you would expect from conventional Boolean logic. This is known as the EXTENSION PRINCIPLE, which states that the classical results of Boolean logic are recovered from fuzzy logic operations when all fuzzy membership grades are restricted to the traditional set {0, 1}. This effectively establishes fuzzy subsets and logic as a true generalization of classical set theory and logic. In fact, by this reasoning all crisp (traditional) subsets are fuzzy subsets of this very special type; and there is no conflict between fuzzy and crisp methods.

171


Some examples

Assume the same definition of TALL as above, and in addition, assume that we have a fuzzy subset OLD defined by the membership function:

old (x) = { 0, if age(x) < 18 yr.

(age(x)-18 yr.)/42 yr., if 18 yr. <= age(x) <= 60 yr.

1, if age(x) > 60 yr. }

And for compactness, let

a = X is TALL and X is OLD

b = X is TALL or X is OLD

c = not (X is TALL)

Then we can compute the following values. (for height 6.0ft and age 50 years)

height age X is TALL X is OLD a b c

6.0 50 0.5 0.76 0.5 0.76 -0.5

Uses of fuzzy logic

Fuzzy logic is used directly in very few applications. The Sony Palmtop apparently uses a fuzzy logic decision tree algorithm to perform and written in (well, computer light pen) Kanji character recognition.

A fuzzy expert system

A fuzzy expert system is an expert system that uses a collection of fuzzy membership functions and rules, instead of Boolean logic, to reason about data. The rules in a fuzzy expert system are usually of a form similar to the following: if x is low and y is high then z = medium where x and y are input variables (names for know data values), z is an output variable (a name for a data value to be computed), low is a membership function (fuzzy subset) defined on x, high is a membership function defined on y, and medium is a membership function defined on z.

The antecedent (the rule's premise) describes to what degree the rule applies, while the conclusion (the rule's consequent) assigns a membership function to each of one or more output variables. Most tools for working with fuzzy expert systems allow more than one conclusion per rule. The set of rules in a fuzzy expert system is known as the rule base or knowledge base.

The general inference process proceeds in three (or four) steps.

1. Under FUZZIFICATION, the membership functions defined on the input variables are applied to their actual values, to determine the degree of truth for each rule premise.

2. Under INFERENCE, the truth-value for the premise of each rule is computed, and applied to the conclusion part of each rule. This results in one fuzzy subset to be assigned to each output variable for each rule. Usually only MIN

172


or PRODUCT are used as inference rules. In MIN inferencing, the output membership function is clipped off at a height corresponding to the rule premise's computed degree of truth (fuzzy logic AND). In PRODUCT inferencing, the output membership function is scaled by the rule premise's computed degree of truth.

3. Under COMPOSITION, all of the fuzzy subsets assigned to each output variable are combined together to form a single fuzzy subset for each output variable. Again, usually MAX or SUM are used. In MAX composition, the combined output fuzzy subset is constructed by taking the point wise maximum over all of the fuzzy subsets assigned to variable by the inference rule (fuzzy logic OR). In SUM composition, the combined output fuzzy subset is constructed by taking the point wise sum over all of the fuzzy subsets assigned to the output variable by the inference rule.

4. Finally is the (optional) DEFUZZIFICATION, which is used when it is useful to convert the fuzzy output set to a crisp number. There are more defuzzification methods than you can shake a stick at (at least 30). Two of the more common techniques are the CENTROID and MAXIMUM methods. In the CENTROID method, the crisp value of the output variable is computed by finding the variable value of the center of gravity of the membership function for the fuzzy value. In the MAXIMUM method, one of the variable values at which the fuzzy subset has its maximum truth-value is chosen as the crisp value for the output variable.

Extended Example:

Assume that the variables x, y, and z all take on values in the interval [0,10], and that the following membership functions and rules are defined:

low(t) = 1 - ( t / 10 )

high(t) = t / 10

rule 1: if x is low and y is low then z is high

rule 2: if x is low and y is high then z is low

rule 3: if x is high and y is low then z is low

rule 4: if x is high and y is high then z is high

Notice that instead of assigning a single value to the output variable z, each rule assigns an entire fuzzy subset (low or high).

Notes:

1. In this example, low(t)+high(t)=1.0 for all t. This is not required, but it is fairly common.

2. The value of t at which low(t) is maximum is the same as the value of t at which high(t) is minimum, and vice-versa. This is also not required, but fairly common.

3. The same membership functions are used for all variables. This isn’t required, and is also not common.

173


In the fuzzification subprocess, the membership functions defined on the input variables are applied to their actual values, to determine the degree of truth for each rule premise. The degree of truth for a rule’s premise is sometimes referred to as its ALPHA. If a rule's premise has a nonzero degree of truth (if the rule applies at all...) then the rule is said to FIRE. For example,

x y low(x) high(x) low(y) high(y) alpha1 alpha2 alpha3 alpha4

0.00.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0

0.03.2 1.0 0.0 0.68 0.32 0.68 0.32 0.0 0.0

0.0 6.1 1.0 0.0 0.39 0.61 0.39 0.61 0.0 0.0

0.0 10.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0

3.2 0.0 0.68 0.32 1.0 0.0 0.68 0.0 0.32 0.0

6.1 0.0 0.39 0.61 1.0 0.0 0.39 0.0 0.61 0.0

10.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0

3.2 3.1 0.68 0.32 0.69 0.31 0.68 0.31 0.32 0.31

3.2 3.3 0.68 0.32 0.67 0.33 0.67 0.33 0.32 0.32

10.0 10.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0

In the inference subprocess, the truth-value for the premise of each rule is computed, and applied to the conclusion part of each rule. This results in one fuzzy subset to be assigned to each output variable for each rule.

MIN and PRODUCT are two INFERENCE METHODS or INFERENCE RULES. In MIN inferencing, the output membership function is clipped off at a height corresponding to the rule premise's computed degree of truth. This corresponds to the traditional interpretation of the fuzzy logic AND operation. In PRODUCT inferencing, the output membership function is scaled by the rule premise's computed degree of truth.

For example, let's look at rule 1 for x = 0.0 and y = 3.2. As shown in the table above, the premise degree of truth works out to 0.68. For this rule, MIN inferencing will assign z the fuzzy subset defined by the membership function:

rule1(z) = { z / 10, if z <= 6.8

0.68, if z >= 6.8 }

For the same conditions, PRODUCT inferencing will assign z the fuzzy subset defined by the membership function:

rule1(z) = 0.68 * high(z)

= 0.068 * z

Note: The terminology used here is slightly nonstandard. In most texts, the term "inference method" is used to mean the combination of the things referred to separately here as "inference" and "composition." Thus you'll see such terms as "MAX-MIN inference" and "SUM-PRODUCT inference" in the literature. They are the

174


combination of MAX composition and MIN inference, or SUM composition and PRODUCT inference, respectively.

You'll also see the reverse terms "MIN-MAX" and "PRODUCT-SUM" – these mean the same things as the reverse order. It seems clearer to describe the two processes separately.

In the composition subprocess, all of the fuzzy subsets assigned to each output variable are combined together to form a single fuzzy subset for each output variable.

MAX composition and SUM composition are two COMPOSITION RULES. In MAX composition, the combined output fuzzy subset is constructed by taking the pointwise maximum over all of the fuzzy subsets assigned to the output variable by the inference rule. In SUM composition, the combined output fuzzy subset is constructed by taking the pointwise sum over all of the fuzzy subsets assigned to the output variable by the inference rule. Note that this can result in truth-values greater than one. For this reason, SUM composition is only used when it will be followed by a defuzzification method, such as the CENTROID method, that doesn't have a problem with this odd case. Otherwise SUM composition can be combined with normalization and is therefore a general-purpose method again.

For example, assume x = 0.0 and y = 3.2. MIN inferencing would assign the following four fuzzy subsets to z:

rule1(z) = { z / 10, if z <= 6.8

0.68, if z >= 6.8 }

rule2(z) = { 0.32, if z <= 6.8

1 - z / 10, if z >= 6.8 }

rule3(z) = 0.0

rule4(z) = 0.0

MAX composition would result in the fuzzy subset:

fuzzy(z) = { 0.32, if z <= 3.2

z / 10, if 3.2 <= z <= 6.8

0.68, if z >= 6.8 }

PRODUCT inferencing would assign the following four fuzzy subsets to z:

rule1(z) = 0.068 * z

rule2(z) = 0.32 - 0.032 * z

rule3(z) = 0.0

rule4(z) = 0.0

SUM composition would result in the fuzzy subset:

fuzzy(z) = 0.32 + 0.036 * z

175


Sometimes it is useful to just examine the fuzzy subsets that are the result of the composition process, but more often, this FUZZY VALUE needs to be converted to a single number -- a CRISP VALUE. This is what the defuzzification subprocess does.

Fuzzy numbers and Fuzzy arithmetic

Fuzzy numbers are fuzzy subsets of the real line. They have a peak or plateau with membership grade 1, over which the members of the universe are completely in the set. The membership function is increasing towards the peak and decreasing away from it. Fuzzy numbers are used very widely in fuzzy control applications. A typical case is the triangular fuzzy number which is

one form of the fuzzy number 7. Slope and trapezoidal functions are also used, as are exponential curves similar to Gaussian probability densities.

There are more defuzzification methods than you can shake a stick at. A couple of years ago, Mizumoto did a short paper that compared about ten defuzzification methods. Two of the more common techniques are the CENTROID and MAXIMUM methods. In the CENTROID method, the crisp value of the output variable is computed by finding the variable value of the center of gravity of the membership function for the fuzzy value. In the MAXIMUM method, one of the variable values at which the fuzzy subset has its maximum truth-value is chosen as the crisp value for the output variable. There are several variations of the MAXIMUM method that differ only in what they do when there is more than one variable value at which this maximum truth-value occurs. One of these, the AVERAGE-OF-MAXIMA method, returns the average of the variable values at which the maximum truth-value occurs.

For example, go back to our previous examples. Using MAX-MIN inferencing and AVERAGE-OF-MAXIMA defuzzification results in a crisp value of 8.4 for z. Using PRODUCT-SUM inferencing and CENTROID defuzzification results in a crisp value of 5.6 for z, as follows.

Earlier in the FAQ, we stated that all variables (including z) take on values in the range [0, 10]. To compute the centroid of the function f(x), you divide the moment of the function by the area of the function. To compute the moment of f(x), you compute the integral of x*f(x) dx, and to compute the area of f(x), you compute the

176

1.0

0.5

5.07.0 9.0


integral of f(x) dx. In this case, we would compute the area as integral from 0 to 10 of (0.32+0.036*z) dz, which is

(0.32 * 10 + 0.018*100) =

(3.2 + 1.8) =

5.0

and the moment as the integral from 0 to 10 of (0.32*z+0.036*z*z) dz, which is

(0.16 * 10 * 10 + 0.012 * 10 * 10 * 10) =

(16 + 12) =

28

Finally, the centroid is 28/5 or 5.6.

Note: Sometimes the composition and defuzzification processes are combined, taking advantage of mathematical relationships that simplify the process of computing the final output variable values.

The Mizumoto reference is probably "Improvement Methods of Fuzzy Controls", in Proceedings of the 3rd IFSA Congress, pages 60-62, 1989.

Uses of fuzzy expert systems

To date, fuzzy expert systems are the most common use of fuzzy logic. They are used in several wide-ranging fields, including:

Linear and Nonlinear Control

Pattern Recognition

Financial Systems

Operation Research

Data Analysis

Summary

In a semantic net, information is represented as a set of nodes connected to each other by a set of labelled ones, which represent relationships among the nodes.

A frame is a collection of attributes (usually called slots) and associated values (and possibly constraints on values) that describe some entity in the world.

A class describes a set of entries that share significant properties.

A slot is a relation. It maps from elements of its domain (the classes for which it makes sense) to elements of its range (its possible values).

The probability of an uncertain event A is a measure of the degree of likelihood of occurrence of that event.

177


Probabilistic reasoning depends on the use of conditional probabilities of specified events when it is known that other events have occurred.

Strict classical logic formalisms do not provide realistic representations of the world in which we live.

MYCIN use measures of both belief and disbelief to represent degrees of confirmation and disconfirmation respectively in a given hypothesis.

Traditional set theory defines set membership as a Boolean predicate, fuzzy set theory allows the representation of set membership as a possibility distribution.

Self-assessment Questions

Fill in the blanks (Solved)

1. The number of arguments of a predicate is known as its ______________.

2. _______________ can represent grammatical form of a statement.

Answers

1. Airity

2. Semantic Net

True or False (Solved)

1. A slot is always a function.

2. Fuzzy logic deals with probabilistic elements in a set.

Answers

1. False

2. True

Fill in the blanks (Unsolved)

1. Partly tried nets one issued to represent __________ expressions.

2. The _________ solves the problems based on its unsent belief.

3. _______ is a method for computing the combined probabilities while pooling evidence in a sequence of inference steps.

True or False (Unsolved)

1. Baysian network does not employ fuzzy logic.

2. A semantic net is a way to represent inferences.

178


3. Frame is a collection of facts and rules.

Detailed Questions

1. Construct Semantic Net representation for the following:

a. Pompeian (Marcus), Black smith (Marcus)

b. Mary gave the green flowered vase to her favourite cousin.

2. Suppose we want to use a semantic net to discover relationship that could help in disambiguating the word “ bank “ in the sentence

John went downtown to deposit his money in the bank.

The financial institution meaning for bank should be preferred over the riverbank meaning.

a. Construct a semantic net that representation for the relevant concepts.

b. Show how intersection search could be find the connection between the correct meaning for bank and the rest of the sentence more easily than it can find connection with the incorrect meaning.

3. Construct partitioned semantic net representation for the following:

Every batter bit a ball.

Ball the batters like the pitcher

a. Construct one consistent frame representation of all the baseball knowledge that was described in this chapter. You will need to choose between the two representations for team that we considered.

4. Modify the property inheritance algorithm of Section 6.2 to work for multiple valued attributes such as the attributes believes in principles defined as follows:

Believes – in – principles

instance : Slot

domain : Person

range : Philosophical- Principles

single-valued : FALSE

5. Define the value of a multiple valued slot S of class C to be the union of the values that are found for S and all its generalizations at C and all its generalizations. Modify your technique to allow a class to exclude specific values that are associated with one or more of its superclasses.

6. Pick a problem area and represent some knowledge about it the way we represented baseball knowledge in this chapter.

179

Ch 6 final

Documents

Transcript of Ch 6 final