Self-Organizing Hierarchical Knowledge Discovery by an...

Self-Organizing Hierarchical Knowledge Discoveryby an ARTMAP Information Fusion System

Gail A. Carpenter, Siegfried Martens

Department of Cognitive and Neural Systems677 Beacon StreetBoston University

Boston, Massachusetts 02215 USA

[ gail, sig ] @cns.bu.edu

Proceedings of the International Joint Conference on Neural Networks (IJCNN’05)Montreal, July 31 - August 4, 2005

Submitted: January, 2005

Technical Report CAS/CNS TR-2005-002Boston, MA: Boston University

Acknowledgements: This work was supported by research grants from the Air Force Office of ScientificResearch (AFOSR F49620-01-1-0423), the National Geospatial-Intelligence Agency (NMA 201-01-1-2016), the National Science Foundation (NSF SBE-0354378), and the Office of Naval Research (ONRN00014-01-1-0624); and by postdoctoral fellowships from the National Geospatial-Intelligence Agencyand the National Science Foundation for Siegfried Martens (NMA 501-03-1-2030 and NSF DGE-0221680).

Corresponding author: Professor Gail A. Carpenter, Department of Cognitive and Neural Systems, 677Beacon Street, Boston University, Boston, MA 02215 USA. Tel.: +1-617-353-9483; fax: +1-617-353-7755. E-mail address: [email protected] (G. Carpenter).

Technical Report CAS/CNS TR-2005-002 IJCNN’05, Montreal 2

Self-Organizing Hierarchical Knowledge Discoveryby an ARTMAP Information Fusion System

Gail A. Carpenter, Siegfried MartensBoston University, Department of Cognitive and Neural System

677 Beacon Street, Boston, MA 02215 USA.[ gail, sig ] @cns.bu.edu http://cns.bu.edu/techlab/

Abstract – Classifying terrain or objects may requirethe resolution of conflicting information from sensorsworking at different times, locations, and scales, andfrom users with different goals and situations. Currentfusion methods can help resolve such inconsistencies,as when evidence variously suggests that an object is acar, a truck, or an airplane. The methods describedhere define a complementary approach to theinformation fusion problem, considering the casewhere sensors and sources are both nominallyinconsistent and reliable, as when evidence suggeststhat an object is a car, a vehicle, and man-made.Underlying relationships among classes are assumedto be unknown to the automated system or the humanuser. The ARTMAP self-organizing rule discoveryprocedure is illustrated with an image example, but isnot limited to the image domain.

I. INTRODUCTION

Image fusion has been defined as “the acquisition,processing and synergistic combination of informationprovided by various sensors or by the same sensor inmany measuring contexts.” [1, p. 3] When multiplesources provide inconsistent data, such methods are calledupon to select the accurate information components. Asquoted by the International Society of Information Fusion(http://www.inforfusion.org/terminology.htm):“Evaluating the reliability of different information sourcesis crucial when the received data reveal someinconsistencies and we have to choose among variousoptions.” For example, independent sources might labelan identified vehicle car or truck or airplane. A fusionmethod could address this problem by weighing theconfidence and reliability of each source, mergingcomplementary information, or gathering more data. Inany case, at most one of these answers is correct.

The methods described here address a complementaryand previously unexamined aspect of the informationfusion problem, seeking to derive consistent knowledgefrom sources that are inconsistent – yet accurate. This is aproblem that the human brain solves naturally. A youngchild who hears the family pet variously called Spot,puppy, dog, dalmatian, mammal, and animal is not only

not alarmed by these conflicting labels but readily usesthem to infer functional relationships. An analogousproblem for information fusion methods seeks to classifythe terrain and objects in an unfamiliar territory based onintelligence supplied by several reliable sources. Eachsource labels a portion of the region based on sensor dataand observations collected at specific times and based onindividual goals and interests. Across sources, a givenpixel might be correctly but inconsistently labeled car,vehicle, and man-made. A human mapping analyst would,in this case, be able to apply a lifetime of experience toresolve the paradox by placing objects in a knowledgehierarchy, and a rule-based expert system could beconstructed to codify this knowledge. Alternatively, ananalyst could be faced with complex or unfamiliar labels,or the structure of object relationships may vary from oneregion to the next.

The current study shows how an ARTMAP neuralnetwork can act as a self-organizing expert system toderive hierarchical knowledge structures from nominallyinconsistent training data. This ability is implicit in thenetwork’s learning strategy, which creates one-to-many,as well as many-to-one, maps of the input space. Duringtraining, the system can learn that disparate pixels map tothe output class car; but, if similar or identical pixels arelater labeled vehicle or man-made , the system canassociate multiple classes with a given input. Duringtesting, distributed code activations predict multipleoutput class labels. A rule production algorithm uses thepattern of distributed network predictions to derive aknowledge hierarchy for the output classes. The resultingdiagram of the relationships among classes can then guidethe construction of consistent layered maps.

II. MULTI-CLASS PREDICTIONS BY ARTMAPNEURAL NETWORKS

While the earliest unsupervised ART [2] and supervisedARTMAP networks [3] feature winner-take-all coderepresentations, many of the networks developed since themid-1990s incorporate distributed code representations.Comparative analyses of these systems have led to thespecification of a default ARTMAP network, whichfeatures simplicity of design and robust performance inmany application domains [4]. Selection of one particular


Fig. 1. Testbed Boston image for ARTMAPinformation fusion methods, in grey scalerepresentation of preprocessed inputs. The city ofRevere is at the center, surrounded by (clockwisefrom lower right) portions of Winthrop, East Boston,Chelsea, Everett, Malden, Melrose, Saugus, andLynn. Logan Airport runways and Boston Harbor areat the lower center, with Revere Beach and theAtlantic Ocean at the right. The Saugus and PinesRivers meet in the upper right, and the Chelsea Riveris in the lower left of the image. Dimensions: 360 x600 pixels (15m resolution)

�

≅ 5.4 km x 9 km . Theimage is divided into four vertical strips: two fortraining, one for validation (if needed), and one fortesting. This protocol produces geographicallydistinct training and testing areas, to assess regionalgeneralization. Typically, class label distributionsvary substantially across strips.

a priori algorithm is intended to facilitate technologytransfer. This network, which here serves as therecognition engine of the information fusion system, useswinner-take-all coding during training and distributedcoding during testing. Distributed test outputs have helpedimprove various methods for categorical decision-making.One such method, in a map production application,compares a baseline mapping procedure, which selects theclass with the largest total output, with a procedure thatenforces a priori output class probabilities and anotherone that selects class-specific output thresholds viavalidation [5]. Distributed coding supports each method,but the ultimate prediction is one output class per testinput. This procedure also specifies a canonicaltraining/testing method which partitions the area inquestion into four vertical or horizontal strips. A givensimulation takes training pixels from two of these strips;uses the validation strip to choose parameters, ifnecessary; and tests on the fourth strip. Methods are thuscompared with training and test sets that are not onlydisjoint but drawn from geographically separate locations.This separation tests for generalization to new regions,where output class distributions could typically be farfrom those of the training and validation sets.

The information fusion techniques developed in thecurrent study modify the baseline mapping procedure byallowing the system to predict more than one output classduring testing. A given test pixel either predicts the Nclasses receiving the largest net system outputs or predictsall classes whose net output exceeds a designatedthreshold

�

Γ . A preliminary version of the ARTMAPinformation fusion system [6] chose a global selectionparameter N or

�

Γ based on analysis of the validationstrip. This method succeeds when most validation and testitems share a common number of correct output classes.The preferable procedure used here allows each testexemplar to choose its own number N of output classpredictions. This per-pixel filtering method thus does notrely on the strong assumption that the correct number ofoutput classes per item is approximately uniform acrossthe test set.

An image testbed demonstrates the robustness of theARTMAP information fusion procedure. This examplewas derived from a Landsat 7 Thematic Mapper (TM)data acquired on the morning of January 1, 2001 by theEarth Resources Observation System (EROS) DataCenter, U.S. Geological Survey, Sioux Falls, SD(http://edc.usgs.gov). The area includes portions ofnortheast Boston and suburbs (Fig. 1), and encompassesmixed urban, suburban, industrial, water, and park spaces.Ground truth pixels are labeled ocean, ice, river, beach,park , road, residential, industrial, water, open space,built-up, natural, man-made. During training, ARTMAPis given no information about relationships among thetarget classes.


III. DERIVING A KNOWLEDGE HIERARCHYFROM A TRAINED NETWORK: PREDICTIONS,

RULES, AND GRAPHS

The ARTMAP fusion system provides a canonicalprocedure for assigning to each input an arbitrary numberof output classes in a supervised learning setting.Information implicit in the distributed predictions of atrained ARTMAP network, trained with prescribedprotocols [7], can be used to generate a hierarchy ofoutput class relationships. To accomplish this, each testpixel first produces a set of output class predictions. Theresulting list of test predictions determines a list of rulesx⇒ y which define relationships between pairs of outputclasses, with each rule carrying a confidence value. Therules are then used to assign classes to levels, with ruleantecedents x at lower levels and consequents y at higherlevels. Classes connected by arrows that codify the list ofrules and confidence values form a graphicalrepresentation of the knowledge hierarchy, as follows.

A. Predictions

A critical aspect of the default ARTMAP network(Fig. 2) is the distributed nature of its internal coderepresentation, which produces continuous-valuedpredictions across output classes during testing. Inresponse to a test input, distributed activations in thedefault ARTMAP coding field send a net signal

�

σk toeach output class k. A winner-take-all method predicts thesingle output class k=K receiving the largest signal σ k .Alternatively, a single test input can predict multipleoutput classes. The per-pixel filtering method employedhere allows the output activation pattern produced by eachtest pixel to determine the number of predicted classes.Namely, if the net signals

�

σk projecting to the outputclasses k are arranged from largest to smallest, the systempredicts all the classes up to the point of maximumdecrease in the signal size from one class to the next. Thisstrategy is motivated by the behavior of a hypotheticalsystem that accurately represents all the output classes. Insuch a system, if a pixel should predict three classes (e.g.,road, pavement, man-made), then the output signals

�

σkto each of these classes would typically be largecompared to those of the remaining classes. Themaximum decrease in size would then occur between thethird and fourth largest signal, and the per-pixel filteringmethod would predict three classes.

Fig. 2. Default ARTMAP notation: An M -dimensional feature vector a is complement coded toform the 2M -D ARTMAP input A . Vector yrepresents a winner-take-all code during training,when a single category node (j=J) is active; and adistributed code during testing. With fast learning,bottom-up weights

�

wij equal top-down weights

�

w ji , and the weight vector

�

w j represents theircommon values. When a coding node j is firstselected during training, it is connected to the outputclass k of the current input (

�

W jk =1). Duringtesting, a distributed code y produces predictions

�

σkdistributed across output classes. In all simulationsreported here, the baseline vigilance matchingparameter ρ =0. [4]

Fig. 3. For the Boston example, the ARTMAP fusion system correctly produces all class rules and levels.

B. Rules

Once each test pixel has produced a set of output classpredictions {x,y,…} from its distributed signals

�

σk ,according to the per-pixel selection method, the list ofmulti-valued test set predictions is then used to deduce a listof output class implications of the form x ⇒ y, eachcarrying a confidence value C%. This rule creation methodis related to the Apriori algorithm in the association ruleliterature [8, 9].

The five steps listed below produce the list of rules thatlabel class relationships. The algorithm employs anequivalence parameter e % and a minimum confidenceparameter c%. Rules with low confidence (C<c) areignored, with one exception: if all rules that include a givenclass have confidence below c, then the list retains the rulederived from the pair predicted by the largest number ofpixels. Although this “no extinction” clause may producelow-confidence rules, these may occasionally correspond tocases that are rare but important. The user can easily takethese exceptions under advisement, since the summarygraph displays each confidence value. Two classes x and yare treated as equivalent (x ≡ y) if both rules x⇒ y andy⇒ x hold with confidence greater than e. In this case, theclass predicted by fewer pixels is ignored in subsequentcomputations, but equivalent classes are displayed as asingle node on the final rule summary graph.

Reasonable default values set the equivalence parametere in the range 90-95% and the minimum confidenceparameter c in the range 50-70%. In all simulations reportedhere, parameter values were set a priori to e=90% andc=50%. Alternatively, e and c may be chosen by validation.

Step 1 : List the number of test set pixels predicting eachoutput class x. Order this list from the classes with thefewest predictions to the classes with the most.

Step 2 : List the number of test set pixels #(x & y)simultaneously predicting each pair of distinct outputclasses. Omit pairs with no such pixels. Order the list sothat #(x) ≤ #(y): classes x observe the order established inStep 1; and for each such class x, classes y observe the sameorder.

Step 3 : Identify equivalent classes, where x≡ y if[#(x & y) / #(y)]≥e%. Remove from the list all class pairsthat include x (where #(x) ≤ #(y), as in Step 2).

Step 4 : Each pair remaining on the list produces a rulex⇒ y with confidence C% = [#(x & y) / #(x)]. If Step 3determined that x≡ y, record the confidence C ≥ e of eachrule in the pair { x⇒ y, y⇒ x }.

Step 5 : Remove from the list all rules with confidenceC < c. Exception (no extinction): If all rules that include agiven class have confidence below the minimumconfidence c, then retain the rule x⇒ y with maximal#(x & y) pixels.

C. Graphs

A directed graph summarizes the list of implication rules.These rules suggest a natural hierarchy among outputclasses, with antecedents sitting below consequents. Foreach rule x⇒ y, class x is located at a lower level of thehierarchy than class y, according to the iterative algorithmbelow. Once each class is situated on its level, a listed rulex⇒ y produces an arrow from x to y . Each rule’sconfidence is indicated on the arrow, with lower-confidencerules (say C<90%) having dashed arrows. For arrows withno displayed confidence values, C=100%.

The following procedure assigns each output class to alevel.


Top Level: Items that appear only as consequents y.

Level 1: Classes that do not appear as consequents in anyrule. Remove from the list all rules x⇒ y where x is inLevel 1.

Next Level: Classes that do not appear as consequents inany remaining rule. Remove from the list all rules x⇒ ywhere x is in this level.

Iterate: Repeat until all rules have been removed from thelist.

Note that Level 1 includes classes that do not appear inany rule as well as those that appear only as antecedents.

The graph in Fig. 3 depicts the implication rules,hierarchy levels, and confidence values derived for theBoston example. ARTMAP information fusion has placedeach class in its correct level and discovers all the correctrules.

IV. CONCLUSION

The ARTMAP neural network produces one-to-manymappings from input vectors to output classes, as well asthe more traditional many-to-one mappings, as the normalproduct of its supervised learning laws. During training, agiven input may learn associations to more than one outputclass. Some of these associations could be erroneous: whendifferent observers label an image dog, coyote, or wolf, atmost one of these classes is correct. Inconsistent data may,however, be completely correct, as when observersvariously label the image wolf, mammal, and carnivore. Byresolving such paradoxes during everyday knowledgeacquisition, humans naturally infer complex, hierarchicalrelationships among classes without explicit specification ofthe rules underlying these relationships. One-to-manylearning allows the ARTMAP information fusion system toassociate any number of output classes with each input.Although inter-class information is not given with thetraining inputs, the system readily derives knowledge of therules, confidence estimates, and multi-class hierarchicalrelationships from patterns of distributed test predictions.

The Boston image testbed example demonstrates howARTMAP information fusion resolves apparentcontradictions in input pixel labels by assigning outputclasses to levels in a knowledge hierarchy. Thismethodology is not, however, limited to the image domainillustrated here, and could be applied, for example, to inferpatterns of drug resistance or to improve marketingsuggestions to individual consumers. One such pilot studyhas created a hypothetical set of relationships amongprotease inhibitors, based on resistance patterns fromgenome sequences of HIV patients.

ACKNOWLEDGMENTS

This work was supported by research grants from the AirForce Office of Scientific Research (AFOSR F49620-01-1-0423), the National Geospatial-Intelligence Agency (NMA201-01-1-2016), the National Science Foundation (NSFSBE-0354378), and the Office of Naval Research (ONRN00014-01-1-0624); and by postdoctoral fellowships fromthe National Geospatial-Intelligence Agency and theNational Science Foundation for Siegfried Martens (NMA501-03-1-2030 and NSF DGE-0221680).

REFERENCES

[1] Simone, G., Farina, A., Morabito, F.C., Serpico, S.B.,& Bruzzone, L. (2002). Image fusion techniques forremote sensing applications. Information Fusion, 3, 3-15.

[2] Carpenter, G.A. & Grossberg, S. (1987). A massivelyparallel architecture for a self-organizing neural patternrecognition machine. Computer Vision, Graphics, andImage Processing, 37, 54-115.

[3] Carpenter, G.A. Grossberg, S., & Reynolds, J.H.(1991). ARTMAP: Supervised real-time learning andclassification of nonstationary data by a self-organizingneural network. Neural Networks, 4(5), 565-588.

[4] Carpenter, G.A. (2003). Default ARTMAP. InProceedings of the international joint conference onneural networks (IJCNN’03), Portland, Oregon (pp.1396-1401).

[5] Parsons, O. & Carpenter, G.A. (2003). ARTMAPneural networks for information fusion and datamining: map production and target recognitionmethodologies. Neural Networks, 16(7), 1075-1089.

[6] Carpenter, G.A., Martens, S., & Ogas, O.J. (2004).Self-organizing hierarchical knowledge discovery byan ARTMAP image fusion system. In Proceedings ofthe 7th international conference on information fusion,Stockholm, Sweden (pp. 235-242).

[7] Carpenter, G.A., Martens, S., & Ogas, O.J. (2005).Self-organizing information fusion and hierarchicalknowledge discovery: a new framework usingARTMAP neural networks. Neural Networks, 18.

[8] Agrawal, R., Imielinski, T., & Swami, A. (1993).Mining association rules between sets of items in largedatabases. In Proceedings of the internationalconference on management of data (ACM SIGMOD),Washington, DC (pp. 207-216).

[9] Agrawal, R. & Srikant, R. (1994). Fast algorithms formining association rules. In Proceedings of the 20thinternational conference on very large data bases(VLDB), Santiago, Chile (pp. 487-499).

Self-Organizing Hierarchical Knowledge Discovery by an...

Documents

Transcript of Self-Organizing Hierarchical Knowledge Discovery by an...