Drobics, m. 2001: datamining using synergiesbetween self-organising maps and inductive learning on...

6
Data Mining Using Synergies Between Self-Organizing Maps and Inductive Learning of Fuzzy Rules MARIO DROBICS, ULRICH BODENHOFER, WERNER WINIWARTER Software Competence Center Hagenberg A-4232 Hagenberg, Austria {mario.drobics/ulrich.bodenhofer!wemer. winiwarter} @scch.at ERICH PETER KLEMENT Fuzzy Logic Laboratorium Linz-Hagenberg Johannes Kepler Universitat Linz A-4040 Linz, Austria [email protected] Abstract Identifying structures in large data sets raises a num- ber of problems. On the one hand, many methods can- not be applied to larger data sets, while, on the other hand, the results are often hard to interpret. We address these problems by a novel three-stage approach. First. we compute a small representation of the input data us- ing a self-organizing map. This reduces the amount of data and allows us to create two-dimensional plots of the data. Then we use this preprocessed information to identify clusters of similarity. Finally, inductive learn- ing methods are applied to generate sets of fuzzy de- scriptions of these clusters. This approach is applied to three case studies, including image data and real-world data sets. The results illustrate the generality and intu- itiveness of the proposed method. 1. Introduction In this paper, we address the problem of identifying and describing regions of interest in large data sets. De- cision trees (171 or other inductive learning methods which are often used for this task are not always suf- ficient, or not even applicable. Clustering methods [2], on the other hand, do not offer sufficient insight into the data structure as their results are often hard to display and interpret. This contribution is devoted to a novel three-stage fault- tolerant approach to data mining which is able to pro- duce qualitative infonnationfrom very large data sets. In the first stage, self-organizing maps (SOMs) [ !3) are used to compress the data to a reasonable amount of nodes which still contain all significant information 0-7803-7078-3/01/$10.00 (C)20011EEE. while eliminating possible data faults, such as noise, outliers, and missing values. While, for the purpose of data reduction, other methods may be used as well [7), SOMs have the advantages that they preserve the topol- ogy of the data space and allow to display the results in two-dimensional plots [6]. The second step is concerned with identifying signifi- cant regions of interest within the SOM nodes using a modified fuzzy c-means algorithm [3 , lO]. In our vari- ant, the two main drawbacks of the standard fuzzy c- means algorithm are resolved in a very simple and el- egant way. Firstly, the clustering is not influenced by outliers anymore, as they have been eliminated by the SOM in the pre-processing steP,. Secondly, initialization is handled by computing a crisp Ward clustering [2] first to find initial values for the cluster centers. In the third and last stage, we create linguistic descrip- tions of the centers of these clusters which help the an- alyst to interpret the results. As the number of data samples under consideration has been reduced tremen- dously by using the SOM nodes, we are able to apply inductive learning methods (16] to find fuzzy descrip- tions of the clusters. Using the previously found clus- ters, we can make use of supervised learning methods in an unsupervised environment by considering the cluster membership as goal parameter. Descriptions are com- posed using fuzzy predicates of the form "x is/is not/is at least/is at most A", where xis the parameter under consideration, and A is a linguistic expression modeled by a fuzzy set. We applied this method to several real-world data sets. Through the combination of the two-dimensional pro- jection of the data using the SOM and the fuzzy descrip- tions generated, the results are not only significant and accurate, but also very intuitive. Page:l780

description

 

Transcript of Drobics, m. 2001: datamining using synergiesbetween self-organising maps and inductive learning on...

  • 1. Data Mining Using Synergies Between Self-Organizing Maps and Inductive Learning of Fuzzy Rules MARIO DROBICS, ULRICH BODENHOFER, WERNER WINIWARTERSoftware Competence Center Hagenberg A-4232 Hagenberg, Austria {mario.drobics/ulrich.bodenhofer!wemer. winiwarter} @scch.at ERICH PETER KLEMENTFuzzy Logic Laboratorium Linz-Hagenberg Johannes Kepler Universitat Linz A-4040 Linz, Austria [email protected] Abstract Identifying structures in large data sets raises a number of problems. On the one hand, many methods cannot be applied to larger data sets, while, on the other hand, the results are often hard to interpret. We address these problems by a novel three-stage approach. First. we compute a small representation of the input data using a self-organizing map. This reduces the amount of data and allows us to create two-dimensional plots of the data. Then we use this preprocessed information to identify clusters of similarity. Finally, inductive learning methods are applied to generate sets of fuzzy descriptions of these clusters. This approach is applied to three case studies, including image data and real-world data sets. The results illustrate the generality and intuitiveness of the proposed method.1. Introduction In this paper, we address the problem of identifying and describing regions of interest in large data sets. Decision trees (171 or other inductive learning methods which are often used for this task are not always sufficient, or not even applicable. Clustering methods [2], on the other hand, do not offer sufficient insight into the data structure as their results are often hard to display and interpret. This contribution is devoted to a novel three-stage faulttolerant approach to data mining which is able to produce qualitative infonnationfrom very large data sets. In the first stage, self-organizing maps (SOMs) [ !3) are used to compress the data to a reasonable amount of nodes which still contain all significant information0-7803-7078-3/01/$10.00 (C)20011EEE.while eliminating possible data faults, such as noise, outliers, and missing values. While, for the purpose of data reduction, other methods may be used as well [7), SOMs have the advantages that they preserve the topology of the data space and allow to display the results in two-dimensional plots [6]. The second step is concerned with identifying significant regions of interest within the SOM nodes using a modified fuzzy c-means algorithm [3 , lO]. In our variant, the two main drawbacks of the standard fuzzy cmeans algorithm are resolved in a very simple and elegant way. Firstly, the clustering is not influenced by outliers anymore, as they have been eliminated by the SOM in the pre-processing steP,. Secondly, initialization is handled by computing a crisp Ward clustering [2] first to find initial values for the cluster centers. In the third and last stage, we create linguistic descriptions of the centers of these clusters which help the analyst to interpret the results. As the number of data samples under consideration has been reduced tremendously by using the SOM nodes, we are able to apply inductive learning methods (16] to find fuzzy descriptions of the clusters . Using the previously found clusters, we can make use of supervised learning methods in an unsupervised environment by considering the cluster membership as goal parameter. Descriptions are composed using fuzzy predicates of the form "x is/is not/is at least/is at most A", where xis the parameter under consideration, and A is a linguistic expression modeled by a fuzzy set. We applied this method to several real-world data sets. Through the combination of the two-dimensional projection of the data using the SOM and the fuzzy descriptions generated, the results are not only significant and accurate, but also very intuitive.Page:l780

2. 2. Preprocessing To reduce computational effort without loosing significant information, we use sel[-organizi11g maps to create a mapping of the data space. Self-organizing maps use unsupervised competitive teaming to create a topologypreserving map of the data. Let us assume that we are given a data set / consisting of K samples I = { x 1, , xK} from an 11-dimensional real space 1(