Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps

of 44 /44
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez NLP & IR Group, UNED December 12, 2008

Embed Size (px)

description

http://nlp.uned.es/~alpgarcia/pub_index.php

Transcript of Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps

  • 1. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez NLP & IR Group, UNED December 12, 2008

2. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our ApproachExperiment Description ResultsConclusionTable of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNEDslide 2 3. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our ApproachExperiment Description ResultsConclusionTable of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNEDslide 3 4. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusionObjectivesGroup HTML documents by content similarity.Self-Organizing Maps (SOM) to organize, visualize andnavigate through the collection.Term weighting function taking advantage of HTML tagsCombining, by means of fuzzy logic, heuristic criteria based onthe inherent semantics of some HTML tags and word positionsin the document. Hypothesis An improvement in document representation will involve an increase in map quality.Alberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 4 5. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our ApproachExperiment Description ResultsConclusionTable of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNEDslide 5 6. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Fuzzy logicCapturing human expert knowledge.Close to natural language.Knowledge base: dened by a set of IF-THEN rules.Linguistic variablesDened using natural language words and fuzzy sets.These sets allow the description of the membership degree ofan object to a particular class.Alberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 6 7. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our ApproachExperiment Description ResultsConclusionTable of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNEDslide 7 8. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 8 9. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 9 10. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 10 11. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 11 12. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 12 13. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 13 14. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 14 15. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 15 16. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 16 17. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 17 18. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 18 19. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our ApproachExperiment Description ResultsConclusionTable of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNEDslide 19 20. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusionLinguistic VariablesAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 20 21. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusionLinguistic VariablesAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 21 22. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusionLinguistic VariablesAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 22 23. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusionLinguistic VariablesAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 23 24. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusionLinguistic VariablesAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 24 25. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusionLinguistic VariablesAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 25 26. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our ApproachExperiment Description ResultsConclusionTable of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNEDslide 26 27. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Knowledge BaseAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 27 28. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Knowledge BaseAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 28 29. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Knowledge BaseAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 29 30. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Knowledge BaseAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 30 31. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our ApproachExperiment Description ResultsConclusionTable of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 ConclusionAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNEDslide 31 32. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusionDimensionality ReductionInput vectors dimension ranging from 100 to 5000Stopwords, puntuaction marks suxes, and words occurringless than 50 times in the whole corpus were removed.Two well known methods:Document frequency reduction.Random projection method.Three proposed rank-based methods:Most Valued Terms.Fixed reduction method.More Frequent Terms until n level.Alberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 32 33. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our ApproachExperiment Description ResultsConclusionTable of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 ConclusionAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNEDslide 33 34. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur Approach Experiment Description ResultsConclusionDocument Map ConstructionBenchmark dataset for clustering: Banksearch110000 documents10 classesSOM size was set equal to the number of classes of inputdocuments, i.e. 5x2, in order to compare clustering results.1M. P. Sinka and D. W. Corne. A large benchmark dataset for web document clustering. Soft Computing Systems: Design, Management, and Applications, 2002.Alberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNEDslide 34 35. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our ApproachExperiment Description ResultsConclusionTable of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 ConclusionAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNEDslide 35 36. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusionEvaluation MethodsWeighted average of the F-measure for each class.After mapping the collection in the trained map, the classwith greater number of documents mapped on a neuron willbe selected to label the unit.All the document vectors in a neuron which class is dierentfrom the neuron label will be counted as errors.Alberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 36 37. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our ApproachExperiment Description ResultsConclusionTable of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNEDslide 37 38. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion Best reduction for each term weighting functionAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 38 39. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion MFTn reduction provides stabilityAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 39 40. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion EFCC+MFTn obtains its best results with the smallest number of featuresAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 40 41. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our ApproachExperiment Description ResultsConclusionTable of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNEDslide 41 42. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusion ConclusionUnsupervised document representation method, based onfuzzy logic, focused on clustering HTML documents by meansof self-organizing maps.MFTn reduction is the most stable reduction in all cases.EFCC representation allows to obtain better results using asmaller vocabulary.Smaller number of features needed to represent the inputdocuments and SOM unit vectors, which implies animprovement in computational cost.Alberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 42 43. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description ResultsConclusionThank You!Alberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNED slide 43 44. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectivesOur ApproachExperiment Description Results Conclusion Related Work VSM Topic DocumentWeighting Modies Information TypeFunction SOM Self organization of a Massive Document Yes Yes Text Shannons EntrophyNo Collection2 Document ClusteringYesNo Text Binary, TF, TF-IDFNo using Phrases3 Document ClusteringYes Yes TextESVM, HSVM, HyMNo using WordNet4 Conceptional SOM5YesNo TextTF Yes2T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a massive document collection. IEEE Trans. on Neural Networks, 2000.3J. Bakus, M. Hussin, and M. Kamel. A som-based document clustering using phrases. In ICONIP, 2002.4C. Hung and S. Wermter. Neural network based document clustering using wordnet ontologies. Int. J. Hybrid Intell. Syst., 20045Y. Liu, X. Wang, and C. Wu. Consom: A conceptional som model for text clustering. In Neurocomputing, 2008Alberto P. Garc a-Plaza, Vctor Fresno, Raquel Mart nez, NLP & IR Group, UNEDslide 44