Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS...
-
Upload
hugo-banks -
Category
Documents
-
view
216 -
download
2
Transcript of Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS...
Intelligent Database Systems Lab
Presenter : Chang,Chun-Chih
Authors : CHRISTOS BOURAS , VASSILIS TSOGKAS
2012, KBS
A clustering technique for news articles using WordNet
Intelligent Database Systems Lab
Outlines
MotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
Motivation
• Document clustering is a powerful technique that has been widely.
• That some of the problems like synonymy, ambiguity and lack of a descriptive content marking of the generated clusters.
Intelligent Database Systems Lab
Objectives
• We are proposing the enhancement of standard k-means algorithm using the external knowledge from WordNet hypernyms .
• The proposed method enabled significantly improves k-means generating also useful and high quality cluster.
Intelligent Database Systems Lab
Methodology-Framework
Intelligent Database Systems Lab
Methodology -Euclidian Distance & City-block Distance
Intelligent Database Systems Lab
Methodology - Pearson
Intelligent Database Systems Lab
Methodology - Cosine Distance
Intelligent Database Systems Lab
Methodology - Spearman-rank Distance
Intelligent Database Systems Lab
Methodology -Kendall Distance
Intelligent Database Systems Lab
Methodology - Comparison of various methods
Euclidian
City-Block
Cosine
Kendall Spearman
Pearson
Intelligent Database Systems Lab
Methodology - heuristic function
For Example for ‘fruit’ d=9 , f=2 then W=0.9954 For Example for ‘edible fruit’ d=7 , f=1 then W=0.8915’
For Example for ‘food’ d=5 , f=1 then W=0.6534
Intelligent Database Systems Lab
Methodology - Enriching news articles using WordNet hypernyms
Intelligent Database Systems Lab
Methodology - Labeling clusters using WordNet hypernyms
Intelligent Database Systems Lab
Methodology - News article’s clustering using W-k means
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
With WordNet use
Without WordNet use
0.0001 →
←0.0010
1.000
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Conclusions
• From the plethora of similarity measures that have been used, the appliance of Euclidian and cosine
k-means produced the best results.
• We have also presented a novel algorithmic approach towards enhancing the k-means algorithm using knowledge from an external database, WordNet.
Intelligent Database Systems Lab
Comments
• Advantages -The resulting labels are with high precision• Applications -News clustering -Cluster labeling