Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS...

23
Intelligent Database Systems Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS , VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles using WordNet

Transcript of Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS...

Page 1: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Presenter : Chang,Chun-Chih

Authors : CHRISTOS BOURAS , VASSILIS TSOGKAS

2012, KBS

A clustering technique for news articles using WordNet

Page 2: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Motivation

• Document clustering is a powerful technique that has been widely.

• That some of the problems like synonymy, ambiguity and lack of a descriptive content marking of the generated clusters.

Page 4: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Objectives

• We are proposing the enhancement of standard k-means algorithm using the external knowledge from WordNet hypernyms .

• The proposed method enabled significantly improves k-means generating also useful and high quality cluster.

Page 5: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Methodology-Framework

Page 6: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Methodology -Euclidian Distance & City-block Distance

Page 7: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Methodology - Pearson

Page 8: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Methodology - Cosine Distance

Page 9: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Methodology - Spearman-rank Distance

Page 10: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Methodology -Kendall Distance

Page 11: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Methodology - Comparison of various methods

Euclidian

City-Block

Cosine

Kendall Spearman

Pearson

Page 12: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Methodology - heuristic function

For Example for ‘fruit’ d=9 , f=2 then W=0.9954 For Example for ‘edible fruit’ d=7 , f=1 then W=0.8915’

For Example for ‘food’ d=5 , f=1 then W=0.6534

Page 13: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Methodology - Enriching news articles using WordNet hypernyms

Page 14: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Methodology - Labeling clusters using WordNet hypernyms

Page 15: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Methodology - News article’s clustering using W-k means

Page 16: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Experiments

Page 17: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Experiments

Page 18: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Experiments

With WordNet use

Without WordNet use

0.0001 →

←0.0010

1.000

Page 19: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Experiments

Page 20: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Experiments

Page 21: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Experiments

Page 22: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Conclusions

• From the plethora of similarity measures that have been used, the appliance of Euclidian and cosine

k-means produced the best results.

• We have also presented a novel algorithmic approach towards enhancing the k-means algorithm using knowledge from an external database, WordNet.

Page 23: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab

Comments

• Advantages -The resulting labels are with high precision• Applications -News clustering -Cluster labeling