A general framework for the distance–decay of similarity ...
The Google Similarity Distance
description
Transcript of The Google Similarity Distance
1Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Presenter: Chien-Hsing Chen
Author: Rudi L. Cilibrasi
Paul M.B. Vitanyi
The Google Similarity Distance
2007,TKDE
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outline
Motivation Objective NGD Experiments Conclusions Personal Opinion
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
great cost of designing structures capable of manipulating knowledge
entering high quality contents in these structures by knowledgeable human experts
the efforts are long-running
large scale
4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objective
The author develop a method that uses only the name of an object and obtains knowledge about the similarity of objects
a regular FCA, used in Ontology, acquires the similarity between objects and attributes
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.The Google Similarity Distance
Kolmogorov complexity
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.The Google Similarity Distance
NGD (horse, rider) = 0.443“horse” 46,700,000 pages
“rider” 12,200,000 pages
“horse, rider” 2,630,000 pages
N= Indexed 8,058,044,651 pages
NGD(pensi, cola)=0.797NGD( 賓拉登 , 攻擊 )=0.64NGD(horse, rider)=0.898NGD(book, drink)=0.694NGD(web, network)=0.2768
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Applications and Experiments
Hierarchical ClusteringGiven a set of objects in a space provided with a distance measure, the matrix has as entries the pairwise distances between the objects.
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Applications and Experiments
Hierarchical ClusteringDataset: 17th Century painters
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Applications and Experiments
SVM-NGD LearningThe author uses the anchor words to convert each of the 40 training words w1, …, w40 to 6-dimensional training vector v1,…v40.
The entry vj,i of vj=(vj,1,…,vj,6) is defined as vj,i=NGD(wj,ai) (1j 40, 1 i 6)≦ ≦ ≦ ≦
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.NGD Translation
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Comparison to WordNet semantics
Randomly selected 100 semantic categories from the WordNet database
for each category, SVM is trained on 50 labeled training samplesPositive examples are from WordNet, others are from dictionary
Per experiment is used a total of six anchors, 3 are from WordNet, 3 are from dictionary
Testing dataset, 20 new examples
Running with 100 experiments
The author ignores the false negatives
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusion
This knowledge base was created over the course of decades by paid human experts.
Google has already indexed more than 8 billion pages and shows no signs of slowing down.
Someone who estimated the 8-billion indexed pages was in 2004.
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Opinion
AdvantageGoogle search engine was respected recently for similarity measure.
Drawbackanchors determination, accuracy measure (ignore false-negative)
NGD is a nothing novel but a demonstration straightly
Application