1 Mining the Web to Determine Similarity Between Words, Objects, and Communities Author : Mehran...
-
Upload
osborn-walker -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Mining the Web to Determine Similarity Between Words, Objects, and Communities Author : Mehran...
1
Mining the Web to Determine Similarity Between Words, Objects, and Communities
Author : Mehran Sahami
Reporter : Tse Ho Lin
2007/9/10
FLAIRS, 2006
2
Outline
Motivation Objectives Methodology
Words Objects Communities
Experiments Conclusion Personal Comments
3
Motivation
Words Many similarity measure are term-wise similarity.
Objects Users may be looking to find the same item sold at
different vendors on the web. Communities
Users are seeking to find others with similar interests.
Cos(“space exploration”, “NASA”)
4
Objectives
We begin by describing a robust method for measuring the semantic similarity between short texts.
We then examine the use of machine learning to produce similarity functions between semi-structured data elements.
We measure the similarity between on –line communities of users as part of a recommendation system.
5
Methodology – Words
Retrieved documentsRetrieved documents
Compute the TFIDF term vector
Compute the TFIDF term vector
idnddd ,...,, 21
Query x
Truncate top m weighted terms
Query y
vi
6
Methodology – Objects
Product Name ISBN CategorizationMe Talk Pretty One Day Paperback Edition 0316776963 Books
Product Name ISBN CategorizationThe Tiny Book of Boss Jokes 0007152604 Books
1f 2f 3f
1R
Compute similaritybetween fields
Compute similaritybetween fields
Training the parameters
Training the parameters ClusteringClustering
2R
),( 12111 RRf ),( 22212 RRf ),( 32313 RRf
7
Methodology – Communities
Joachims’ Combine Ranking
B, R: Community
8
Experiments
Words
Objects
9
ExperimentsCommunities
m: The user is already a member of the recommended communityn: The user visits but does not join the recommended communityj: The user joins the recommended community
L2, MI1, MI2, IDF, L1, LogOdds.
10
Conclusion
In this paper we have presented several web-based applications where measuring the similarity between different entities is an important element for success.
11
Personal Comments
Application Similarity Measure, Record linkage.
Advantage The proposed approaches use large quantity of
available on-line information.
Drawback The author doesn’t compare with other related methods
in the experiment.
Parameters Training
12