A Latent Semantic Indexing-based approach to multilingual document clastering
-
Upload
kylan-gamble -
Category
Documents
-
view
48 -
download
0
description
Transcript of A Latent Semantic Indexing-based approach to multilingual document clastering
![Page 1: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/1.jpg)
Chih-Ping Wei, Christopher C. Yang, Chia-Min Lin
Decision Support Systems 45 (2008) 606-620
Reporter : Yi Ru, Lee
![Page 2: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/2.jpg)
Introduction
Latent Semantic Indexing(LSI)
LSI-based multilingual document clustering technique
Empirical evaluation
Conclusion
2
![Page 3: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/3.jpg)
Translation-basedSynonymy Polysemyvocabulary
Multilingual spaceLatent Semantic Indexing(LSI)Lexical matchingReduce the dimensions
3
![Page 4: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/4.jpg)
Singular Value Decomposition (SVD)
kX kU TkVk
4
![Page 5: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/5.jpg)
diUdi
diUdi
Tkk
Tk
1
5
![Page 6: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/6.jpg)
6
![Page 7: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/7.jpg)
Multilingual semantic space analysis
7
![Page 8: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/8.jpg)
Document folding-in
8
![Page 9: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/9.jpg)
Dimension Selection
i
jiwDjDL )( Dj denote the LSI dimension j
Wji is the weight of document i in Dj
9
![Page 10: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/10.jpg)
Clustering
Hierarchical clustering algorithm
n
i
n
iii
n
iii
yx
yxYX
1 1
22
1),cos(
10
![Page 11: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/11.jpg)
11
![Page 12: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/12.jpg)
TA
CACR )(recallcluster
GA
CACP )(precisioncluster
TA is the set of associations in the true categories.
GA is the set of associations in the clusters generated by the document clustering technique. CA is the set of correct associations that exists in both the clusters and the true categories.
12
![Page 13: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/13.jpg)
Examples
TA={(e1−e2),(c1−c2), (e1−c1), (e1−c2), (e2−c1), (e2−c2), (e3−e4),(c3−c4), (c3−c5), (c4−c5), (e3−c3), (e3−c4), (e3−c5), (e4−c3), (e4−c4), (e4−c5)}
GA={(e1−e2), (c1−c3), (e1−c1), (e1−c3), (e2−c1), (e2−c3), (e3−e4), (e3−c2), (e4−c2), (c4−c5)}
CA={(e1−e2), (e1−c1), (e2−c1), (e3−e4), (c4−c5)}
13
![Page 14: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/14.jpg)
PRT curves of the LSI-based MLDC technique
14
![Page 15: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/15.jpg)
Comparisons of different representation schemes
15
![Page 16: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/16.jpg)
Effect of dimension selection (h=5 for MLDC with dimension selection; k=5 for MLDC without dimension selection)
16
![Page 17: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/17.jpg)
Effect of dimension selection (h=20 for MLDC with dimension selection; k=20 for MLDC without dimension selection)
17
![Page 18: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/18.jpg)
Best scenario versus best scenario comparison
18
![Page 19: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/19.jpg)
PRT curves of overall, monolingual, and cross-lingual performance
19
![Page 20: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/20.jpg)
monolingual PRT curve > overall PRT curve > cross-lingual PRT curve
Specific domain
20
![Page 21: A Latent Semantic Indexing-based approach to multilingual document clastering](https://reader030.fdocuments.in/reader030/viewer/2022033022/56812c61550346895d90f084/html5/thumbnails/21.jpg)
Thank you
21