Chinese analogy search considering multi-relations Zhao Lu Department of Computer Science and...
-
Upload
isaac-henderson -
Category
Documents
-
view
235 -
download
2
Transcript of Chinese analogy search considering multi-relations Zhao Lu Department of Computer Science and...
Chinese analogy search considering multi-relations
Zhao Lu
Department of Computer Science and Technology, East China Normal University, Shanghai, China
CSC2012
Our problem
Latent Relation Search is a recently proposed query-by-example technique that aims at solving queries in which the user specifies a triplet of terms (A,B,C) and seeks from a search engine a fourth term D whose relationship with C is analogous to that of A and B.
For example, Huo Qigang and Guo Jingjing is a couple. If the name Yao Ming is given, we can find out Yao Ming’s wife Ye Li.
The relation between Yao Ming and Ye Li is highly similar to Huo Qigang and Guo Jingjing .
CSC2012
Contribution
We propose a hybrid method to represent relations
between word-pairs using bag of words and lexical
patterns.
We count frequency and weight of word.
A k-means clustering method is used to extract all the
relation words representing different relationships
between word pair (A, B).
CSC2012
Three Kinds of Relation Mapping
OTO
OTMMR
CSC2012
Extracting relation-words
1. Extract the complete sentences containing A and B.
2. Word segmentation and POS tagging.
Preprocessing Modular
CSC2012
Extract relation-words by lexical pattern
XRY XRvY, XRnYX*RY X对 RnY, X之 RnY, X与 RnY, X的 RnY,
X和 RnY, X会 RvYX*R*Y X的 Rn是 Y, X的 Rn叫 Y, X的 Rn叫 Y,
X及其 RnY, X怎样 RvY, X如何 RvY,X完成 RvY, XRv的 RnY
X*R*Y X做 YRn, X由 YRn, X等 YRn, X出任 YRnXY*R XY的 Rn, XY是 Rn
We count the frequency and weight of each word. The definition of weight is the times of the word which occurs
in a sentence that match a lexical pattern.
Table 1: Lexical patterns
CSC2012
Clustering using a k-means method
In order to distinguish the different words on behalf of different relations, we use the k-means clustering to clarify the words into different clusters.
After clustering, we select the word with the highest frequency and weight value as the relation-representing word.
Extracting Target Words in the same way
CSC2012
Experiment evaluationsID A B C D
Person name, Person name
5 Li Yapeng Li Yan Zhao Benshan Zhao Yufang
7 Liu Xiang Sun HaiPing Lin Dan Tang XianHu
9 Huang Jiaqiang Huang Jiaju Jiang Wu Jiang Wen
17 Lin Daiyu Chen Xiaoxu Shaseng Yan Huaili
29 Zhao Benshan Xiaoshenyang Hou Yaowen Guo Degang
31 Cai Zhuoyan Xu Hao Cai Shaofen Xu Yi
47 Dan Lin Xie Xingfang Yao Ming Ye Li
Place name, Common noun
11 Korea Won Britain Pound
19 Cangshan Garlic Laiyang Pear
21 Korea Muay Thai Japan Sumo
25 India Flying cake Japan Sushi
27 Tibet Buttered tea Shandong Pancake
43 Egypt Pyramid Indonesia Borobudur
CSC2012
Common noun, Common noun
13 Butterfly Pupa Frog Pollywog
45 Panda Bamboo Eagle Snake
33 Whale Mammalia Ostrich Birds
35 Mars Phobos Jupiter Io
Proper nouns, Person name
1 Luo Guanzhong Sanguoyanyi Wu Chengen Xiyouji
3 Buddhism Sakyamuni Christianity Jesus
39 Chibi Wu Yusen Mei Lanfang Chen Kaige
Person name, place name
41 BaimaTemple Yin Le Shaolin Temple Shi Yongxin
37 Korean Li Mingbo Russian Putin
Proper noun, Person name
15 China Mobile Wang Jianzhou China Unicom Chang Xiaobing
Place name, Place name
23 France Paris Britain London
Proper noun, Proper noun
49 Oracle sun Google youtube
ID A B C D
Experiment Results
Rank168%
Rank26%
Rnak316%
Rank44% miss
6%
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 490
1
2
3
4
5
11
22
11111
33
2
111111
3
11
3
1
0
3
0
44
3
111111111
3
11
3
1
0
111111
Fig.3 The relation-word ranks for test cases
Fig.2 Percentage of questions which target words at various rank
CSC2012
MRR and Percentage of Target Words at Different Rank
MRR @1 @5 @10 @20
CAS 0.773 68.0 94.0 94.0 94.0
CMB 0.474 40.0 57.3 61.3 62.3
CNJ 0.545 43.3 68.3 72.3 76.0
CSC2012
Conclusion
A Chinese Analogy search method is proposed. Different relationships between the entities are
distinguished by k-means clustering.Our approach achieves a MRR of 0.773 which is
higher than existing methods.
CSC2012
Future work
•In the future, we will focus on the way to distinguish the three kinds of relation mapping automatically .
•Some method like SVM will be applied to raise the accuracy of extracting relation-words.
CSC2012