Chinese analogy search considering multi-relations Zhao Lu Department of Computer Science and...

Chinese analogy search considering multi-relations

Zhao Lu

Department of Computer Science and Technology, East China Normal University, Shanghai, China

CSC2012

Our problem

Latent Relation Search is a recently proposed query-by-example technique that aims at solving queries in which the user specifies a triplet of terms (A,B,C) and seeks from a search engine a fourth term D whose relationship with C is analogous to that of A and B.

For example, Huo Qigang and Guo Jingjing is a couple. If the name Yao Ming is given, we can find out Yao Ming’s wife Ye Li.

The relation between Yao Ming and Ye Li is highly similar to Huo Qigang and Guo Jingjing .

CSC2012

Contribution

We propose a hybrid method to represent relations

between word-pairs using bag of words and lexical

patterns.

We count frequency and weight of word.

A k-means clustering method is used to extract all the

relation words representing different relationships

between word pair (A, B).

CSC2012

Three Kinds of Relation Mapping

OTO

OTMMR

CSC2012

Extracting relation-words

1. Extract the complete sentences containing A and B.

2. Word segmentation and POS tagging.

Preprocessing Modular

CSC2012

Extract relation-words by lexical pattern

XRY XRvY, XRnYX*RY X对 RnY, X之 RnY, X与 RnY, X的 RnY,

X和 RnY, X会 RvYX*R*Y X的 Rn是 Y, X的 Rn叫 Y, X的 Rn叫 Y,

X及其 RnY, X怎样 RvY, X如何 RvY,X完成 RvY, XRv的 RnY

X*R*Y X做 YRn, X由 YRn, X等 YRn, X出任 YRnXY*R XY的 Rn, XY是 Rn

We count the frequency and weight of each word. The definition of weight is the times of the word which occurs

in a sentence that match a lexical pattern.

Table 1： Lexical patterns

CSC2012

Clustering using a k-means method

In order to distinguish the different words on behalf of different relations, we use the k-means clustering to clarify the words into different clusters.

After clustering, we select the word with the highest frequency and weight value as the relation-representing word.

Extracting Target Words in the same way

CSC2012

Experiment evaluationsID A B C D

Person name, Person name

5 Li Yapeng Li Yan Zhao Benshan Zhao Yufang

7 Liu Xiang Sun HaiPing Lin Dan Tang XianHu

9 Huang Jiaqiang Huang Jiaju Jiang Wu Jiang Wen

17 Lin Daiyu Chen Xiaoxu Shaseng Yan Huaili

29 Zhao Benshan Xiaoshenyang Hou Yaowen Guo Degang

31 Cai Zhuoyan Xu Hao Cai Shaofen Xu Yi

47 Dan Lin Xie Xingfang Yao Ming Ye Li

Place name, Common noun

11 Korea Won Britain Pound

19 Cangshan Garlic Laiyang Pear

21 Korea Muay Thai Japan Sumo

25 India Flying cake Japan Sushi

27 Tibet Buttered tea Shandong Pancake

43 Egypt Pyramid Indonesia Borobudur

CSC2012

Common noun, Common noun

13 Butterfly Pupa Frog Pollywog

45 Panda Bamboo Eagle Snake

33 Whale Mammalia Ostrich Birds

35 Mars Phobos Jupiter Io

Proper nouns, Person name

1 Luo Guanzhong Sanguoyanyi Wu Chengen Xiyouji

3 Buddhism Sakyamuni Christianity Jesus

39 Chibi Wu Yusen Mei Lanfang Chen Kaige

Person name, place name

41 BaimaTemple Yin Le Shaolin Temple Shi Yongxin

37 Korean Li Mingbo Russian Putin

Proper noun, Person name

15 China Mobile Wang Jianzhou China Unicom Chang Xiaobing

Place name, Place name

23 France Paris Britain London

Proper noun, Proper noun

49 Oracle sun Google youtube

ID A B C D

Experiment Results

Rank168%

Rank26%

Rnak316%

Rank44% miss

6%

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 490

1

2

3

4

5

11

22

11111

33

2

111111

3

11

3

1

0

3

0

44

3

111111111

3

11

3

1

0

111111

Fig.3 The relation-word ranks for test cases

Fig.2 Percentage of questions which target words at various rank

CSC2012

MRR and Percentage of Target Words at Different Rank

MRR @1 @5 @10 @20

CAS 0.773 68.0 94.0 94.0 94.0

CMB 0.474 40.0 57.3 61.3 62.3

CNJ 0.545 43.3 68.3 72.3 76.0

CSC2012

Conclusion

A Chinese Analogy search method is proposed. Different relationships between the entities are

distinguished by k-means clustering.Our approach achieves a MRR of 0.773 which is

higher than existing methods.

CSC2012

Future work

•In the future, we will focus on the way to distinguish the three kinds of relation mapping automatically .

•Some method like SVM will be applied to raise the accuracy of extracting relation-words.

CSC2012

Chinese analogy search considering multi-relations Zhao Lu Department of Computer Science and...

Documents

Transcript of Chinese analogy search considering multi-relations Zhao Lu Department of Computer Science and...