Scaling Up Word Sense Disambiguation via Parallel Texts

29
1 Scaling Up Word Sense Disambiguation via Parallel Texts Yee Seng Chan Hwee Tou Ng Department of Computer Science National University of Singapore

description

Scaling Up Word Sense Disambiguation via Parallel Texts. Yee Seng Chan Hwee Tou Ng Department of Computer Science National University of Singapore. Supervised WSD. Word Sense Disambiguation (WSD) Identifying the correct meaning, or sense, of a word in context Supervised learning - PowerPoint PPT Presentation

Transcript of Scaling Up Word Sense Disambiguation via Parallel Texts

Page 1: Scaling Up  Word Sense Disambiguation via Parallel Texts

1

Scaling Up Word Sense Disambiguation

via Parallel Texts

Yee Seng Chan

Hwee Tou Ng

Department of Computer Science

National University of Singapore

Page 2: Scaling Up  Word Sense Disambiguation via Parallel Texts

2

Supervised WSD

• Word Sense Disambiguation (WSD)– Identifying the correct meaning, or sense, of a

word in context• Supervised learning

– Successful approach– Collect corpus where each ambiguous word is

annotated with the correct sense– Current systems usually rely on SEMCOR, a

relatively small manually annotated corpus, affecting scalability

Page 3: Scaling Up  Word Sense Disambiguation via Parallel Texts

3

Data Acquisition

• Need to tackle data acquisition bottleneck• Manually annotated corpora:

– DSO corpus (Ng & Lee, 1996)– Open Mind Word Expert (OMWE) (Chklovski &

Mihalcea, 2002)

• Parallel texts:– Our prior work (Ng, Wang, & Chan, 2003)

exploited English-Chinese parallel texts for WSD

Page 4: Scaling Up  Word Sense Disambiguation via Parallel Texts

4

WordNet Senses of channel

• Sense 1: A path over which electrical signals can pass

• Sense 2: A passage for water• Sense 3: A long narrow furrow• Sense 4: A relatively narrow body of water• Sense 5: A means of communication or

access• Sense 6: A bodily passage or tube• Sense 7: A television station and its programs

Page 5: Scaling Up  Word Sense Disambiguation via Parallel Texts

5

Chinese Translations of channel

• Sense 1: 频道 (pin dao)• Sense 2: 水道 (shui dao), 水渠 (shui qu), 排

水渠 (pai shui qu)• Sense 3: 沟 (gou)• Sense 4: 海峡 (hai xia)• Sense 5: 途径 (tu jing)• Sense 6: 导管 (dao guan)• Sense 7: 频道 (pin dao)

Page 6: Scaling Up  Word Sense Disambiguation via Parallel Texts

6

Parallel Texts for WSD

…The institutions have already consulted the staff concerned through various channels, including discussion with the staff representatives.…

…有关院校已透过不同的途径征询校内有关员工的意见,包括与有关的职员代表磋商…

途径(tu jing): “sense tag”

Page 7: Scaling Up  Word Sense Disambiguation via Parallel Texts

7

Approach

1. Use manually translated English-Chinese parallel texts

2. Parallel text alignment3. Manually provide Chinese translations for

WordNet senses of a word (serve as “sense-tags”)

4. Gather training examples from the English portion of parallel texts

5. Train WSD classifiers to disambiguate English words in new contexts

Page 8: Scaling Up  Word Sense Disambiguation via Parallel Texts

8

Issues

• (Ng, Wang, & Chan 2003) evaluated on 22 nouns. Can this approach scale up to a large set of nouns?

• Previous evaluation was on lumped senses. How would it perform in a fine-grained disambiguation setting?

• In practice, would any difficulties arise in the gathering of training examples from parallel texts?

Page 9: Scaling Up  Word Sense Disambiguation via Parallel Texts

9

Size of Parallel CorporaParallel Corpora English

(Mwords/MB)Chinese

(Mchars/MB)

Hong Kong Hansards 39.9 / 223.2 35.4 / 146.8

Hong Kong News 16.8 / 96.4 15.3 / 67.6

Hong Kong Laws 9.9 / 53.7 9.2 / 37.5

Sinorama 3.8 / 20.5 3.3 / 13.5

Xinhua News 2.1 / 11.9 2.1 / 8.9

English Translation of Chinese Treebank

0.1 / 0.7 0.1 / 0.4

Sub-total 72.6 / 406.4 65.4 / 274.7

Total 138 / 681.1

Page 10: Scaling Up  Word Sense Disambiguation via Parallel Texts

10

Parallel Text Alignment

• Sentence alignment:– Corpora available in sentence-aligned form

• Pre-processing:– English: tokenization– Chinese: word segmentation

• Word alignment:– GIZA++ (Och & Ney, 2000)

Page 11: Scaling Up  Word Sense Disambiguation via Parallel Texts

11

Selection of Translations• WordNet 1.7 as sense inventory• Chinese translations from 2 sources:

– Oxford Advanced Learner’s English-Chinese dictionary– Kingsoft Powerword 2003 (Chinese translation of the American

Heritage dictionary)

– Providing Chinese translations for all the WordNet senses of a word takes 15 minutes on average.

• If the same Chinese translation is assigned to several senses, only the least numbered sense will have a valid translation

Oxford definition entries

for channel

Kingsoft Powerworddefinition entries

for channel

WordNet sense entries for channel

Page 12: Scaling Up  Word Sense Disambiguation via Parallel Texts

12

Scope of Experiments

• Aim: scale up to a large set of nouns

• Frequently occurring nouns are highly ambiguous.

• Maximize benefits: – Select 800 most frequent noun types in the

Brown corpus (BC)– Represents 60% of noun tokens in BC

Page 13: Scaling Up  Word Sense Disambiguation via Parallel Texts

13

WSD

• Used the WSD program of (Lee & Ng, 2002)

• Knowledge sources: parts-of-speech, surrounding words, local collocations

• Learning algorithm: Naïve Bayes

• Achieves state-of-the-art WSD accuracy

Page 14: Scaling Up  Word Sense Disambiguation via Parallel Texts

14

Evaluation Set

• Suitable evaluation data set: set of nouns in the SENSEVAL-2 English all-words task

Page 15: Scaling Up  Word Sense Disambiguation via Parallel Texts

15

Summary Figures

Noun set No. of noun types

No. of noun

tokens

WNs1 accuracy

(%)

Avg. no. of senses

All nouns 437 1067 71.9 4.23

MFSet 212 494 61.1 5.89

All − MFSet

225 573 81.2 2.67

Page 16: Scaling Up  Word Sense Disambiguation via Parallel Texts

16

Evaluation on MFSet

• Gather parallel text examples for nouns in MFSet

• For comparison, what is the accuracy of training on manually annotated examples?– SEMCOR (SC)– SEMCOR + OMWE (SC+OM)

Page 17: Scaling Up  Word Sense Disambiguation via Parallel Texts

17

Evaluation Results (in %)

System

Evaluation set

MFSet

S1 (best SE2 system) 72.9

S2 65.4

S3 64.4

WNs1 (WordNet sense 1) 61.1

SC (SEMCOR) 67.8

SC+OM (SEMCOR + OMWE) 68.4

P1 (parallel text) 69.6

Page 18: Scaling Up  Word Sense Disambiguation via Parallel Texts

18

Evaluation on All Nouns

• Want an indication of P1 performance on all nouns

• Expanded evaluation set to all nouns in SENSEVAL-2 English all-words task

• Used WNs1 strategy for nouns where parallel text examples are not available

Page 19: Scaling Up  Word Sense Disambiguation via Parallel Texts

19

Evaluation Results (in %)

System

Evaluation set

MFSet All nouns

S1 (best SE2 system) 72.9 78.0

S2 65.4 74.5

S3 64.4 70.0

WNs1 (WordNet sense 1) 61.1 71.9

SC (SEMCOR) 67.8 76.2

SC+OM (SEMCOR + OMWE) 68.4 76.5

P1 (parallel text) 69.6 75.8

Page 20: Scaling Up  Word Sense Disambiguation via Parallel Texts

20

Lack of Matches• Lack of matching English occurrences for

some Chinese translations:– Sense 7 of noun report:

» “the general estimation that the public has for a person”

» assigned translation “ 名声” (ming sheng)

– In parallel corpus, no occurrences of report aligned to “ 名声” (ming sheng)

– No examples gathered for sense 7 of report– Affects recall

Page 21: Scaling Up  Word Sense Disambiguation via Parallel Texts

21

Examples from other Nouns

• Can gather examples for sense 7 of report from other English nouns having the same corresponding Chinese translations:

名声 (ming sheng)

Sense 7 of report:“the general estimation that the public has for a person”

Sense 3 of name:“a person’s reputation”

Page 22: Scaling Up  Word Sense Disambiguation via Parallel Texts

22

Evaluation Results (in %)

System

Evaluation set

MFSet All nouns

S1 (best SE2 system) 72.9 78.0

S2 65.4 74.5

S3 64.4 70.0

WNs1 (WordNet sense 1) 61.1 71.9

SC (SEMCOR) 67.8 76.2

SC+OM (SEMCOR + OMWE) 68.4 76.5

P1 (parallel text) 69.6 75.8

P2 (P1 + noun substitution) 70.7 76.3

Page 23: Scaling Up  Word Sense Disambiguation via Parallel Texts

23

JCN Measure

• Semantic distance measure of Jiang & Conrath (1997), provides a reliable estimate of the distance between two WordNet synsets: Dist(s1,s2)

• JCN– Information content (IC) of concept c:

– Link strength LS(c,p) of edge:

– Distance between two synsets:

Page 24: Scaling Up  Word Sense Disambiguation via Parallel Texts

24

Similarity Measure

• We used the WordNet Similarity package (Pedersen, Patwardhan & Michelizzi, 2004):– provide a similarity score between WordNet

synsets based on jcn measure: jcn(s1,s2) = 1/Dist(s1,s2)

– In earlier example, obtain similarity score jcn(s1,s2), where:

» s1 = sense 7 of report» s2 = sense 3 of name

Page 25: Scaling Up  Word Sense Disambiguation via Parallel Texts

25

Incorporating JCN Measure

• In performing WSD with a naïve Bayes classifier, sense s assigned to example with features f1, …, fn is chosen so as to maximize:

• A training example gathered from another English noun based on a common Chinese translation contributes a fractional count to Count(s) and Count(fj,s), based on jcn(s1,s2).

Page 26: Scaling Up  Word Sense Disambiguation via Parallel Texts

26

Evaluation Results (in %)

System

Evaluation set

MFSet All nouns

S1 (best SE2 system) 72.9 78.0

S2 65.4 74.5

S3 64.4 70.0

WNs1 (WordNet sense 1) 61.1 71.9

SC (SEMCOR) 67.8 76.2

SC+OM (SEMCOR + OMWE) 68.4 76.5

P1 (parallel texts) 69.6 75.8

P2 (P1 + noun substitution) 70.7 76.3

P2jcn (P2 + jcn) 72.7 77.2

Page 27: Scaling Up  Word Sense Disambiguation via Parallel Texts

27

Paired t-test for MFSetSystem S1 P1 P2 P2jcn SC SC+OM WNs1

S1 * ~ ~ ~ >> > >>P1 * ~ << ~ ~ >>P2 * < > ~ >>P2jcn * >> > >>SC * ~ >>SC+OM * >>WNs1 *

“>>”, “<<”: p-value ≤ 0.01“>”, “<”: p-value (0.01, 0.05] “~”: p-value > 0.05

Page 28: Scaling Up  Word Sense Disambiguation via Parallel Texts

28

Paired t-test for All NounsSystem S1 P1 P2 P2jcn SC SC+OM WNs1

S1 * > ~ ~ ~ ~ >>P1 * ~ < ~ ~ >>P2 * ~ ~ ~ >>P2jcn * ~ ~ >>SC * ~ >>SC+OM * >>WNs1 *

“>>”, “<<”: p-value ≤ 0.01“>”, “<”: p-value (0.01, 0.05] “~”: p-value > 0.05

Page 29: Scaling Up  Word Sense Disambiguation via Parallel Texts

29

Conclusion

• Tackling the data acquisition bottleneck is crucial

• Gathering examples for WSD from parallel texts is scalable to a large set of nouns

• Training on parallel text examples can outperform training on manually annotated data, and achieves performance comparable to the best system of SENSEVAL-2 English all-words task