Scaling Up Word Sense Disambiguation via Parallel Texts
description
Transcript of Scaling Up Word Sense Disambiguation via Parallel Texts
1
Scaling Up Word Sense Disambiguation
via Parallel Texts
Yee Seng Chan
Hwee Tou Ng
Department of Computer Science
National University of Singapore
2
Supervised WSD
• Word Sense Disambiguation (WSD)– Identifying the correct meaning, or sense, of a
word in context• Supervised learning
– Successful approach– Collect corpus where each ambiguous word is
annotated with the correct sense– Current systems usually rely on SEMCOR, a
relatively small manually annotated corpus, affecting scalability
3
Data Acquisition
• Need to tackle data acquisition bottleneck• Manually annotated corpora:
– DSO corpus (Ng & Lee, 1996)– Open Mind Word Expert (OMWE) (Chklovski &
Mihalcea, 2002)
• Parallel texts:– Our prior work (Ng, Wang, & Chan, 2003)
exploited English-Chinese parallel texts for WSD
4
WordNet Senses of channel
• Sense 1: A path over which electrical signals can pass
• Sense 2: A passage for water• Sense 3: A long narrow furrow• Sense 4: A relatively narrow body of water• Sense 5: A means of communication or
access• Sense 6: A bodily passage or tube• Sense 7: A television station and its programs
5
Chinese Translations of channel
• Sense 1: 频道 (pin dao)• Sense 2: 水道 (shui dao), 水渠 (shui qu), 排
水渠 (pai shui qu)• Sense 3: 沟 (gou)• Sense 4: 海峡 (hai xia)• Sense 5: 途径 (tu jing)• Sense 6: 导管 (dao guan)• Sense 7: 频道 (pin dao)
6
Parallel Texts for WSD
…The institutions have already consulted the staff concerned through various channels, including discussion with the staff representatives.…
…有关院校已透过不同的途径征询校内有关员工的意见,包括与有关的职员代表磋商…
途径(tu jing): “sense tag”
7
Approach
1. Use manually translated English-Chinese parallel texts
2. Parallel text alignment3. Manually provide Chinese translations for
WordNet senses of a word (serve as “sense-tags”)
4. Gather training examples from the English portion of parallel texts
5. Train WSD classifiers to disambiguate English words in new contexts
8
Issues
• (Ng, Wang, & Chan 2003) evaluated on 22 nouns. Can this approach scale up to a large set of nouns?
• Previous evaluation was on lumped senses. How would it perform in a fine-grained disambiguation setting?
• In practice, would any difficulties arise in the gathering of training examples from parallel texts?
9
Size of Parallel CorporaParallel Corpora English
(Mwords/MB)Chinese
(Mchars/MB)
Hong Kong Hansards 39.9 / 223.2 35.4 / 146.8
Hong Kong News 16.8 / 96.4 15.3 / 67.6
Hong Kong Laws 9.9 / 53.7 9.2 / 37.5
Sinorama 3.8 / 20.5 3.3 / 13.5
Xinhua News 2.1 / 11.9 2.1 / 8.9
English Translation of Chinese Treebank
0.1 / 0.7 0.1 / 0.4
Sub-total 72.6 / 406.4 65.4 / 274.7
Total 138 / 681.1
10
Parallel Text Alignment
• Sentence alignment:– Corpora available in sentence-aligned form
• Pre-processing:– English: tokenization– Chinese: word segmentation
• Word alignment:– GIZA++ (Och & Ney, 2000)
11
Selection of Translations• WordNet 1.7 as sense inventory• Chinese translations from 2 sources:
– Oxford Advanced Learner’s English-Chinese dictionary– Kingsoft Powerword 2003 (Chinese translation of the American
Heritage dictionary)
– Providing Chinese translations for all the WordNet senses of a word takes 15 minutes on average.
• If the same Chinese translation is assigned to several senses, only the least numbered sense will have a valid translation
Oxford definition entries
for channel
Kingsoft Powerworddefinition entries
for channel
WordNet sense entries for channel
12
Scope of Experiments
• Aim: scale up to a large set of nouns
• Frequently occurring nouns are highly ambiguous.
• Maximize benefits: – Select 800 most frequent noun types in the
Brown corpus (BC)– Represents 60% of noun tokens in BC
13
WSD
• Used the WSD program of (Lee & Ng, 2002)
• Knowledge sources: parts-of-speech, surrounding words, local collocations
• Learning algorithm: Naïve Bayes
• Achieves state-of-the-art WSD accuracy
14
Evaluation Set
• Suitable evaluation data set: set of nouns in the SENSEVAL-2 English all-words task
15
Summary Figures
Noun set No. of noun types
No. of noun
tokens
WNs1 accuracy
(%)
Avg. no. of senses
All nouns 437 1067 71.9 4.23
MFSet 212 494 61.1 5.89
All − MFSet
225 573 81.2 2.67
16
Evaluation on MFSet
• Gather parallel text examples for nouns in MFSet
• For comparison, what is the accuracy of training on manually annotated examples?– SEMCOR (SC)– SEMCOR + OMWE (SC+OM)
17
Evaluation Results (in %)
System
Evaluation set
MFSet
S1 (best SE2 system) 72.9
S2 65.4
S3 64.4
WNs1 (WordNet sense 1) 61.1
SC (SEMCOR) 67.8
SC+OM (SEMCOR + OMWE) 68.4
P1 (parallel text) 69.6
18
Evaluation on All Nouns
• Want an indication of P1 performance on all nouns
• Expanded evaluation set to all nouns in SENSEVAL-2 English all-words task
• Used WNs1 strategy for nouns where parallel text examples are not available
19
Evaluation Results (in %)
System
Evaluation set
MFSet All nouns
S1 (best SE2 system) 72.9 78.0
S2 65.4 74.5
S3 64.4 70.0
WNs1 (WordNet sense 1) 61.1 71.9
SC (SEMCOR) 67.8 76.2
SC+OM (SEMCOR + OMWE) 68.4 76.5
P1 (parallel text) 69.6 75.8
20
Lack of Matches• Lack of matching English occurrences for
some Chinese translations:– Sense 7 of noun report:
» “the general estimation that the public has for a person”
» assigned translation “ 名声” (ming sheng)
– In parallel corpus, no occurrences of report aligned to “ 名声” (ming sheng)
– No examples gathered for sense 7 of report– Affects recall
21
Examples from other Nouns
• Can gather examples for sense 7 of report from other English nouns having the same corresponding Chinese translations:
名声 (ming sheng)
Sense 7 of report:“the general estimation that the public has for a person”
Sense 3 of name:“a person’s reputation”
22
Evaluation Results (in %)
System
Evaluation set
MFSet All nouns
S1 (best SE2 system) 72.9 78.0
S2 65.4 74.5
S3 64.4 70.0
WNs1 (WordNet sense 1) 61.1 71.9
SC (SEMCOR) 67.8 76.2
SC+OM (SEMCOR + OMWE) 68.4 76.5
P1 (parallel text) 69.6 75.8
P2 (P1 + noun substitution) 70.7 76.3
23
JCN Measure
• Semantic distance measure of Jiang & Conrath (1997), provides a reliable estimate of the distance between two WordNet synsets: Dist(s1,s2)
• JCN– Information content (IC) of concept c:
– Link strength LS(c,p) of edge:
– Distance between two synsets:
24
Similarity Measure
• We used the WordNet Similarity package (Pedersen, Patwardhan & Michelizzi, 2004):– provide a similarity score between WordNet
synsets based on jcn measure: jcn(s1,s2) = 1/Dist(s1,s2)
– In earlier example, obtain similarity score jcn(s1,s2), where:
» s1 = sense 7 of report» s2 = sense 3 of name
25
Incorporating JCN Measure
• In performing WSD with a naïve Bayes classifier, sense s assigned to example with features f1, …, fn is chosen so as to maximize:
• A training example gathered from another English noun based on a common Chinese translation contributes a fractional count to Count(s) and Count(fj,s), based on jcn(s1,s2).
26
Evaluation Results (in %)
System
Evaluation set
MFSet All nouns
S1 (best SE2 system) 72.9 78.0
S2 65.4 74.5
S3 64.4 70.0
WNs1 (WordNet sense 1) 61.1 71.9
SC (SEMCOR) 67.8 76.2
SC+OM (SEMCOR + OMWE) 68.4 76.5
P1 (parallel texts) 69.6 75.8
P2 (P1 + noun substitution) 70.7 76.3
P2jcn (P2 + jcn) 72.7 77.2
27
Paired t-test for MFSetSystem S1 P1 P2 P2jcn SC SC+OM WNs1
S1 * ~ ~ ~ >> > >>P1 * ~ << ~ ~ >>P2 * < > ~ >>P2jcn * >> > >>SC * ~ >>SC+OM * >>WNs1 *
“>>”, “<<”: p-value ≤ 0.01“>”, “<”: p-value (0.01, 0.05] “~”: p-value > 0.05
28
Paired t-test for All NounsSystem S1 P1 P2 P2jcn SC SC+OM WNs1
S1 * > ~ ~ ~ ~ >>P1 * ~ < ~ ~ >>P2 * ~ ~ ~ >>P2jcn * ~ ~ >>SC * ~ >>SC+OM * >>WNs1 *
“>>”, “<<”: p-value ≤ 0.01“>”, “<”: p-value (0.01, 0.05] “~”: p-value > 0.05
29
Conclusion
• Tackling the data acquisition bottleneck is crucial
• Gathering examples for WSD from parallel texts is scalable to a large set of nouns
• Training on parallel text examples can outperform training on manually annotated data, and achieves performance comparable to the best system of SENSEVAL-2 English all-words task