Post on 10-Feb-2022
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Word Sense DisambiguationComputational Lexical Semantics
Gemma Boleda1 Stefan Evert2
1Universitat Politecnica de Catalunya
2University of Osnabruck
ESSLLI. Bordeaux, France, July 2009.
1 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Thanks
These slides are based on Jurafsky & Martin (2004: chapter 20)and material by Ann Copestake (course at UPF, 2008)
2 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Outline
1 Overview
2 Supervised WSD
3 Evaluation
4 Dictionary and Thesaurus Methods
5 Discussion
3 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Outline
1 Overview
2 Supervised WSD
3 Evaluation
4 Dictionary and Thesaurus Methods
5 Discussion
4 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Overview
Word Sense Disambiguation
The task of selecting the correct sense for a word in context.
potentially helpful in many applications
machine translation, question answering, information retrieval. . .
we focus on WSD as a stand-alone task
artificial!
5 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
WSD algorithm
basic form:
input: word in context, fixed inventory of word sensesoutput: the correct word sense for that use
context?
words surrounding the target word: annotated? just the wordsin no particular order? context size?
inventory?task-dependent
machine translation from English to Spanish: set of Spanishtranslationsspeech synthesis: homographs with differing pronunciations(e.g., bass)
stand-alone task: a lexical resource (usually, WordNet)
6 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
An example
WordNet Sense Target Word in Contextbass4 . . . fish as Pacific salmon and striped bass and. . .bass4 . . . produce filets of smoked bass or sturgeon. . .bass7 . . . exciting jazz bass player since Ray Brown. . .bass7 . . . play bass because he doesn’t have to solo. . .
Figure: Possible inventory of sense tags for word bass
7 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Variants of the task
lexical sample task
WSD for a small set of target wordsa number of corpus instances are selected and labeledsimilar to task in our case study
→ supervised approaches; word-specific classifiers
all-words
WSD for all content words in a textsimilar to POS-tagging; but very large “tagset”! → datasparseness
not enough training data for every word
8 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Outline
1 Overview
2 Supervised WSD
3 Evaluation
4 Dictionary and Thesaurus Methods
5 Discussion
9 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Feature extraction
supervised approach → need to identify features that arepredictive of word senses
fundamental (and early) insight: look at the context words
basssmoked bass orjazz bass player
window (e.g., 1-word window)
10 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Method
process the dataset (POS-tagging, lemmatization, parsing)
build feature representation encoding the relevant linguisticinformation
two main feature types:1 collocational features2 bag-of-words features
11 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Collocational features
features that take order or syntactic relations into account
restricted to immediate word context (usually fixed window).For example:
lemma and part of speech of two-word windowsyntactic function of the target word
12 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Collocational features: Example
Example: (20.1) An electric guitar and bass player stand off toone side, not really part of the scene, just as a sort of nod togringo expectations perhaps.
2-word window representation, using parts of speech:
[guitar , NN, and , CC , player , NN, stand , VB][w − 2, P − 2, w − 1, P − 1, w + 1, P + 1, w + 2, P + 2]
13 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Bag-of-words features
lexical features
pre-selected words that are potentially relevant for sensedistinctions. For example:
for all-words task: frequent content words in the corpusfor lexical sample task: content words in the sentences of thetarget word
test for presence/absence of a certain word in the selectedcontext
14 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Bag-of-words features: Example
Example: (20.1) An electric guitar and bass player stand off to oneside, not really part of the scene, just as a sort of nod to gringoexpectations perhaps.
pre-selected words:
[fishing , big , sound , player , fly ]
feature vector:
[0, 0, 0, 1, 0]
15 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
More on features
collocational cues account for:“collocational” effects
bass+player=bass7
syntax-related sense differences
serve breakfast to customers vs. serve Philadelphia
bag of word features account for topic and domain relatedfeatures
resemblance to semantic fields, frames, . . .
complementary information → both feature types usuallycombined
16 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Combined representation: Example
simplified representation for 2 sentences:
collocational features corresponding to 1-word window:
. . . jazz bass player . . .
. . . smoked bass or . . .
bag-of-word features only fishing, player
17 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Combined representation
Weka format
@relation bass
@attribute wordL1 {jazz,smoke}@attribute posL1 {CC,VBD}@attribute wordR1 {player,or}@attribute posR1 {CC,NN}@attribute fishing {0,1}@attribute player {0,1}@attribute sense {s4,s7}@datajazz,CC,player,NN,0,1,s7smoke,VBD,or,NN,0,0,s4
. . . jazz bass player . . .
. . . smoked bass or . . .
18 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Method
any supervised algorithm
Decision Trees (for example, J48)Decision Lists (similar to Decision Trees)Naive Bayes (probabilistic). . .
and tool
WekaRSVMToolyour own implementation. . .
19 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Interim Summary
supervised approaches use sense-annotated datasets
need many annotated examples for every word
relevant information in the context:
lexico-syntactic information (collocational features)lexical information (bag of words features)
information is encoded in the form of features . . .
and a classifier is trained to distinguish different senses of agiven word
20 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Outline
1 Overview
2 Supervised WSD
3 Evaluation
4 Dictionary and Thesaurus Methods
5 Discussion
21 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Extrinsic evaluation
long term goal: improve performance in end-to-end application
→ extrinsic evaluation (or task-based, end-to-end, in vivoevaluation)
example: Word Sense Disambiguation for (Cross-Lingual)Information Retrievalhttp://ixa2.si.ehu.es/clirwsd
22 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Intrinsic evaluation
however, extrinsic evaluation difficult and time consuming
→ intrinsic evaluation (or in vitro evaluation)
treat a WSD component as if it were a stand-alone systemmeasure: sense accuracy (percentage of words correctlytagged)
Accuracy =matches
total
method: held-out data from the same sense-tagged corporaused for training (train-test methodology)
to standardize datasets and methods: SensEval and SemEvalcompetitions
example: our case study
23 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Baseline
baseline: performance we would get without much knowledge/ with a simple approach
necessary for any Machine Learning experiment(how good is 70%?)
simplest baseline: most frequent senseWordNet: first sense heuristic (senses ordered)very powerful baseline! → skewed distribution of senses incorporaBUT we need access to annotated data for every word in thedataset to estimate sense frequencies
this is a “knowledge-laden” baseline
24 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Ceiling
ceiling or upper-bound for performance: inter-coderagreement
all-word corpora using WordNet: Ao ≈ 0.75− 0.8more coarse-grained sense distinctions: Ao ≈ 0.9
another possibility: avoid annotation using pseudowordsbanana-door
however: unrealistic → real polysemy is not likebanana-doors!
need to find better ways to create pseudowords
25 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Outline
1 Overview
2 Supervised WSD
3 Evaluation
4 Dictionary and Thesaurus Methods
5 Discussion
26 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Overview
sense-labeled corpora give accurate information – but scarce!
need other sources: dictionaries, thesaurus, selectionalrestrictions . . .
idea: use dictionaries as corpora (identifying related words indefinitions and examples)
27 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
An example
Example: (20.10) The bank can guarantee deposits will eventuallycover future tuition costs because it invests in adjustable-ratemortgage securities.
bank1 Gloss: a financial institution that accepts deposits andchannels the money into lending activities
Examples: “he cashed a check at the bank”; “that bankholds the mortgage on my home”
bank2 Gloss: sloping land (especially beside a body of water)Examples: “they pulled the canoe up on the bank”;
“he sat on the bank of the river”
Figure: WordNet information for two senses of bank
28 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
An example
Example: (20.10) The bank can guarantee deposits will eventuallycover future tuition costs because it invests in adjustable-ratemortgage securities.
bank1 Gloss: a financial institution that accepts deposits andchannels the money into lending activities
Examples: “he cashed a check at the bank”; “that bankholds the mortgage on my home”
bank2 Gloss: sloping land (especially beside a body of water)Examples: “they pulled the canoe up on the bank”;
“he sat on the bank of the river”
Figure: WordNet information for two senses of bank
29 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Signatures
signature: set of words that characterizes a given sense of atarget word
extracted from dictionaries, thesauri, tagged corpora, . . .
for example (20.10):
bank1: financial, institution, accept, deposit, channel, money,lending, activity, cash, check, hold, mortgage, homebank2: sloping, land, body, water, pull, canoe, bank, sit, river
30 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Lesk Algorithm
Lesk Algorithm
function SIMPLIFIED LESK(word, sentence) returns best sense of wordbest-sense ← most frequent sense for wordmax-overlap ← 0context ← set of words in sentencefor each sense in senses of word do
signature ← set of words in the gloss and examples of senseoverlap ← COMPUTEOVERLAP(signature, context)if overlap > max-overlap then
max-overlap ← overlapbest-sense ← sense
endreturn(best-sense)
31 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Lesk Algorithm
Example: she strolled by the river bank.
best-sense ← bank1; max-overlap ← 0
context ← {she, stroll, river}sense bank1:
signature ← {financial, institution, accept, deposit, channel,money, lending, activity, cash, check, hold, mortgage, home}overlap ← 0; 0 > 0 fails
sense bank2:
signature ← {sloping, land, body, water, pull, canoe, bank, sit,river}overlap ← 1; 1 > 0 succeedsbest-sense ← bank2; max − overlap ← 1
return bank2
32 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Discussion
right intuition: words that appear in dictionary definitions andexamples are relevant to a given sense
problem: data sparseness: dictionary entries short, not alwaysexamples
→ Lesk algorithm currently used as baseline
BUT many extensions possible and have been tried(generalizations over lemmata, corpus data, weighting, . . . )
AND dictionary-derived features can be used (are used) instandard supervised approaches
33 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Interim Summary
information encoded in dictionaries (definitions, examples) isuseful for WSD
can be used exclusively or in addition to other information(collocations, bag of words) for supervised approaches
the Lesk algorithm disambiguates solely on the basis ofdicionary information
overlap between dictionary entry and context of wordoccurrence
the most frequent sense and the Lesk algorithm are used asbaselines for evaluation
34 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Overview
we have a huge number of classes (senses)
need large hand-built resources:
supervised approaches need large annotated corpora(unrealistic)dictionary methods need large dictionaries, which, even ifavailable, often do not provide enough information
alternatives:
Minimally supervised WSDUnsupervised WSDboth make use of unannotated data
these approaches are not as successful as supervisedapproaches
35 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Minimally supervised WSD: Bootstrapping
for a given word, for example plant
start with a small number of annotated examples (seeds) foreach sensecollect additional examples for each sense based on theirsimilarity to annotated examplesiterate
36 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Bootstrapping: example
plant (Yarowsky 1995)sense A: living entity; sense B: buildingfirst examples: those that appear with life (sense A) andmanufacturing (sense B)
Figure: Bootstrapping word senses. Figure 20.4 in Jurafsky & Martin.37 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Yarowsky 1995
Influential insights (used as heuristics in Yarowsky’s algorithm):
→ one sense per collocation
life+plant = plantA
manufacturing+plant = plantB
→ one sense per discourse
if a word appears multiple times in a text, probably alloccurrences will bear the same sensealso useful to enlarge datasets
38 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Unsupervised WSD
no previous knowledge
no human-defined word senses
simply group examples according to the similarity of theexamples
clustering
and infer senses from that
problem: hard to interpret and evaluate
39 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Outline
1 Overview
2 Supervised WSD
3 Evaluation
4 Dictionary and Thesaurus Methods
5 Discussion
40 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Interim summary
WSD can be framed as a standard classification task
training data, feature definition, classifier, evaluation→ supervised approaches
most useful information:
syntactic and lexical context (collocational features)words related to the different senses of a given word (bag ofword features)words in dictionary (thesaurus, etc.) entries
other approaches try to make use of unannotated data
bootstrapping, unsupervised learningwould be great, but not as successful as supervised approaches(and harder to interpret and work with)
41 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Useful empirical facts
skewed distribution of senses
→ most frequent sense baseline→ heuristic when no other information is available
BUT distribution varies with text/corpus! (cone in geometrytextbook)
one sense per collocation
bass+player=bass7
→ simple cues for sense classification (heuristic)
one sense per discourse
different occurences of a word in a given text tend to be usedin the same sense
→ heuristic for classification and for data gathering
42 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Conceptual problems
the task as currently defined does no allow for generalizationover different words → learning is word-specific
number of classes = number of senses; equal to or greaterthan number of words!need training data for every sense of every wordmost words have low frequency (Zipf’s law)no chance with unknown words
this wouldn’t be a problem if word sense alternation were likebank1 − bank2 (homonymy). . .
. . . but many alternations are systematic! (regular polysemy,metonymy, metaphor)
43 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Regular polysemy
conversion
bank (N): financial institutionbank (V): put money in a banksame for sugar, hammer, tango, etc. (also derivation: -ize)
adjectives (Boleda 2007)
qualitative vs. relational: cara familiar (‘familiar face’) vs.reunio familiar (‘family meeting’)event-related vs. qualitative: fet sabut (‘known fact’) vs. homesabut (‘wise man’)
44 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Regular polysemy: mass/count
animal/meat
chicken1: animal; chicken2: meatlamb1: animal; lamb2: meat. . .
portions/kinds: two beers
two servings of beertwo types of beer
generally: thing/derived substance (grinding)
After several lorries had run over the body, there was rabbitsplattered all over the road.
45 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Regular polysemy
verb alternationscausative/inchoative (Levin 1993)
John broke the windowThe window broke
Spanish psychological verbs
Le preocupa la situacion (Dative + Subject)Bruna no quiere preocuparla (subject + Accusative)
46 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Contextual coercion / Logical metonymy
(Also see course by Louise McNally.)
object to eventuality (Pustejovsky 1995)
Mary enjoyed the book.After three martinis, Kim felt much happier.
adjectives (Pustejovsky 1995): event selection
fast runner vs. fast typist vs. fast car
47 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Metonymy
container/content
He drank a bottle of whisky.Morphology again: He drank a bottleful of whisky. (-fulsuffixation)
fruit/plant
olive, grapefruit, . . .Spanish: often tree masculine (olivo, naranjo), fruit feminine(oliva, naranja)
figure/ground
Kim painted the doorKim walked through the door
48 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Metonymy
country names
Location: I live in China.Government: The US and Lybia have agreed to work togetherto solve. . .Team (sports): England won last’s year World Cup.
more generally: institutions
Barcelona applied for the Olympic Games.The banks won’t give credits now.The newspapers criticized this policy.
object/person
The cello is playing badly.Not so regular: contextual metaphor: The ham sandwichwants his check. (Lakoff & Johnson 1980)
49 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Metaphor
physical → mental
depart1: physical transfer; arrive1: physical transfer; go1:physical transferdepart2: mental transfer; arrive2: mental transfer; go2: mentaltransfer
concrete → abstract
aigua clara (‘clear water’) vs. estil clar (‘clear style’)cabells negres (‘black hair’) vs. humor negre (‘black humour’)
50 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
To sum up
pervasive systematicity in sense alternations: regularpolysemy, metonymy, metaphor
productiveWe found a little, hairy wampimuk sleeping behind the tree(McDonald & Ramscar 2001)Wampimuk soup is delicious!
inherent property of language
analogical reasoning(psychology again)
WSD as currently handled cannot capture these regularities
theoretical and practical problem!
51 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
WSD and regularities: what one can do
generalize on FEATURES
e.g., jazz → MUSIC-STYLE → jazz, rock, blues, . . .provided some lexical resource is available that encodes thisinformationHe is a jazz bass player.→ I love bass solos in rock music.problem: when (how) to generalize? when to stop?
52 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
WSD and regularities: what would be desirable
train on chicken and use the data for lamb, wampimuk, . . .Resources such as WordNet encode the meat/animaldistinction:
WordNet info for chicken:chicken1: the flesh of a chicken used for food.chicken2: a domesticated gallinaceous bird (hyponym).chicken3: a person who lacks confidence.chicken4: a foolhardy competition.
WordNet info for lamb:lamb1: young sheep.lamb2: a person easily deceived or cheated.lamb3: a sweet innocent mild-mannered person.lamb4: the flesh of a young domestic sheep eaten as food .
WHAT IS MISSING: link between chicken2 and lamb1, chicken1
and lamb4 (note other senses)53 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Word Sense DisambiguationComputational Lexical Semantics
Gemma Boleda1 Stefan Evert2
1Universitat Politecnica de Catalunya
2University of Osnabruck
ESSLLI. Bordeaux, France, July 2009.
54 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Classifier example 1: Naive Bayes
probabilistic classifier (related to HMMs)
choosing the best sense amounts to choosing the mostprobable sense given the feature vector
conditional probabilityBUT it is impossible to train it directly (too many featurecombinations)
2 strategies:
decomposing the probabilities (Bayes’ rules) → easier toestimatemaking unrealistic assumption: words are independent (→Naive Bayes)
training the classifier = estimating probabilities from thesense-tagged corpus
55 / 56
OverviewSupervised WSD
EvaluationDictionary and Thesaurus Methods
Other approaches to WSDDiscussion
Other example classifiers
Classifier example 2: Decision Lists
similar to decision trees (difference: only one condition)
Rule Sensefish within window → bass4
striped bass → bass4
guitar within window → bass7
play/V bass → bass7
Figure: Decision List for word bass
to learn a decision list classifier:
generate and order tests according to the training data
56 / 56