Word Sense Disambiguation

28
Word Sense Disambiguation • Many words have multiple meanings – E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text Applications: – Machine translation – Information retrieval – Semantic interpretation of text

description

Word Sense Disambiguation. Many words have multiple meanings E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text Applications: Machine translation Information retrieval Semantic interpretation of text. Tagging?. - PowerPoint PPT Presentation

Transcript of Word Sense Disambiguation

Page 1: Word Sense Disambiguation

Word Sense Disambiguation• Many words have multiple meanings

– E.g, river bank, financial bank

• Problem: Assign proper sense to each ambiguous word in text

• Applications: – Machine translation– Information retrieval– Semantic interpretation of text

Page 2: Word Sense Disambiguation

Tagging?

• Idea: Treat sense disambiguation like POS tagging, just with “semantic tags”

• The problems differ:– POS tags depend on specific structural cues

(mostly neighboring tags)– Senses depend on semantic context – less

structured, longer distance dependency

Page 3: Word Sense Disambiguation

Approaches

• Supervised learning:Learn from a pretagged corpus

• Dictionary-Based LearningLearn to distinguish senses from dictionary entries

• Unsupervised LearningAutomatically cluster word occurrences into different senses

Page 4: Word Sense Disambiguation

Evaluation

• Train and test on pretagged texts– Difficult to come by

• Artificial data: ‘merge’ two words to form an ‘ambiguous’ word with two ‘senses’– E.g, replace all occurrences of door and of

window with doorwindow and see if the system figures out which is which

Page 5: Word Sense Disambiguation

Performance Bounds

• How good is (say) 83.2%??

• Evaluate performance relative to lower and upper bounds:– Baseline performance: how well does the

simplest “reasonable” algorithm do?– Human performance: what percentage of the

time do people agree on classification?

Page 6: Word Sense Disambiguation

Supervised Learning• Each ambiguous word token wi = wk in the

training is tagged with a sense from sk1,

…,sknk

• Each word token occurs in a context ci

(usually defined as a window around the word occurrence – up to ~100 words long)

• Each context contains a set of words used as features vi

j

Page 7: Word Sense Disambiguation

Bayesian Classification

• Bayes decision rule:Classify s(wi) = arg maxs P(s | ci)

• Minimizes probability of error

• How to compute? Use Bayes’ Theorem:

)(

)()|()|(

cP

sPscPcsP kk

k

Page 8: Word Sense Disambiguation

Bayes’ Classifier (cont.)

• Note that P(c) is constant for all senses, therefore:

)](log)|([logmaxarg

)()|(maxarg

)()(

)|(maxarg

)|(maxarg

kks

kks

kk

s

ks

sPscP

sPscP

sPcP

scP

csPs

k

k

k

k

Page 9: Word Sense Disambiguation

Naïve Bayes• Assume:

– Features are conditionally independent, given the example class,

– Feature order doesn’t matter

(bag of words model – repetition counts)

cvkjk

cvkj

kjjk

j

j

svPscP

svP

scvvPscP

in

in

)|(log)|(log

)|(

)|}in :({)|(

Page 10: Word Sense Disambiguation

Naïve Bayes Training

• For all senses sk of w, do:

– For all words vj in the vocabulary, do:

• For all senses sk of w, do:

)(

),()|(

j

kjkj vC

svCsvP

)(

)()(

wC

sCsP k

k

Page 11: Word Sense Disambiguation

Naïve Bayes Classification

• For all senses sk of wi, do:

– score(sk) = log P(sk)

– For all words vj in the context window ci, do:

• score(sk) += log P(vj | sk)

• Choose s(wi) = arg maxsk score(sk)

Page 12: Word Sense Disambiguation

Significant Features

• Senses of drug (Gale et al. 1992):

‘medication’ prices, prescription, patent,

increase, consumer,

pharmaceutical

‘illegal substance’

abuse, paraphernalia, illicit,

alcohol, cocaine, traffickers

Page 13: Word Sense Disambiguation

Dictionary-Based Disambiguation

Idea: Choose between senses of a word given in a dictionary based on the words in the definitions

Cone:1. A mass of ovule-bearing or pollen-bearing scales in

trees of the pine family or in cycads that are arranged usually on a somewhat elongated axis

2. Something that resembles a cone in shape: as a crisp cone-shaped wafer for holding ice cream

Page 14: Word Sense Disambiguation

Algorithm (Lesk 1986)

Define Di(w) as the bag of words in the ith definition for w

Define E(w) as i Di(w)

• For all senses sk of w, do:– score(sk) = similarity(Dk(w),[vj in c E(vj)])

• Choose s = arg maxsk score(sk)

Page 15: Word Sense Disambiguation

Similarity Metrics

similarity(X, Y) =

),min(tcoefficien overlap

tcoefficien Jaccard

2tcoefficien Dice

tcoefficien matching

YX

YX

YX

YX

YX

YX

YX

Page 16: Word Sense Disambiguation

Simple Exampleash:

s1: a tree of the olive family

s2: the solid residue left when combustible material is burned

This cigar burns slowly and creates a stiff ash2

The ash1 is one of the last trees to come into leaf

After being struck by lightning the olive tree was reduced to ash?

Page 17: Word Sense Disambiguation

Some Improvements

• Lesk obtained results of 50-70% accuracy

Possible improvements:

• Run iteratively, each time only using definitions of “appropriate” senses for context words

• Expand each word to a set of synonyms, using also a thesaurus

Page 18: Word Sense Disambiguation

Thesaurus-Based Disambiguation

• Thesaurus assigns subject codes to different words, assigning multiple codes to ambiguous words

• t(sk) = subject code of sense sk for word w in the thesaurus

(t,v) = 1 iff t is a subject code for word v

Page 19: Word Sense Disambiguation

Simple Algorithm

• Count up number of context words with same subject code:

for each sense sk of wi, do:

score(sk) = vj in ci (t(sk),vj)

s(wi) = arg maxsk score(sk)

Page 20: Word Sense Disambiguation

Some Issues

• Domain-dependence: In computer manuals, “mouse” will not be evidence for topic “mammal”

• Coverage: “Michael Jordan” will not likely be in a thesaurus, but is an excellent indicator for topic “sports”

Page 21: Word Sense Disambiguation

Tuning for a Corpus• Use a naïve-Bayes formulation:

P(t | c) = [P(t) vj in c P(vj | t) ] / [vj in c P(vj) ]• Initialize probabilities as uniform• Re-estimate P(t) and P(vj | t) for each topic t

and each word vj by evaluating all contexts in the corpus, assuming the context has topic t if P(t | c) > (where is a predefined threshhold)

• Disambiguate by choosing the highest probability topic

Page 22: Word Sense Disambiguation

Training (based on the paper):for all topics tl, let Wl be the set of words listed

under the topicfor all topics tl, let Tl = { c(w) | w Wl }

for all words vj , let Vj = { c | vj c }

for all words vj , topics tl, let:

P(vj | tl) = |Vj Tl | / (j |Vj Tl | )(with smoothing)

for all topics tl, let:P(tl) = (j|Vj Tl | ) / (l j |Vj Tl | )

for all words vj , let:P(vj) = |Vj | / Nc (where Nc is the total number of contexts)

Page 23: Word Sense Disambiguation

Using a Bilingual Corpus

• Use correlations between phrases in two languages to disambiguateE.g, interest = ‘legal share’ (acquire an interest)

‘attention’ (show interest)In German Beteiligung erwerben

Interesse zeigen

• Depending on where the translations of related words occur, determine which sense applies

Page 24: Word Sense Disambiguation

Scoring

• Given a context c in which a syntactic relation R(w,v) holds between w and a context word v:– Score of sense sk is the number of contexts c’ in

the second language such that R(w’,v’)c’ where w’ is a translation of sk and v’ is a translation of v.

– Choose highest-scoring sense

Page 25: Word Sense Disambiguation

Global Constraints

• One sense per discourse: The sense of a word tends to be consistent within a given document

• One sense per collocation: Nearby words provide strong and consistent clues for sense, depending on distance, order, and syntactic relationship

Page 26: Word Sense Disambiguation

Examples

D1 living existence of plant and animal life

living classified as either plant or animal

??? bacterial and plant cells are enclosed

D2 living contains varied plant and animal life

living the most common plant life

factory protected by plant parts remaining

Page 27: Word Sense Disambiguation

Collocations

• Collocation: a pair of words that tend to occur together

• Rank collocations that occur with different senses by:rank(f) = P(sk1

| f) / P(sk2 | f)

• A higher rank for a collocation f means it is more highly indicative of its sense

Page 28: Word Sense Disambiguation

for all senses sk of w do:

Fk = all collocations in sk’s definition

Ek = { }

while at least one Ek changed, do:

for all senses sk of w do:

Ek = { ci | exists fm in ci & fm in Fk }

for all senses sk of w do:

Fk = { fm | P(sk | fm) / P(sn | fm) > a }

for all documents d do:

determine the majority sense s* of w in d

assign all occurrences of w to s*