Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical...
Transcript of Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical...
![Page 1: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/1.jpg)
Natural Language ProcessingLexical Semantics
Word Sense Disambiguation and Word SimilarityPotsdam, 31 May 2012
Saeedeh MomtaziInformation Systems Group
based on the slides of the course book
![Page 2: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/2.jpg)
Outline
1 Lexical SemanticsWordNet
2 Word Sense Disambiguation
3 Word Similarity
Saeedeh Momtazi | NLP | 31.05.2012
2
![Page 3: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/3.jpg)
Outline
1 Lexical SemanticsWordNet
2 Word Sense Disambiguation
3 Word Similarity
Saeedeh Momtazi | NLP | 31.05.2012
3
![Page 4: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/4.jpg)
Word Meaning
Considering the meaning(s) of a word in addition to its writtenform
Word SenseA discrete representation of an aspect of the meaning of a word
Saeedeh Momtazi | NLP | 31.05.2012
4
![Page 5: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/5.jpg)
Word
LexemeAn entry in a lexicon consisting of a pair:a form with a single meaning representation
Camel (animal)Camel (music band)
LemmaThe grammatical form that is used to represent a lexeme
Camel
Saeedeh Momtazi | NLP | 31.05.2012
5
![Page 6: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/6.jpg)
Homonymy
Words which have similar form but different meanings
Camel (animal)Camel (music band) Homographs
WriteRight Homophone
Saeedeh Momtazi | NLP | 31.05.2012
6
![Page 7: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/7.jpg)
Semantics Relations
Realizing lexical relations among words
Hyponymy (is a) {parent: hypernym, child: hyponym }dog & animal
Meronymy (part of)arm & body
Synonymyfall & autumn
Antonymytall & short
�
�Relations are between senses
rather than words
Saeedeh Momtazi | NLP | 31.05.2012
7
![Page 8: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/8.jpg)
Outline
1 Lexical SemanticsWordNet
2 Word Sense Disambiguation
3 Word Similarity
Saeedeh Momtazi | NLP | 31.05.2012
8
![Page 9: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/9.jpg)
WordNet
A hierarchical database of lexical relations
Three Separate sub-databasesNounsVerbsAdjectives and Adverbs
Closed class words are not included
Each word is annotated with a set of senses
Available onlinehttp://wordnetweb.princeton.edu/perl/webwn
Saeedeh Momtazi | NLP | 31.05.2012
9
![Page 10: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/10.jpg)
WordNet
Number of words in WordNet 3.0
Category EntryNoun 117,097Verb 11,488Adjective 22,141Adverb 4,061
Average number of senses in WordNet 3.0
Category SenseNoun 1.23Verb 2.16
Saeedeh Momtazi | NLP | 31.05.2012
10
![Page 11: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/11.jpg)
Word Sense
Synset (synonym set)
Saeedeh Momtazi | NLP | 31.05.2012
11
![Page 12: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/12.jpg)
Word Relations (Hypernym)
Saeedeh Momtazi | NLP | 31.05.2012
12
![Page 13: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/13.jpg)
Word Relations (Sister)
Saeedeh Momtazi | NLP | 31.05.2012
13
![Page 14: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/14.jpg)
Outline
1 Lexical SemanticsWordNet
2 Word Sense Disambiguation
3 Word Similarity
Saeedeh Momtazi | NLP | 31.05.2012
14
![Page 15: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/15.jpg)
Applications
Information retrievalMachine translationSpeech synthesis
Saeedeh Momtazi | NLP | 31.05.2012
15
![Page 16: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/16.jpg)
Information retrieval
Saeedeh Momtazi | NLP | 31.05.2012
16
![Page 17: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/17.jpg)
Machine translation
Saeedeh Momtazi | NLP | 31.05.2012
17
![Page 18: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/18.jpg)
Example
Sense: band 532736 Music NThe band made copious recordings now regarded as classic from1941 to 1950. These were to have a tremendous influence on theworldwide jazz revival to come During the war Lu led a 20 piecenavy band in Hawaii.
Saeedeh Momtazi | NLP | 31.05.2012
18
![Page 19: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/19.jpg)
Example
Sense: band 532838 Rubber-band NHe had assumed that so famous and distinguished a professor
would have been given the best possible medical attention it wasthe sort of assumption young men make. Here suspended fromLewis’s person were pieces of tubing held on by rubber bands anold wooden peg a bit of cork.
Saeedeh Momtazi | NLP | 31.05.2012
19
![Page 20: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/20.jpg)
Example
Sense: band 532734 Range N
There would be equal access to all currencies financialinstruments and financial services dash and no majorconstitutional change. As realignments become more rare andexchange rates waver in narrower bands the system could evolveinto one of fixed exchange rates.
Saeedeh Momtazi | NLP | 31.05.2012
20
![Page 21: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/21.jpg)
Word Sense Disambiguation
InputA wordThe context of the wordSet of potential senses for the word
OutputThe best sense of the word for this context
Saeedeh Momtazi | NLP | 31.05.2012
21
![Page 22: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/22.jpg)
Approaches
Thesaurus-based
Supervised learning
Semi-supervised learning
Saeedeh Momtazi | NLP | 31.05.2012
22
![Page 23: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/23.jpg)
Thesaurus-based
Extracting sense definitions from existing sourcesDictionariesThesauriWikipedia
Saeedeh Momtazi | NLP | 31.05.2012
23
![Page 24: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/24.jpg)
Thesaurus-based
Saeedeh Momtazi | NLP | 31.05.2012
24
![Page 25: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/25.jpg)
The Lesk Algorithm
Selecting the sense whose definition shares the most wordswith the word’s context
Simplified Algorithm [Kilgarriff and Rosenzweig, 2000]
Saeedeh Momtazi | NLP | 31.05.2012
25
![Page 26: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/26.jpg)
The Lesk Algorithm
Simple to implementNo training data neededRelatively bad results
Saeedeh Momtazi | NLP | 31.05.2012
26
![Page 27: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/27.jpg)
Supervised Learning
Training data:A corpus in which each occurrence of the ambiguous word w isannotated by its correct sense
SemCor : 234,000 sense-tagged from Brown corpusSENSEVAL-1: 34 target wordsSENSEVAL-2: 73 target wordsSENSEVAL-3: 57 target words (2081 sense-tagged)
Saeedeh Momtazi | NLP | 31.05.2012
27
![Page 28: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/28.jpg)
Feature Selection
Using the words in the context with a specific window size
CollocationConsidering all words in a window (as well as their POS) and theirposition
Bag-of-wordConsidering the frequent words regardless their positionDeriving a set of k most frequent words in the window from thetraining corpusRepresenting each word in the data as a k -dimention vectorFinding the frequency of the selected words in the context of thecurrent observation
Saeedeh Momtazi | NLP | 31.05.2012
28
![Page 29: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/29.jpg)
Collocation
Sense: band 532734 Range N
There would be equal access to all currencies financialinstruments and financial services dash and no majorconstitutional change. As realignments become more rare andexchange rates waver in narrower bands the system could evolveinto one of fixed exchange rates.
Window size: +/- 3Context: waver in narrower bands the system could{Wn−3, Pn−3, Wn−2, Pn−2, Wn−1, Pn−1, Wn+1, Pn+1, Wn+2, Pn+2, Wn+3, Pn+3}{waver, NN, in , IN , narrower, JJ, the, DT, system, NN , could, MD}
Saeedeh Momtazi | NLP | 31.05.2012
29
![Page 30: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/30.jpg)
Bag-of-word
Sense: band 532734 Range N
There would be equal access to all currencies financialinstruments and financial services dash and no majorconstitutional change. As realignments become more rare andexchange rates waver in narrower bands the system could evolveinto one of fixed exchange rates.
Window size: +/- 3Context: waver in narrower bands the system couldk frequent words for band:{circle, dance, group, jewelery, music, narrow, ring, rubber, wave}{ 0 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 1 }
Saeedeh Momtazi | NLP | 31.05.2012
30
![Page 31: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/31.jpg)
Naïve Bayes Classification
Choosing the best sense s out of all possible senses si for afeature vector ~f of the word w
s = argmaxsiP(si |~f )
s = argmaxsi
P(~f |si) · P(si)
P(~f )
P(~f ) has no effect
s = argmaxsiP(~f |si) · P(si)
Saeedeh Momtazi | NLP | 31.05.2012
31
![Page 32: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/32.jpg)
Naïve Bayes Classification
s = argmaxsiP(si) · P(~f |si)
Likelihood
ProbabilityPrior
Probability
s = argmaxsiP(si) ·
m∏j=1
P(fj |si)
P(si) =#(si)
#(w)
#(si): number of times the sense si is used for the word w in the training data#(w): the total number of samples for the word w
Saeedeh Momtazi | NLP | 31.05.2012
32
![Page 33: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/33.jpg)
Naïve Bayes Classification
s = argmaxsiP(si) · P(~f |si)
Likelihood
ProbabilityPrior
Probability
s = argmaxsiP(si) ·
m∏j=1
P(fj |si)
P(fj |si) =#(fj , si)
#(si)
#(fj , si): the number of times the feature fj occurred for the sense si of word w#(si): the total number of samples of w with the sense si in the training data
Saeedeh Momtazi | NLP | 31.05.2012
33
![Page 34: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/34.jpg)
Semi-supervised Learning
What is the best approach when we do not have enough datato train a model?
Saeedeh Momtazi | NLP | 31.05.2012
34
![Page 35: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/35.jpg)
Semi-supervised Learning
A small amount of labeled data
A large amount of unlabeled data
SolutionFinding the similarity between the labeled andunlabeled dataPredicting the labels of the unlabeled data
Saeedeh Momtazi | NLP | 31.05.2012
35
![Page 36: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/36.jpg)
Semi-supervised Learning
What is the best approach when we do not have enough datato train a model?
For each sense,Select the most important word which frequently co-occurs withthe target word only for this particular senseFind the sentences from unlabeled data which contain the targetword and the selected wordLabel the sentence with the corresponding senseAdd the new labeled sentences to the training data
Example for Band
sense selected wordMusic playRubber elasticRange spectrum
Saeedeh Momtazi | NLP | 31.05.2012
36
![Page 37: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/37.jpg)
Outline
1 Lexical SemanticsWordNet
2 Word Sense Disambiguation
3 Word Similarity
Saeedeh Momtazi | NLP | 31.05.2012
37
![Page 38: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/38.jpg)
Word Similarity
Task
Finding the similarity between two wordsCovering somewhat a wider range of relations in the meaning(different with synonymy)Being defined with a score (degree of similarity)
ExampleBank (financial institute) & fundcar & bicycle
Saeedeh Momtazi | NLP | 31.05.2012
38
![Page 39: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/39.jpg)
Applications
Information retrievalQuestion answeringDocument categorizationMachine translationLanguage modelingWord clustering
Saeedeh Momtazi | NLP | 31.05.2012
39
![Page 40: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/40.jpg)
Information retrieval &Question Answering
Saeedeh Momtazi | NLP | 31.05.2012
40
![Page 41: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/41.jpg)
Approaches
Thesaurus-basedBased on their distance in thesaurusBased on their definition in thesaurus (gloss)
DistributionalBased on the similarity between their contexts
Saeedeh Momtazi | NLP | 31.05.2012
41
![Page 42: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/42.jpg)
Thesaurus-based Methods
Two concepts (sense) are similar if they are “nearby”(if there is a short path between them in the hypernym hierarchy)
Saeedeh Momtazi | NLP | 31.05.2012
42
![Page 43: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/43.jpg)
Path-base Similarity
pathlen(c1, c2) = 1 + number of edges in the shortest pathbetween the sense nodes c1 and c2
simpath(c1, c2) = − log pathlen(c1, c2)
wordsim(w1,w2) = maxc1∈senses(w1)c2∈senses(w2)
sim(c1, c2)
when we have no knowledge about the exact sense(which is the case when processing general text)
Saeedeh Momtazi | NLP | 31.05.2012
43
![Page 44: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/44.jpg)
Path-base Similarity
Shortcoming
Assumes that each link represents a uniform distanceNickel to money seems closer than to standard
SolutionUsing a metric which represents the cost of each edgeindependently⇒Words connected only through abstract nodes are less similar
Saeedeh Momtazi | NLP | 31.05.2012
44
![Page 45: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/45.jpg)
Information Content Similarity
Assigning a probability P(c) to each node of thesaurusP(c) is the probability that a randomly selected word in a corpusis an instance of concept c⇒ P(root) = 1, since all words are subsumed by the root conceptThe probability is trained by counting the words in a corpusThe lower a concept in the hierarchy, the lower its probability
P(c) =
∑w∈words(c)#w
Nwords(c) is the set of words subsumed by concept cN is the total number of words in the corpus that are available in thesaurus
Saeedeh Momtazi | NLP | 31.05.2012
45
![Page 46: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/46.jpg)
Information Content Similarity
words(coin) = {nickel, dime}words(coinage) = {nickel, dime, coin}words(money) = {budget, fund}
words(medium of exchange) = {nickel, dime, coin, coinage, currency, budget, fund, money}
Saeedeh Momtazi | NLP | 31.05.2012
46
![Page 47: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/47.jpg)
Information Content Similarity
Augmenting each concept in the WordNet hierarchy with aprobability P(c)
Saeedeh Momtazi | NLP | 31.05.2012
47
![Page 48: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/48.jpg)
Information Content Similarity
Information Content:IC(c) = − log P(c)
Lowest common subsumer:LCS(c1, c2) = the lowest node in the hierarchy that subsumes both c1 and c2
Saeedeh Momtazi | NLP | 31.05.2012
48
![Page 49: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/49.jpg)
Information Content Similarity
Resnik similarity
Measuring the common amount of information by the informationcontent of the lowest common subsumer of the two concepts
simresnik (c1, c2) = − log P(LCS(c1, c2))
simresnik (hill,coast) = − log P(geological-formation)
Saeedeh Momtazi | NLP | 31.05.2012
49
![Page 50: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/50.jpg)
Information Content Similarity
Lin similarity
Measuring the difference between two concepts in addition totheir commonality
simLin(c1, c2) =2 log P(LCS(c1, c2))
log P(c1) + log P(c2)
simLin(hill,coast) =2 log P(geological-formation)
log P(hill) + P(coast)
Saeedeh Momtazi | NLP | 31.05.2012
50
![Page 51: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/51.jpg)
Information Content Similarity
Jiang-Conrath similarity
simJC(c1, c2) =1
log P(c1) + log P(c2)− 2 log P(LCS(c1, c2))
simJC(hill,coast) =1
log P(hill) + P(coast)− 2 log P(geological-formation)
Saeedeh Momtazi | NLP | 31.05.2012
51
![Page 52: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/52.jpg)
Extended Lesk
Looking at word definitions in thesaurus (gloss)Measuring the similarity base on the number of common wordsin their definitionAdding a score of n2 for each n-word phrase that occurs in bothglossesComputing overlap for other relations as well(gloss of hypernyms and hyponyms)
simeLesk =∑
r ,q∈RELS
overlap(gloss(r(c1),gloss(q(c2)))
Saeedeh Momtazi | NLP | 31.05.2012
52
![Page 53: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/53.jpg)
Extended Lesk
Drawing paper
paper that is specially prepared for use in drafting
Decalthe art of transferring designs from specially prepared paper to awood or glass or metal surface
common phrases: specially prepared and paper
simeLesk = 1 + 22 = 1 + 4 = 5
Saeedeh Momtazi | NLP | 31.05.2012
53
![Page 54: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/54.jpg)
Thesaurus-based Similarities
Overview
Saeedeh Momtazi | NLP | 31.05.2012
54
![Page 55: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/55.jpg)
Available Libraries
WordNet::Similarity
Source:http://wn-similarity.sourceforge.net/
Web-based interface:http://marimba.d.umn.edu/cgi-bin/similarity/similarity.cgi
Saeedeh Momtazi | NLP | 31.05.2012
55
![Page 56: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/56.jpg)
Thesaurus-based Methods
Shortcomings
Many words are missing in thesaurusOnly use hyponym info
Might useful for nouns, but weak for adjectives, adverbs, and verbs
Many languages have no thesaurus
AlternativeUsing distributional methods for word similarity
Saeedeh Momtazi | NLP | 31.05.2012
56
![Page 57: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/57.jpg)
Distributional Methods
Using context information to find the similarity between wordsGuessing the meaning of a word based on its context
tezgüino?
tezgüino?
A bottle of tezgüino is on the tableEverybody likes tezgüinoTezgüino makes you drunkWe make tezgüino out of corn
⇒ An alcoholic beverageSaeedeh Momtazi | NLP | 31.05.2012
57
![Page 58: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/58.jpg)
Context Representations
Considering a target term tBuilding a vocabulary of M words ({w1,w2,w3, ...,wM})Creating a vector for t with M features (t = {f1, f2, f3, ..., fM})
fi means the number of times the word wi occurs in the contextof t
tezgüino?
A bottle of tezgüino is on the tableEverybody likes tezgüinoTezgüino makes you drunkWe make tezgüino out of corn
t = tezgüinovocab = {book, bottle, city, drunk, like, water,...}
t = { 0 , 1 , 0 , 1 , 1 , 0 ,...}Saeedeh Momtazi | NLP | 31.05.2012
58
![Page 59: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/59.jpg)
Context Representations
Term-term matrixThe number of times the context word c appear close to the termt in within a window
art boil data function large sugar summarize waterapricot 0 1 0 0 1 2 0 1pineapple 0 1 0 0 1 1 0 1digital 0 0 1 3 1 0 1 0information 0 0 9 1 1 0 2 0
GoalFinding a good metric that based on the vectors of these fourwords shows
apricot and pineapple to be hight similardigital and information to be hight similarthe other four pairing (apricot & digital, apricot & information,pineapple & digital, pineapple & information) to be less similar
Saeedeh Momtazi | NLP | 31.05.2012
59
![Page 60: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/60.jpg)
Distributional similarity
Three parameters should be specified
How the co-occurrence terms are defined? (what is a neighbor?)
How terms are weighted?
What vector distance metric should be used?
Saeedeh Momtazi | NLP | 31.05.2012
60
![Page 61: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/61.jpg)
Distributional similarity
How the co-occurrence terms are defined? (what is a neighbor?)
Widow of k wordsSentenceParagraphDocument
Saeedeh Momtazi | NLP | 31.05.2012
61
![Page 62: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/62.jpg)
Distributional similarity
How terms are weighted?Binary
1, if two words co-occur (no matter how often)0, otherwise
FrequencyNumber of times two words co-occur with respect to the total sizeof the corpus
P(t , c) = #(t,c)N
Pointwise Mutual informationNumber of times two words co-occur, compared with what wewould expect if they were independent
PMI(t , c) = log P(t,c)P(t)·P(c)
Saeedeh Momtazi | NLP | 31.05.2012
62
![Page 63: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/63.jpg)
Distributional similarity
#(t , c)art boil data function large sugar summarize water
apricot 0 1 0 0 1 2 0 1pineapple 0 1 0 0 1 1 0 1digital 0 0 1 3 1 0 1 0information 0 0 9 1 1 0 2 0
P(t , c) {N = 28}art boil data function large sugar summarize water
apricot 0 0.035 0 0 0.035 0.071 0 0.035pineapple 0 0.035 0 0 0.035 0.035 0 0.035digital 0 0 0.035 0.107 0.035 0 0.035 0information 0 0 0.321 0.035 0.035 0 0.071 0
Saeedeh Momtazi | NLP | 31.05.2012
63
![Page 64: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/64.jpg)
Pointwise Mutual Informationart boil data function large sugar summarize water
apricot 0 0.035 0 0 0.035 0.071 0 0.035pineapple 0 0.035 0 0 0.035 0.035 0 0.035digital 0 0 0.035 0.107 0.035 0 0.035 0information 0 0 0.321 0.035 0.035 0 0.071 0
P(digital, summarize) = 0.035P(information, function) = 0.035
P(digital, summarize) = P(information, function)
PMI(digital, summarize) =?PMI(information, function) =?
Saeedeh Momtazi | NLP | 31.05.2012
64
![Page 65: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/65.jpg)
Pointwise Mutual Informationart boil data function large sugar summarize water
apricot 0 0.035 0 0 0.035 0.071 0 0.035pineapple 0 0.035 0 0 0.035 0.035 0 0.035digital 0 0 0.035 0.107 0.035 0 0.035 0information 0 0 0.321 0.035 0.035 0 0.071 0
P(digital, summarize) = 0.035P(information, function) = 0.035
P(digital) = 0.212 P(summarize) = 0.106P(information) = 0.462 P(function) = 0.142
PMI(digital, summarize) = P(digital,summarize)P(digital)·P(summarize) = 0.035
0.212×0.106 = 1.557
PMI(information, function) = P(information,function)P(information)·P(function) = 0.035
0.462×0.142 = 0.533
P(digital, summarize) > P(information, function)
Saeedeh Momtazi | NLP | 31.05.2012
65
![Page 66: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/66.jpg)
Distributional similarity
How terms are weighted?
Binary
Frequency
Pointwise Mutual information
PMI(t , c) = log P(t,c)P(t)·P(c)
t-test
t − test(t , c) = P(t,c)−P(t)·P(c)√P(t)·P(c)
Saeedeh Momtazi | NLP | 31.05.2012
66
![Page 67: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/67.jpg)
Distributional similarity
What vector distance metric should be used?
Cosine
Simcosine(~v , ~w) =∑
i vi×wi√∑i v2
i
√∑i w2
i
Jaccard
Simjaccard (~v , ~w) =∑
i min(vi ,wi )∑i max(vi ,wi )
Dice
Simdice(~v , ~w) =2×
∑i min(vi ,wi )∑i (vi+wi )
Saeedeh Momtazi | NLP | 31.05.2012
67
![Page 68: Natural Language Processing - Hasso Plattner Institute · Natural Language Processing Lexical Semantics Word Sense Disambiguation and Word Similarity Potsdam, 31 May 2012 Saeedeh](https://reader031.fdocuments.in/reader031/viewer/2022022012/5b192c7e7f8b9a3c258c6e4f/html5/thumbnails/68.jpg)
Further Reading
Speech and Language ProcessingChapters 19, 20
Saeedeh Momtazi | NLP | 31.05.2012
68