Icon 2007 Pedersen

72
The Semantic Quilt: The Semantic Quilt: Contexts, Co- Contexts, Co- occurrences, occurrences, Kernels, and Kernels, and Ontologies Ontologies Ted Pedersen Ted Pedersen University of Minnesota, University of Minnesota, Duluth Duluth http://www.umn.edu/home/ http://www.umn.edu/home/ tpederse tpederse

Transcript of Icon 2007 Pedersen

Page 1: Icon 2007 Pedersen

The Semantic Quilt: The Semantic Quilt: Contexts, Co-occurrences, Contexts, Co-occurrences, Kernels, and Ontologies Kernels, and Ontologies

Ted PedersenTed PedersenUniversity of Minnesota, DuluthUniversity of Minnesota, Duluth

http://www.umn.edu/home/tpedersehttp://www.umn.edu/home/tpederse

Page 2: Icon 2007 Pedersen

Create by stitching togetherCreate by stitching together

Page 3: Icon 2007 Pedersen

Sew together different materialsSew together different materials

Page 4: Icon 2007 Pedersen

Semantics in NLPSemantics in NLP Believed to be useful for many applicationsBelieved to be useful for many applications

Machine TranslationMachine Translation Document or Story UnderstandingDocument or Story Understanding Text GenerationText Generation Web SearchWeb Search ……

Can come from many sourcesCan come from many sources Not well integratedNot well integrated Not well defined?Not well defined?

Page 5: Icon 2007 Pedersen

What do we mean by What do we mean by semanticssemantics??…it depends on our resources……it depends on our resources…

Ontologies – semantics of a Ontologies – semantics of a wordword are provided are provided by relationship to other conceptsby relationship to other concepts similar words are related to similar conceptssimilar words are related to similar concepts

Dictionary – semantics of a Dictionary – semantics of a wordword are provided by are provided by a definitiona definition similar words have similar definitionssimilar words have similar definitions

Contexts – a Contexts – a wordword is defined by the surrounding is defined by the surrounding contextcontext similar words occur in similar contextssimilar words occur in similar contexts

Co-occurrences – a Co-occurrences – a word word is defined by the is defined by the company it keeps company it keeps words that occur with the same words are similarwords that occur with the same words are similar

Page 6: Icon 2007 Pedersen

What level of granularity?What level of granularity?

wordswords terms / collocationsterms / collocations phrasesphrases sentencessentences paragraphsparagraphs documentsdocuments booksbooks

Page 7: Icon 2007 Pedersen

The Terrible Tension :The Terrible Tension :ambiguity versus granularityambiguity versus granularity

Words are potentially very ambiguousWords are potentially very ambiguous but we can list them (sort of)but we can list them (sort of) we can define their meanings (sort of)we can define their meanings (sort of) not ambiguous to human reader, but hard for a not ambiguous to human reader, but hard for a

computer to know which meaning is intendedcomputer to know which meaning is intended

Terms / collocations are less ambiguousTerms / collocations are less ambiguous harder to enumerate in general because there are so harder to enumerate in general because there are so

many, but can be done in a domain (e.g., medicine)many, but can be done in a domain (e.g., medicine)

Phrases can still be ambiguous, but usually only Phrases can still be ambiguous, but usually only when that is the intent of the speaker / writerwhen that is the intent of the speaker / writer

Page 8: Icon 2007 Pedersen

The Current State of AffairsThe Current State of Affairs Most of our resources and methods focus on Most of our resources and methods focus on

word or term semanticsword or term semantics makes it possible to build resources (manually or makes it possible to build resources (manually or

automatically) that have reasonable coverageautomatically) that have reasonable coverage but, techniques are very resource dependentbut, techniques are very resource dependent but, resources introduce language dependenciesbut, resources introduce language dependencies but, introduces a lot of ambiguitybut, introduces a lot of ambiguity but, not clear how to bring together resourcesbut, not clear how to bring together resources

Similarity is an important organizing principleSimilarity is an important organizing principle but, there are lots of ways to be similarbut, there are lots of ways to be similar

Page 9: Icon 2007 Pedersen

Things we can do now…Things we can do now… Identify associated wordsIdentify associated words

fine winefine wine baseball batbaseball bat

Identify similar contextsIdentify similar contexts I bought some food at the storeI bought some food at the store I purchased something to eat at the marketI purchased something to eat at the market

Identify similar (or related) conceptsIdentify similar (or related) concepts frog : amphibianfrog : amphibian Duluth : snowDuluth : snow

Assign meanings to wordsAssign meanings to words I went to the bankI went to the bank/[financial-inst.]/[financial-inst.] to deposit my check to deposit my check

Page 10: Icon 2007 Pedersen

Things we want to do…Things we want to do…

Integrate different resources and methodsIntegrate different resources and methods Solve bigger problemsSolve bigger problems

much of what we do now is a means to an much of what we do now is a means to an unclear endunclear end

Language IndependenceLanguage Independence Broad coverageBroad coverage Less reliance on manually built resources Less reliance on manually built resources

ontologies, dictionaries, training data…ontologies, dictionaries, training data…

Page 11: Icon 2007 Pedersen

Semantic Patches to Sew TogetherSemantic Patches to Sew Together Co-OccurrencesCo-Occurrences

Ngram Statistics Package : measures association Ngram Statistics Package : measures association between words, can identify collocations or termsbetween words, can identify collocations or terms

ContextsContexts SenseClusters : measures similarity between written SenseClusters : measures similarity between written

texts (i.e., contexts) texts (i.e., contexts) OntologiesOntologies

WordNet-Similarity : measures similarity between WordNet-Similarity : measures similarity between concepts found in WordNetconcepts found in WordNet

Disclosure : All of these are projects at the University of Minnesota, DuluthDisclosure : All of these are projects at the University of Minnesota, Duluth

Page 12: Icon 2007 Pedersen

Co-occurrencesCo-occurrences

Ngram Statistics PackageNgram Statistics Package

http://ngram.sourceforge.nethttp://ngram.sourceforge.net

Page 13: Icon 2007 Pedersen

Things we can do now…Things we can do now…

Identify associated wordsIdentify associated words fine winefine wine baseball batbaseball bat

Identify similar contextsIdentify similar contexts I bought some food at the storeI bought some food at the store I purchased something to eat at the marketI purchased something to eat at the market

Identify similar (or related) conceptsIdentify similar (or related) concepts frog : amphibianfrog : amphibian Duluth : snowDuluth : snow

Assign meanings to wordsAssign meanings to words I went to the bank/[financial-inst.] to deposit my checkI went to the bank/[financial-inst.] to deposit my check

Page 14: Icon 2007 Pedersen

Co-occurrences and semantics?Co-occurrences and semantics?

single words are very ambiguoussingle words are very ambiguous bat bat lineline

pairs of words disambiguate each otherpairs of words disambiguate each other baseball batbaseball bat vampire … Transylvaniavampire … Transylvania product lineproduct line speech …. line speech …. line

Page 15: Icon 2007 Pedersen

Why pairs of words?Why pairs of words?

Zipf's LawZipf's Law most words are very raremost words are very rare most bigrams are even more raremost bigrams are even more rare most ngrams are even more more raremost ngrams are even more more rare

Pairs of words are much less ambiguous Pairs of words are much less ambiguous than individual words, and yet can be than individual words, and yet can be found with reasonable frequency even in found with reasonable frequency even in relatively small corporarelatively small corpora

Page 16: Icon 2007 Pedersen

BigramsBigrams

Window Size of 2Window Size of 2 baseball bat, fine wine, apple orchard, bill clintonbaseball bat, fine wine, apple orchard, bill clinton

Window Size of 3Window Size of 3 house house ofof representatives, bottle representatives, bottle ofof wine, wine,

Window Size of 4Window Size of 4 president president of theof the republic, whispering republic, whispering in thein the wind wind

Selected using a small window size (2-4 words)Selected using a small window size (2-4 words) Objective is to capture a regular or localized Objective is to capture a regular or localized

pattern between two words (collocation?)pattern between two words (collocation?)

Page 17: Icon 2007 Pedersen

““occur together more often than occur together more often than expected by chance…”expected by chance…”

Observed frequencies for two words occurring Observed frequencies for two words occurring together and alone are stored in a 2x2 matrixtogether and alone are stored in a 2x2 matrix

Expected values are calculated, based on the Expected values are calculated, based on the model of independence and observed valuesmodel of independence and observed values How often would you expect these words to occur How often would you expect these words to occur

together, if they only occurred together by chance?together, if they only occurred together by chance? If two words occur “significantly” more often than the If two words occur “significantly” more often than the

expected value, then the words do not occur together expected value, then the words do not occur together by chance.by chance.

Page 18: Icon 2007 Pedersen

2x2 Contingency Table2x2 Contingency Table

IntelligenceIntelligence notnot

IntelligenceIntelligence

ArtificialArtificial 100100 400400

notnot

ArtificialArtificial

300300 100,000100,000

Page 19: Icon 2007 Pedersen

2x2 Contingency Table2x2 Contingency Table

IntelligenceIntelligence notnot

IntelligenceIntelligence

ArtificialArtificial 100100 300300 400400

notnot

ArtificialArtificial

200200 99,40099,400 99,60099,600

300300 99,70099,700 100,000100,000

Page 20: Icon 2007 Pedersen

2x2 Contingency Table2x2 Contingency Table

IntelligenceIntelligence notnot

IntelligenceIntelligence

ArtificialArtificial 100.0100.0

000.12000.12

300.0300.0

398.8398.8

400400

notnot

ArtificialArtificial

200.0200.0

298.8298.8

99,400.099,400.0

99,301.299,301.2

99,60099,600

300300 99,70099,700 100,000100,000

Page 21: Icon 2007 Pedersen

Measures of AssociationMeasures of Association

2

1,

22

2

1,

2

),(

)],(),([

)),(

),(log*),((

ji ji

jiji

ji

ji

jiji

wwexpected

wwexpectedwwobservedX

wwexpected

wwobservedwwobservedG

Page 22: Icon 2007 Pedersen

Measures of AssociationMeasures of Association

78.8191

88.7502

2

X

G

Page 23: Icon 2007 Pedersen

Interpreting the Scores…Interpreting the Scores…

Values above a certain level of Values above a certain level of significance can be considered grounds significance can be considered grounds for rejecting the null hypothesis for rejecting the null hypothesis H0: the words in the bigram are independentH0: the words in the bigram are independent 3.84 is associated with 95% confidence that 3.84 is associated with 95% confidence that

the null hypothesis should be rejectedthe null hypothesis should be rejected

Page 24: Icon 2007 Pedersen

Measures of AssociationMeasures of Associationall supported in NSPall supported in NSP

http://ngram.sourceforge.nethttp://ngram.sourceforge.net

Log-likelihood RatioLog-likelihood Ratio Mutual Information Mutual Information Pointwise Mutual Pointwise Mutual

Information Information Pearson’s Chi-squared Pearson’s Chi-squared

TestTest

Phi coefficient Phi coefficient Fisher’s Exact Test Fisher’s Exact Test T-test T-test Dice CoefficientDice Coefficient Odds RatioOdds Ratio

Page 25: Icon 2007 Pedersen

What do we get at the end?What do we get at the end?

A list of bigrams or co-occurrences that are A list of bigrams or co-occurrences that are significant or interesting (meaningful?)significant or interesting (meaningful?) automaticautomatic language independentlanguage independent

These can be viewed as fundamental building These can be viewed as fundamental building blocks for systems that do semantic processingblocks for systems that do semantic processing relatively unambiguousrelatively unambiguous often have high information contentoften have high information content can serve as a fingerprint for a document or bookcan serve as a fingerprint for a document or book

What can we do with these though?????What can we do with these though?????

Page 26: Icon 2007 Pedersen

ContextsContexts

SenseClustersSenseClusters

http://senseclusters.sourceforge.nethttp://senseclusters.sourceforge.net

Page 27: Icon 2007 Pedersen

Things we can do now…Things we can do now… Identify associated wordsIdentify associated words

fine winefine wine baseball batbaseball bat

Identify similar contextsIdentify similar contexts I bought some food at the storeI bought some food at the store I purchased something to eat at the marketI purchased something to eat at the market

Identify similar (or related) conceptsIdentify similar (or related) concepts frog : amphibianfrog : amphibian Duluth : snowDuluth : snow

Assign meanings to wordsAssign meanings to words I went to the bank/[financial-inst.] to deposit my checkI went to the bank/[financial-inst.] to deposit my check

Page 28: Icon 2007 Pedersen

2828

Similar ContextsSimilar Contextsmay have the same meaning…may have the same meaning…

Context 1: He drives his car fast Context 1: He drives his car fast Context 2: Jim speeds in his autoContext 2: Jim speeds in his auto

Car -> motor, garage, gasoline, insuranceCar -> motor, garage, gasoline, insurance Auto -> motor, insurance, gasoline, accidentAuto -> motor, insurance, gasoline, accident

Car and Auto share many co-occurrences… Car and Auto share many co-occurrences…

Page 29: Icon 2007 Pedersen

Clustering Similar ContextsClustering Similar Contexts

A A contextcontext is a short unit of text is a short unit of text often a phrase to a paragraph in length, often a phrase to a paragraph in length,

although it can be longeralthough it can be longer Input: N contextsInput: N contexts Output: k clustersOutput: k clusters

Where each member of a cluster is a context Where each member of a cluster is a context that is more similar to each other than to the that is more similar to each other than to the contexts found in other clusterscontexts found in other clusters

Page 30: Icon 2007 Pedersen

Contexts (input)Contexts (input)

I can hear the ocean in that I can hear the ocean in that shellshell.. My operating system My operating system shell shell is bash.is bash. The The shellsshells on the shore are lovely. on the shore are lovely. The The shell shell command line is flexible.command line is flexible. An oyster An oyster shellshell is very hard and black. is very hard and black.

Page 31: Icon 2007 Pedersen

Contexts (output)Contexts (output)

Cluster 1: Cluster 1: My My operating systemoperating system shellshell is bash.is bash. The The shell shell command linecommand line is flexible. is flexible.

Cluster 2:Cluster 2: The The shellsshells on the on the shoreshore are lovely. are lovely. An An oyster oyster shellshell is very hard and black. is very hard and black. I can hear the I can hear the oceanocean in that in that shell.shell.

Page 32: Icon 2007 Pedersen

General MethodologyGeneral Methodology

Represent contexts using second order co-Represent contexts using second order co-occurrences occurrences

Reduce dimensionality of vectorsReduce dimensionality of vectors Singular value decompositionSingular value decomposition

Cluster the context vectorsCluster the context vectors Find the number of clustersFind the number of clusters Label the clustersLabel the clusters

Evaluate and/or use the contexts!Evaluate and/or use the contexts!

Page 33: Icon 2007 Pedersen

Second Order FeaturesSecond Order Features

Second order features encode something Second order features encode something ‘extra’ about a feature that occurs in a ‘extra’ about a feature that occurs in a context, something not available in the context, something not available in the context itselfcontext itself Native SenseClusters : each feature is Native SenseClusters : each feature is

represented by a vector of the words with which represented by a vector of the words with which it occurs it occurs

Latent Semantic Analysis : each feature is Latent Semantic Analysis : each feature is represented by a vector of the contexts in which represented by a vector of the contexts in which it occurs it occurs

Page 34: Icon 2007 Pedersen

3434

Second Order Context Second Order Context RepresentationRepresentation

Bigrams used to create a word matrixBigrams used to create a word matrix Cell values = log-likelihood of word pairCell values = log-likelihood of word pair

Rows are first order co-occurrence vector Rows are first order co-occurrence vector for a wordfor a word

Represent context by averaging vectors of Represent context by averaging vectors of words in that contextwords in that context Context includes the Cxt positions around the Context includes the Cxt positions around the

target, where Cxt is typically 5 or 20.target, where Cxt is typically 5 or 20.

Page 35: Icon 2007 Pedersen

3535

22ndnd Order Context Vectors Order Context Vectors

He won an Oscar, but He won an Oscar, but Tom HanksTom Hanks is still a nice guy. is still a nice guy.

06272.852.913362.608420.0321176.8451.021O2contex

t

018818.55

000205.5469

134.5102

guy

000136.0441

29.57600Oscar

008.739951.781230.5203324.9818.5533won

needlefamilywarmovieactorfootballbaseball

Page 36: Icon 2007 Pedersen

Second Order Co-OccurrencesSecond Order Co-Occurrences

These two contexts share no words in common, yet they are These two contexts share no words in common, yet they are similar! similar! diskdisk and and linux linux both occur with “Apple”, “IBM”, “data”, both occur with “Apple”, “IBM”, “data”, “graphics”, and “memory” “graphics”, and “memory”

The two contexts are similar because they share many The two contexts are similar because they share many second second order co-occurrencesorder co-occurrences

appleapple bloodblood cellscells ibmibm datadata tissuetissue graphicsgraphics memorymemory organorgan PlasmaPlasma

diskdisk .76.76 .00.00 .01.01 1.31.3 2.12.1 .00.00 .91.91 .72.72 .00.00 .00.00

linuxlinux .96.96 .00.00 .16.16 1.71.7 2.72.7 .03.03 1.11.1 1.01.0 .00.00 .13.13

• I got a new disk today!

• What do you think of linux?

Page 37: Icon 2007 Pedersen

3737

After context representation…After context representation…

Second order vector is an average of word Second order vector is an average of word vectors that make up context, captures vectors that make up context, captures indirect relationshipsindirect relationships Reduced by SVD to principal componentsReduced by SVD to principal components

Now, cluster the vectors!Now, cluster the vectors! We use the method of repeated bisectionsWe use the method of repeated bisections CLUTOCLUTO

Page 38: Icon 2007 Pedersen

What do we get at the end?What do we get at the end?

contexts organized into some number of contexts organized into some number of clusters based on the clusters based on the similarity similarity of their co-of their co-occurrencesoccurrences

contexts which share words that tend to contexts which share words that tend to co-occur with the same other words are co-occur with the same other words are clustered togetherclustered together 22ndnd order co-occurences order co-occurences

Page 39: Icon 2007 Pedersen

Finding Similar ContextsFinding Similar Contexts Find phrases that say the same thing using Find phrases that say the same thing using

different wordsdifferent words I went to the storeI went to the store Ted drove to Wal-MartTed drove to Wal-Mart

Find words that have the same meaning in Find words that have the same meaning in different contextsdifferent contexts The The lineline is moving pretty fast is moving pretty fast I stood in I stood in lineline for 12 hours for 12 hours

Find different words that have the same Find different words that have the same meaning in different contextsmeaning in different contexts The The lineline is moving pretty fast is moving pretty fast I stood in the I stood in the queuequeue for 12 hours for 12 hours

Page 40: Icon 2007 Pedersen

Semantic SimilaritySemantic Similarity

WordNet-SimilarityWordNet-Similarity

http://wn-similarity.sourceforge.nethttp://wn-similarity.sourceforge.net

Page 41: Icon 2007 Pedersen

Things we can do now…Things we can do now… Identify associated wordsIdentify associated words

fine winefine wine baseball batbaseball bat

Identify similar contextsIdentify similar contexts I bought some food at the storeI bought some food at the store I purchased something to eat at the marketI purchased something to eat at the market

Identify similar (or related) conceptsIdentify similar (or related) concepts frog : amphibianfrog : amphibian Duluth : snowDuluth : snow

Assign meanings to wordsAssign meanings to words I went to the bank/[financial-inst.] to deposit my checkI went to the bank/[financial-inst.] to deposit my check

Page 42: Icon 2007 Pedersen

Similarity and RelatednessSimilarity and Relatedness

Two concepts are similar if they are Two concepts are similar if they are connected by connected by is-a is-a relationships.relationships. A frog A frog is-a-kind-of is-a-kind-of amphibianamphibian An illness An illness is-a is-a heath_conditionheath_condition

Two concepts can be related many ways…Two concepts can be related many ways… A human A human has-a-part has-a-part liver liver Duluth Duluth receives-a-lot-of receives-a-lot-of snowsnow

……similarity is one way to be related similarity is one way to be related

Page 43: Icon 2007 Pedersen

Similarity as Organizing PrincipleSimilarity as Organizing Principle

Measure word association using knowledge lean Measure word association using knowledge lean methods that are based on co-occurrence methods that are based on co-occurrence information from large corporainformation from large corpora

Measure contextual similarity using knowledge Measure contextual similarity using knowledge lean methods that are based on co-occurrence lean methods that are based on co-occurrence information from large corporainformation from large corpora

Measure conceptual similarity using a structured Measure conceptual similarity using a structured repository of knowledge repository of knowledge Lexical database WordNetLexical database WordNet

Page 44: Icon 2007 Pedersen

Why measure conceptual similarity? Why measure conceptual similarity?

A word will take the sense that is most A word will take the sense that is most related to the surrounding contextrelated to the surrounding context I love I love JavaJava, especially the beaches and the , especially the beaches and the

weather. weather. I love I love JavaJava, especially the support for , especially the support for

concurrent programming.concurrent programming. I love I love javajava, especially first thing in the morning , especially first thing in the morning

with a bagel. with a bagel.

Page 45: Icon 2007 Pedersen

Word Sense DisambiguationWord Sense Disambiguation ……can be performed by finding the sense of a can be performed by finding the sense of a

word most related to its neighborsword most related to its neighbors Here, we define similarity and relatedness with Here, we define similarity and relatedness with

respect to WordNetrespect to WordNet WordNet::Similarity WordNet::Similarity http://wn-similarity.sourceforge.nethttp://wn-similarity.sourceforge.net

WordNet::SenseRelateWordNet::SenseRelate AllWords – assign a sense to every content wordAllWords – assign a sense to every content word TargetWord – assign a sense to a given wordTargetWord – assign a sense to a given word http://senserelate.sourceforge.net http://senserelate.sourceforge.net

Page 46: Icon 2007 Pedersen

WordNet-SimilarityWordNet-Similarityhttp://wn-similarity.sourceforge.nethttp://wn-similarity.sourceforge.net

Path based measuresPath based measures Shortest path (path)Shortest path (path) Wu & Palmer (wup)Wu & Palmer (wup) Leacock & Chodorow (lch)Leacock & Chodorow (lch) Hirst & St-Onge (hso)Hirst & St-Onge (hso)

Information content measuresInformation content measures Resnik (res)Resnik (res) Jiang & Conrath (jcn)Jiang & Conrath (jcn) Lin (lin)Lin (lin)

Gloss based measuresGloss based measures Banerjee and Pedersen (lesk)Banerjee and Pedersen (lesk) Patwardhan and Pedersen (vector, vector_pairs)Patwardhan and Pedersen (vector, vector_pairs)

Page 47: Icon 2007 Pedersen

Path FindingPath Finding

Find shortest is-a path between two concepts?Find shortest is-a path between two concepts? Rada, et. al. (1989)Rada, et. al. (1989) Scaled by depth of hierarchyScaled by depth of hierarchy

• Leacock & Chodorow (1998)Leacock & Chodorow (1998) Depth of subsuming concept scaled by sum of the Depth of subsuming concept scaled by sum of the

depths of individual concepts depths of individual concepts • Wu and Palmer (1994)Wu and Palmer (1994)

Page 48: Icon 2007 Pedersen

watercraft

instrumentality

object

artifact

conveyance

vehicle

motor-vehicle

car boat

ark

article

ware

table-ware

cutlery

fork

from Jiang and Conrath [1997]

Page 49: Icon 2007 Pedersen

Information ContentInformation Content

Measure of specificity in is-a hierarchy (Resnik, 1995)Measure of specificity in is-a hierarchy (Resnik, 1995) -log (probability of concept)-log (probability of concept) High information content values mean very specific concepts High information content values mean very specific concepts

(like pitch-fork and basketball shoe)(like pitch-fork and basketball shoe)

Count how often a concept occurs in a corpusCount how often a concept occurs in a corpus Increment the count associated with that concept, and Increment the count associated with that concept, and

propagate the count up!propagate the count up! If based on word forms, increment all concepts associated If based on word forms, increment all concepts associated

with that formwith that form

Page 50: Icon 2007 Pedersen

Observed “car”...Observed “car”...

motor vehicle (327 +1)

*root* (32783 + 1)

minicab (6)

cab (23)

car (73 +1) bus (17)

stock car (12)

Page 51: Icon 2007 Pedersen

Observed “stock car”...Observed “stock car”...

motor vehicle (328+1)

*root* (32784+1)

minicab (6)

cab (23)

car (74+1) bus (17)

stock car (12+1)

Page 52: Icon 2007 Pedersen

After Counting Concepts... After Counting Concepts...

motor vehicle (329) IC = 1.9

*root* (32785)

minicab (6)

cab (23)

car (75) bus (17) IC = 3.5

stock car (13) IC = 3.1

Page 53: Icon 2007 Pedersen

Similarity and Information ContentSimilarity and Information Content

Resnik (1995) use information content of least Resnik (1995) use information content of least common subsumer to express similarity between common subsumer to express similarity between two conceptstwo concepts

Lin (1998) scale information content of least Lin (1998) scale information content of least common subsumer with sum of information common subsumer with sum of information content of two conceptscontent of two concepts

Jiang & Conrath (1997) find difference between Jiang & Conrath (1997) find difference between least common subsumer’s information content least common subsumer’s information content and the sum of the two individual conceptsand the sum of the two individual concepts

Page 54: Icon 2007 Pedersen

Why doesn’t this solve problem?Why doesn’t this solve problem?

Concepts must be organized in a Concepts must be organized in a hierarchy, and connected in that hierarchyhierarchy, and connected in that hierarchy Limited to comparing nouns with nouns, or Limited to comparing nouns with nouns, or

maybe verbs with verbsmaybe verbs with verbs Limited to similarity measures (is-a)Limited to similarity measures (is-a)

What about mixed parts of speech?What about mixed parts of speech? Murder (noun) and horrible (adjective)Murder (noun) and horrible (adjective) Tobacco (noun) and drinking (verb)Tobacco (noun) and drinking (verb)

Page 55: Icon 2007 Pedersen

Using Dictionary Glosses Using Dictionary Glosses to Measure Relatednessto Measure Relatedness

Lesk (1985) Algorithm – measure relatedness of two Lesk (1985) Algorithm – measure relatedness of two concepts by counting the number of shared words in their concepts by counting the number of shared words in their definitionsdefinitions

Cold - a mild Cold - a mild viral viral infection involving the nose and respiratory passages (but infection involving the nose and respiratory passages (but not the lungs)not the lungs)

Flu - an acute febrile highly contagious Flu - an acute febrile highly contagious viral viral diseasedisease Adapted Lesk (Banerjee & Pedersen, 2003) – expand Adapted Lesk (Banerjee & Pedersen, 2003) – expand

glosses to include those concepts directly relatedglosses to include those concepts directly related Cold - a common cold affecting the nasal passages and resulting in Cold - a common cold affecting the nasal passages and resulting in

congestion and sneezing and headache; mild congestion and sneezing and headache; mild viralviral infection involving the nose infection involving the nose and and respiratoryrespiratory passages (but not the lungs); a passages (but not the lungs); a disease disease affecting the affecting the respiratoryrespiratory system system

Flu - an acute and highly contagious Flu - an acute and highly contagious respiratoryrespiratory diseasedisease of swine caused by of swine caused by the orthomyxovirus thought to be the same virus that caused the 1918 the orthomyxovirus thought to be the same virus that caused the 1918 influenza pandemic; an acute febrile highly contagious influenza pandemic; an acute febrile highly contagious viral viral disease; a disease; a disease disease that can be communicated from one person to anotherthat can be communicated from one person to another

Page 56: Icon 2007 Pedersen

Context/Gloss VectorsContext/Gloss Vectors

Leskian approaches require exact matches in glossesLeskian approaches require exact matches in glosses Glosses are short, use related but not identical wordsGlosses are short, use related but not identical words

Solution? Expand glosses by replacing each content word Solution? Expand glosses by replacing each content word with a co-occurrence vector derived from corporawith a co-occurrence vector derived from corpora Rows are words in glosses, columns are the co-Rows are words in glosses, columns are the co-

occurring words in a corpus, cell values are their log-occurring words in a corpus, cell values are their log-likelihood ratioslikelihood ratios

Average the word vectors to create a single vector that Average the word vectors to create a single vector that represents the gloss/sense (Patwardhan & Pedersen, 2003)represents the gloss/sense (Patwardhan & Pedersen, 2003) 22ndnd order co-occurrences order co-occurrences

Measure relatedness using cosine rather than exact match!Measure relatedness using cosine rather than exact match!

Page 57: Icon 2007 Pedersen

Gloss/Context VectorsGloss/Context Vectors

Page 58: Icon 2007 Pedersen

Word Sense DisambiguationWord Sense Disambiguation

……can be performed by finding the sense of a can be performed by finding the sense of a word most related to its neighborsword most related to its neighbors

Here, we define similarity and relatedness Here, we define similarity and relatedness with respect to WordNet-Similaritywith respect to WordNet-Similarity

WordNet-SenseRelateWordNet-SenseRelate AllWords – assign a sense to every content wordAllWords – assign a sense to every content word TargetWord – assign a sense to a given wordTargetWord – assign a sense to a given word http://senserelate.sourceforge.net http://senserelate.sourceforge.net

Page 59: Icon 2007 Pedersen

WordNet-SenseRelateWordNet-SenseRelatehttp://senserelate.sourceforge.nethttp://senserelate.sourceforge.net

For each sense of a target word in contextFor each sense of a target word in context For each content word in the contextFor each content word in the context

• For each sense of that content wordFor each sense of that content word Measure similarity/relatedness between sense of target Measure similarity/relatedness between sense of target

word and sense of content word with WordNet::Similarityword and sense of content word with WordNet::Similarity Keep running sum for score of each sense of targetKeep running sum for score of each sense of target

Pick sense of target word with highest Pick sense of target word with highest score with words in contextscore with words in context

Page 60: Icon 2007 Pedersen

WSD ExperimentWSD Experiment Senseval-2 data consists of 73 nouns, verbs, Senseval-2 data consists of 73 nouns, verbs,

and adjectives, approximately 8,600 “training” and adjectives, approximately 8,600 “training” examples and 4,300 “test” examples. examples and 4,300 “test” examples. Best supervised system 64%Best supervised system 64% SenseRelate 53% (lesk, vector)SenseRelate 53% (lesk, vector) Most frequent sense 48%Most frequent sense 48%

Window of context is defined by position, Window of context is defined by position, includes 2 content words to both the left and includes 2 content words to both the left and right which are measured against the word being right which are measured against the word being disambiguated. disambiguated. Positional proximity is not always associated with Positional proximity is not always associated with

semantic similarity.semantic similarity.

Page 61: Icon 2007 Pedersen

Human Relatedness ExperimentHuman Relatedness Experiment

Miller and Charles (1991) created 30 pairs Miller and Charles (1991) created 30 pairs of nouns that were scored on a of nouns that were scored on a relatedness scale by over 50 human relatedness scale by over 50 human subjectssubjects

Vector measure correlates at over 80% Vector measure correlates at over 80% with human relatedness judgementswith human relatedness judgements

Next closest measure is lesk (at 70%)Next closest measure is lesk (at 70%) All other measures at less than 65%All other measures at less than 65%

Page 62: Icon 2007 Pedersen

Coverage…Coverage…

WordNetWordNet Nouns – 80,000 conceptsNouns – 80,000 concepts Verbs – 13,000 conceptsVerbs – 13,000 concepts Adjectives – 18,000 conceptsAdjectives – 18,000 concepts Adverbs – 4,000 conceptsAdverbs – 4,000 concepts

Words not found in WordNet can’t be Words not found in WordNet can’t be disambiguated by SenseRelatedisambiguated by SenseRelate

language and resource dependent…language and resource dependent…

Page 63: Icon 2007 Pedersen

Supervised WSDSupervised WSD

http://www.d.umn.edu/~tpederse/supervised.htmlhttp://www.d.umn.edu/~tpederse/supervised.html

Page 64: Icon 2007 Pedersen

Things we can do now…Things we can do now… Identify associated wordsIdentify associated words

fine winefine wine baseball batbaseball bat

Identify similar contextsIdentify similar contexts I bought some food at the storeI bought some food at the store I purchased something to eat at the marketI purchased something to eat at the market

Identify similar (or related) conceptsIdentify similar (or related) concepts frog : amphibianfrog : amphibian Duluth : snowDuluth : snow

Assign meanings to wordsAssign meanings to words I went to the bank/[financial-inst.] to I went to the bank/[financial-inst.] to

deposit my checkdeposit my check

Page 65: Icon 2007 Pedersen

Machine Learning ApproachMachine Learning Approach

Annotate text with sense tagsAnnotate text with sense tags must select sense inventorymust select sense inventory

Find interesting featuresFind interesting features bigrams and co-occurrences quite effectivebigrams and co-occurrences quite effective

Learn a modelLearn a model Apply model to untagged dataApply model to untagged data Works very well…given sufficient Works very well…given sufficient

quantities of training data and sufficient quantities of training data and sufficient coverage of your sense inventorycoverage of your sense inventory

Page 66: Icon 2007 Pedersen

Clever Ways to Get Training DataClever Ways to Get Training Data

Parallel TextParallel Text Senseval-3 task, use Hindi translations of Senseval-3 task, use Hindi translations of

English words as sense tagsEnglish words as sense tags Mining the Web for contexts that include Mining the Web for contexts that include

an unambiguous collocationan unambiguous collocation lineline ambiguous, ambiguous, product lineproduct line is not is not

Page 67: Icon 2007 Pedersen

Where does this leave us?Where does this leave us? Ngram Statistics PackageNgram Statistics Package

Finding Co-occurrences and bigrams that carry some Finding Co-occurrences and bigrams that carry some semantic weightsemantic weight

SenseClustersSenseClusters Clustering Similar ContextsClustering Similar Contexts

WordNet-SimilarityWordNet-Similarity Measuring Similarity between ConceptsMeasuring Similarity between Concepts

SenseRelate SenseRelate Word Sense DisambiguationWord Sense Disambiguation knowledge/resource based methodknowledge/resource based method

SupervisedSupervised WSDWSD Building models that assign a sense to a given wordBuilding models that assign a sense to a given word

Page 68: Icon 2007 Pedersen

Integration that already exists…Integration that already exists…

NSP feeds SenseClustersNSP feeds SenseClusters NSP feeds Supervised WSDNSP feeds Supervised WSD WordNet-Similarity feeds SenseRelateWordNet-Similarity feeds SenseRelate

Could do a lot more…one example, how to Could do a lot more…one example, how to give supervised WSD information beyond give supervised WSD information beyond what it finds in annotated text, perhaps what it finds in annotated text, perhaps reducing the amount of such text neededreducing the amount of such text needed

Page 69: Icon 2007 Pedersen

Kernels are similarity matricesKernels are similarity matrices NSP NSP produces word by word similarity matrices, produces word by word similarity matrices,

for use by SenseClustersfor use by SenseClusters SenseClustersSenseClusters produces context by context produces context by context

similarity matrices based on co-occurrencessimilarity matrices based on co-occurrences WordNet-SimilarityWordNet-Similarity produces concept by concept produces concept by concept

similarity matricessimilarity matrices SenseRelateSenseRelate produces context by context produces context by context

similarity matrices based on concept similaritysimilarity matrices based on concept similarity All of these could be used as kernels for All of these could be used as kernels for

Supervised WSDSupervised WSD

Page 70: Icon 2007 Pedersen

ConclusionConclusion

Time to integrate what we have at the word and Time to integrate what we have at the word and term levelterm level look for ways to stitch semantic patches togetherlook for ways to stitch semantic patches together

This will increase our coverage and decrease This will increase our coverage and decrease language dependencelanguage dependence make the quilt bigger and sturdiermake the quilt bigger and sturdier

We will then be able to look at a broader range We will then be able to look at a broader range of languages and semantic problemsof languages and semantic problems calm problems with the warmth of your lovely quilt… calm problems with the warmth of your lovely quilt…

Page 71: Icon 2007 Pedersen

Many Thanks … Many Thanks … SenseClustersSenseClusters

Amruta PurandareAmruta Purandare• MS 2004, now Pitt PhDMS 2004, now Pitt PhD

Anagha Kulkarni Anagha Kulkarni • MS 2006, now CMU PhDMS 2006, now CMU PhD

Mahesh JoshiMahesh Joshi• MS 2006, now CMU MSMS 2006, now CMU MS

WordNet SimilarityWordNet Similarity Siddharth PatwardhanSiddharth Patwardhan

• MS 2003, now Utah PhDMS 2003, now Utah PhD Jason MichelizziJason Michelizzi

• MS 2005, now US NavyMS 2005, now US Navy

Ngram Statistics PackageNgram Statistics Package Satanjeev BanerjeeSatanjeev Banerjee

• MS 2002, now CMU PhDMS 2002, now CMU PhD Saiyam KohliSaiyam Kohli

• MS 2006, now Beckman-CoulterMS 2006, now Beckman-Coulter Bridget Thomson-McInnesBridget Thomson-McInnes

• MS 2004, now Minnesota PhDMS 2004, now Minnesota PhD Supervised WSDSupervised WSD

Saif MohammadSaif Mohammad• MS 2003, now Toronto PhDMS 2003, now Toronto PhD

Mahesh JoshiMahesh Joshi Amruta PurandareAmruta Purandare

SenseRelateSenseRelate Satanjeev BanerjeeSatanjeev Banerjee Siddharth PatwardhanSiddharth Patwardhan Jason MichelizziJason Michelizzi

Page 72: Icon 2007 Pedersen

URLsURLs Ngram Statistics PackageNgram Statistics Package

http://ngram.sourceforge.nethttp://ngram.sourceforge.net SenseClustersSenseClusters

http://senseclusters.sourceforge.nethttp://senseclusters.sourceforge.net IJCAI Tutorial on Jan 6 (afternoon) IJCAI Tutorial on Jan 6 (afternoon)

WordNet-SimilarityWordNet-Similarity http://wn-similarity.sourceforge.nethttp://wn-similarity.sourceforge.net

SenseRelate WSDSenseRelate WSD http://senserelate.sourceforge.nethttp://senserelate.sourceforge.net

Supervised WSDSupervised WSD http://www.d.umn.edu/~tpederse/supervised.htmlhttp://www.d.umn.edu/~tpederse/supervised.html