Understanding Text with Knowledge-Bases and Random...

Transcript of Understanding Text with Knowledge-Bases and Random...

Understanding Text withKnowledge-Bases and Random Walks

Eneko Agirreixa2.si.ehu.es/eneko

IXA NLP GroupUniversity of the Basque Country

MAVIR, 2011

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 1 / 54
Random Walks on Large Graphs

WWW, PageRank and Google

source: http://opte.org


http://opte.org

WWW, PageRank and Google

source: http://opte.org


http://opte.org

Linked Data

sources: http://sixdegrees.hu/ http://www2.research.att.com/˜yifanhu/http://www.cise.ufl.edu/research/sparse/matrices/Gleich/ http://www.ebremer.com/


http://sixdegrees.hu/http://www2.research.att.com/~yifanhu/http://www.cise.ufl.edu/research/sparse/matrices/Gleich/http://www.ebremer.com/

Wikipedia (DBpedia)




WordNet




Unified Medical Language System



Text Understanding

Understanding of broad language, what’s behind the surface strings

Barcelona boss says that Jose Mourinho is’the best coach in the world’

Text Understanding



Text Understanding



Text Understanding



End systems that we would like to build:natural dialogue, speech recognition, machine translationimproving parsing, semantic role labeling, information retrieval, questionanswering

Text Understanding

From string to semantic representation (First Order Logic)

Barcelona coach praises Jose Mourinho.

Exist e1, x1, x2, x3 such thatFC Barcelona(x1) and coach:n:1(x2) andpraise:v:2(e1,x2,x3) and José Mourinho(x3)

Disambiguation: Concepts, Entities and Semantic RolesQuantifiers, modality, negation, etc.

Inference and Reasoning

Barcelona coach praises Mourinho ∼ Guardiola honors Mourinho

. . . with respect to some Knowledge Base

Text Understanding

From string to semantic representation (First Order Logic)

Barcelona coach praises Jose Mourinho.

Exist e1, x1, x2, x3 such thatFC Barcelona(x1) and coach:n:1(x2) andpraise:v:2(e1,x2,x3) and José Mourinho(x3)

Disambiguation: Concepts, Entities and Semantic RolesQuantifiers, modality, negation, etc.

Inference and Reasoning

Barcelona coach praises Mourinho ∼ Guardiola honors Mourinho

. . . with respect to some Knowledge Base

Text Understanding:Knowledge Bases and Random Walks

Focus on the following tasks on inference and disambiguation:Map words in context to KB concepts(Word Sense Disambiguation)Similarity between concepts and wordsSimilarity to improve ad-hoc information retrieval

Applied to WordNet(s), UMLS, WikipediaExcellent resultsOpen source software and data: http://ixa2.si.ehu.es/ukb/


http://ixa2.si.ehu.es/ukb/
Text Understanding:Knowledge Bases and Random Walks

Focus on the following tasks on inference and disambiguation:Map words in context to KB concepts(Word Sense Disambiguation)Similarity between concepts and wordsSimilarity to improve ad-hoc information retrieval

Applied to WordNet(s), UMLS, WikipediaExcellent resultsOpen source software and data: http://ixa2.si.ehu.es/ukb/


http://ixa2.si.ehu.es/ukb/
Outline

1 WordNet, PageRank and Personalized PageRank

2 Random walks for WSD

3 Adapting WSD to domains

4 WSD on the biomedical domain

5 Random walks for similarity

6 Similarity and Information Retrieval

7 Conclusions

WordNet, PageRank and Personalized PageRank

Outline







7 Conclusions


Wordnet, Pagerank and Personalized PageRank(with Aitor Soroa)

WordNet is the most widely used hierarchically organizedlexical database for English (Fellbaum, 1998)

Broad coverage of nouns, verbs, adjectives, adverbs

Main unit: synset (concept)

coach#1, manager#3, handler#2someone in charge of training an athlete or a team.

Relations between concepts:synonymy (built-in), hyperonymy, antonymy, meronymy, entailment,derivation, gloss

Closely linked versions in several languages


Wordnet

Example of hypernym relations:

coach#1trainer

leaderperson

organism. . .

entity

synonyms: manager, handlergloss words (and synsets): charge, train (verb), athlete, teamhyponyms: baseball coach, basketball coach, conditioner, football coachinstance:John McGrawdomain: sport, athleticsderivation: coach (verb), managership, manage (verb), handle (verb)


Wordnet

Representing WordNet as a graph:Nodes represent conceptsEdges represent relations (undirected)In addition, directed edges from words to corresponding concepts(senses)


Wordnet

coach#n1

managership#n3

sport#n1

trainer#n1

handle#v6

coach#n2

teacher#n1

tutorial#n1coach#n5

public_transport#n1

fleet#n2

seat#n1

holonym

holonym

hyperonym

domain

derivation

hyperonym

derivation

hyperonym

derivationcoach


Random Walks: PageRank

Given a graph, ranks nodes according totheir relative structural importance

If an edge from ni to nj exists, a vote from ni to nj is producedStrength depends on the rank of niThe more important ni is, the more strength its votes will have.

PageRank is more commonly viewedas the result of a random walk process

Rank of ni represents the probability of a random walkover the graph ending on ni, at a sufficiently large time.



G: graph with N nodes n1, . . . , nNdi: outdegree of node iM: N × N matrix

Mji =

1di

an edge from i to j exists

0 otherwise

PageRank equation:Pr = cMPr + (1 − c)v

surfer follows edgessurfer randomly jumps to any node (teleport)

c: damping factor: the way in which these two terms are combined




Mji =

1di


0 otherwise







Mji =

1di


0 otherwise







Mji =

1di


0 otherwise







Mji =

1di


0 otherwise





Random Walks: Personalized PageRank

Pr = cMPr + (1 − c)v

PageRank: v is a stochastic normalized vector, with elements 1N

Equal probabilities to all nodes in case of random jumps

Personalized PageRank, non-uniform v (Haveliwala 2002)Assign stronger probabilities to certain kinds of nodesBias PageRank to prefer these nodes

For ex. if we concentrate all mass on node iAll random jumps return to niRank of i will be highHigh rank of i will make all the nodes in its vicinity also receive a high rankImportance of node i given by the initial v spreads along the graph

Random walks for WSD

Outline







7 Conclusions


Word Sense Disambiguation (WSD)

Goal: determine senses of the open-class words in a text.“Nadal is sharing a house with his uncle and coach, Toni.”“Our fleet comprises coaches from 35 to 58 seats.”

Knowledge Base (e.g. WordNet):coach#1 someone in charge of training an athlete or a team.coach#2 a person who gives private instruction (as in singing, acting, etc.).coach#3 a railcar where passengers ride.coach#4 a carriage pulled by four horses with one driver.coach#5 a vehicle carrying many passengers; used for public transport.



Goal: determine senses of the open-class words in a text.“Nadal is sharing a house with his uncle and coach, Toni.”“Our fleet comprises coaches from 35 to 58 seats.”

Knowledge Base (e.g. WordNet):coach#1 someone in charge of training an athlete or a team.coach#2 a person who gives private instruction (as in singing, acting, etc.).coach#3 a railcar where passengers ride.coach#4 a carriage pulled by four horses with one driver.coach#5 a vehicle carrying many passengers; used for public transport.



Supervised corpus-based WSD performs bestTrain classifiers on hand-tagged data (typically SemCor)Data sparseness, e.g. coach 20 examples (20,0,0,0,0,0), bank 48 examples(25,20,2,1,0. . . )Results decrease when train/test from different sources (even Brown, BNC)Decrease even more when train/test from different domains

Knowledge-based WSDUses information in a KB (WordNet)Performs close to but lower than Most Frequent Sense (MFS, supervised)Vocabulary coverageRelation coverage



Supervised corpus-based WSD performs bestTrain classifiers on hand-tagged data (typically SemCor)Data sparseness, e.g. coach 20 examples (20,0,0,0,0,0), bank 48 examples(25,20,2,1,0. . . )Results decrease when train/test from different sources (even Brown, BNC)Decrease even more when train/test from different domains

Knowledge-based WSDUses information in a KB (WordNet)Performs close to but lower than Most Frequent Sense (MFS, supervised)Vocabulary coverageRelation coverage


Domain adaptation

Deploying NLP techniques in real applications is challenging, specially forWSD:

Sense distributions change across domainsData sparseness hurts moreContext overlap is reducedNew senses, new terms

But. . .Some words get less interpretations in domains:bank in finance, coach in sports


Domain adaptation

Deploying NLP techniques in real applications is challenging, specially forWSD:

Sense distributions change across domainsData sparseness hurts moreContext overlap is reducedNew senses, new terms

But. . .Some words get less interpretations in domains:bank in finance, coach in sports


Knowledge-based WSD using random walks(with Aitor Soroa, Oier Lopez de Lacalle)

Use information in WordNet for disambiguation:“Our fleet comprises coaches from 35 to 58 seats.”

Traditional approach (Patwardhan et al. 2007):Compare each target sense of coach with those of the words in the context(fleet, comprise, blue, seat)Using semantic relatedness between pairs of senses (6x4x3x8x9=5184)Alternative to comb. explosion, each word disambiguated individually(6x(4+3+8+9)=144)

sim(coach#1,fleet#1) + sim(coach#1,fleet#2) + . . . sim(coach#1,seat#1) . . .sim(coach#2,fleet#1) + sim(coach#2,fleet#2) + . . . sim(coach#2,seat#1) . . .. . .

Graph-based methodsExploit the structural properties of the graph underlying WordNetFind globally optimal solutionsDisambiguate large portions of text in one goPrincipled solution to combinatorial explosion


Knowledge-based WSD using random walks(with Aitor Soroa, Oier Lopez de Lacalle)

Use information in WordNet for disambiguation:“Our fleet comprises coaches from 35 to 58 seats.”

Traditional approach (Patwardhan et al. 2007):Compare each target sense of coach with those of the words in the context(fleet, comprise, blue, seat)Using semantic relatedness between pairs of senses (6x4x3x8x9=5184)Alternative to comb. explosion, each word disambiguated individually(6x(4+3+8+9)=144)

sim(coach#1,fleet#1) + sim(coach#1,fleet#2) + . . . sim(coach#1,seat#1) . . .sim(coach#2,fleet#1) + sim(coach#2,fleet#2) + . . . sim(coach#2,seat#1) . . .. . .

Graph-based methodsExploit the structural properties of the graph underlying WordNetFind globally optimal solutionsDisambiguate large portions of text in one goPrincipled solution to combinatorial explosion


Using PageRank for WSD

Given a graph representation of the LKBPageRank over the whole WordNet would get a context-independentranking of word senses

We would like:Given an input text, disambiguate all open-class words in the input taking therest as context

Two alternatives1 Create a context-sensitive subgraph and apply PageRank over it (Navigli

and Lapata, 2007; Agirre et al. 2008)2 Use Personalized PageRank over the complete graph, initializing v with the

context words


Using PageRank for WSD

Given a graph representation of the LKBPageRank over the whole WordNet would get a context-independentranking of word senses

We would like:Given an input text, disambiguate all open-class words in the input taking therest as context

Two alternatives1 Create a context-sensitive subgraph and apply PageRank over it (Navigli

and Lapata, 2007; Agirre et al. 2008)2 Use Personalized PageRank over the complete graph, initializing v with the

context words


Using Personalized PageRank (PPR)

For each word Wi, i = 1 . . . m in the contextInitialize v with uniform probabilities over words WiContext words act as source nodesinjecting probability mass into the concept graph

Run Personalized PageRank

Choose highest ranking sense for target word


Using Personalized PageRank (PPR)

coach#n1

managership#n3

sport#n1

trainer#n1

handle#n8

coach#n2

teacher#n1

tutorial#n1

coach#n5

public_transport#n1

fleet#n2

seat#n1

coach fleet comprise ... seat

comprise#v1 ...


Experiment setting

Two datasetsSenseval 2 All Words (S2AW)Senseval 3 All Words (S3AW)

Both labelled with WordNet 1.7 tags

Create input contexts of at least 20 wordsAdding sentences immediately before and after if original too short

PageRank settings:Damping factor (c): 0.85End after 30 iterations


Results and comparison to related work (S2AW)

Senseval-2 All Words datasetSystem All N V Adj. Adv.Mih05 54.2 57.5 36.5 56.7 70.9Sihna07 56.4 65.6 32.3 61.4 60.2Tsatsa07 49.2 – – – –PPR 58.6 70.4 38.9 58.3 70.1MFS 60.1 71.2 39.0 61.1 75.4

(Mihalcea, 2005) Pairwise Lesk between senses, then PageRank.(Sinha & Mihalcea, 2007) Several similarity measures, voting, fine-tuning for

each PoS. Development over S3AW.(Tsatsaronis et al., 2007) Subgraph BFS over WordNet 1.7 and eXtended WN,

then spreading activation.


Results and comparison to related work (S2AW)

Senseval-2 All Words datasetSystem All N V Adj. Adv.Mih05 54.2 57.5 36.5 56.7 70.9Sihna07 56.4 65.6 32.3 61.4 60.2Tsatsa07 49.2 – – – –PPR 58.6 70.4 38.9 58.3 70.1MFS 60.1 71.2 39.0 61.1 75.4

(Mihalcea, 2005) Pairwise Lesk between senses, then PageRank.(Sinha & Mihalcea, 2007) Several similarity measures, voting, fine-tuning for

each PoS. Development over S3AW.(Tsatsaronis et al., 2007) Subgraph BFS over WordNet 1.7 and eXtended WN,

then spreading activation.


Comparison to related work (S3AW)

System All N V Adj. Adv.Mih05 52.2 - - - -Sihna07 52.4 60.5 40.6 54.1 100.0Nav10 - 61.9 36.1 62.8 -PPR 57.4 64.1 46.9 62.6 92.9MFS 62.3 69.3 53.6 63.7 92.9

(Mihalcea, 2005) Pairwise Lesk between senses, then PageRank.(Sinha & Mihalcea, 2007) Several simmilarity measures, voting, fine-tuning for

each PoS. Development over S3AW.(Navigli & Lapata, 2010) Subgraph DFS(3) over WordNet 2.0 plus proprietary

relations, several centrality algorithms.


Comparison to related work (S3AW)

System All N V Adj. Adv.Mih05 52.2 - - - -Sihna07 52.4 60.5 40.6 54.1 100.0Nav10 - 61.9 36.1 62.8 -PPR 57.4 64.1 46.9 62.6 92.9MFS 62.3 69.3 53.6 63.7 92.9

(Mihalcea, 2005) Pairwise Lesk between senses, then PageRank.(Sinha & Mihalcea, 2007) Several simmilarity measures, voting, fine-tuning for

each PoS. Development over S3AW.(Navigli & Lapata, 2010) Subgraph DFS(3) over WordNet 2.0 plus proprietary

relations, several centrality algorithms.

Adapting WSD to domains

Outline







7 Conclusions



How could we improve WSD performance without tagging new data fromdomain or adapting WordNet manually to the domain?

What would happen if we apply PPR-based WSD to specific domains?

Personalized PageRank over context“. . . has never won a league title as coach but took Parma tosuccess. . . ”

Personalized PageRank over related wordsGet related words from distributional thesauruscoach: manager, captain, player, team, striker, . . .


Experiments

Dataset with examples from BNC, Sports and Finance sections Reuters(Koeling et al. 2005)

41 nouns: salient in either domain or with senses linked to these domainsSense inventory: WordNet v. 1.7.1

300 examples for each of the 41 nounsRoughly 100 examples from each word and corpus

ExperimentsSupervised: train MFS, SVM, k-NN on SemCor examplesPersonalized PageRank (same damping factors, iterations)

Use context50 related words (Koeling et al. 2005) (BNC, Sports, Finance)


Experiments

Dataset with examples from BNC, Sports and Finance sections Reuters(Koeling et al. 2005)

41 nouns: salient in either domain or with senses linked to these domainsSense inventory: WordNet v. 1.7.1

300 examples for each of the 41 nounsRoughly 100 examples from each word and corpus

ExperimentsSupervised: train MFS, SVM, k-NN on SemCor examplesPersonalized PageRank (same damping factors, iterations)

Use context50 related words (Koeling et al. 2005) (BNC, Sports, Finance)


Results

Systems BNC Sports FinancesBaselines Random ∗19.7 ∗19.2 ∗19.5

SemCor MFS ∗34.9 ∗19.6 ∗37.1Supervised SVM ∗38.7 ∗25.3 ∗38.7

k-NN 42.8 ∗30.3 ∗43.4Context PPR 43.8 ∗35.6 ∗46.9Related PPR ∗37.7 51.5 59.3words (Koeling et al. 2005) ∗40.7 ∗43.3 ∗49.7

Skyline Test MFS ∗52.0 ∗77.8 ∗82.3

Supervised (MFS, SVM, k-NN) very low (see test MFS)PPR on context: best for BNC (* for statistical significance)PPR on related words: best for Sports and Finance and improves overKoeling et al., who use pairwise WordNet similarity.

WSD on the biomedical domain

Outline







7 Conclusions


UMLS and biomedical text(with Aitor Soroa and Mark Stevenson)

Ambiguities believed not to occur on specific domainsOn the Use of Cold Water as a Powerful Remedial Agent in ChronicDisease.Intranasal ipratropium bromide for the common cold.

11.7% of the phrases in abstracts added to MEDLINE in 1998 wereambiguous (Weeber et al. 2011)

Unified Medical Language System (UMLS) MetathesaurusConcept Unique Identifiers (CUIs)

C0234192: Cold (Cold Sensation) [Physiologic Function]C0009264: Cold (cold temperature) [Natural Phenomenon or Process]C0009443: Cold (Common Cold) [Disease or Syndrome]


UMLS and biomedical text(with Aitor Soroa and Mark Stevenson)

Ambiguities believed not to occur on specific domainsOn the Use of Cold Water as a Powerful Remedial Agent in ChronicDisease.Intranasal ipratropium bromide for the common cold.

11.7% of the phrases in abstracts added to MEDLINE in 1998 wereambiguous (Weeber et al. 2011)

Unified Medical Language System (UMLS) MetathesaurusConcept Unique Identifiers (CUIs)

C0234192: Cold (Cold Sensation) [Physiologic Function]C0009264: Cold (cold temperature) [Natural Phenomenon or Process]C0009443: Cold (Common Cold) [Disease or Syndrome]


WSD and biomedical text

Thesaurus in Metathesaurus:Alcohol and other drugs, Medical Subject Headings, Crisp Thesaurus, SNOMEDClinical Terms, etc.Relations in the Metathesaurus between CUIs:parent, can be qualified by, related possibly sinonymous, related other

We applied random walks over a graph of CUIs.Evaluated on NLM-WSD, 50 ambiguous terms (100 instances each)

KB #CUIs #relations Acc. TermsAOD 15,901 58,998 51.5 4MSH 278,297 1,098,547 44.7 9CSP 16,703 73,200 60.2 3SNOMEDCT 304,443 1,237,571 62.5 29all above 572,105 2,433,324 64.4 48all relations - 5,352,190 68.1 50combined with cooc. - - 73.7 50(Jimeno and Aronson, 2011) - - 68.4 50


WSD and biomedical text

Thesaurus in Metathesaurus:Alcohol and other drugs, Medical Subject Headings, Crisp Thesaurus, SNOMEDClinical Terms, etc.Relations in the Metathesaurus between CUIs:parent, can be qualified by, related possibly sinonymous, related other

We applied random walks over a graph of CUIs.Evaluated on NLM-WSD, 50 ambiguous terms (100 instances each)

KB #CUIs #relations Acc. TermsAOD 15,901 58,998 51.5 4MSH 278,297 1,098,547 44.7 9CSP 16,703 73,200 60.2 3SNOMEDCT 304,443 1,237,571 62.5 29all above 572,105 2,433,324 64.4 48all relations - 5,352,190 68.1 50combined with cooc. - - 73.7 50(Jimeno and Aronson, 2011) - - 68.4 50

Random walks for similarity

Outline







7 Conclusions


Similarity

Given two words or multiword-expressions,estimate how similar they are.

cord smilegem jewel

magician oracle

Features shared, belonging to the same class

Relatedness is a more general relationship,including other relations like topical relatedness or meronymy.

king cabbagemovie star

journey voyage

Typically implemented as calculating a numeric value ofsimilarity/relatedness.


Similarity


cord smilegem jewel

magician oracle




journey voyage



Similarity


cord smilegem jewel

magician oracle




journey voyage



Similarity and WSD

gem jewelmovie star

Both WSD and Similarity are closely intertwined:

Similarity between words based onsimilarity between senses (implicitly doing disambiguation)

WSD uses similarity between senses in context


Similarity examples

RG dataset WordSim353 datasetcord smile 0.02 king cabbage 0.23

rooster voyage 0.04 professor cucumber 0.31. . . . . .

glass jewel 1.78 investigation effort 4.59magician oracle 1.82 movie star 7.38

. . . . . .cemetery graveyard 3.88 journey voyage 9.29

automobile car 3.92 midday noon 9.29midday noon 3.94 tiger tiger 10.00

80 pairs, 51 subjects 353 pairs, 16 subjectsSimilarity Similarity and relatedness


Similarity

Two main approaches:Knowledge-based (Roget’s Thesaurus, WordNet, etc.)Corpus-based, also known as distributional similarity (co-occurrences)

Many potential applications:Overcome brittleness (word match)NLP subtasks (parsing, semantic role labeling)Information retrievalQuestion answeringSummarizationMachine translation optimization and evaluationInference (textual entailment)


Similarity

Two main approaches:Knowledge-based (Roget’s Thesaurus, WordNet, etc.)Corpus-based, also known as distributional similarity (co-occurrences)

Many potential applications:Overcome brittleness (word match)NLP subtasks (parsing, semantic role labeling)Information retrievalQuestion answeringSummarizationMachine translation optimization and evaluationInference (textual entailment)


Random walks for similarity(with Aitor Soroa, Montse Cuadros, German Rigau)

Based on (Hughes and Ramage, 2007)Given a pair of words (w1, w2),

Initialize teleport probability mass on w1Run Personalized Pagerank, obtaining ~w1Initialize w2 and obtain ~w2Measure similarity between ~w1 and ~w2 (e.g. cosine)

Experiment settings:Damping value c = 0.85Calculations finish after 30 iterations

Variations for Knowledge Base:WordNet 3.0WordNet relationsGloss relationsother relations


Random walks for similarity(with Aitor Soroa, Montse Cuadros, German Rigau)

Based on (Hughes and Ramage, 2007)Given a pair of words (w1, w2),

Initialize teleport probability mass on w1Run Personalized Pagerank, obtaining ~w1Initialize w2 and obtain ~w2Measure similarity between ~w1 and ~w2 (e.g. cosine)

Experiment settings:Damping value c = 0.85Calculations finish after 30 iterations

Variations for Knowledge Base:WordNet 3.0WordNet relationsGloss relationsother relations


Dataset and results

Method Source Spearman(Gabrilovich and Markovitch, 2007) Wikipedia 0.75WordNet 3.0 + Knownets WordNet 0.71WordNet 3.0 + glosses WordNet 0.68(Agirre et al. 2009) Corpora 0.66(Finkelstein et al. 2007) LSA 0.56(Hughes and Ramage, 2007) WordNet 0.55(Jarmasz 2003) WordNet 0.35

Similarity and Information Retrieval

Outline







7 Conclusions


Similarity and Information Retrieval(with Arantxa Otegi and Xabier Arregi)

Document expansion (aka clustering and smoothing) has been shown tobe successful in ad-hoc IR

Use WordNet and similarity to expand documents

Example:I can’t install DSL because of the antivirus program, any hints?You should turn off virus and anti-spy software. And thats done within eachof the softwares themselves. Then turn them back on later after setting upany DSL softwares.

Method:Initialize random walk with document wordsRetrieve top k synsetsIntroduce words on those k synsets in a secondary indexWhen retrieving, use both primary and secondary indexes


Example

You should turn off virus and anti-spy software. And thats done within each of thesoftwares themselves. Then turn them back on later after setting up any DSLsoftwares.


Example


Example

I can’t install DSL because of the antivirus program, any hints?


Experiments

BM25 ranking functionCombine 2 indexes: original words and expansion termsParameters: k1, b (BM25) λ (indices) k (concepts in expansion)

Three collections:Robust at CLEF 2009Yahoo Answer!RespubliQA (IR for QA)

Summary of results:Default parameters: 1.43% - 4.90% improvement in all 3 datasetsOptimized parameters: 0.98% - 2.20% improvement in 2 datasetsCarrying parameters: 5.77% - 19.77% improvement in 4 out of 6

RobustnessParticularly on short documents


Experiments






Experiments





Conclusions

Outline







7 Conclusions

Conclusions

Conclusions

Knowledge-based method for similarity and WSDBased on random walksExploits whole structure of underlying KB efficiently

Performance:Similarity: best KB algorithm, comparable with 1.6 Tword, slightly below ESAWSD: Best KB algorithm S2AW, S3AW, Domains datasetsWSD and domains:

Better than supervised WSD when adapting to domains (Sports, Finance)Best KB algorithm in Biomedical texts

Conclusions

Conclusions

Knowledge-based method for similarity and WSDBased on random walksExploits whole structure of underlying KB efficiently

Performance:Similarity: best KB algorithm, comparable with 1.6 Tword, slightly below ESAWSD: Best KB algorithm S2AW, S3AW, Domains datasetsWSD and domains:

Better than supervised WSD when adapting to domains (Sports, Finance)Best KB algorithm in Biomedical texts

Conclusions

Conclusions

Applications: ad-hoc Information Retrievalperformance gains and robustness

Easily ported to other languagesProvides cross-lingual similarityOnly requirement of having a WordNet

Publicly available at http://ixa2.si.ehu.es/ukbBoth programs and data (WordNet, UMLS)Including program to construct graphs from new KB (e.g. Wikipedia)GPL license, open source, free


http://ixa2.si.ehu.es/ukb
Conclusions

Conclusions





Conclusions

Conclusions





Conclusions

Future work

Domains and WSD: interrelation of domain classification and WSD

Named Entity Disambiguation using Wikipedia

Information Retrieval: other options to combine similarity information

Moving to sentence similarity: Semeval task on Semantic Text Similarityhttp://www.cs.york.ac.uk/semeval-2012/task6/


http://www.cs.york.ac.uk/semeval-2012/task6/
Conclusions

Understanding Text withKnowledge-Bases and Random Walks

Eneko Agirreixa2.si.ehu.es/eneko

IXA NLP GroupUniversity of the Basque Country

MAVIR, 2011

Conclusions

References I

Agirre, E., Arregi, X. and Otegi, A. (2010).Document Expansion Based on WordNet for Robust IR.In Proceedings of the 23rd International Conference on ComputationalLinguistics (Coling) pp. 9–17,.

Agirre, E., de Lacalle, O. L. and Soroa, A. (2009).Knowledge-Based WSD on Specific Domains: Performing better thanGeneric Supervised WSD.In Proceedings of IJCAI, Pasadena, USA.

Agirre, E. and Soroa, A. (2009).Personalizing PageRank for Word Sense Disambiguation.In Proceedings of EACL-09, Athens, Greece.

Conclusions

References II

Agirre, E., Soroa, A., Alfonseca, E., Hall, K., Kravalova, J. and Pasca, M.(2009).A Study on Similarity and Relatedness Using Distributional andWordNet-based Approaches.In Proceedings of annual meeting of the North American Chapter of theAssociation of Computational Linguistics (NAAC), Boulder, USA.

Agirre, E., Soroa, A. and Stevenson, M. (2010).Graph-based Word Sense Disambiguation of Biomedical Documents.Bioinformatics 26, 2889–2896.

Eneko Agirre, Montse Cuadros, G. R. and Soroa, A. (2010).Exploring Knowledge Bases for Similarity.In Proceedings of the Seventh conference on International LanguageResources and Evaluation (LREC’10), (Nicoletta Calzolari(Conference Chair), Khalid Choukri, B. M. J. M. J. O. S. P. M. R. D. T.,ed.), pp. 373–377, European Language Resources Association (ELRA),Valletta, Malta.

Conclusions

References III

Otegi, A., Arregi, X. and Agirre, E. (2011).Query Expansion for IR using Knowledge-Based Relatedness.In Proceedings of the International Joint Conference on Natural LanguageProcessing.

Stevenson, M., Agirre, E. and Soroa, A. (2011).Exploiting Domain Information for Word Sense Disambiguation of MedicalDocuments.Journal of the American Medical Informatics Association In press, 1–6.

Yeh, E., Ramage, D., Manning, C., Agirre, E. and Soroa, A. (2009).WikiWalk: Random walks on Wikipedia for Semantic Relatedness.In ACL workshop ”TextGraphs-4: Graph-based Methods for NaturalLanguage Processing.


WordNet, PageRank and Personalized PageRankRandom walks for WSDAdapting WSD to domainsWSD on the biomedical domainRandom walks for similaritySimilarity and Information RetrievalConclusions

Understanding Text with Knowledge-Bases and Random...

Documents

Transcript of Understanding Text with Knowledge-Bases and Random...