Understanding Text with Knowledge-Bases and Random...

91
Understanding Text with Knowledge-Bases and Random Walks Eneko Agirre ixa2.si.ehu.es/eneko IXA NLP Group University of the Basque Country MAVIR, 2011 Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 1 / 54

Transcript of Understanding Text with Knowledge-Bases and Random...

  • Understanding Text withKnowledge-Bases and Random Walks

    Eneko Agirreixa2.si.ehu.es/eneko

    IXA NLP GroupUniversity of the Basque Country

    MAVIR, 2011

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 1 / 54

  • Random Walks on Large Graphs

    WWW, PageRank and Google

    source: http://opte.org

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 2 / 54

    http://opte.org

  • Random Walks on Large Graphs

    WWW, PageRank and Google

    source: http://opte.org

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 2 / 54

    http://opte.org

  • Random Walks on Large Graphs

    Linked Data

    sources: http://sixdegrees.hu/ http://www2.research.att.com/˜yifanhu/http://www.cise.ufl.edu/research/sparse/matrices/Gleich/ http://www.ebremer.com/

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54

    http://sixdegrees.hu/http://www2.research.att.com/~yifanhu/http://www.cise.ufl.edu/research/sparse/matrices/Gleich/http://www.ebremer.com/

  • Random Walks on Large Graphs

    Wikipedia (DBpedia)

    sources: http://sixdegrees.hu/ http://www2.research.att.com/˜yifanhu/http://www.cise.ufl.edu/research/sparse/matrices/Gleich/ http://www.ebremer.com/

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54

    http://sixdegrees.hu/http://www2.research.att.com/~yifanhu/http://www.cise.ufl.edu/research/sparse/matrices/Gleich/http://www.ebremer.com/

  • Random Walks on Large Graphs

    WordNet

    sources: http://sixdegrees.hu/ http://www2.research.att.com/˜yifanhu/http://www.cise.ufl.edu/research/sparse/matrices/Gleich/ http://www.ebremer.com/

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54

    http://sixdegrees.hu/http://www2.research.att.com/~yifanhu/http://www.cise.ufl.edu/research/sparse/matrices/Gleich/http://www.ebremer.com/

  • Random Walks on Large Graphs

    Unified Medical Language System

    sources: http://sixdegrees.hu/ http://www2.research.att.com/˜yifanhu/http://www.cise.ufl.edu/research/sparse/matrices/Gleich/ http://www.ebremer.com/

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54

    http://sixdegrees.hu/http://www2.research.att.com/~yifanhu/http://www.cise.ufl.edu/research/sparse/matrices/Gleich/http://www.ebremer.com/

  • Random Walks on Large Graphs

    sources: http://sixdegrees.hu/ http://www2.research.att.com/˜yifanhu/http://www.cise.ufl.edu/research/sparse/matrices/Gleich/ http://www.ebremer.com/

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54

    http://sixdegrees.hu/http://www2.research.att.com/~yifanhu/http://www.cise.ufl.edu/research/sparse/matrices/Gleich/http://www.ebremer.com/

  • Text Understanding

    Understanding of broad language, what’s behind the surface strings

    Barcelona boss says that Jose Mourinho is’the best coach in the world’

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 4 / 54

  • Text Understanding

    Understanding of broad language, what’s behind the surface strings

    Barcelona boss says that Jose Mourinho is’the best coach in the world’

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 4 / 54

  • Text Understanding

    Understanding of broad language, what’s behind the surface strings

    Barcelona boss says that Jose Mourinho is’the best coach in the world’

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 4 / 54

  • Text Understanding

    Understanding of broad language, what’s behind the surface strings

    Barcelona boss says that Jose Mourinho is’the best coach in the world’

    End systems that we would like to build:natural dialogue, speech recognition, machine translationimproving parsing, semantic role labeling, information retrieval, questionanswering

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 4 / 54

  • Text Understanding

    From string to semantic representation (First Order Logic)

    Barcelona coach praises Jose Mourinho.

    Exist e1, x1, x2, x3 such thatFC Barcelona(x1) and coach:n:1(x2) andpraise:v:2(e1,x2,x3) and José Mourinho(x3)

    Disambiguation: Concepts, Entities and Semantic RolesQuantifiers, modality, negation, etc.

    Inference and Reasoning

    Barcelona coach praises Mourinho ∼ Guardiola honors Mourinho

    . . . with respect to some Knowledge Base

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 5 / 54

  • Text Understanding

    From string to semantic representation (First Order Logic)

    Barcelona coach praises Jose Mourinho.

    Exist e1, x1, x2, x3 such thatFC Barcelona(x1) and coach:n:1(x2) andpraise:v:2(e1,x2,x3) and José Mourinho(x3)

    Disambiguation: Concepts, Entities and Semantic RolesQuantifiers, modality, negation, etc.

    Inference and Reasoning

    Barcelona coach praises Mourinho ∼ Guardiola honors Mourinho

    . . . with respect to some Knowledge Base

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 5 / 54

  • Text Understanding:Knowledge Bases and Random Walks

    Focus on the following tasks on inference and disambiguation:Map words in context to KB concepts(Word Sense Disambiguation)Similarity between concepts and wordsSimilarity to improve ad-hoc information retrieval

    Applied to WordNet(s), UMLS, WikipediaExcellent resultsOpen source software and data: http://ixa2.si.ehu.es/ukb/

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 6 / 54

    http://ixa2.si.ehu.es/ukb/

  • Text Understanding:Knowledge Bases and Random Walks

    Focus on the following tasks on inference and disambiguation:Map words in context to KB concepts(Word Sense Disambiguation)Similarity between concepts and wordsSimilarity to improve ad-hoc information retrieval

    Applied to WordNet(s), UMLS, WikipediaExcellent resultsOpen source software and data: http://ixa2.si.ehu.es/ukb/

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 6 / 54

    http://ixa2.si.ehu.es/ukb/

  • Outline

    1 WordNet, PageRank and Personalized PageRank

    2 Random walks for WSD

    3 Adapting WSD to domains

    4 WSD on the biomedical domain

    5 Random walks for similarity

    6 Similarity and Information Retrieval

    7 Conclusions

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 7 / 54

  • WordNet, PageRank and Personalized PageRank

    Outline

    1 WordNet, PageRank and Personalized PageRank

    2 Random walks for WSD

    3 Adapting WSD to domains

    4 WSD on the biomedical domain

    5 Random walks for similarity

    6 Similarity and Information Retrieval

    7 Conclusions

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 8 / 54

  • WordNet, PageRank and Personalized PageRank

    Wordnet, Pagerank and Personalized PageRank(with Aitor Soroa)

    WordNet is the most widely used hierarchically organizedlexical database for English (Fellbaum, 1998)

    Broad coverage of nouns, verbs, adjectives, adverbs

    Main unit: synset (concept)

    coach#1, manager#3, handler#2someone in charge of training an athlete or a team.

    Relations between concepts:synonymy (built-in), hyperonymy, antonymy, meronymy, entailment,derivation, gloss

    Closely linked versions in several languages

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 9 / 54

  • WordNet, PageRank and Personalized PageRank

    Wordnet

    Example of hypernym relations:

    coach#1trainer

    leaderperson

    organism. . .

    entity

    synonyms: manager, handlergloss words (and synsets): charge, train (verb), athlete, teamhyponyms: baseball coach, basketball coach, conditioner, football coachinstance:John McGrawdomain: sport, athleticsderivation: coach (verb), managership, manage (verb), handle (verb)

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 10 / 54

  • WordNet, PageRank and Personalized PageRank

    Wordnet

    Representing WordNet as a graph:Nodes represent conceptsEdges represent relations (undirected)In addition, directed edges from words to corresponding concepts(senses)

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 11 / 54

  • WordNet, PageRank and Personalized PageRank

    Wordnet

    coach#n1

    managership#n3

    sport#n1

    trainer#n1

    handle#v6

    coach#n2

    teacher#n1

    tutorial#n1coach#n5

    public_transport#n1

    fleet#n2

    seat#n1

    holonym

    holonym

    hyperonym

    domain

    derivation

    hyperonym

    derivation

    hyperonym

    derivationcoach

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 12 / 54

  • WordNet, PageRank and Personalized PageRank

    Random Walks: PageRank

    Given a graph, ranks nodes according totheir relative structural importance

    If an edge from ni to nj exists, a vote from ni to nj is producedStrength depends on the rank of niThe more important ni is, the more strength its votes will have.

    PageRank is more commonly viewedas the result of a random walk process

    Rank of ni represents the probability of a random walkover the graph ending on ni, at a sufficiently large time.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 13 / 54

  • WordNet, PageRank and Personalized PageRank

    Random Walks: PageRank

    G: graph with N nodes n1, . . . , nNdi: outdegree of node iM: N × N matrix

    Mji =

    1di

    an edge from i to j exists

    0 otherwise

    PageRank equation:Pr = cMPr + (1 − c)v

    surfer follows edgessurfer randomly jumps to any node (teleport)

    c: damping factor: the way in which these two terms are combined

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54

  • WordNet, PageRank and Personalized PageRank

    Random Walks: PageRank

    G: graph with N nodes n1, . . . , nNdi: outdegree of node iM: N × N matrix

    Mji =

    1di

    an edge from i to j exists

    0 otherwise

    PageRank equation:Pr = cMPr + (1 − c)v

    surfer follows edgessurfer randomly jumps to any node (teleport)

    c: damping factor: the way in which these two terms are combined

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54

  • WordNet, PageRank and Personalized PageRank

    Random Walks: PageRank

    G: graph with N nodes n1, . . . , nNdi: outdegree of node iM: N × N matrix

    Mji =

    1di

    an edge from i to j exists

    0 otherwise

    PageRank equation:Pr = cMPr + (1 − c)v

    surfer follows edgessurfer randomly jumps to any node (teleport)

    c: damping factor: the way in which these two terms are combined

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54

  • WordNet, PageRank and Personalized PageRank

    Random Walks: PageRank

    G: graph with N nodes n1, . . . , nNdi: outdegree of node iM: N × N matrix

    Mji =

    1di

    an edge from i to j exists

    0 otherwise

    PageRank equation:Pr = cMPr + (1 − c)v

    surfer follows edgessurfer randomly jumps to any node (teleport)

    c: damping factor: the way in which these two terms are combined

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54

  • WordNet, PageRank and Personalized PageRank

    Random Walks: PageRank

    G: graph with N nodes n1, . . . , nNdi: outdegree of node iM: N × N matrix

    Mji =

    1di

    an edge from i to j exists

    0 otherwise

    PageRank equation:Pr = cMPr + (1 − c)v

    surfer follows edgessurfer randomly jumps to any node (teleport)

    c: damping factor: the way in which these two terms are combined

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54

  • WordNet, PageRank and Personalized PageRank

    Random Walks: Personalized PageRank

    Pr = cMPr + (1 − c)v

    PageRank: v is a stochastic normalized vector, with elements 1N

    Equal probabilities to all nodes in case of random jumps

    Personalized PageRank, non-uniform v (Haveliwala 2002)Assign stronger probabilities to certain kinds of nodesBias PageRank to prefer these nodes

    For ex. if we concentrate all mass on node iAll random jumps return to niRank of i will be highHigh rank of i will make all the nodes in its vicinity also receive a high rankImportance of node i given by the initial v spreads along the graph

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 15 / 54

  • WordNet, PageRank and Personalized PageRank

    Random Walks: Personalized PageRank

    Pr = cMPr + (1 − c)v

    PageRank: v is a stochastic normalized vector, with elements 1N

    Equal probabilities to all nodes in case of random jumps

    Personalized PageRank, non-uniform v (Haveliwala 2002)Assign stronger probabilities to certain kinds of nodesBias PageRank to prefer these nodes

    For ex. if we concentrate all mass on node iAll random jumps return to niRank of i will be highHigh rank of i will make all the nodes in its vicinity also receive a high rankImportance of node i given by the initial v spreads along the graph

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 15 / 54

  • WordNet, PageRank and Personalized PageRank

    Random Walks: Personalized PageRank

    Pr = cMPr + (1 − c)v

    PageRank: v is a stochastic normalized vector, with elements 1N

    Equal probabilities to all nodes in case of random jumps

    Personalized PageRank, non-uniform v (Haveliwala 2002)Assign stronger probabilities to certain kinds of nodesBias PageRank to prefer these nodes

    For ex. if we concentrate all mass on node iAll random jumps return to niRank of i will be highHigh rank of i will make all the nodes in its vicinity also receive a high rankImportance of node i given by the initial v spreads along the graph

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 15 / 54

  • Random walks for WSD

    Outline

    1 WordNet, PageRank and Personalized PageRank

    2 Random walks for WSD

    3 Adapting WSD to domains

    4 WSD on the biomedical domain

    5 Random walks for similarity

    6 Similarity and Information Retrieval

    7 Conclusions

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 16 / 54

  • Random walks for WSD

    Word Sense Disambiguation (WSD)

    Goal: determine senses of the open-class words in a text.“Nadal is sharing a house with his uncle and coach, Toni.”“Our fleet comprises coaches from 35 to 58 seats.”

    Knowledge Base (e.g. WordNet):coach#1 someone in charge of training an athlete or a team.coach#2 a person who gives private instruction (as in singing, acting, etc.).coach#3 a railcar where passengers ride.coach#4 a carriage pulled by four horses with one driver.coach#5 a vehicle carrying many passengers; used for public transport.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 17 / 54

  • Random walks for WSD

    Word Sense Disambiguation (WSD)

    Goal: determine senses of the open-class words in a text.“Nadal is sharing a house with his uncle and coach, Toni.”“Our fleet comprises coaches from 35 to 58 seats.”

    Knowledge Base (e.g. WordNet):coach#1 someone in charge of training an athlete or a team.coach#2 a person who gives private instruction (as in singing, acting, etc.).coach#3 a railcar where passengers ride.coach#4 a carriage pulled by four horses with one driver.coach#5 a vehicle carrying many passengers; used for public transport.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 17 / 54

  • Random walks for WSD

    Word Sense Disambiguation (WSD)

    Supervised corpus-based WSD performs bestTrain classifiers on hand-tagged data (typically SemCor)Data sparseness, e.g. coach 20 examples (20,0,0,0,0,0), bank 48 examples(25,20,2,1,0. . . )Results decrease when train/test from different sources (even Brown, BNC)Decrease even more when train/test from different domains

    Knowledge-based WSDUses information in a KB (WordNet)Performs close to but lower than Most Frequent Sense (MFS, supervised)Vocabulary coverageRelation coverage

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 18 / 54

  • Random walks for WSD

    Word Sense Disambiguation (WSD)

    Supervised corpus-based WSD performs bestTrain classifiers on hand-tagged data (typically SemCor)Data sparseness, e.g. coach 20 examples (20,0,0,0,0,0), bank 48 examples(25,20,2,1,0. . . )Results decrease when train/test from different sources (even Brown, BNC)Decrease even more when train/test from different domains

    Knowledge-based WSDUses information in a KB (WordNet)Performs close to but lower than Most Frequent Sense (MFS, supervised)Vocabulary coverageRelation coverage

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 18 / 54

  • Random walks for WSD

    Domain adaptation

    Deploying NLP techniques in real applications is challenging, specially forWSD:

    Sense distributions change across domainsData sparseness hurts moreContext overlap is reducedNew senses, new terms

    But. . .Some words get less interpretations in domains:bank in finance, coach in sports

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 19 / 54

  • Random walks for WSD

    Domain adaptation

    Deploying NLP techniques in real applications is challenging, specially forWSD:

    Sense distributions change across domainsData sparseness hurts moreContext overlap is reducedNew senses, new terms

    But. . .Some words get less interpretations in domains:bank in finance, coach in sports

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 19 / 54

  • Random walks for WSD

    Knowledge-based WSD using random walks(with Aitor Soroa, Oier Lopez de Lacalle)

    Use information in WordNet for disambiguation:“Our fleet comprises coaches from 35 to 58 seats.”

    Traditional approach (Patwardhan et al. 2007):Compare each target sense of coach with those of the words in the context(fleet, comprise, blue, seat)Using semantic relatedness between pairs of senses (6x4x3x8x9=5184)Alternative to comb. explosion, each word disambiguated individually(6x(4+3+8+9)=144)

    sim(coach#1,fleet#1) + sim(coach#1,fleet#2) + . . . sim(coach#1,seat#1) . . .sim(coach#2,fleet#1) + sim(coach#2,fleet#2) + . . . sim(coach#2,seat#1) . . .. . .

    Graph-based methodsExploit the structural properties of the graph underlying WordNetFind globally optimal solutionsDisambiguate large portions of text in one goPrincipled solution to combinatorial explosion

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 20 / 54

  • Random walks for WSD

    Knowledge-based WSD using random walks(with Aitor Soroa, Oier Lopez de Lacalle)

    Use information in WordNet for disambiguation:“Our fleet comprises coaches from 35 to 58 seats.”

    Traditional approach (Patwardhan et al. 2007):Compare each target sense of coach with those of the words in the context(fleet, comprise, blue, seat)Using semantic relatedness between pairs of senses (6x4x3x8x9=5184)Alternative to comb. explosion, each word disambiguated individually(6x(4+3+8+9)=144)

    sim(coach#1,fleet#1) + sim(coach#1,fleet#2) + . . . sim(coach#1,seat#1) . . .sim(coach#2,fleet#1) + sim(coach#2,fleet#2) + . . . sim(coach#2,seat#1) . . .. . .

    Graph-based methodsExploit the structural properties of the graph underlying WordNetFind globally optimal solutionsDisambiguate large portions of text in one goPrincipled solution to combinatorial explosion

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 20 / 54

  • Random walks for WSD

    Using PageRank for WSD

    Given a graph representation of the LKBPageRank over the whole WordNet would get a context-independentranking of word senses

    We would like:Given an input text, disambiguate all open-class words in the input taking therest as context

    Two alternatives1 Create a context-sensitive subgraph and apply PageRank over it (Navigli

    and Lapata, 2007; Agirre et al. 2008)2 Use Personalized PageRank over the complete graph, initializing v with the

    context words

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 21 / 54

  • Random walks for WSD

    Using PageRank for WSD

    Given a graph representation of the LKBPageRank over the whole WordNet would get a context-independentranking of word senses

    We would like:Given an input text, disambiguate all open-class words in the input taking therest as context

    Two alternatives1 Create a context-sensitive subgraph and apply PageRank over it (Navigli

    and Lapata, 2007; Agirre et al. 2008)2 Use Personalized PageRank over the complete graph, initializing v with the

    context words

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 21 / 54

  • Random walks for WSD

    Using Personalized PageRank (PPR)

    For each word Wi, i = 1 . . . m in the contextInitialize v with uniform probabilities over words WiContext words act as source nodesinjecting probability mass into the concept graph

    Run Personalized PageRank

    Choose highest ranking sense for target word

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 22 / 54

  • Random walks for WSD

    Using Personalized PageRank (PPR)

    coach#n1

    managership#n3

    sport#n1

    trainer#n1

    handle#n8

    coach#n2

    teacher#n1

    tutorial#n1

    coach#n5

    public_transport#n1

    fleet#n2

    seat#n1

    coach fleet comprise ... seat

    comprise#v1 ...

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 23 / 54

  • Random walks for WSD

    Experiment setting

    Two datasetsSenseval 2 All Words (S2AW)Senseval 3 All Words (S3AW)

    Both labelled with WordNet 1.7 tags

    Create input contexts of at least 20 wordsAdding sentences immediately before and after if original too short

    PageRank settings:Damping factor (c): 0.85End after 30 iterations

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 24 / 54

  • Random walks for WSD

    Results and comparison to related work (S2AW)

    Senseval-2 All Words datasetSystem All N V Adj. Adv.Mih05 54.2 57.5 36.5 56.7 70.9Sihna07 56.4 65.6 32.3 61.4 60.2Tsatsa07 49.2 – – – –PPR 58.6 70.4 38.9 58.3 70.1MFS 60.1 71.2 39.0 61.1 75.4

    (Mihalcea, 2005) Pairwise Lesk between senses, then PageRank.(Sinha & Mihalcea, 2007) Several similarity measures, voting, fine-tuning for

    each PoS. Development over S3AW.(Tsatsaronis et al., 2007) Subgraph BFS over WordNet 1.7 and eXtended WN,

    then spreading activation.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 25 / 54

  • Random walks for WSD

    Results and comparison to related work (S2AW)

    Senseval-2 All Words datasetSystem All N V Adj. Adv.Mih05 54.2 57.5 36.5 56.7 70.9Sihna07 56.4 65.6 32.3 61.4 60.2Tsatsa07 49.2 – – – –PPR 58.6 70.4 38.9 58.3 70.1MFS 60.1 71.2 39.0 61.1 75.4

    (Mihalcea, 2005) Pairwise Lesk between senses, then PageRank.(Sinha & Mihalcea, 2007) Several similarity measures, voting, fine-tuning for

    each PoS. Development over S3AW.(Tsatsaronis et al., 2007) Subgraph BFS over WordNet 1.7 and eXtended WN,

    then spreading activation.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 25 / 54

  • Random walks for WSD

    Comparison to related work (S3AW)

    System All N V Adj. Adv.Mih05 52.2 - - - -Sihna07 52.4 60.5 40.6 54.1 100.0Nav10 - 61.9 36.1 62.8 -PPR 57.4 64.1 46.9 62.6 92.9MFS 62.3 69.3 53.6 63.7 92.9

    (Mihalcea, 2005) Pairwise Lesk between senses, then PageRank.(Sinha & Mihalcea, 2007) Several simmilarity measures, voting, fine-tuning for

    each PoS. Development over S3AW.(Navigli & Lapata, 2010) Subgraph DFS(3) over WordNet 2.0 plus proprietary

    relations, several centrality algorithms.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 26 / 54

  • Random walks for WSD

    Comparison to related work (S3AW)

    System All N V Adj. Adv.Mih05 52.2 - - - -Sihna07 52.4 60.5 40.6 54.1 100.0Nav10 - 61.9 36.1 62.8 -PPR 57.4 64.1 46.9 62.6 92.9MFS 62.3 69.3 53.6 63.7 92.9

    (Mihalcea, 2005) Pairwise Lesk between senses, then PageRank.(Sinha & Mihalcea, 2007) Several simmilarity measures, voting, fine-tuning for

    each PoS. Development over S3AW.(Navigli & Lapata, 2010) Subgraph DFS(3) over WordNet 2.0 plus proprietary

    relations, several centrality algorithms.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 26 / 54

  • Adapting WSD to domains

    Outline

    1 WordNet, PageRank and Personalized PageRank

    2 Random walks for WSD

    3 Adapting WSD to domains

    4 WSD on the biomedical domain

    5 Random walks for similarity

    6 Similarity and Information Retrieval

    7 Conclusions

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 27 / 54

  • Adapting WSD to domains

    Adapting WSD to domains

    How could we improve WSD performance without tagging new data fromdomain or adapting WordNet manually to the domain?

    What would happen if we apply PPR-based WSD to specific domains?

    Personalized PageRank over context“. . . has never won a league title as coach but took Parma tosuccess. . . ”

    Personalized PageRank over related wordsGet related words from distributional thesauruscoach: manager, captain, player, team, striker, . . .

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 28 / 54

  • Adapting WSD to domains

    Adapting WSD to domains

    How could we improve WSD performance without tagging new data fromdomain or adapting WordNet manually to the domain?

    What would happen if we apply PPR-based WSD to specific domains?

    Personalized PageRank over context“. . . has never won a league title as coach but took Parma tosuccess. . . ”

    Personalized PageRank over related wordsGet related words from distributional thesauruscoach: manager, captain, player, team, striker, . . .

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 28 / 54

  • Adapting WSD to domains

    Adapting WSD to domains

    How could we improve WSD performance without tagging new data fromdomain or adapting WordNet manually to the domain?

    What would happen if we apply PPR-based WSD to specific domains?

    Personalized PageRank over context“. . . has never won a league title as coach but took Parma tosuccess. . . ”

    Personalized PageRank over related wordsGet related words from distributional thesauruscoach: manager, captain, player, team, striker, . . .

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 28 / 54

  • Adapting WSD to domains

    Experiments

    Dataset with examples from BNC, Sports and Finance sections Reuters(Koeling et al. 2005)

    41 nouns: salient in either domain or with senses linked to these domainsSense inventory: WordNet v. 1.7.1

    300 examples for each of the 41 nounsRoughly 100 examples from each word and corpus

    ExperimentsSupervised: train MFS, SVM, k-NN on SemCor examplesPersonalized PageRank (same damping factors, iterations)

    Use context50 related words (Koeling et al. 2005) (BNC, Sports, Finance)

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 29 / 54

  • Adapting WSD to domains

    Experiments

    Dataset with examples from BNC, Sports and Finance sections Reuters(Koeling et al. 2005)

    41 nouns: salient in either domain or with senses linked to these domainsSense inventory: WordNet v. 1.7.1

    300 examples for each of the 41 nounsRoughly 100 examples from each word and corpus

    ExperimentsSupervised: train MFS, SVM, k-NN on SemCor examplesPersonalized PageRank (same damping factors, iterations)

    Use context50 related words (Koeling et al. 2005) (BNC, Sports, Finance)

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 29 / 54

  • Adapting WSD to domains

    Results

    Systems BNC Sports FinancesBaselines Random ∗19.7 ∗19.2 ∗19.5

    SemCor MFS ∗34.9 ∗19.6 ∗37.1Supervised SVM ∗38.7 ∗25.3 ∗38.7

    k-NN 42.8 ∗30.3 ∗43.4Context PPR 43.8 ∗35.6 ∗46.9Related PPR ∗37.7 51.5 59.3words (Koeling et al. 2005) ∗40.7 ∗43.3 ∗49.7

    Skyline Test MFS ∗52.0 ∗77.8 ∗82.3

    Supervised (MFS, SVM, k-NN) very low (see test MFS)PPR on context: best for BNC (* for statistical significance)PPR on related words: best for Sports and Finance and improves overKoeling et al., who use pairwise WordNet similarity.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 30 / 54

  • WSD on the biomedical domain

    Outline

    1 WordNet, PageRank and Personalized PageRank

    2 Random walks for WSD

    3 Adapting WSD to domains

    4 WSD on the biomedical domain

    5 Random walks for similarity

    6 Similarity and Information Retrieval

    7 Conclusions

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 31 / 54

  • WSD on the biomedical domain

    UMLS and biomedical text(with Aitor Soroa and Mark Stevenson)

    Ambiguities believed not to occur on specific domainsOn the Use of Cold Water as a Powerful Remedial Agent in ChronicDisease.Intranasal ipratropium bromide for the common cold.

    11.7% of the phrases in abstracts added to MEDLINE in 1998 wereambiguous (Weeber et al. 2011)

    Unified Medical Language System (UMLS) MetathesaurusConcept Unique Identifiers (CUIs)

    C0234192: Cold (Cold Sensation) [Physiologic Function]C0009264: Cold (cold temperature) [Natural Phenomenon or Process]C0009443: Cold (Common Cold) [Disease or Syndrome]

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 32 / 54

  • WSD on the biomedical domain

    UMLS and biomedical text(with Aitor Soroa and Mark Stevenson)

    Ambiguities believed not to occur on specific domainsOn the Use of Cold Water as a Powerful Remedial Agent in ChronicDisease.Intranasal ipratropium bromide for the common cold.

    11.7% of the phrases in abstracts added to MEDLINE in 1998 wereambiguous (Weeber et al. 2011)

    Unified Medical Language System (UMLS) MetathesaurusConcept Unique Identifiers (CUIs)

    C0234192: Cold (Cold Sensation) [Physiologic Function]C0009264: Cold (cold temperature) [Natural Phenomenon or Process]C0009443: Cold (Common Cold) [Disease or Syndrome]

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 32 / 54

  • WSD on the biomedical domain

    WSD and biomedical text

    Thesaurus in Metathesaurus:Alcohol and other drugs, Medical Subject Headings, Crisp Thesaurus, SNOMEDClinical Terms, etc.Relations in the Metathesaurus between CUIs:parent, can be qualified by, related possibly sinonymous, related other

    We applied random walks over a graph of CUIs.Evaluated on NLM-WSD, 50 ambiguous terms (100 instances each)

    KB #CUIs #relations Acc. TermsAOD 15,901 58,998 51.5 4MSH 278,297 1,098,547 44.7 9CSP 16,703 73,200 60.2 3SNOMEDCT 304,443 1,237,571 62.5 29all above 572,105 2,433,324 64.4 48all relations - 5,352,190 68.1 50combined with cooc. - - 73.7 50(Jimeno and Aronson, 2011) - - 68.4 50

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 33 / 54

  • WSD on the biomedical domain

    WSD and biomedical text

    Thesaurus in Metathesaurus:Alcohol and other drugs, Medical Subject Headings, Crisp Thesaurus, SNOMEDClinical Terms, etc.Relations in the Metathesaurus between CUIs:parent, can be qualified by, related possibly sinonymous, related other

    We applied random walks over a graph of CUIs.Evaluated on NLM-WSD, 50 ambiguous terms (100 instances each)

    KB #CUIs #relations Acc. TermsAOD 15,901 58,998 51.5 4MSH 278,297 1,098,547 44.7 9CSP 16,703 73,200 60.2 3SNOMEDCT 304,443 1,237,571 62.5 29all above 572,105 2,433,324 64.4 48all relations - 5,352,190 68.1 50combined with cooc. - - 73.7 50(Jimeno and Aronson, 2011) - - 68.4 50

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 33 / 54

  • Random walks for similarity

    Outline

    1 WordNet, PageRank and Personalized PageRank

    2 Random walks for WSD

    3 Adapting WSD to domains

    4 WSD on the biomedical domain

    5 Random walks for similarity

    6 Similarity and Information Retrieval

    7 Conclusions

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 34 / 54

  • Random walks for similarity

    Similarity

    Given two words or multiword-expressions,estimate how similar they are.

    cord smilegem jewel

    magician oracle

    Features shared, belonging to the same class

    Relatedness is a more general relationship,including other relations like topical relatedness or meronymy.

    king cabbagemovie star

    journey voyage

    Typically implemented as calculating a numeric value ofsimilarity/relatedness.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 35 / 54

  • Random walks for similarity

    Similarity

    Given two words or multiword-expressions,estimate how similar they are.

    cord smilegem jewel

    magician oracle

    Features shared, belonging to the same class

    Relatedness is a more general relationship,including other relations like topical relatedness or meronymy.

    king cabbagemovie star

    journey voyage

    Typically implemented as calculating a numeric value ofsimilarity/relatedness.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 35 / 54

  • Random walks for similarity

    Similarity

    Given two words or multiword-expressions,estimate how similar they are.

    cord smilegem jewel

    magician oracle

    Features shared, belonging to the same class

    Relatedness is a more general relationship,including other relations like topical relatedness or meronymy.

    king cabbagemovie star

    journey voyage

    Typically implemented as calculating a numeric value ofsimilarity/relatedness.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 35 / 54

  • Random walks for similarity

    Similarity and WSD

    gem jewelmovie star

    Both WSD and Similarity are closely intertwined:

    Similarity between words based onsimilarity between senses (implicitly doing disambiguation)

    WSD uses similarity between senses in context

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 36 / 54

  • Random walks for similarity

    Similarity examples

    RG dataset WordSim353 datasetcord smile 0.02 king cabbage 0.23

    rooster voyage 0.04 professor cucumber 0.31. . . . . .

    glass jewel 1.78 investigation effort 4.59magician oracle 1.82 movie star 7.38

    . . . . . .cemetery graveyard 3.88 journey voyage 9.29

    automobile car 3.92 midday noon 9.29midday noon 3.94 tiger tiger 10.00

    80 pairs, 51 subjects 353 pairs, 16 subjectsSimilarity Similarity and relatedness

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 37 / 54

  • Random walks for similarity

    Similarity

    Two main approaches:Knowledge-based (Roget’s Thesaurus, WordNet, etc.)Corpus-based, also known as distributional similarity (co-occurrences)

    Many potential applications:Overcome brittleness (word match)NLP subtasks (parsing, semantic role labeling)Information retrievalQuestion answeringSummarizationMachine translation optimization and evaluationInference (textual entailment)

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 38 / 54

  • Random walks for similarity

    Similarity

    Two main approaches:Knowledge-based (Roget’s Thesaurus, WordNet, etc.)Corpus-based, also known as distributional similarity (co-occurrences)

    Many potential applications:Overcome brittleness (word match)NLP subtasks (parsing, semantic role labeling)Information retrievalQuestion answeringSummarizationMachine translation optimization and evaluationInference (textual entailment)

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 38 / 54

  • Random walks for similarity

    Random walks for similarity(with Aitor Soroa, Montse Cuadros, German Rigau)

    Based on (Hughes and Ramage, 2007)Given a pair of words (w1, w2),

    Initialize teleport probability mass on w1Run Personalized Pagerank, obtaining ~w1Initialize w2 and obtain ~w2Measure similarity between ~w1 and ~w2 (e.g. cosine)

    Experiment settings:Damping value c = 0.85Calculations finish after 30 iterations

    Variations for Knowledge Base:WordNet 3.0WordNet relationsGloss relationsother relations

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 39 / 54

  • Random walks for similarity

    Random walks for similarity(with Aitor Soroa, Montse Cuadros, German Rigau)

    Based on (Hughes and Ramage, 2007)Given a pair of words (w1, w2),

    Initialize teleport probability mass on w1Run Personalized Pagerank, obtaining ~w1Initialize w2 and obtain ~w2Measure similarity between ~w1 and ~w2 (e.g. cosine)

    Experiment settings:Damping value c = 0.85Calculations finish after 30 iterations

    Variations for Knowledge Base:WordNet 3.0WordNet relationsGloss relationsother relations

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 39 / 54

  • Random walks for similarity

    Dataset and results

    Method Source Spearman(Gabrilovich and Markovitch, 2007) Wikipedia 0.75WordNet 3.0 + Knownets WordNet 0.71WordNet 3.0 + glosses WordNet 0.68(Agirre et al. 2009) Corpora 0.66(Finkelstein et al. 2007) LSA 0.56(Hughes and Ramage, 2007) WordNet 0.55(Jarmasz 2003) WordNet 0.35

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 40 / 54

  • Similarity and Information Retrieval

    Outline

    1 WordNet, PageRank and Personalized PageRank

    2 Random walks for WSD

    3 Adapting WSD to domains

    4 WSD on the biomedical domain

    5 Random walks for similarity

    6 Similarity and Information Retrieval

    7 Conclusions

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 41 / 54

  • Similarity and Information Retrieval

    Similarity and Information Retrieval(with Arantxa Otegi and Xabier Arregi)

    Document expansion (aka clustering and smoothing) has been shown tobe successful in ad-hoc IR

    Use WordNet and similarity to expand documents

    Example:I can’t install DSL because of the antivirus program, any hints?You should turn off virus and anti-spy software. And thats done within eachof the softwares themselves. Then turn them back on later after setting upany DSL softwares.

    Method:Initialize random walk with document wordsRetrieve top k synsetsIntroduce words on those k synsets in a secondary indexWhen retrieving, use both primary and secondary indexes

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 42 / 54

  • Similarity and Information Retrieval

    Example

    You should turn off virus and anti-spy software. And thats done within each of thesoftwares themselves. Then turn them back on later after setting up any DSLsoftwares.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 43 / 54

  • Similarity and Information Retrieval

    Example

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 44 / 54

  • Similarity and Information Retrieval

    Example

    I can’t install DSL because of the antivirus program, any hints?

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 45 / 54

  • Similarity and Information Retrieval

    Experiments

    BM25 ranking functionCombine 2 indexes: original words and expansion termsParameters: k1, b (BM25) λ (indices) k (concepts in expansion)

    Three collections:Robust at CLEF 2009Yahoo Answer!RespubliQA (IR for QA)

    Summary of results:Default parameters: 1.43% - 4.90% improvement in all 3 datasetsOptimized parameters: 0.98% - 2.20% improvement in 2 datasetsCarrying parameters: 5.77% - 19.77% improvement in 4 out of 6

    RobustnessParticularly on short documents

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 46 / 54

  • Similarity and Information Retrieval

    Experiments

    BM25 ranking functionCombine 2 indexes: original words and expansion termsParameters: k1, b (BM25) λ (indices) k (concepts in expansion)

    Three collections:Robust at CLEF 2009Yahoo Answer!RespubliQA (IR for QA)

    Summary of results:Default parameters: 1.43% - 4.90% improvement in all 3 datasetsOptimized parameters: 0.98% - 2.20% improvement in 2 datasetsCarrying parameters: 5.77% - 19.77% improvement in 4 out of 6

    RobustnessParticularly on short documents

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 46 / 54

  • Similarity and Information Retrieval

    Experiments

    BM25 ranking functionCombine 2 indexes: original words and expansion termsParameters: k1, b (BM25) λ (indices) k (concepts in expansion)

    Three collections:Robust at CLEF 2009Yahoo Answer!RespubliQA (IR for QA)

    Summary of results:Default parameters: 1.43% - 4.90% improvement in all 3 datasetsOptimized parameters: 0.98% - 2.20% improvement in 2 datasetsCarrying parameters: 5.77% - 19.77% improvement in 4 out of 6

    RobustnessParticularly on short documents

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 46 / 54

  • Conclusions

    Outline

    1 WordNet, PageRank and Personalized PageRank

    2 Random walks for WSD

    3 Adapting WSD to domains

    4 WSD on the biomedical domain

    5 Random walks for similarity

    6 Similarity and Information Retrieval

    7 Conclusions

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 47 / 54

  • Conclusions

    Conclusions

    Knowledge-based method for similarity and WSDBased on random walksExploits whole structure of underlying KB efficiently

    Performance:Similarity: best KB algorithm, comparable with 1.6 Tword, slightly below ESAWSD: Best KB algorithm S2AW, S3AW, Domains datasetsWSD and domains:

    Better than supervised WSD when adapting to domains (Sports, Finance)Best KB algorithm in Biomedical texts

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 48 / 54

  • Conclusions

    Conclusions

    Knowledge-based method for similarity and WSDBased on random walksExploits whole structure of underlying KB efficiently

    Performance:Similarity: best KB algorithm, comparable with 1.6 Tword, slightly below ESAWSD: Best KB algorithm S2AW, S3AW, Domains datasetsWSD and domains:

    Better than supervised WSD when adapting to domains (Sports, Finance)Best KB algorithm in Biomedical texts

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 48 / 54

  • Conclusions

    Conclusions

    Applications: ad-hoc Information Retrievalperformance gains and robustness

    Easily ported to other languagesProvides cross-lingual similarityOnly requirement of having a WordNet

    Publicly available at http://ixa2.si.ehu.es/ukbBoth programs and data (WordNet, UMLS)Including program to construct graphs from new KB (e.g. Wikipedia)GPL license, open source, free

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 49 / 54

    http://ixa2.si.ehu.es/ukb

  • Conclusions

    Conclusions

    Applications: ad-hoc Information Retrievalperformance gains and robustness

    Easily ported to other languagesProvides cross-lingual similarityOnly requirement of having a WordNet

    Publicly available at http://ixa2.si.ehu.es/ukbBoth programs and data (WordNet, UMLS)Including program to construct graphs from new KB (e.g. Wikipedia)GPL license, open source, free

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 49 / 54

    http://ixa2.si.ehu.es/ukb

  • Conclusions

    Conclusions

    Applications: ad-hoc Information Retrievalperformance gains and robustness

    Easily ported to other languagesProvides cross-lingual similarityOnly requirement of having a WordNet

    Publicly available at http://ixa2.si.ehu.es/ukbBoth programs and data (WordNet, UMLS)Including program to construct graphs from new KB (e.g. Wikipedia)GPL license, open source, free

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 49 / 54

    http://ixa2.si.ehu.es/ukb

  • Conclusions

    Future work

    Domains and WSD: interrelation of domain classification and WSD

    Named Entity Disambiguation using Wikipedia

    Information Retrieval: other options to combine similarity information

    Moving to sentence similarity: Semeval task on Semantic Text Similarityhttp://www.cs.york.ac.uk/semeval-2012/task6/

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 50 / 54

    http://www.cs.york.ac.uk/semeval-2012/task6/

  • Conclusions

    Understanding Text withKnowledge-Bases and Random Walks

    Eneko Agirreixa2.si.ehu.es/eneko

    IXA NLP GroupUniversity of the Basque Country

    MAVIR, 2011

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 51 / 54

  • Conclusions

    References I

    Agirre, E., Arregi, X. and Otegi, A. (2010).Document Expansion Based on WordNet for Robust IR.In Proceedings of the 23rd International Conference on ComputationalLinguistics (Coling) pp. 9–17,.

    Agirre, E., de Lacalle, O. L. and Soroa, A. (2009).Knowledge-Based WSD on Specific Domains: Performing better thanGeneric Supervised WSD.In Proceedings of IJCAI, Pasadena, USA.

    Agirre, E. and Soroa, A. (2009).Personalizing PageRank for Word Sense Disambiguation.In Proceedings of EACL-09, Athens, Greece.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 52 / 54

  • Conclusions

    References II

    Agirre, E., Soroa, A., Alfonseca, E., Hall, K., Kravalova, J. and Pasca, M.(2009).A Study on Similarity and Relatedness Using Distributional andWordNet-based Approaches.In Proceedings of annual meeting of the North American Chapter of theAssociation of Computational Linguistics (NAAC), Boulder, USA.

    Agirre, E., Soroa, A. and Stevenson, M. (2010).Graph-based Word Sense Disambiguation of Biomedical Documents.Bioinformatics 26, 2889–2896.

    Eneko Agirre, Montse Cuadros, G. R. and Soroa, A. (2010).Exploring Knowledge Bases for Similarity.In Proceedings of the Seventh conference on International LanguageResources and Evaluation (LREC’10), (Nicoletta Calzolari(Conference Chair), Khalid Choukri, B. M. J. M. J. O. S. P. M. R. D. T.,ed.), pp. 373–377, European Language Resources Association (ELRA),Valletta, Malta.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 53 / 54

  • Conclusions

    References III

    Otegi, A., Arregi, X. and Agirre, E. (2011).Query Expansion for IR using Knowledge-Based Relatedness.In Proceedings of the International Joint Conference on Natural LanguageProcessing.

    Stevenson, M., Agirre, E. and Soroa, A. (2011).Exploiting Domain Information for Word Sense Disambiguation of MedicalDocuments.Journal of the American Medical Informatics Association In press, 1–6.

    Yeh, E., Ramage, D., Manning, C., Agirre, E. and Soroa, A. (2009).WikiWalk: Random walks on Wikipedia for Semantic Relatedness.In ACL workshop ”TextGraphs-4: Graph-based Methods for NaturalLanguage Processing.

    Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 54 / 54

    WordNet, PageRank and Personalized PageRankRandom walks for WSDAdapting WSD to domainsWSD on the biomedical domainRandom walks for similaritySimilarity and Information RetrievalConclusions