Babelfy: Entity Linking meets Word Sense Disambiguation.

36
Entity Linking meets Word Sense Disambiguation: a Unified Approach Paper by: Andrea Moro, Alessandro Raganato, Roberto Navigli Dipartimento di Informatica,Sapienza Universita di Roma Presentation by: Antonio Quirós Grupo LaBDA (Laboratorio de Bases de Datos Avanzadas) Universidad Carlos III de Madrid

Transcript of Babelfy: Entity Linking meets Word Sense Disambiguation.

Entity Linking meets Word Sense Disambiguation: a Unified Approach

Paper by: Andrea Moro, Alessandro Raganato, Roberto NavigliDipartimento di Informatica,Sapienza Universita di Roma

Presentation by: Antonio QuirósGrupo LaBDA (Laboratorio de Bases de Datos Avanzadas)

Universidad Carlos III de Madrid

Babelfy is a unified, multilingual, graph-based approach to Entity Linking and Word Sense Disambiguation based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations.

Babelfy is based on the BabelNet 3.0 multilingual semantic network and jointly performs disambiguation and entity linking.

Entity Linking: Discovering mentions of entities within a text and linking them in a Knowledge Base.

Word Sense Disambiguation: Assigning meanings to word occurrencies within a text.

Babelfy combine Entity Linking and Word Sense Disambiguation.

EL & WSD

- Unlike WSD, Babelfy allows overlapping fragments of text

ie: “Major League Baseball”

It identifies and disambiguate several nominal and entity mentions:

“Major League Baseball” - “Major League” - “League” - “Baseball”

- Unlike EL, it links not only Named Entity Mentions (“Major League Baseball”) but also nominal mentions (“Major League”) to their corresponding meaning in the Knowledge Base.

Babelfy approach in three steps:

One: Associate each vertex of the Semantic Network with a Semantic Signature.

Two: Given an input text, extract all the linkable fragments and for each fragment list the possible meanings according to the Semantic Network.

Three: Create a graph-based semantic interpretation of the whole text by linking the candidate meanings of the fragments using the Semantic Signatures created in the first step, and then, extract a dense subgraph of this representation and select the best candidate meaning for each fragment.

Highly related

verticesPerformed only once

Ei th er concept or named en ti ty

Novel approach !!

Step One: (Creating the Semantic Signatures)

Assign higher weight to edges which are involved in more densely connected areas.

This is accomplished by using “Directed Triangles” (Cycles of lenght 3) and weight by the number of triangles they occur in.

Step One: (Creating the Semantic Signatures)

Football

weight(v, v') := |{(v, v', v'') : (v, v'), (v', v''), (v'', v) ∈ E}|+1

Ball Basketall

Field

Sports Court

Step One: (Creating the Semantic Signatures)

weight(Football, Sports) = | ( (Football, Sports) , (Football, Ball) , (Sports, Ball) ) , ( (Football, Sports) , (Football, Field) , (Sports, Field) ) | = 2 + 1 = 3

FootballBall

Field

Sports Court

Basketall

Step One: (Creating the Semantic Signatures)

2

FootballBall Basketall

Field

Sports Court

2

2 2

2

23 3 3

Step One: (Creating the Semantic Signatures)

After assigning weights to each edge, perform a Random Walk with Restart to create the Semantic Signature: a set of highly related vertices.

For a fixed number of steps, run a RWR from every vertex v of the Semantic Network, keep track of the encountered vertices; eliminate weakly related vertices, keeping only those items that were hit at least η times.

Finally return the remaining vertices as SemSignv: the Semantic

Signature of v.

Step One: (Creating the Semantic Signatures)

1: input: v, the starting vertex; , the restart probability; αn, the number of steps to be executed; P, the transition probabilities; , the frequency threshold.η

2: output: semSignv, set of related vertices for v. 3: function RWR(v, , n,P, ) α η4: v' := v5: counts := newMap < Synset, Integer >6: while n > 0 do7: if random() > α then8: given the transition probabilities P(·|v')9: of v', choose a random neighbor v''10: v' := v''11: counts[v']++12: else13: restart the walk14: v' := v15: n := n 1−16: for each v' in counts.keys() do17: if counts[v'] < η then18: remove v' from counts.keys() 19: return semSignv = counts.keys()

P(v ' | v) = weight(v, v ') ∑ weight(v, v '')

v '' V∈

Step Two: (Candidate Identification)

Using part-of-speech tagging, identify the set F of all textual fragments which contains at least one noun and are substring of lexicalizations in BabelNet.

For each f F look for candidates meanings -∈ cand(f)-: vertices containing f or, only for named entities, a superstring of f as their lexicalization.

Babelfy uses a loose candidate identification based on superstring matching, instead of exact matching.

Step Two: (Candidate Identification)

example:

Word: Sports

Candidates: SportsWater sports...Skateboarding {…, Extreme Sports, …}...

Vertices con taining f

Vertices having a superstring of f as one of i ts

lexicalization (Senses)

Step Three: (Candidate Disambiguation)

Create a directed graph GI = (V

I, E

I) of the Semantic Interpretations of

the input text.

VI : Contains all candidate meanings of all fragments

VI := {(v, f) : v ∈ cand(f), f F}∈

EI: Connect two candidate meanings of different fragments if one is in

the semantic signature of the other.

Add an edge from (v, f) to (v', f') iff f ≠ f' and v' semSign∈v

Step Three: (Candidate Disambiguation)

Once created GI (The graph representation of all the possible

interpretations) then apply densest subgraph heuristics.

After that, the result is a sub-graph which contains those semantic interpretations that are most coherent to each other. But this sub-graph might still containt multiple interpretations for the same fragment.

So, the final step is to select the most suitable candidate meaning for each fragment f given a threshold to discard semantically unrelated candidate meanings.

Step Three: (Candidate Disambiguation)

1: input: F, the fragments in the input text; semSign, the semantic signatures; µ, ambiguity level to be reached; cand, fragments to candidate meanings.

2: output: selected, disambiguated fragments. 3: function DISAMB(F,semSign, µ, cand) 4: VI := ;EI := ∅ ∅5: GI := (VI,EI) 6: for each fragment f F ∈ do7: for each candidate v cand(f) ∈ do8: VI := VI {(v, f)}∪9: for each ((v, f), (v', f')) VI × VI ∈ do10: if f ≠ f' and v' semSignv ∈ then11: EI := EI {((v, f), (v', f'))}∪12: G*I := DENSSUB(F, cand, GI, µ)13: selected := newMap < String,Synset >14: for each f F s.t. (v, f) V*I ∈ ∃ ∈ do15: cand*(f) := {v : (v, f) V*I }∈16: v* := argmax

v cand*(f)∈ score((v, f))

17: if score((v*, f)) ≥ θ then18: selected(f) := v* 19: return selected

Function with th e novel approach!!

Step Three: (Candidate Disambiguation)

Let's see an example:

“The leaf is falling from the tree on my head”

- Leaf has many candidate meanings.- falling also has many candidate meanings.- tree also has many candidate meanings.

And, as you might have guessed... - Head also has many candidate meanings.

Step Three: “The leaf is falling from the tree on my head”

Music, Disc, Record, Rock( Tree (Álbum), tree )

Thoughts, Feelings, Reason( Mind, head )

Body, Anatomy, Falling (Accident)( Head, head )

Guide, Group, Team, Boss( Leader, head )

Book, Text, Paragraph, Novel( Header, head )

cand(f)SemSign

v

Physics, Descend, Sky, High( Fall, falling )

Music, Alicia Keys, Album( Falling (Song), falling )

Pain, Hit, Push, Trauma( Falling (Accident), falling )

Action, Hollywood, Cinema( Falling (Movie), falling )

Nature, Fall, Earth, Oxygen, Leaf( Tree, tree )

Leaf, Storage, Father, Son, Binary( Tree (Data Structure), tree )

Node, Euler, Binary, Math, Path( Tree (Graph Theory), tree )

Fall, Woods, Tree, Forest, Flora, Fall( Leaf, leaf )

Text, Side, Right, Left, Book, Novel( Leaf (Book), leaf )

Car, Motor, Vehicle, Japan, Tree( Nissan Leaf, leaf )

Games, Visual Novel, Publisher( Leaf (Japanese Co.), leaf )

Music, Pop, Dutch, Falling (Song)( Leaf (Band), leaf )

(Generate a graph representation with all possible meanings)

Step Three: (Candidate Disambiguation)

Following the algorithm, create an edge between two vertex if and only if they do not belong to the same frangment and one is part of the Semantic Signature of the other.

Step Three: “The leaf is falling from the tree on my head”

Music, Disc, Record, Rock( Tree (Álbum), tree )

Thoughts, Feelings, Reason( Mind, head )

Body, Anatomy, Falling (Accident)( Head, head )

Guide, Group, Team, Boss( Leader, head )

Book, Text, Paragraph, Novel( Header, head )

cand(f)SemSign

v

Physics, Descend, Sky, High( Fall, falling )

Music, Alicia Keys, Album( Falling (Song), falling )

Pain, Hit, Push, Trauma( Falling (Accident), falling )

Action, Hollywood, Cinema( Falling (Movie), falling )

Nature, Fall, Earth, Oxygen, Leaf( Tree, tree )

Leaf, Storage, Father, Son, Binary( Tree (Data Structure), tree )

Node, Euler, Binary, Math, Path( Tree (Graph Theory), tree )

Fall, Woods, Tree, Forest, Flora, Fall( Leaf, leaf )

Text, Side, Right, Left, Book, Novel( Leaf (Book), leaf )

Car, Motor, Vehicle, Japan, Tree( Nissan Leaf, leaf )

Games, Visual Novel, Publisher( Leaf (Japanese Co.), leaf )

Music, Pop, Dutch, Falling (Song)( Leaf (Band), leaf )

(Generate a graph representation with all possible meanings)

Step Three:Apply densest sub-graph heuristics to obtain a sub-graph which contains those semantic interpretations that are most coherent to each other

DENSSUB(F, cand, GI, µ)

We 'll come back to i t later...

Step Three: “The leaf is falling from the tree on my head”

Body, Anatomy, Falling (Accident)( Head, head )

Book, Text, Paragraph, Novel( Header, head )

cand(f)SemSign

v

Physics, Descend, Sky, High( Fall, falling )

Pain, Hit, Push, Trauma, Tree( Falling (Accident), falling )

Nature, Root, Earth, Oxygen, Fall( Tree, tree )

Leaf, Storage, Father, Son, Binary( Tree (Data Structure), tree )

Fall, Woods, Tree, Forest, Flora, Fall( Leaf, leaf )

Music, Pop, Dutch, Falling (Song)( Leaf (Band), leaf )

(Generate a graph representation with all possible meanings)

Let's assume this is th e output of th e blackbox

Step Three: Then we have to select the most suitable candidate meaning for each fragment f.

We use a given threshold θ to discard semantically unrealted candidates.

For each fragment f, we compute the score of each candidate for that fragment and keep those candidates which score is higher than θ.

score((v, f)) = w(v,f) · deg((v, f))

∑ w(v',f) · deg((v', f)) v' cand(f)∈

w(v,f) := |{f' F :∈ v' s.t. ((v, f),(v', f')) or ((v', f'),(v, f)) E∃ ∈I}|

|F| 1−

deg(v) is the overall number of incoming and outgoing edgesdeg(v) := deg+(v)+deg (v)−

Step Three: In other words: We compute the score for each meaning by calculating it's normalized weighted degree.

Calculate the weight for the meaning, multiply it by it's degree and divide it by the sumatory of all scores of the candidates for that fragment.

The weight is calculated as the fraction of fragments the candidate meaning v connects to. In other words, count the number of fragments the vertex v connects to and divide it by the number of fragments minus one.

Fragments, not vertex. In oth er words, if th e

vertex v connects to v ' and v '' and th ey both

belong to th e same fragment, th ey count

as

one

Step Three: “The leaf is falling from the tree on my head”

Body, Anatomy, Falling (Accident)( Head, head )

Book, Text, Paragraph, Novel( Header, head )

cand(f)SemSign

v

Physics, Descend, Sky, High( Fall, falling )

Pain, Hit, Push, Trauma, Tree( Falling (Accident), falling )

Nature, Root, Earth, Oxygen, Fall( Tree, tree )

Leaf, Storage, Father, Son, Binary( Tree (Data Structure), tree )

Fall, Woods, Tree, Forest, Flora, Fall( Leaf, leaf )

Music, Pop, Dutch, Falling (Song)( Leaf (Band), leaf )

(Generate a graph representation with all possible meanings)

Let's compute the weight of (Leaf, leaf)The number of fragments “(Leafl, leaf)” is linked to, divided by the number of fragments minus one:

w((Leaf, leaf)) = |{Fall, Tree}| = 2 4 – 1 3

Step Three: “The leaf is falling from the tree on my head”

Body, Anatomy, Falling (Accident)( Head, head )

Book, Text, Paragraph, Novel( Header, head )

cand(f)SemSign

v

Physics, Descend, Sky, High( Fall, falling )

Pain, Hit, Push, Trauma, Tree( Falling (Accident), falling )

Nature, Root, Earth, Oxygen, Fall( Tree, tree )

Leaf, Storage, Father, Son, Binary( Tree (Data Structure), tree )

Fall, Woods, Tree, Forest, Flora, Fall( Leaf, leaf )

Music, Pop, Dutch, Falling (Song)( Leaf (Band), leaf )

(Generate a graph representation with all possible meanings)

And the degree of (Leaf, leaf) is the number of incomming and outgoing edges:

deg((Leaf, leaf)) = 3

Step Three:For our example the computed weights and degrees are in the next table:

Step Three:Now we can calculate the score for every candidate meaning:

For each candidate multiply it's weight by it's degree (w*d)Then again for each candidate, divide w*d by the sum of all w*d for that fragment.

For example (Leaf, leaf)

weight((Leaf, leaf)) = 2/3degree((Leaf, leaf)) = 4w*d = 8/3

Sum of all others w*d for that specific fragment (leaf) = 8/3

score((Leaf, leaf)) = 1,000

8 3 = 18

3

83

Step Three:For our example the computed scores are in the next table:

Step Three:Finally, we link each fragment with the highest ranking candidate meaning v* if it's score is higher than the fixed threshold.

Four our example, for a threshold of 0,7We keep:

Leaf (plant)FallTreeHead (as body part)

Which is correct.

Densest Sub-Graph

DENSSUB(F, cand, GI, µ)

Back to the blackbox

!!

Densest Sub-GraphThis is an approach to drastically reduce the level of ambiguity of the initial semantic interpretation graph.

It is based on the assumption that the most suitable meanings of each text fragment will belong to the densest area of the graph.

Identify the densest sub-graph of size at least k is NP-Hard. So Babelfy uses a heuristic for k-partite graphs inspired by a 2-approximation greedy algorithm for arbitrary graphs.

Babelfy strategy is based on the iterative removal of low-coherence vertices.

Densest Sub-GraphFirst, start with the initial semantic interpretation graph G

I

(0) at step 0.

For each step, identify the most ambiguous fragment fmax (The one with the maxumum number of candidate meanings).

Then, discard the weakest interpretation of the current fragment fmax. This is done by determining the lexical and semantic coherence of each candidate meaning using the score formula showed before.

The vertex with the minimum score is removed from the graph.

Densest Sub-Graph

Then, in the next step, repeat the low-coherence removal step. And stop when the number of remaining candidates for each fragment is below a threshold.

During each iteration, compute the average degree of the current step graph, and keep the densest subgraph of the initial semantic interpretation graph, which is the one that maximizes the average degree.

Densest Sub-Graph1: input: F, the set of all fragments in the input text;

cand, from fragments to candidate meanings; G(0)I , the full semantic interpretation graph; µ, ambiguity level to be reached.

2: output: G*I, a dense subgraph.3: function DENSSUB(F, cand, G(0)I ,µ) 4: t := 05: G*I := G(0)I6: while true do7: fmax := argmaxf F∈ |{v : (v, f) V∃ ∈ (t)

I}|8: if |{v : (v, fmax) V∃ ∈ (t)

I }| µ ≤ then9: break; 10: vmin:= argmin score((v, fmax))

v cand(fmax)∈

11: V(t+1)I := V(t)I \ {(vmin, fmax)}12: E(t+1)I := E(t)I V∩ (t+1)

I × V(t+1)I13: G(t+1)I := (V(t+1)I, E(t+1)I)14: if avgdeg(G(t+1)I) > avgdeg(G*I) then15: G*I := G(t+1)I16: t := t+117: return G*I

LinksReference paper about Babelfy:

A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics (TACL), 2, pp. 231-244, 2014. http://wwwusers.di.uniroma1.it/~navigli/pubs/TACL_2014_Babelfy.pdf

Babelfy websitehttp://babelfy.org/

Babelnet websitehttp://babelnet.org/

Grupo LaBDAhttp://labda.inf.uc3m.es/