UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Post on 21-Jul-2015

325 views 2 download

Tags:

Transcript of UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

UNIBA: Exploiting a Distributional Semantic Model for

Disambiguating and Linking Entities in Tweets

Pierpaolo Basile, Annalina Caputo, Giovanni Semeraro, Fedelucio Narducci

{fedelucio.narducci, pierpaolo.basile}@uniba.it

#Microposts2015, NEEL Challenge, Florence 18th May 2015

The Challenge

Just watched Frozen for the first time ever and knew the words to all the songs... How?! #productplacement

Problem: Find and link entities in tweets

Product Entity type

Our Approach

• Entity Recognition • using PoS-tag

• relying on n-grams

• Disambiguation • knowledge-based method that combines a

Distributional Semantic Models (DSM) with prior probability assigned to each DBpedia concept

• Type • manual map for all types defined in the dbpedia-owl

ontology to the respective types in the task

Entity Recognition: Indexing Frozen

<dbpedia.org/resource/Frozen_(Madonna_song)>

Frozen

<dbpedia.org/resource/Frozen_(2013_film)>

Apple

<dbpedia.org/resource/Apple_Inc.>

Apple Inc.

<dbpedia.org/resource/Apple_Inc.>

Barack Obama

<http://dbpedia.org/resource/Barack_Obama>

DBpedia titles file and DBpedia NLP resources http://wifo5-04.informatik.uni-mannheim.de/downloads/datasets/

Search Score Levenshtein Distance Jaccard Index

Indexing

Entity Recognition…

PoS-tagger

N-grams generation

Tokenization and Normalization

Candidate list of surface

forms

Tweet

…Entity Recognition

Search and Filtering

Search Score Levenshtein Distance

Jaccard Index

Candidate list of surface

forms

Candidate entities and

list of possible concepts

Disambiguation

Building the glosses

Building the context

Semantic Ranking

3-step approach

Disambiguation: Building the glosses

"Frozen" is a song by American singer-songwriter Madonna…

Frozen is a 2013 American 3D computer-animated musical…

DBpedia extended abstracts

Disambiguation: Building the context

Just watched Frozen for the first time ever and knew the words to all the songs... How?! #productplacement

<just, watched, first, time, knew, words, all, songs, how, product, placement>

Context

Disambiguation: Semantic Ranking 1/3

• Words as points in a mathematical space

• Close words are similar • Word space is built analyzing

word co-occurrences in a large corpus

• Vector composition using superposition (+)

Disambiguation: Semantic Ranking 2/3

word2vec: https://code.google.com/p/word2vec/

Distributional Semantic Model built on Wikipedia

Context

• Cosine similarity between the gloss and the context

• Linear combination with a function which takes into account the usage of concepts in Wikipedia

Disambiguation: Semantic Ranking 3/3 Statistics about the usage of concepts in Wikipedia

𝑝 𝑐𝑖𝑗 𝑒𝑖 =𝑡 𝑒𝑖 , 𝑐𝑖𝑗 + 1

#𝑒𝑖 + |𝐶𝑖|

Concept probability given the entity

𝑝 𝑐𝑖𝑗 𝑒𝑖 =𝑡 𝑒𝑖 , 𝑐𝑖𝑗 + 1

#𝑒𝑖 + |𝐶𝑖|

Disambiguation: Semantic overlap 3/3 Statistics about the usage of concepts in Wikipedia

Number of times ei is linked as cij

Number of concepts assigned to ei

Evaluation

• Development set • 500 manually annotated tweets

• Metrics • SLM: Strong Link Match • STMM: Strong Typed Mention Match • MC: Mention Ceaf

• System setup • TweetNLP for tokenization and PoS-tagging • word2vec for DSM building: 400 vector dimensions

analyzing only terms that occur at least 25 times • Developed in JAVA

Results

• Low performance in entity recognition

• Good results in disambiguation: F=0.825 considering correct recognition and no-NIL instances

Entity Recognition F-SLM F-STMM F-MC

PoS-tag 0.362 0.267 0.389

N-grams 0.258 0.191 0.306