UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets
-
Upload
pierpaolo-basile -
Category
Documents
-
view
325 -
download
2
Transcript of UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets
UNIBA: Exploiting a Distributional Semantic Model for
Disambiguating and Linking Entities in Tweets
Pierpaolo Basile, Annalina Caputo, Giovanni Semeraro, Fedelucio Narducci
{fedelucio.narducci, pierpaolo.basile}@uniba.it
#Microposts2015, NEEL Challenge, Florence 18th May 2015
The Challenge
Just watched Frozen for the first time ever and knew the words to all the songs... How?! #productplacement
Problem: Find and link entities in tweets
Product Entity type
Our Approach
• Entity Recognition • using PoS-tag
• relying on n-grams
• Disambiguation • knowledge-based method that combines a
Distributional Semantic Models (DSM) with prior probability assigned to each DBpedia concept
• Type • manual map for all types defined in the dbpedia-owl
ontology to the respective types in the task
Entity Recognition: Indexing Frozen
<dbpedia.org/resource/Frozen_(Madonna_song)>
Frozen
<dbpedia.org/resource/Frozen_(2013_film)>
Apple
<dbpedia.org/resource/Apple_Inc.>
Apple Inc.
<dbpedia.org/resource/Apple_Inc.>
Barack Obama
<http://dbpedia.org/resource/Barack_Obama>
DBpedia titles file and DBpedia NLP resources http://wifo5-04.informatik.uni-mannheim.de/downloads/datasets/
Search Score Levenshtein Distance Jaccard Index
Indexing
Entity Recognition…
PoS-tagger
N-grams generation
Tokenization and Normalization
Candidate list of surface
forms
Tweet
…Entity Recognition
Search and Filtering
Search Score Levenshtein Distance
Jaccard Index
Candidate list of surface
forms
Candidate entities and
list of possible concepts
Disambiguation
Building the glosses
Building the context
Semantic Ranking
3-step approach
Disambiguation: Building the glosses
"Frozen" is a song by American singer-songwriter Madonna…
Frozen is a 2013 American 3D computer-animated musical…
DBpedia extended abstracts
Disambiguation: Building the context
Just watched Frozen for the first time ever and knew the words to all the songs... How?! #productplacement
<just, watched, first, time, knew, words, all, songs, how, product, placement>
Context
Disambiguation: Semantic Ranking 1/3
• Words as points in a mathematical space
• Close words are similar • Word space is built analyzing
word co-occurrences in a large corpus
• Vector composition using superposition (+)
Disambiguation: Semantic Ranking 2/3
word2vec: https://code.google.com/p/word2vec/
Distributional Semantic Model built on Wikipedia
Context
• Cosine similarity between the gloss and the context
• Linear combination with a function which takes into account the usage of concepts in Wikipedia
Disambiguation: Semantic Ranking 3/3 Statistics about the usage of concepts in Wikipedia
𝑝 𝑐𝑖𝑗 𝑒𝑖 =𝑡 𝑒𝑖 , 𝑐𝑖𝑗 + 1
#𝑒𝑖 + |𝐶𝑖|
Concept probability given the entity
𝑝 𝑐𝑖𝑗 𝑒𝑖 =𝑡 𝑒𝑖 , 𝑐𝑖𝑗 + 1
#𝑒𝑖 + |𝐶𝑖|
Disambiguation: Semantic overlap 3/3 Statistics about the usage of concepts in Wikipedia
Number of times ei is linked as cij
Number of concepts assigned to ei
Evaluation
• Development set • 500 manually annotated tweets
• Metrics • SLM: Strong Link Match • STMM: Strong Typed Mention Match • MC: Mention Ceaf
• System setup • TweetNLP for tokenization and PoS-tagging • word2vec for DSM building: 400 vector dimensions
analyzing only terms that occur at least 25 times • Developed in JAVA
Results
• Low performance in entity recognition
• Good results in disambiguation: F=0.825 considering correct recognition and no-NIL instances
Entity Recognition F-SLM F-STMM F-MC
PoS-tag 0.362 0.267 0.389
N-grams 0.258 0.191 0.306