EmojiNet: An Open Service and API for Emoji Sense Discovery

16
EmojiNet: An Open Service and API for Emoji Sense Discovery Presented By - Sanjaya Wijeratne Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran, EmojiNet: An Open Service and API for Emoji Sense Discovery, In 11th International AAAI Conference on Web and Social Media (ICWSM 2017). Montreal, Canada; 2017. Demo | BibTeX Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran

Transcript of EmojiNet: An Open Service and API for Emoji Sense Discovery

Page 1: EmojiNet: An Open Service and API for Emoji Sense Discovery

EmojiNet: An Open Service and API for Emoji Sense Discovery

Presented By - Sanjaya Wijeratne

Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran, EmojiNet: An Open Service and API for Emoji Sense Discovery, In 11th International AAAI Conference on Web and Social Media (ICWSM 2017). Montreal, Canada; 2017. Demo | BibTeX

Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran

Page 2: EmojiNet: An Open Service and API for Emoji Sense Discovery

Problems with current State-of-the-art● Current version of EmojiNet supports:

○ Only 35% of all emoji supported by the Unicode

Consortium (845 out of 2,389)

○ Emoji sense definitions are very short (10 ~ 15 words)

○ No support for platform-specific emoji meanings

○ Not available for download as a dataset

○ Does not support REST API access

2

Page 3: EmojiNet: An Open Service and API for Emoji Sense Discovery

What is new in EmojiNet● Supports all 2,389 emoji supported by Unicode Consortium

○ 2,389 emoji (3 times increase)

○ 12,904 sense definitions (4 times increase)

● Sense-embeddings learned over text corpora

○ Twitter and Google News corpora are used to learn word

embeddings to further strengthen sense definitions

● Platform-specific meanings for 40 commonly misunderstood

emoji obtained through an Amazon Mechanical Turk Task

● Public release of the EmojiNet dataset with REST API access 3

Page 4: EmojiNet: An Open Service and API for Emoji Sense Discovery

Building EmojiNet

4

Page 5: EmojiNet: An Open Service and API for Emoji Sense Discovery

Sense Extraction from Web Resources

5

Page 6: EmojiNet: An Open Service and API for Emoji Sense Discovery

Sense Filtering● We had 50,115 total number of senses in our sense pool

○ 21,779 of them were incorrect according to English

○ We evaluated the remaining 28,336 sense labels

■ 15,432 sense labels were removed as they were not

correct (noisy data extracted from Emoji Dictionary)

○ Remaining 12,904 sense labels were considered for sense

disambiguation

6

Page 7: EmojiNet: An Open Service and API for Emoji Sense Discovery

Linking Sense Labels with BabelNet Senses

7

Page 8: EmojiNet: An Open Service and API for Emoji Sense Discovery

Emoji Sense Distribution

8

Page 9: EmojiNet: An Open Service and API for Emoji Sense Discovery

EmojiNet Resource Evaluation● Resource linking based on image similarity performed with

96.27% accuracy

9

Page 10: EmojiNet: An Open Service and API for Emoji Sense Discovery

Adding Word Embeddings to EmojiNet● We trained a Twitter word embedding model using 110

million tweets with emoji. We also used a publicly available

Google News word embedding model to learn word vectors

● Each word in each emoji sense in each emoji was replaced by

the 20 most related words learned by the word embeddings

models. This lead to 3 contexts for each emoji sense

○ BabelNet-based context words

○ Twitter-based context words

○ Google News-based context words10

Page 11: EmojiNet: An Open Service and API for Emoji Sense Discovery

Adding Platform-specific senses to EmojiNet● We conducted an experiment on Amazon Mechanical Turk

to understand what emoji senses are platform-specific for a

given emoji

○ We selected 40 commonly misunderstood emoji for this

○ We created 14,448 tasks, where each task asked to

evaluate whether a particular platform-specific sense is

valid

○ 1,128 tasks were filtered as they were spam

11

Page 12: EmojiNet: An Open Service and API for Emoji Sense Discovery

Emoji Sense Disambiguation● We selected 25 most misunderstood emoji based on past

work for a emoji sense disambiguation task

○ Randomly selected 50 tweets for each emoji

○ Used Simplified LESK algorithm for disambiguation

12

Page 13: EmojiNet: An Open Service and API for Emoji Sense Discovery

Emoji Similarity● We used 100 emoji available in EmoTwi50 dataset to create a

graph based on emoji similarity

○ Emoji are represented as nodes

○ If two emoji share the same sense label, they are

connected by an edge

● We used label propagation algorithm to find clusters in our

emoji graph

13

Page 14: EmojiNet: An Open Service and API for Emoji Sense Discovery

Emoji Similarity Graph

14

Page 15: EmojiNet: An Open Service and API for Emoji Sense Discovery

Calculate Emoji Similarity using Jaccard Coefficient● In another experiment, we used Jaccard Similarity on emoji

senses to find emoji similarity

15

Page 16: EmojiNet: An Open Service and API for Emoji Sense Discovery

Questions?

Thank You!

16

Read more about EmojiNet at - http://wiki.knoesis.org/index.php/EmojiNet