Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

21
Annotating a Foreign Language Lexical Resource with Pictures Dmitry Ustalov IMM UB RAS / UrFU Yekaterinburg, Russia

Transcript of Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Page 1: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Annotating a Foreign Language Lexical

Resource with Pictures

Dmitry UstalovIMM UB RAS / UrFUYekaterinburg, Russia

Page 2: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Outline

• Introduction•Related Work•Approach•Evaluation•Results•Discussion•Conclusion

2

Page 3: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Introduction

•The problem of mapping images to the word senses is quite important:•multimedia search,• text illustration,• quality assessment.

• It is also interesting to assess the Yet Another RussNet lexical resource.(Braslavski et al, 2014).

3

Page 4: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Related Work

• PicNet, a proprietary resource(Mihalcea & Leong, 2008).• ImageNet annotates WordNet with

pictures & bounding boxes(Deng et al., 2009).• Intersection with WordNet.ru is negligible.

• ImageCLEF creates software and datasets for image indexing (Mül̈ler et al., 2010).

4

Page 5: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Related Work: Flickr

•Single-query image retrieval(Reiter et al., 2007).•Semantic Web-based approach

(Trojahn et al., 2008).•Wikipedia-based approach

(Stampouli et al., 2010).•Flickr tags with visual saliency of images (Jiang et al., 2014).

5

Page 6: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Problem

Given an annotated image I, a bilingual dictionary B, and a lexical resource S, produce a mapping Is.

“cat”, “tomcat”, “kitten” →«кот, кошка, котёночек»

6

Page 7: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

TagBag: Assumptions

•The most image tags are nouns.•Tags may be polysemous and the redundant tags may be present.• “crane” is «журавль» or «кран»?

•The image has a “main” object.

7

Page 8: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

TagBag

•Tag. Initialize an empty vector.• Iterate over image tags and retrieve all

the translations for each tag.• Add each occurrence to a dimension.

•Bag. Prune that vector.• Remove the low frequency dimensions

with the cut-off value.• Return the resulting vector.

8

Page 9: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

TagBag: Pseudocode

9

Page 10: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Evaluation

•The present approach is pretty simple.Let’s evaluate it empirically.•Take the top 1500 English nouns and search for Flickr photos.

http://www.talkenglish.com/Vocabulary/Top-1500-Nouns.aspx

•Get the V.K. Mueller’s dictionary.http://ustalov.imm.uran.ru/pub/mueller.tar.gz

10

Page 11: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Experimental Setup

•Yet Another RussNet (CC BY-SA).http://russianword.net/

•Similarity measures: • cosine similarity,• Jaccard index.

•Ask threeannotators tosubmitjudgements.

11

Page 12: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

призрак, тень, намёк

12https://www.flickr.com/photos/127324269@N03/16217604730

Page 13: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

труд, работа, занятие

13https://www.flickr.com/photos/79304587@N07/16192772090

Page 14: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

мужчина, парень, юноша

14https://www.flickr.com/photos/94029069@N03/15797009873

Page 15: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

футбол

15https://www.flickr.com/photos/113780395@N05/15789001293

Page 16: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

пища, провизия, питание, корм

16https://www.flickr.com/photos/80972943@N00/16396295195

Page 17: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Results

•The accuracy is moderately high and the agreement level is good.•Both measures demonstrate the same performance.

17

http://ustalov.imm.uran.ru/pub/tagbag-aist.tar.gz

Page 18: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Discussion

•Some mappings are the same w.r.t. the similarity measures and 13 of 43 of these mapping are wrong.•Three sources of errors:• sloppy image tags (7 of 13),• actual mapping errors (3 of 13),• batch uploads (3 of 13).

18

Page 19: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Conclusion

•TagBag is an unsupervised approach for mapping images to synsets.•The performance depends both on image tags and ontology bias.•Visual saliency and spam filtering may increase the quality.

19

Page 20: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Further Work

20

Page 21: Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Thank you!

Dmitry Ustalova post-graduate student @IMM UB RAS, Yekaterinburg, Russia.

https://ustalov.name/[email protected]

The present work is supported by the Russian Foundation for the Humanities, project no. 13-04-12020.

21