Improving Flickr discovery through Wikipedias

Post on 11-May-2015

1.161 views 1 download

Tags:

description

Position paper presented at the "Between Ontologies and Folksonomies" (BOF) workshop at CCT2007.

Transcript of Improving Flickr discovery through Wikipedias

1/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Improving Flickr discovery through Wikipedias

Federico Gobbo{federico.gobbo}@uninsubria.it

Universita degli Studi dell’InsubriaVarese, Italy

(cc) Some rights reserved.

2/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

1 IntroductionWhy folksonomies are interesting

2 FolksonomiesWhy folksonomies differ?

3 Linguistic issuesAugmented folksonomies through natural language

4 Introducing FlickrpediaMultilingual diversity as the source of knowledge

5 Concluding Remarks

3/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Why folksonomies are interesting

A key question of information retrieval today

How to add meaningful metadata to web content, in order toincrease the utility of information by improve the precision ofinformation retrieval to search engines?

4/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Why folksonomies are interesting

Folksonomies, a tentative answer. What are they?

folksonomy = folks + taxonomy

A folksonomy is made by tags or labels, usually single-wordmetadata attached to online items (documents, photos, videos,etc.), in order to add contextual meaning to the items themselves.

Folksonomies are a tentative effort toward the goal of improvingthe precision of information retrieval.

5/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Why folksonomies differ?

Folksonomies and traditional taxonomies

Unlike traditional taxonomies, there is no explicit hierarchybetween tags nor tags are exclusive. For example, the photo of a

cat may be tagged as ‘cat’ and ‘european’ and ‘animal’, but thereis nothing that say that all cats are animals: tags can be seen ascommon facets of the item itself (Schmitz 2006). There is no

central authority, and this is the main reason why folksonomies arebecoming more and more popular among web resource users.

6/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Why folksonomies differ?

The two different scopes of folksonomies

Each tag has two different scopes at the same time:

personimy, the user’s defined one (Quintarelli 2005);

consensus, the social shared meaning.

Consensus is becoming more and more important, as the wide useof tag suggestion interfaces in web applications suggests.

7/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Why folksonomies differ?

Folksonomies and the Long Tail (see the video!)

8/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Why folksonomies differ?

The key concept of serendipity

Consensus permits serendipity, i.e. users dig the web through tagsfinding new, unexpected and useful content, not easily accessiblevia traditional search engines.

Tags are used as filters, i.e. a query on more tags returns the itemstagged with any of the given tags – or with all tags, depending onthe application (Golder and Huberman 2006).

The purpose of this paper is to improve serendipity allowing peopleto dig folksonomies regardless of the natural language(s) theymaster.

9/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Augmented folksonomies through natural language

Tags as linguistic objects

Tags are words, i.e. alphabetical strings meaningful in somenatural language. There is no controlled language. In particular,features unrecognized are:

synonymity (different word strings, analogue meaning);

homography (identical word string, totally different meaning);

different strategies in encoding are possibles (e.g.‘28-03-2008’, ‘2008March3’, ‘3rd March 2008’);

misspellings are very frequent, so standard NLP techniques arebanned.

Guy and Tonkin (2006) even advocated tag literacy education.

10/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Augmented folksonomies through natural language

The linguistic divide in folksonomies

Multilingualism is an issue not fully explored yet in folksonomies.In fact, tags are written in a human language and users areinclined to write in the languages they are comfortable in.

It is certainly desiderable for a user not comfortable in English orother big language (in terms of presence in the web) to search andfind tags using a search engine interface in his or her tongue, whilethe engine searches the corresponding tags in English and in othermajor human languages.

11/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Multilingual diversity as the source of knowledge

How to overcome the linguistic divide?

A proposal: through a special web application which extracts thepairs language-tags in every available language before passing thetags to the folksonomy search engine.

The claim is improvement in serendipity: when searching in 20natural languages at the same time, some interesting data will befound, undiscovered through a single language search.

12/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Multilingual diversity as the source of knowledge

Flickr and its API

Flickr is one of the most popular web applications for photos (+2million photos are found if ‘flowers’ are searched, nowadays).Photos are freely tagged by users, so it can be considered afolksonomy.

Open source APIs in major programming languages are availableand people can make queries to the Flickr repository through anauthentication key given on request.

http://www.flickr.com/services/api

13/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Multilingual diversity as the source of knowledge

Flickrpedia = Flickr + Wikipedias

Flickrpedia is designed on an API in Ruby and over developmentframework Ruby on Rails (Thomas 2005, Thomas andHeinemeier-Hansson 2005). Users can make queries in Flickrwriting a tag specifying its natural language.

The system crawls the Wikipedia in the corresponding languageand look for an appropriate page. With the help of regularexpressions, Flickrpedia parses the web page and extracts theexisting language pairs of the same topic in other languages fromthe appropriate web page box.

How Flickrpedia works

AirplaneEnglish

German user

FlugzeugGerman

AvionFrench

Hegazkinbasque

enters the query in Flickrpedia

the systemcrawls

parsing with the help of regular expressions

...

the German user obtains the desidered photos from Flickr!

The web page box for “alternate languages” in WikipediaAn example: the German word ‘Flugzeug’

16/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Multilingual diversity as the source of knowledge

The results of the German word ‘Flugzeug’

At 2007, April, 11, Flickr finds less than 10,000 photos whileFlickrpedia more than 20,000 for the same query, giving a lot ofunexpected and relevant photos.

Don’t trust me: try by yourself!Word searched: ‘Flugzeug’, i.e. airplane in German

http://buffy.sciva.uninsubria.it/∼rl608838/search

18/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Flickrpedia until now

Flickrpedia should only store the wikipedias according to theexisting natural languages – actually, 85. Large and extemporaneusshared information repositories, like Flickr, can be managedthrough other semi-structured information repositories as thewikipedias.

Flickrpedia, if refined out of its actual prototypical phase, may helpusers with poor knowledge of major languages to retrieveinformation only through their lesser-used languages.

19/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Further direction of Flickrpedia

Flickrpedia is far from perfect: homographies are still unmanaged,even if wikipedias have disambiguating pages, and it is not clearwhich wikipedias to choose in order to optimize serendipity.

By now the parsed wikipedias are the biggest ones in terms of wikipages, but this doesn’t give any guarantee of serendipityaugmentation.

Finally, the API given by Flickr is a severe limit: up to 20 tags canbe inserted in a single query request, and up to 60 thumbnails maybe given.

20/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Beyond Flickrpedia

This approach isn’t limited to Flickr as the underlying folksonomy.Our research direction is towards generalization, i.e. users canchoose the appropriate folksonomy performing multilingual queries.

It is still to demonstrate how to apply this approach tofolksonomies where the semantic references are different fromphotos, i.e. an airplane or a flower is still so in almost every humanlanguage, more or less.

The real underlying problem is how to measure serendipity, i.e.specific and precise metrics for serendipity are needed.

21/21

Index Introduction Folksonomies Linguistic issues Introducing Flickrpedia Concluding Remarks

Thank you. Any questions?

Download these slides at the following permalink:

http://purl.org/net/fgobbo

(cc) F. Gobbo 2007. Published in Italy.Attribuzione – Non commerciale – Condividi allo stesso modo 2.5