Social Event Detection at MediaEval 2014: Challenges, Datasets, and Evaluation

Social Event Detection at MediaEval 2014:

Challenges, Datasets and Evaluation

Vasileios Mezaris, ITI – CERTH

Symeon Papadopoulos, ITI - CERTH

Georgios Petkos, ITI-CERTH

#2

Social events?

• Events that are organized and attended by people and are represented by multimedia posted online by different people.

• For instance: concerts, sports events, public celebrations, protests, etc.

• Why are social events interesting? – A significant part of human activity is centred around social events.– Detection of social events may be of interest to professional journalists

that would like to discover new social events.– Casual users may also like to organize their photo collections (and

possibly, the collections of their friends) around attended events.

#3

Two subtasks

• Subtask 1: Detection.– Participants are asked to cluster a collection of images, so

that each cluster corresponds to a distinct social event.– Images come with metadata of different types: a

multimodal clustering problem.

• Subtask 2: Retrieval.– Participants are asked to retrieve those events that match

specific criteria (type of event, location, time, involved entities).

– 10 test queries are provided. For instance: “Find all music events that took place in Canada”.

#4

Datasets and evaluation

• First dataset consists of ~360,000 Flickr images corresponding to ~18,000 social events. Used for development purposes in the first subtask as well as for development and testing purposes in the second subtask.

• Second dataset consists of ~110,000 Flickr images and is used for testing in the first subtask.

• Each image comes with metadata that includes time-stamps, geographic information, tags, title, description, etc. There is certain amount of noise and additionally some features are missing for some images (for instance, around 20% of the images are geotagged).

• Dataset is already publicly available at:http://mklab.iti.gr/project/sed2014

• For the first subtask, results were evaluated using F1 and NMI, whereas for the second subtask results were evalutated using Precision, Recall and F1.

#5

1st subtask: overview of applied methods

• 6 teams submitted to the first subtask.

• At a very high level there were two types of approaches:

a) Applying a sequence of clustering operations. E.g. treat all images of a user independently and cluster them according to temporal criteria. Then merge the first level clusters according to textual, spatial or temporal similarity. Visual similarity was not considered. (LIMSI, UPC, ATU)

b) Learning a similarity metric between images and use this to do the clustering. Visual similarity was considered (CERTH, ADMRG-QUT, SAIVT-ADMRG).

#6

1st subtask: Results

Ranking Team F1 NMI Type1 ATU 0.9476 0.9886 a2 UPC 0.9240 0.9820 a

Organizing team (no ranking)

CERTH 0.9161 0.9818 b

3 LIMSI 0.8214 0.9554 a4 ADMRG-QUT 0.7533 0.9024 b5 SAIVT-ADMRG 0.7525 0.9018 b

• Very good results obtained using only the metadata of the images and not the visual content

• Careful consideration of the data and the problem lead best performing teams to come up with an appropriate sequence of clustering steps based on particular modalities, whereas an approach that is based on learning a multimodal similarity performs almost as well.

#7

2nd subtask: overview of applied methods

• 2 teams submitted to the second subtask.

• One team (ATU) utilized a similarity measure that accounts for temporal, spatial and textual similarity of the events to the queries as well as a 1-class SVM to account for event types. Also, applied query expansion to the query using WordNet sysnets.

• The other team (CERTH) learned language models for each considered criterion in the set of test queries and used them to classify events.

#8

Ranking Team F1 Precision RecallOrganizing team

(no ranking)CERTH 0.4604 0.3905 0.7080

1 ATU 0.2877 0.4203 0.4057

• Queries that involved location criteria appeared to be easier than those that involved only the type of event.

• The difference in performance between the teams may be because CERTH utilized language models learned from Flickr data whereas ATU used WordNet’s synsets to do query expansion.

2nd subtask: Results

#9

Acknowledgements

Social Event Detection at MediaEval 2014: Challenges, Datasets, and Evaluation

Software

Transcript of Social Event Detection at MediaEval 2014: Challenges, Datasets, and Evaluation