MediaEval Workshop 2011

24
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011

description

MediaEval Workshop 2011. Pisa, Italy 1-2 September 2011. Introduction. Genre Tagging task: Given 1727 videos and 26 genre tags, decide which tag goes to which video. Genres were – art, health, literature. Technology, sports, blogs, religion, travel, etc. - PowerPoint PPT Presentation

Transcript of MediaEval Workshop 2011

Page 1: MediaEval Workshop 2011

MediaEval Workshop 2011

Pisa, Italy1-2 September 2011

Page 2: MediaEval Workshop 2011

Introduction

• Genre Tagging task: Given 1727 videos and 26 genre tags, decide which tag goes to which video.

• Genres were – art, health, literature. Technology, sports, blogs, religion, travel, etc.

• Videos were from an online video hosting site called blip.tv

Page 3: MediaEval Workshop 2011

Introduction cont..

• Data given to us: Videos, Speech transcripts, metadata and some user defined tags.

• Total data/videos were divided into two sets.– Development set (consisting of 247 videos of

which we were given the ground truth, so that we can play around with our algorithm).

– Test Set (consisting of 1727 videos for which we were not given the ground truth and we had to submit our results in the workshop).

Page 4: MediaEval Workshop 2011

TUD-MIR at MediaEval 2011 Genre Tagging Task: Query

Expansion from a Limited Number of Labeled Videos

Page 5: MediaEval Workshop 2011

Main Idea

• Information Retrieval approach• Just used the textual data• Using a relatively small number of labeled

videos in the development set to mine query expansion terms that are characteristic of each genre.

Page 6: MediaEval Workshop 2011

Approach

• Combined all the videos of the same genre in the development set together.

• Apply preprocessing such as stop word removal and stemming.

• Perform weighting and ranking of all the terms in the development set vocabulary.

• And then use the top 20 terms from each genre document to be expanded query terms.

Page 7: MediaEval Workshop 2011

Offer Weighting Formula

5.0*5.05.0*5.0log*)(

rRrnrRnNrriOW

In the formula above, r is the number of videos of a particular genre in which term t(i) appears in, R is the total number of videos of that genre, N is the total number of videos in the collection and n is the number of videos in the collection in which term t(i) appears.

Page 8: MediaEval Workshop 2011

Few other Query Expansion Techniques

• They also ran several query expansions: PRF, WordNet, Google Sets and YouTube.

• To expand queries via YouTube, they first download metadata (e.g. title, description and tags) of the top-50 ranked videos returned by YouTube for each genre label, except for default category and sample 20 expansion terms from those using the Offer Weight as explained earlier.

Page 9: MediaEval Workshop 2011

LIA @ MediaEval 2011 : Compact Representation ofHeterogeneous Descriptors for Video Genre

Classification

Page 10: MediaEval Workshop 2011

Main Idea

• Classification approach• A method that extracts low dimensional

feature space based on text, audio and video information.

• Late fusion of SVM results for each modality.

Page 11: MediaEval Workshop 2011

Data Collection

• Training data set was collected from the web.• They first expanded the query terms using

Latent Dirichlet Allocation (LDA) on Gigaword corpus and then used top 10 expanded terms for each genre.

• They Queried YouTube and Daily-motion for the videos (total of 3120 videos).

• For textual data they used web pages from Google (1560 documents/web pages)

Page 12: MediaEval Workshop 2011

Features Extracted

• Features –– Text: TF-IDF metric– Audio: Acoustic frames of MFCC every 10ms in a

hamming window of 20 ms large.– Visual: Color structure descriptor or dominant

color structure like homogeneous texture descriptor or edge histogram descriptor. Texture was the best feature according to them.

Page 13: MediaEval Workshop 2011

Classification

• Each modality is separately given to SVM classifier and the scores of each are combined using linear interpolation.

Page 14: MediaEval Workshop 2011

User Name Similarity

• They also tried to use the user name similarity in the training set. They refer to the relation of genres and user name as a knowledge base and use it to boost the genre scores.

• So they increase the scores of genre for any video if the user name of that video exists in the knowledge base (development set).

Page 15: MediaEval Workshop 2011

TUB @ MediaEval 2011 Genre Tagging Task: Prediction

using Bag-of-(visual)-Words Approaches

Page 16: MediaEval Workshop 2011

Main Idea

• Classification task• Bag-of-words approaches with different

features derived from visual content and associated textual information

Page 17: MediaEval Workshop 2011

Features Extracted

• Mainly textual features:– They translated foreign language program ASR in

English using Google Translate.– Used Bag-of-Words (Tf-Idf) model for the textual

features.• For visual features: – They used local feature SURF extracted from each

key frame of video sequence.

Page 18: MediaEval Workshop 2011

Classification

• Fusion:– Early fusion of visual and textual features and then

SVM classification.• Classification:– Used multi-class SVM, Multinomial Naïve Bayes

and Nearest Neighbor for classification.

Page 19: MediaEval Workshop 2011

SINAI-Genre tagging of videos based on information retrieval and

semantic similarity using WordNet

Page 20: MediaEval Workshop 2011

Main Idea

• IR approach• Query expansion using WordNet• And different similarity measure rather than

Cosine similarity

Page 21: MediaEval Workshop 2011

Approach

• Query Expansion: Produce a bag of words using WordNet’s synonyms, hyponyms and domain terms for each genre term.

• An existing framework, Terrier IR system, has been used to obtain a measure of relatedness between the videos and the genre terms.

Page 22: MediaEval Workshop 2011

Second Approach

• They also used a formula proposed by Lin, which is based on WordNet, to measure the semantic similarity between the nouns detected in each test video and the bags of words generated for each genre category.

Page 23: MediaEval Workshop 2011

• Then they only kept the matches which exceeded the threshold of 0.75 score.

• Finally, the accumulated similarity score has been divided by the number of words detected in the video, obtaining the final semantic similarity score.

Page 24: MediaEval Workshop 2011

Results for allWork MAP Score

TUD-MIR (IR approach) 0.3212

LIA (Classification) 0.1828

TUB (Classification) 0.3049

SINAI (IR approach) 0.1115

Our Result (IR approach) 0.1081