MediaEval Workshop 2011

MediaEval Workshop 2011

Pisa, Italy1-2 September 2011

Introduction

• Genre Tagging task: Given 1727 videos and 26 genre tags, decide which tag goes to which video.

• Genres were – art, health, literature. Technology, sports, blogs, religion, travel, etc.

• Videos were from an online video hosting site called blip.tv

Introduction cont..

• Data given to us: Videos, Speech transcripts, metadata and some user defined tags.

• Total data/videos were divided into two sets.– Development set (consisting of 247 videos of

which we were given the ground truth, so that we can play around with our algorithm).

– Test Set (consisting of 1727 videos for which we were not given the ground truth and we had to submit our results in the workshop).

TUD-MIR at MediaEval 2011 Genre Tagging Task: Query

Expansion from a Limited Number of Labeled Videos

Main Idea

• Information Retrieval approach• Just used the textual data• Using a relatively small number of labeled

videos in the development set to mine query expansion terms that are characteristic of each genre.

Approach

• Combined all the videos of the same genre in the development set together.

• Apply preprocessing such as stop word removal and stemming.

• Perform weighting and ranking of all the terms in the development set vocabulary.

• And then use the top 20 terms from each genre document to be expanded query terms.

Offer Weighting Formula

5.0*5.05.0*5.0log*)(

rRrnrRnNrriOW

In the formula above, r is the number of videos of a particular genre in which term t(i) appears in, R is the total number of videos of that genre, N is the total number of videos in the collection and n is the number of videos in the collection in which term t(i) appears.

Few other Query Expansion Techniques

• They also ran several query expansions: PRF, WordNet, Google Sets and YouTube.

• To expand queries via YouTube, they first download metadata (e.g. title, description and tags) of the top-50 ranked videos returned by YouTube for each genre label, except for default category and sample 20 expansion terms from those using the Offer Weight as explained earlier.

LIA @ MediaEval 2011 : Compact Representation ofHeterogeneous Descriptors for Video Genre

Classification

Main Idea

• Classification approach• A method that extracts low dimensional

feature space based on text, audio and video information.

• Late fusion of SVM results for each modality.

Data Collection

• Training data set was collected from the web.• They first expanded the query terms using

Latent Dirichlet Allocation (LDA) on Gigaword corpus and then used top 10 expanded terms for each genre.

• They Queried YouTube and Daily-motion for the videos (total of 3120 videos).

• For textual data they used web pages from Google (1560 documents/web pages)

Features Extracted

• Features –– Text: TF-IDF metric– Audio: Acoustic frames of MFCC every 10ms in a

hamming window of 20 ms large.– Visual: Color structure descriptor or dominant

color structure like homogeneous texture descriptor or edge histogram descriptor. Texture was the best feature according to them.

Classification

• Each modality is separately given to SVM classifier and the scores of each are combined using linear interpolation.

User Name Similarity

• They also tried to use the user name similarity in the training set. They refer to the relation of genres and user name as a knowledge base and use it to boost the genre scores.

• So they increase the scores of genre for any video if the user name of that video exists in the knowledge base (development set).

TUB @ MediaEval 2011 Genre Tagging Task: Prediction

using Bag-of-(visual)-Words Approaches

Main Idea

• Classification task• Bag-of-words approaches with different

features derived from visual content and associated textual information

Features Extracted

• Mainly textual features:– They translated foreign language program ASR in

English using Google Translate.– Used Bag-of-Words (Tf-Idf) model for the textual

features.• For visual features: – They used local feature SURF extracted from each

key frame of video sequence.

Classification

• Fusion:– Early fusion of visual and textual features and then

SVM classification.• Classification:– Used multi-class SVM, Multinomial Naïve Bayes

and Nearest Neighbor for classification.

SINAI-Genre tagging of videos based on information retrieval and

semantic similarity using WordNet

Main Idea

• IR approach• Query expansion using WordNet• And different similarity measure rather than

Cosine similarity

Approach

• Query Expansion: Produce a bag of words using WordNet’s synonyms, hyponyms and domain terms for each genre term.

• An existing framework, Terrier IR system, has been used to obtain a measure of relatedness between the videos and the genre terms.

Second Approach

• They also used a formula proposed by Lin, which is based on WordNet, to measure the semantic similarity between the nouns detected in each test video and the bags of words generated for each genre category.

• Then they only kept the matches which exceeded the threshold of 0.75 score.

• Finally, the accumulated similarity score has been divided by the number of words detected in the video, obtaining the final semantic similarity score.

Results for allWork MAP Score

TUD-MIR (IR approach) 0.3212

LIA (Classification) 0.1828

TUB (Classification) 0.3049

SINAI (IR approach) 0.1115

Our Result (IR approach) 0.1081

MediaEval Workshop 2011

Documents

Transcript of MediaEval Workshop 2011