TubeTagger – YouTube-based Concept Detection

TubeTagger— YouTube-based Concept Detection —

Adrian Ulges, Markus KochDamian Borth, Thomas M. Breuel

German Research Center for Artificial Intelligence (DFKI) &University of Kaiserslautern

December 6 2009

D.Borth: : TubeTagger 1 December 6 2009

Outline

Motivation

TubeTagger System

TubeTagger Web Demo

Summary

Data, data, data

”...TV, video on demand, Internet video, and P2P video willaccount for over 91 percent of global consumer traffic by 2013...”

”Cisco Visual Networking Index”, 2009

”...more than 20 hours of video uploaded to YouTube everyminute, 1 billion views per day...”

, 2009

Increasing importance of video live streams: Obama’s inauguration,Michael Jackson’s memorial service, music video debuts...

, 2009

Data, data, data

, 2009

Data, data, data

, 2009

Data, data, data

, 2009

Content-based Video Retrieval

How to search in large video databases?

interview

soccer

I Video Concept Detection [MediaMill, Columbia, IBM,...]

→ as key building block of CBVR

interview

soccer

Supervised Machine Learning

I Video Concept Detection [MediaMill, Columbia, IBM,...]

→ as key building block of CBVR

I Supervised Machine Learning→ need labeled training data

Training Data

State-of-the-art Datasets

I TRECVID

I LSCOM

Acquisition

I collaborative [Ayache07]

I precise & high quality

I time consuming effort

I vocabulary size does not scale→ from hundreds to thousands of concepts [Hauptmann07]

I missing new emerging concepts of interest→ e.g. ”Michael Jackson”, ”Obama”, ”iPod”1

1top ranked searches 2009-Q3 by “Google Insights for Search” for web search, news searchand product search respectively

Training Data

State-of-the-art Datasets

I TRECVID

I LSCOM

Acquisition

I collaborative [Ayache07]

I precise & high quality

I time consuming effort

I vocabulary size does not scale→ from hundreds to thousands of concepts [Hauptmann07]

I missing new emerging concepts of interest→ e.g. ”Michael Jackson”, ”Obama”, ”iPod”1

1top ranked searches 2009-Q3 by “Google Insights for Search” for web search, news searchand product search respectively

Web-video as training data

I available at large scale

I high variability of dataI enriched with

I tagsI commentsI ratings

I could allow automaticconcept learning

ConsI weakly labeled

I incomplete / subjectiveI coarse

I focus of interest

I domain change

I video portal as black box

I focus of interest

I domain change

I focus of interest

I domain change

TubeTagger System Overview

Notation

I t ∈ T := tags (≈concepts)

I x ∈ Xdb := keyframe of a videos

I P(t|x) := probability of tag t for keyframe x

Notation

I t ∈ T := tags (≈concepts)

I x ∈ Xdb := keyframe of a videos

I P(t|x) := probability of tag t for keyframe x

Training Data Acquisition

I use YouTube as primarytraining data source

I use tags as weak labelsI pos. = taggedI neg. = all non-tagged

Learning Pipelines

I Visual Concept Learning

I Semantic ConceptLearning

Visual Concept Learning

Features

I keyframe extraction→ x ∈ Xdb

I bag-of-visual-wordsdescriptors [Sivic06]

I SIFT [Lowe99]

I SURF [Bay06]

Statistical Models

I SVM [Scholkopf01]

I PAMIR [Grangier08, Paredes09]

I Max. Entropy [Deselaers05]

→ output: P(t|x)

Features

I SIFT [Lowe99]

I SURF [Bay06]

Statistical Models

I SVM [Scholkopf01]

→ output: P(t|x)

Features

I SIFT [Lowe99]

I SURF [Bay06]

Statistical Models

I SVM [Scholkopf01]

→ output: P(t|x)

Features

I SIFT [Lowe99]

I SURF [Bay06]

Statistical Models

I SVM [Scholkopf01]

→ output: P(t|x)

Semantic Concept Learning

Query Formulation

I concepts 6= textual queriesI mapping of text queries to concept vocabulary

I ”computer” → Monitor, Windows-Desktop, iPhoneI ”funny” → Muppets, Cats, Commercials

I learning of tag co-occurrences

Query Formulation

I concepts 6= textual queriesI mapping of text queries to concept vocabulary

I ”computer” → Monitor, Windows-Desktop, iPhoneI ”funny” → Muppets, Cats, Commercials

I learning of tag co-occurrences

FeatureI bag-of-words features

I t ∈ T → ht

I q := hq

I mapping to concepts asweights

I w(q, t) :=< ht , hq >

Approach

I T5 = w(q, t)top5

I final fusionI P(q|x) =∑

t∈T5

w(q,t)Pt∈T5

w(q,t) P(t|x)

FeatureI bag-of-words features

I t ∈ T → ht

I q := hq

I mapping to concepts asweights

I w(q, t) :=< ht , hq >

Approach

I T5 = w(q, t)top5

I final fusionI P(q|x) =∑

t∈T5

w(q,t)Pt∈T5

w(q,t) P(t|x)

Experiments

Dataset

I 1200 hrs. (= 750k keyframes)

I 233 conceptsI per concept:

I 150 videos for trainingI 50 videos for testing ”soccer” ”traffic”

Clips/Concept Evaluation

I 10 concepts

I trained on N clips

I tested on 200k keyframesI saturation at 100− 150

clips/concepts for SURF+PAMIR

Experiments

Dataset

I 10 concepts

Experiments

Dataset

I 10 concepts

Experiments

Feature & Classifier Evaluation

I 81 concepts

I video level testing (avg. fusion)I results:

I MAP: 22.4%I MAP: 15.4% (6 times faster)

featuremodel SURF SIFT

SVMs 20.4 22.4PAMIR 15.4 18.4

MAXENT 14.1 15.5

Performance Distribution

restaurantorigami operation-roomsoccer busshowing 78 representative concepts out of 233

Experiments

Feature & Classifier Evaluation

I 81 concepts

I video level testing (avg. fusion)I results:

I MAP: 22.4%I MAP: 15.4% (6 times faster)

featuremodel SURF SIFT

SVMs 20.4 22.4PAMIR 15.4 18.4

MAXENT 14.1 15.5

Performance Distribution

restaurantorigami operation-roomsoccer busshowing 78 representative concepts out of 233

Insights in Web-based Concept Detection

Good Concepts

I ”boat/ship”, ”pyramids”

I broad community of YouTube users

I often ”interesting”, ”spectacular”

Redundant Concepts

I ”drummer”, ”fencing”

I series of clips, not sufficiently diverse

I problem: generalization

Bad Concepts

I ”fence”, ”operation-room”

I not regularly used as a tag

”boat ship”

”drummer”

”fence”

TubeTagger Web Demo

User Interface

1. selected a concept

2. or enter a query

3. switch between video or keyframe level

TubeTagger Web Demo

Deep Tagging

I frame-accurate concept detection beyond coarse tags

I find concept (e.g. ”Christmas Tree”) within a video clip

TubeTagger Web Demo

Text-based Search

I matches queries to concepts

I e.g. ”diving” → ”underwater”, ”shipwreck”, ”fish”, . . .

Summary

TubeTagger - YouTube-based Concept Detection

I utilize YouTube videos as training source for visualconcept detector learning

I utilize YouTube tags for semantic model generationI web demo providing:

I deep taggingI tag recommendationI text-based search

questions?

project site: www.dfki.de/moonvid

web demo at: http://madm.dfki.de/demo/tubetagger/

TubeTagger – YouTube-based Concept Detection

Technology

Transcript of TubeTagger – YouTube-based Concept Detection