TubeTagger – YouTube-based Concept Detection

39
TubeTagger — YouTube-based Concept Detection — Adrian Ulges, Markus Koch Damian Borth, Thomas M. Breuel German Research Center for Artificial Intelligence (DFKI) & University of Kaiserslautern December 6 2009 D.Borth: : TubeTagger 1 December 6 2009

description

Talk at the IEEE ICDM Workshop on Internet Multimedia Mining in Miami, USA Link to original publication: http://madm.dfki.de/publication&pubid=4492

Transcript of TubeTagger – YouTube-based Concept Detection

Page 1: TubeTagger – YouTube-based Concept Detection

TubeTagger— YouTube-based Concept Detection —

Adrian Ulges, Markus KochDamian Borth, Thomas M. Breuel

German Research Center for Artificial Intelligence (DFKI) &University of Kaiserslautern

December 6 2009

D.Borth: : TubeTagger 1 December 6 2009

Page 2: TubeTagger – YouTube-based Concept Detection

Outline

Motivation

TubeTagger System

TubeTagger Web Demo

Summary

D.Borth: : TubeTagger 2 December 6 2009

Page 3: TubeTagger – YouTube-based Concept Detection

Data, data, data

”...TV, video on demand, Internet video, and P2P video willaccount for over 91 percent of global consumer traffic by 2013...”

”Cisco Visual Networking Index”, 2009

”...more than 20 hours of video uploaded to YouTube everyminute, 1 billion views per day...”

, 2009

Increasing importance of video live streams: Obama’s inauguration,Michael Jackson’s memorial service, music video debuts...

, 2009

D.Borth: : TubeTagger 3 December 6 2009

Page 4: TubeTagger – YouTube-based Concept Detection

Data, data, data

”...TV, video on demand, Internet video, and P2P video willaccount for over 91 percent of global consumer traffic by 2013...”

”Cisco Visual Networking Index”, 2009

”...more than 20 hours of video uploaded to YouTube everyminute, 1 billion views per day...”

, 2009

Increasing importance of video live streams: Obama’s inauguration,Michael Jackson’s memorial service, music video debuts...

, 2009

D.Borth: : TubeTagger 3 December 6 2009

Page 5: TubeTagger – YouTube-based Concept Detection

Data, data, data

”...TV, video on demand, Internet video, and P2P video willaccount for over 91 percent of global consumer traffic by 2013...”

”Cisco Visual Networking Index”, 2009

”...more than 20 hours of video uploaded to YouTube everyminute, 1 billion views per day...”

, 2009

Increasing importance of video live streams: Obama’s inauguration,Michael Jackson’s memorial service, music video debuts...

, 2009

D.Borth: : TubeTagger 3 December 6 2009

Page 6: TubeTagger – YouTube-based Concept Detection

Data, data, data

”...TV, video on demand, Internet video, and P2P video willaccount for over 91 percent of global consumer traffic by 2013...”

”Cisco Visual Networking Index”, 2009

”...more than 20 hours of video uploaded to YouTube everyminute, 1 billion views per day...”

, 2009

Increasing importance of video live streams: Obama’s inauguration,Michael Jackson’s memorial service, music video debuts...

, 2009

D.Borth: : TubeTagger 3 December 6 2009

Page 7: TubeTagger – YouTube-based Concept Detection

Content-based Video Retrieval

How to search in large video databases?

D.Borth: : TubeTagger 4 December 6 2009

Page 8: TubeTagger – YouTube-based Concept Detection

Content-based Video Retrieval

How to search in large video databases?

???

D.Borth: : TubeTagger 5 December 6 2009

Page 9: TubeTagger – YouTube-based Concept Detection

Content-based Video Retrieval

How to search in large video databases?

interview

soccer

beach

I Video Concept Detection [MediaMill, Columbia, IBM,...]

→ as key building block of CBVR

D.Borth: : TubeTagger 6 December 6 2009

Page 10: TubeTagger – YouTube-based Concept Detection

Content-based Video Retrieval

How to search in large video databases?

interview

soccer

beach

Supervised Machine Learning

I Video Concept Detection [MediaMill, Columbia, IBM,...]

→ as key building block of CBVR

I Supervised Machine Learning→ need labeled training data

D.Borth: : TubeTagger 7 December 6 2009

Page 11: TubeTagger – YouTube-based Concept Detection

Training Data

State-of-the-art Datasets

I TRECVID

I LSCOM

Acquisition

I collaborative [Ayache07]

I precise & high quality

I time consuming effort

I vocabulary size does not scale→ from hundreds to thousands of concepts [Hauptmann07]

I missing new emerging concepts of interest→ e.g. ”Michael Jackson”, ”Obama”, ”iPod”1

1top ranked searches 2009-Q3 by “Google Insights for Search” for web search, news searchand product search respectively

D.Borth: : TubeTagger 8 December 6 2009

Page 12: TubeTagger – YouTube-based Concept Detection

Training Data

State-of-the-art Datasets

I TRECVID

I LSCOM

Acquisition

I collaborative [Ayache07]

I precise & high quality

I time consuming effort

I vocabulary size does not scale→ from hundreds to thousands of concepts [Hauptmann07]

I missing new emerging concepts of interest→ e.g. ”Michael Jackson”, ”Obama”, ”iPod”1

1top ranked searches 2009-Q3 by “Google Insights for Search” for web search, news searchand product search respectively

D.Borth: : TubeTagger 8 December 6 2009

Page 13: TubeTagger – YouTube-based Concept Detection

Web-video as training data

Pros

I available at large scale

I high variability of dataI enriched with

I tagsI commentsI ratings

I could allow automaticconcept learning

ConsI weakly labeled

I incomplete / subjectiveI coarse

I focus of interest

I domain change

I video portal as black box

D.Borth: : TubeTagger 9 December 6 2009

Page 14: TubeTagger – YouTube-based Concept Detection

Web-video as training data

Pros

I available at large scale

I high variability of dataI enriched with

I tagsI commentsI ratings

I could allow automaticconcept learning

ConsI weakly labeled

I incomplete / subjectiveI coarse

I focus of interest

I domain change

I video portal as black box

D.Borth: : TubeTagger 9 December 6 2009

Page 15: TubeTagger – YouTube-based Concept Detection

Web-video as training data

Pros

I available at large scale

I high variability of dataI enriched with

I tagsI commentsI ratings

I could allow automaticconcept learning

ConsI weakly labeled

I incomplete / subjectiveI coarse

I focus of interest

I domain change

I video portal as black box

D.Borth: : TubeTagger 9 December 6 2009

Page 16: TubeTagger – YouTube-based Concept Detection

TubeTagger System Overview

Notation

I t ∈ T := tags (≈concepts)

I x ∈ Xdb := keyframe of a videos

I P(t|x) := probability of tag t for keyframe x

D.Borth: : TubeTagger 10 December 6 2009

Page 17: TubeTagger – YouTube-based Concept Detection

TubeTagger System Overview

Notation

I t ∈ T := tags (≈concepts)

I x ∈ Xdb := keyframe of a videos

I P(t|x) := probability of tag t for keyframe x

D.Borth: : TubeTagger 10 December 6 2009

Page 18: TubeTagger – YouTube-based Concept Detection

TubeTagger System Overview

Training Data Acquisition

I use YouTube as primarytraining data source

I use tags as weak labelsI pos. = taggedI neg. = all non-tagged

D.Borth: : TubeTagger 11 December 6 2009

Page 19: TubeTagger – YouTube-based Concept Detection

TubeTagger System Overview

Training Data Acquisition

I use YouTube as primarytraining data source

I use tags as weak labelsI pos. = taggedI neg. = all non-tagged

D.Borth: : TubeTagger 11 December 6 2009

Page 20: TubeTagger – YouTube-based Concept Detection

TubeTagger System Overview

Training Data Acquisition

I use YouTube as primarytraining data source

I use tags as weak labelsI pos. = taggedI neg. = all non-tagged

Learning Pipelines

I Visual Concept Learning

I Semantic ConceptLearning

D.Borth: : TubeTagger 12 December 6 2009

Page 21: TubeTagger – YouTube-based Concept Detection

Visual Concept Learning

Features

I keyframe extraction→ x ∈ Xdb

I bag-of-visual-wordsdescriptors [Sivic06]

I SIFT [Lowe99]

I SURF [Bay06]

Statistical Models

I SVM [Scholkopf01]

I PAMIR [Grangier08, Paredes09]

I Max. Entropy [Deselaers05]

→ output: P(t|x)

D.Borth: : TubeTagger 13 December 6 2009

Page 22: TubeTagger – YouTube-based Concept Detection

Visual Concept Learning

Features

I keyframe extraction→ x ∈ Xdb

I bag-of-visual-wordsdescriptors [Sivic06]

I SIFT [Lowe99]

I SURF [Bay06]

Statistical Models

I SVM [Scholkopf01]

I PAMIR [Grangier08, Paredes09]

I Max. Entropy [Deselaers05]

→ output: P(t|x)

D.Borth: : TubeTagger 13 December 6 2009

Page 23: TubeTagger – YouTube-based Concept Detection

Visual Concept Learning

Features

I keyframe extraction→ x ∈ Xdb

I bag-of-visual-wordsdescriptors [Sivic06]

I SIFT [Lowe99]

I SURF [Bay06]

Statistical Models

I SVM [Scholkopf01]

I PAMIR [Grangier08, Paredes09]

I Max. Entropy [Deselaers05]

→ output: P(t|x)

D.Borth: : TubeTagger 13 December 6 2009

Page 24: TubeTagger – YouTube-based Concept Detection

Visual Concept Learning

Features

I keyframe extraction→ x ∈ Xdb

I bag-of-visual-wordsdescriptors [Sivic06]

I SIFT [Lowe99]

I SURF [Bay06]

Statistical Models

I SVM [Scholkopf01]

I PAMIR [Grangier08, Paredes09]

I Max. Entropy [Deselaers05]

→ output: P(t|x)

D.Borth: : TubeTagger 13 December 6 2009

Page 25: TubeTagger – YouTube-based Concept Detection

Semantic Concept Learning

Query Formulation

I concepts 6= textual queriesI mapping of text queries to concept vocabulary

I ”computer” → Monitor, Windows-Desktop, iPhoneI ”funny” → Muppets, Cats, Commercials

I learning of tag co-occurrences

D.Borth: : TubeTagger 14 December 6 2009

Page 26: TubeTagger – YouTube-based Concept Detection

Semantic Concept Learning

Query Formulation

I concepts 6= textual queriesI mapping of text queries to concept vocabulary

I ”computer” → Monitor, Windows-Desktop, iPhoneI ”funny” → Muppets, Cats, Commercials

I learning of tag co-occurrences

D.Borth: : TubeTagger 14 December 6 2009

Page 27: TubeTagger – YouTube-based Concept Detection

Semantic Concept Learning

FeatureI bag-of-words features

I t ∈ T → ht

I q := hq

I mapping to concepts asweights

I w(q, t) :=< ht , hq >

Approach

I T5 = w(q, t)top5

I final fusionI P(q|x) =∑

t∈T5

w(q,t)Pt∈T5

w(q,t) P(t|x)

D.Borth: : TubeTagger 15 December 6 2009

Page 28: TubeTagger – YouTube-based Concept Detection

Semantic Concept Learning

FeatureI bag-of-words features

I t ∈ T → ht

I q := hq

I mapping to concepts asweights

I w(q, t) :=< ht , hq >

Approach

I T5 = w(q, t)top5

I final fusionI P(q|x) =∑

t∈T5

w(q,t)Pt∈T5

w(q,t) P(t|x)

D.Borth: : TubeTagger 15 December 6 2009

Page 29: TubeTagger – YouTube-based Concept Detection

Experiments

Dataset

I 1200 hrs. (= 750k keyframes)

I 233 conceptsI per concept:

I 150 videos for trainingI 50 videos for testing ”soccer” ”traffic”

Clips/Concept Evaluation

I 10 concepts

I trained on N clips

I tested on 200k keyframesI saturation at 100− 150

clips/concepts for SURF+PAMIR

D.Borth: : TubeTagger 16 December 6 2009

Page 30: TubeTagger – YouTube-based Concept Detection

Experiments

Dataset

I 1200 hrs. (= 750k keyframes)

I 233 conceptsI per concept:

I 150 videos for trainingI 50 videos for testing ”soccer” ”traffic”

Clips/Concept Evaluation

I 10 concepts

I trained on N clips

I tested on 200k keyframesI saturation at 100− 150

clips/concepts for SURF+PAMIR

D.Borth: : TubeTagger 16 December 6 2009

Page 31: TubeTagger – YouTube-based Concept Detection

Experiments

Dataset

I 1200 hrs. (= 750k keyframes)

I 233 conceptsI per concept:

I 150 videos for trainingI 50 videos for testing ”soccer” ”traffic”

Clips/Concept Evaluation

I 10 concepts

I trained on N clips

I tested on 200k keyframesI saturation at 100− 150

clips/concepts for SURF+PAMIR

D.Borth: : TubeTagger 16 December 6 2009

Page 32: TubeTagger – YouTube-based Concept Detection

Experiments

Feature & Classifier Evaluation

I 81 concepts

I video level testing (avg. fusion)I results:

I MAP: 22.4%I MAP: 15.4% (6 times faster)

featuremodel SURF SIFT

SVMs 20.4 22.4PAMIR 15.4 18.4

MAXENT 14.1 15.5

Performance Distribution

restaurantorigami operation-roomsoccer busshowing 78 representative concepts out of 233

D.Borth: : TubeTagger 17 December 6 2009

Page 33: TubeTagger – YouTube-based Concept Detection

Experiments

Feature & Classifier Evaluation

I 81 concepts

I video level testing (avg. fusion)I results:

I MAP: 22.4%I MAP: 15.4% (6 times faster)

featuremodel SURF SIFT

SVMs 20.4 22.4PAMIR 15.4 18.4

MAXENT 14.1 15.5

Performance Distribution

restaurantorigami operation-roomsoccer busshowing 78 representative concepts out of 233

D.Borth: : TubeTagger 17 December 6 2009

Page 34: TubeTagger – YouTube-based Concept Detection

Insights in Web-based Concept Detection

Good Concepts

I ”boat/ship”, ”pyramids”

I broad community of YouTube users

I often ”interesting”, ”spectacular”

Redundant Concepts

I ”drummer”, ”fencing”

I series of clips, not sufficiently diverse

I problem: generalization

Bad Concepts

I ”fence”, ”operation-room”

I not regularly used as a tag

”boat ship”

”drummer”

”fence”

D.Borth: : TubeTagger 18 December 6 2009

Page 35: TubeTagger – YouTube-based Concept Detection

TubeTagger Web Demo

User Interface

1. selected a concept

2. or enter a query

3. switch between video or keyframe level

D.Borth: : TubeTagger 19 December 6 2009

Page 36: TubeTagger – YouTube-based Concept Detection

TubeTagger Web Demo

Deep Tagging

I frame-accurate concept detection beyond coarse tags

I find concept (e.g. ”Christmas Tree”) within a video clip

D.Borth: : TubeTagger 20 December 6 2009

Page 37: TubeTagger – YouTube-based Concept Detection

TubeTagger Web Demo

Text-based Search

I matches queries to concepts

I e.g. ”diving” → ”underwater”, ”shipwreck”, ”fish”, . . .

D.Borth: : TubeTagger 21 December 6 2009

Page 38: TubeTagger – YouTube-based Concept Detection

Summary

TubeTagger - YouTube-based Concept Detection

I utilize YouTube videos as training source for visualconcept detector learning

I utilize YouTube tags for semantic model generationI web demo providing:

I deep taggingI tag recommendationI text-based search

D.Borth: : TubeTagger 22 December 6 2009

Page 39: TubeTagger – YouTube-based Concept Detection

questions?

project site: www.dfki.de/moonvid

web demo at: http://madm.dfki.de/demo/tubetagger/

D.Borth: : TubeTagger 23 December 6 2009