TubeTagger – YouTube-based Concept Detection
-
Upload
damian-borth -
Category
Technology
-
view
791 -
download
2
description
Transcript of TubeTagger – YouTube-based Concept Detection
TubeTagger— YouTube-based Concept Detection —
Adrian Ulges, Markus KochDamian Borth, Thomas M. Breuel
German Research Center for Artificial Intelligence (DFKI) &University of Kaiserslautern
December 6 2009
D.Borth: : TubeTagger 1 December 6 2009
Outline
Motivation
TubeTagger System
TubeTagger Web Demo
Summary
D.Borth: : TubeTagger 2 December 6 2009
Data, data, data
”...TV, video on demand, Internet video, and P2P video willaccount for over 91 percent of global consumer traffic by 2013...”
”Cisco Visual Networking Index”, 2009
”...more than 20 hours of video uploaded to YouTube everyminute, 1 billion views per day...”
, 2009
Increasing importance of video live streams: Obama’s inauguration,Michael Jackson’s memorial service, music video debuts...
, 2009
D.Borth: : TubeTagger 3 December 6 2009
Data, data, data
”...TV, video on demand, Internet video, and P2P video willaccount for over 91 percent of global consumer traffic by 2013...”
”Cisco Visual Networking Index”, 2009
”...more than 20 hours of video uploaded to YouTube everyminute, 1 billion views per day...”
, 2009
Increasing importance of video live streams: Obama’s inauguration,Michael Jackson’s memorial service, music video debuts...
, 2009
D.Borth: : TubeTagger 3 December 6 2009
Data, data, data
”...TV, video on demand, Internet video, and P2P video willaccount for over 91 percent of global consumer traffic by 2013...”
”Cisco Visual Networking Index”, 2009
”...more than 20 hours of video uploaded to YouTube everyminute, 1 billion views per day...”
, 2009
Increasing importance of video live streams: Obama’s inauguration,Michael Jackson’s memorial service, music video debuts...
, 2009
D.Borth: : TubeTagger 3 December 6 2009
Data, data, data
”...TV, video on demand, Internet video, and P2P video willaccount for over 91 percent of global consumer traffic by 2013...”
”Cisco Visual Networking Index”, 2009
”...more than 20 hours of video uploaded to YouTube everyminute, 1 billion views per day...”
, 2009
Increasing importance of video live streams: Obama’s inauguration,Michael Jackson’s memorial service, music video debuts...
, 2009
D.Borth: : TubeTagger 3 December 6 2009
Content-based Video Retrieval
How to search in large video databases?
D.Borth: : TubeTagger 4 December 6 2009
Content-based Video Retrieval
How to search in large video databases?
???
D.Borth: : TubeTagger 5 December 6 2009
Content-based Video Retrieval
How to search in large video databases?
interview
soccer
beach
I Video Concept Detection [MediaMill, Columbia, IBM,...]
→ as key building block of CBVR
D.Borth: : TubeTagger 6 December 6 2009
Content-based Video Retrieval
How to search in large video databases?
interview
soccer
beach
Supervised Machine Learning
I Video Concept Detection [MediaMill, Columbia, IBM,...]
→ as key building block of CBVR
I Supervised Machine Learning→ need labeled training data
D.Borth: : TubeTagger 7 December 6 2009
Training Data
State-of-the-art Datasets
I TRECVID
I LSCOM
Acquisition
I collaborative [Ayache07]
I precise & high quality
I time consuming effort
I vocabulary size does not scale→ from hundreds to thousands of concepts [Hauptmann07]
I missing new emerging concepts of interest→ e.g. ”Michael Jackson”, ”Obama”, ”iPod”1
1top ranked searches 2009-Q3 by “Google Insights for Search” for web search, news searchand product search respectively
D.Borth: : TubeTagger 8 December 6 2009
Training Data
State-of-the-art Datasets
I TRECVID
I LSCOM
Acquisition
I collaborative [Ayache07]
I precise & high quality
I time consuming effort
I vocabulary size does not scale→ from hundreds to thousands of concepts [Hauptmann07]
I missing new emerging concepts of interest→ e.g. ”Michael Jackson”, ”Obama”, ”iPod”1
1top ranked searches 2009-Q3 by “Google Insights for Search” for web search, news searchand product search respectively
D.Borth: : TubeTagger 8 December 6 2009
Web-video as training data
Pros
I available at large scale
I high variability of dataI enriched with
I tagsI commentsI ratings
I could allow automaticconcept learning
ConsI weakly labeled
I incomplete / subjectiveI coarse
I focus of interest
I domain change
I video portal as black box
D.Borth: : TubeTagger 9 December 6 2009
Web-video as training data
Pros
I available at large scale
I high variability of dataI enriched with
I tagsI commentsI ratings
I could allow automaticconcept learning
ConsI weakly labeled
I incomplete / subjectiveI coarse
I focus of interest
I domain change
I video portal as black box
D.Borth: : TubeTagger 9 December 6 2009
Web-video as training data
Pros
I available at large scale
I high variability of dataI enriched with
I tagsI commentsI ratings
I could allow automaticconcept learning
ConsI weakly labeled
I incomplete / subjectiveI coarse
I focus of interest
I domain change
I video portal as black box
D.Borth: : TubeTagger 9 December 6 2009
TubeTagger System Overview
Notation
I t ∈ T := tags (≈concepts)
I x ∈ Xdb := keyframe of a videos
I P(t|x) := probability of tag t for keyframe x
D.Borth: : TubeTagger 10 December 6 2009
TubeTagger System Overview
Notation
I t ∈ T := tags (≈concepts)
I x ∈ Xdb := keyframe of a videos
I P(t|x) := probability of tag t for keyframe x
D.Borth: : TubeTagger 10 December 6 2009
TubeTagger System Overview
Training Data Acquisition
I use YouTube as primarytraining data source
I use tags as weak labelsI pos. = taggedI neg. = all non-tagged
D.Borth: : TubeTagger 11 December 6 2009
TubeTagger System Overview
Training Data Acquisition
I use YouTube as primarytraining data source
I use tags as weak labelsI pos. = taggedI neg. = all non-tagged
D.Borth: : TubeTagger 11 December 6 2009
TubeTagger System Overview
Training Data Acquisition
I use YouTube as primarytraining data source
I use tags as weak labelsI pos. = taggedI neg. = all non-tagged
Learning Pipelines
I Visual Concept Learning
I Semantic ConceptLearning
D.Borth: : TubeTagger 12 December 6 2009
Visual Concept Learning
Features
I keyframe extraction→ x ∈ Xdb
I bag-of-visual-wordsdescriptors [Sivic06]
I SIFT [Lowe99]
I SURF [Bay06]
Statistical Models
I SVM [Scholkopf01]
I PAMIR [Grangier08, Paredes09]
I Max. Entropy [Deselaers05]
→ output: P(t|x)
D.Borth: : TubeTagger 13 December 6 2009
Visual Concept Learning
Features
I keyframe extraction→ x ∈ Xdb
I bag-of-visual-wordsdescriptors [Sivic06]
I SIFT [Lowe99]
I SURF [Bay06]
Statistical Models
I SVM [Scholkopf01]
I PAMIR [Grangier08, Paredes09]
I Max. Entropy [Deselaers05]
→ output: P(t|x)
D.Borth: : TubeTagger 13 December 6 2009
Visual Concept Learning
Features
I keyframe extraction→ x ∈ Xdb
I bag-of-visual-wordsdescriptors [Sivic06]
I SIFT [Lowe99]
I SURF [Bay06]
Statistical Models
I SVM [Scholkopf01]
I PAMIR [Grangier08, Paredes09]
I Max. Entropy [Deselaers05]
→ output: P(t|x)
D.Borth: : TubeTagger 13 December 6 2009
Visual Concept Learning
Features
I keyframe extraction→ x ∈ Xdb
I bag-of-visual-wordsdescriptors [Sivic06]
I SIFT [Lowe99]
I SURF [Bay06]
Statistical Models
I SVM [Scholkopf01]
I PAMIR [Grangier08, Paredes09]
I Max. Entropy [Deselaers05]
→ output: P(t|x)
D.Borth: : TubeTagger 13 December 6 2009
Semantic Concept Learning
Query Formulation
I concepts 6= textual queriesI mapping of text queries to concept vocabulary
I ”computer” → Monitor, Windows-Desktop, iPhoneI ”funny” → Muppets, Cats, Commercials
I learning of tag co-occurrences
D.Borth: : TubeTagger 14 December 6 2009
Semantic Concept Learning
Query Formulation
I concepts 6= textual queriesI mapping of text queries to concept vocabulary
I ”computer” → Monitor, Windows-Desktop, iPhoneI ”funny” → Muppets, Cats, Commercials
I learning of tag co-occurrences
D.Borth: : TubeTagger 14 December 6 2009
Semantic Concept Learning
FeatureI bag-of-words features
I t ∈ T → ht
I q := hq
I mapping to concepts asweights
I w(q, t) :=< ht , hq >
Approach
I T5 = w(q, t)top5
I final fusionI P(q|x) =∑
t∈T5
w(q,t)Pt∈T5
w(q,t) P(t|x)
D.Borth: : TubeTagger 15 December 6 2009
Semantic Concept Learning
FeatureI bag-of-words features
I t ∈ T → ht
I q := hq
I mapping to concepts asweights
I w(q, t) :=< ht , hq >
Approach
I T5 = w(q, t)top5
I final fusionI P(q|x) =∑
t∈T5
w(q,t)Pt∈T5
w(q,t) P(t|x)
D.Borth: : TubeTagger 15 December 6 2009
Experiments
Dataset
I 1200 hrs. (= 750k keyframes)
I 233 conceptsI per concept:
I 150 videos for trainingI 50 videos for testing ”soccer” ”traffic”
Clips/Concept Evaluation
I 10 concepts
I trained on N clips
I tested on 200k keyframesI saturation at 100− 150
clips/concepts for SURF+PAMIR
D.Borth: : TubeTagger 16 December 6 2009
Experiments
Dataset
I 1200 hrs. (= 750k keyframes)
I 233 conceptsI per concept:
I 150 videos for trainingI 50 videos for testing ”soccer” ”traffic”
Clips/Concept Evaluation
I 10 concepts
I trained on N clips
I tested on 200k keyframesI saturation at 100− 150
clips/concepts for SURF+PAMIR
D.Borth: : TubeTagger 16 December 6 2009
Experiments
Dataset
I 1200 hrs. (= 750k keyframes)
I 233 conceptsI per concept:
I 150 videos for trainingI 50 videos for testing ”soccer” ”traffic”
Clips/Concept Evaluation
I 10 concepts
I trained on N clips
I tested on 200k keyframesI saturation at 100− 150
clips/concepts for SURF+PAMIR
D.Borth: : TubeTagger 16 December 6 2009
Experiments
Feature & Classifier Evaluation
I 81 concepts
I video level testing (avg. fusion)I results:
I MAP: 22.4%I MAP: 15.4% (6 times faster)
featuremodel SURF SIFT
SVMs 20.4 22.4PAMIR 15.4 18.4
MAXENT 14.1 15.5
Performance Distribution
restaurantorigami operation-roomsoccer busshowing 78 representative concepts out of 233
D.Borth: : TubeTagger 17 December 6 2009
Experiments
Feature & Classifier Evaluation
I 81 concepts
I video level testing (avg. fusion)I results:
I MAP: 22.4%I MAP: 15.4% (6 times faster)
featuremodel SURF SIFT
SVMs 20.4 22.4PAMIR 15.4 18.4
MAXENT 14.1 15.5
Performance Distribution
restaurantorigami operation-roomsoccer busshowing 78 representative concepts out of 233
D.Borth: : TubeTagger 17 December 6 2009
Insights in Web-based Concept Detection
Good Concepts
I ”boat/ship”, ”pyramids”
I broad community of YouTube users
I often ”interesting”, ”spectacular”
Redundant Concepts
I ”drummer”, ”fencing”
I series of clips, not sufficiently diverse
I problem: generalization
Bad Concepts
I ”fence”, ”operation-room”
I not regularly used as a tag
”boat ship”
”drummer”
”fence”
D.Borth: : TubeTagger 18 December 6 2009
TubeTagger Web Demo
User Interface
1. selected a concept
2. or enter a query
3. switch between video or keyframe level
D.Borth: : TubeTagger 19 December 6 2009
TubeTagger Web Demo
Deep Tagging
I frame-accurate concept detection beyond coarse tags
I find concept (e.g. ”Christmas Tree”) within a video clip
D.Borth: : TubeTagger 20 December 6 2009
TubeTagger Web Demo
Text-based Search
I matches queries to concepts
I e.g. ”diving” → ”underwater”, ”shipwreck”, ”fish”, . . .
D.Borth: : TubeTagger 21 December 6 2009
Summary
TubeTagger - YouTube-based Concept Detection
I utilize YouTube videos as training source for visualconcept detector learning
I utilize YouTube tags for semantic model generationI web demo providing:
I deep taggingI tag recommendationI text-based search
D.Borth: : TubeTagger 22 December 6 2009
questions?
project site: www.dfki.de/moonvid
web demo at: http://madm.dfki.de/demo/tubetagger/
D.Borth: : TubeTagger 23 December 6 2009