Relevance Filtering meets Active Learning: Improving Web-based Concept Detectors

Relevance Filtering meets Active Learning— Improving Web-based Concept Detectors —

Damian Borth*, Adrian Ulges, Thomas M. Breuel

German Research Center for Artificial Intelligence (DFKI) &University of Kaiserslautern, Germany

March 29 2010

D.Borth: : Relevance Filtering meets Active Learning 1 March 29 2010

Outline

Introduction

Approach: Active Relevance Filtering

Experimental Results

Summary

Digital Video

”...about 24 hours of video is uploaded every minute, 1 billion views per day...”

, 2010

”...TV, video on demand, Internet video, and P2P video will account for over91 percent of global consumer traffic by 2013...”

, 2009

Information Overload vs. Video Retrieval

I high demand for automatic machine indexing

I one solution: concept detection [Snoek09], [Smeaton09], ...

→ as key building block of CBVR

Digital Video

, 2010

, 2009

Digital Video

, 2010

, 2009

Digital Video

, 2010

, 2009

Digital Video

, 2010

, 2009

Concept Detection - Framework

I unknown video shot X

I concept vocabulary t1...tnI statistical model estimating concept presence P(ti |X )

I expert labels are used as training data

I time consuming effort [Ayache07]

→ datasets are limited in vocabulary size [Hauptmann07],

overfit [Yang08] and narrowed in its flexibility

I propose web video as training source [Ulges07]

I use tags as class labels

I allows autonomous concept learning

I label noise problemI subjectiveI coarse

I relevance filteringI adapt concept learning to noisy labelsI perform label refinement

Relevance Filtering Approaches

filtered labels

Active Relevance FilteringÄ+:

manual annotationwith Active LearningÄ

automatic Relevance Filtering:

weak labels

Relevance Filtering

I auto. relevance filtering

I active learning

I combination of both → active relevance filtering

Automatic Relevance Filtering

I take label noise into account during model training

I identify false positive and filter them

Related Work

I joint probabilities of tags and content [Bernard03], [Feng04]

I neighbor voting [Snoek09]

I samples reweighting according to inferred relevance [Ulges08]

I take label noise into account during model training

I identify false positive and filter them

Related Work

I joint probabilities of tags and content [Bernard03], [Feng04]

I neighbor voting [Snoek09]

I samples reweighting according to inferred relevance [Ulges08]

Approach [Ulges10]

I training data: X = {x1, . . . , xn}I training labels: Y = {y1, . . . , yn} (known)

I true labels: Y = {y1, . . . , yn} (unknown)yi = −1 → yi = −1yi = 1 → yi ∈ {1,−1} (true pos. or false pos.)

I statistical model: kernel densitiesI infer yi by estimating relevance scores βi = P(yi |xi , yi = 1)

I fitted by EM

I model extension: φ(X , Y )→ φ(X , Y , β)

Approach [Ulges10]

I training data: X = {x1, . . . , xn}I training labels: Y = {y1, . . . , yn} (known)

I true labels: Y = {y1, . . . , yn} (unknown)yi = −1 → yi = −1yi = 1 → yi ∈ {1,−1} (true pos. or false pos.)

I statistical model: kernel densitiesI infer yi by estimating relevance scores βi = P(yi |xi , yi = 1)

I fitted by EM

I model extension: φ(X , Y )→ φ(X , Y , β)

Active Learning

I select informative samples for manual labeling

Related Work

I text classification [Lewis94], [Tong02], ...

I image retrieval [Tong01], [Chang05], ...

I video data labeling [Ayache07], [Hua08], ...

Sample Selection Methods

1. most relevant sampling

2. uncertainty sampling

3. most relevant sampling + density weighted repulsion (DWR)

Active Learning

Related Work

Active Learning

Related Work

Active Learning

I pool-based active learning

I selects label according to model

I new labeled sample helps further selection

Our Approach: Active Relevance Filtering

I active learning + auto. relevance filtering

I selects label according to filtered model

I new labeled sample helps further filtering & selection

Experiments

YouTube-22Concepts-Dataset

I 100 videos per concept

I keyframes extractedI features:

I SIFT [Lowe99]

I visual words [Sivic03]”swimming” ”cats”

I subset of 10 conceptsI trained on:

I 500 noisy pos. samplesI 1000 neg. samples

I tested on:I 500 pos. samplesI 1500 neg. samples

Noisy Pos. Samples

I label precision of webvideo: 20− 50% [Ulges10]

I for this experiments: 20%I 500 noisy pos. samples:

I 100 true pos. samplesI 400 false pos. samples

Experiments

I SIFT [Lowe99]

Noisy Pos. Samples

Experiments

I SIFT [Lowe99]

Noisy Pos. Samples

Experiments

I SIFT [Lowe99]

Noisy Pos. Samples

Experiments - Impact of Label Noise

Relevance Filtering

no relevance filteringautomatic relevance filt.ground truth

System Performance

I Mean Average Precision (MAP)

I auto. relevance filtering helps

I potential gap of improvementremains

system MAPnoisy data 0.455

auto. relevance filtering 0.482ground truth 0.557

Experiments - Relevance Filtering

0 100 200 300 400 500

Active Learning

labeled samples

ground truth labels

automatic relevance filtering

no relevance filtering

DWRrandommost relevantuncertainty

0 100 200 300 400 500

Active Relevance Filtering

labeled samples

ground truth labels

DWRmost relevantuncertaintyrandom

Active Learning

I active learning canoutperform randomselection

Active Rel. Filtering

I initial auto. relevancefiltering helps

I improves active learning further

Experiments - Relevance Filtering

0 100 200 300 400 500

Active Learning

labeled samples

ground truth labels

DWRrandommost relevantuncertainty

0 100 200 300 400 500

Active Relevance Filtering

labeled samples

ground truth labels

DWRmost relevantuncertaintyrandom

Direct Comparison

I DWR sampling approach# refined samples AL ARF

0 0.455 0.48250 0.474 0.541

250 0.525 0.557

Experiments - Top Ranked Keyframes

concept: basketball

a) no relevance filtering, b) automatic relevance filtering, c) active relevance filtering

concept: basketball

concept: eiffeltower

Discussion

Contributions

I concept learning from noisy (= weakly labeled) web video

I evaluation of different refinement approaches

I proposed approach: active relevance filtering

I automatic relevance filtering helps but is limited

I active learning is outperforming random selectionI active relevance filtering is able to improves active learning

I auto. relevance filtering + active learning

Discussion

Contributions

I concept learning from noisy (= weakly labeled) web video

I evaluation of different refinement approaches

I proposed approach: active relevance filtering

I automatic relevance filtering helps but is limited

I active learning is outperforming random selectionI active relevance filtering is able to improves active learning

I auto. relevance filtering + active learning

Questions?

Relevance Filtering meets Active Learning: Improving Web-based Concept Detectors

Technology

Transcript of Relevance Filtering meets Active Learning: Improving Web-based Concept Detectors