Relevance Filtering meets Active Learning: Improving Web-based Concept Detectors

38
Relevance Filtering meets Active Learning — Improving Web-based Concept Detectors — Damian Borth*, Adrian Ulges, Thomas M. Breuel German Research Center for Artificial Intelligence (DFKI) & University of Kaiserslautern, Germany March 29 2010 D.Borth: : Relevance Filtering meets Active Learning 1 March 29 2010

description

Talk at the ACM Int. Conference on Multimedia Information Retrieval (MIR) in Philadelphia, USA Link to original publication: http://madm.dfki.de/publication&pubid=4535

Transcript of Relevance Filtering meets Active Learning: Improving Web-based Concept Detectors

Page 1: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Relevance Filtering meets Active Learning— Improving Web-based Concept Detectors —

Damian Borth*, Adrian Ulges, Thomas M. Breuel

German Research Center for Artificial Intelligence (DFKI) &University of Kaiserslautern, Germany

March 29 2010

D.Borth: : Relevance Filtering meets Active Learning 1 March 29 2010

Page 2: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Outline

Introduction

Approach: Active Relevance Filtering

Experimental Results

Summary

D.Borth: : Relevance Filtering meets Active Learning 2 March 29 2010

Page 3: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Digital Video

”...about 24 hours of video is uploaded every minute, 1 billion views per day...”

, 2010

”...TV, video on demand, Internet video, and P2P video will account for over91 percent of global consumer traffic by 2013...”

, 2009

Information Overload vs. Video Retrieval

I high demand for automatic machine indexing

I one solution: concept detection [Snoek09], [Smeaton09], ...

→ as key building block of CBVR

D.Borth: : Relevance Filtering meets Active Learning 3 March 29 2010

Page 4: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Digital Video

”...about 24 hours of video is uploaded every minute, 1 billion views per day...”

, 2010

”...TV, video on demand, Internet video, and P2P video will account for over91 percent of global consumer traffic by 2013...”

, 2009

Information Overload vs. Video Retrieval

I high demand for automatic machine indexing

I one solution: concept detection [Snoek09], [Smeaton09], ...

→ as key building block of CBVR

D.Borth: : Relevance Filtering meets Active Learning 3 March 29 2010

Page 5: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Digital Video

”...about 24 hours of video is uploaded every minute, 1 billion views per day...”

, 2010

”...TV, video on demand, Internet video, and P2P video will account for over91 percent of global consumer traffic by 2013...”

, 2009

Information Overload vs. Video Retrieval

I high demand for automatic machine indexing

I one solution: concept detection [Snoek09], [Smeaton09], ...

→ as key building block of CBVR

D.Borth: : Relevance Filtering meets Active Learning 3 March 29 2010

Page 6: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Digital Video

”...about 24 hours of video is uploaded every minute, 1 billion views per day...”

, 2010

”...TV, video on demand, Internet video, and P2P video will account for over91 percent of global consumer traffic by 2013...”

, 2009

Information Overload vs. Video Retrieval

I high demand for automatic machine indexing

I one solution: concept detection [Snoek09], [Smeaton09], ...

→ as key building block of CBVR

D.Borth: : Relevance Filtering meets Active Learning 3 March 29 2010

Page 7: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Digital Video

”...about 24 hours of video is uploaded every minute, 1 billion views per day...”

, 2010

”...TV, video on demand, Internet video, and P2P video will account for over91 percent of global consumer traffic by 2013...”

, 2009

Information Overload vs. Video Retrieval

I high demand for automatic machine indexing

I one solution: concept detection [Snoek09], [Smeaton09], ...

→ as key building block of CBVR

D.Borth: : Relevance Filtering meets Active Learning 3 March 29 2010

Page 8: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Concept Detection - Framework

I unknown video shot X

I concept vocabulary t1...tnI statistical model estimating concept presence P(ti |X )

D.Borth: : Relevance Filtering meets Active Learning 4 March 29 2010

Page 9: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Concept Detection - Framework

I expert labels are used as training data

I time consuming effort [Ayache07]

→ datasets are limited in vocabulary size [Hauptmann07],

overfit [Yang08] and narrowed in its flexibility

D.Borth: : Relevance Filtering meets Active Learning 5 March 29 2010

Page 10: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Concept Detection - Framework

I propose web video as training source [Ulges07]

I use tags as class labels

I allows autonomous concept learning

D.Borth: : Relevance Filtering meets Active Learning 6 March 29 2010

Page 11: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Concept Detection - Framework

I label noise problemI subjectiveI coarse

D.Borth: : Relevance Filtering meets Active Learning 7 March 29 2010

Page 12: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Concept Detection - Framework

I relevance filteringI adapt concept learning to noisy labelsI perform label refinement

D.Borth: : Relevance Filtering meets Active Learning 8 March 29 2010

Page 13: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Relevance Filtering Approaches

filtered labels

¸

˚¸

˚

˚¸

˚

Active Relevance FilteringÄ+:

manual annotationwith Active LearningÄ

automatic Relevance Filtering:

weak labels

(¸)

(¸)

(¸)

(¸)

(¸)

(¸)

(¸)

Relevance Filtering

I auto. relevance filtering

I active learning

I combination of both → active relevance filtering

D.Borth: : Relevance Filtering meets Active Learning 9 March 29 2010

Page 14: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Automatic Relevance Filtering

Idea

I take label noise into account during model training

I identify false positive and filter them

Related Work

I joint probabilities of tags and content [Bernard03], [Feng04]

I neighbor voting [Snoek09]

I samples reweighting according to inferred relevance [Ulges08]

D.Borth: : Relevance Filtering meets Active Learning 10 March 29 2010

Page 15: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Automatic Relevance Filtering

Idea

I take label noise into account during model training

I identify false positive and filter them

Related Work

I joint probabilities of tags and content [Bernard03], [Feng04]

I neighbor voting [Snoek09]

I samples reweighting according to inferred relevance [Ulges08]

D.Borth: : Relevance Filtering meets Active Learning 10 March 29 2010

Page 16: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Automatic Relevance Filtering

Approach [Ulges10]

I training data: X = {x1, . . . , xn}I training labels: Y = {y1, . . . , yn} (known)

I true labels: Y = {y1, . . . , yn} (unknown)yi = −1 → yi = −1yi = 1 → yi ∈ {1,−1} (true pos. or false pos.)

I statistical model: kernel densitiesI infer yi by estimating relevance scores βi = P(yi |xi , yi = 1)

I fitted by EM

I model extension: φ(X , Y )→ φ(X , Y , β)

D.Borth: : Relevance Filtering meets Active Learning 11 March 29 2010

Page 17: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Automatic Relevance Filtering

Approach [Ulges10]

I training data: X = {x1, . . . , xn}I training labels: Y = {y1, . . . , yn} (known)

I true labels: Y = {y1, . . . , yn} (unknown)yi = −1 → yi = −1yi = 1 → yi ∈ {1,−1} (true pos. or false pos.)

I statistical model: kernel densitiesI infer yi by estimating relevance scores βi = P(yi |xi , yi = 1)

I fitted by EM

I model extension: φ(X , Y )→ φ(X , Y , β)

D.Borth: : Relevance Filtering meets Active Learning 11 March 29 2010

Page 18: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Active Learning

Idea

I select informative samples for manual labeling

Related Work

I text classification [Lewis94], [Tong02], ...

I image retrieval [Tong01], [Chang05], ...

I video data labeling [Ayache07], [Hua08], ...

Sample Selection Methods

1. most relevant sampling

2. uncertainty sampling

3. most relevant sampling + density weighted repulsion (DWR)

D.Borth: : Relevance Filtering meets Active Learning 12 March 29 2010

Page 19: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Active Learning

Idea

I select informative samples for manual labeling

Related Work

I text classification [Lewis94], [Tong02], ...

I image retrieval [Tong01], [Chang05], ...

I video data labeling [Ayache07], [Hua08], ...

Sample Selection Methods

1. most relevant sampling

2. uncertainty sampling

3. most relevant sampling + density weighted repulsion (DWR)

D.Borth: : Relevance Filtering meets Active Learning 12 March 29 2010

Page 20: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Active Learning

Idea

I select informative samples for manual labeling

Related Work

I text classification [Lewis94], [Tong02], ...

I image retrieval [Tong01], [Chang05], ...

I video data labeling [Ayache07], [Hua08], ...

Sample Selection Methods

1. most relevant sampling

2. uncertainty sampling

3. most relevant sampling + density weighted repulsion (DWR)

D.Borth: : Relevance Filtering meets Active Learning 12 March 29 2010

Page 21: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Active Learning

I pool-based active learning

I selects label according to model

I new labeled sample helps further selection

D.Borth: : Relevance Filtering meets Active Learning 13 March 29 2010

Page 22: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Our Approach: Active Relevance Filtering

I active learning + auto. relevance filtering

I selects label according to filtered model

I new labeled sample helps further filtering & selection

D.Borth: : Relevance Filtering meets Active Learning 14 March 29 2010

Page 23: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Experiments

YouTube-22Concepts-Dataset

I 100 videos per concept

I keyframes extractedI features:

I SIFT [Lowe99]

I visual words [Sivic03]”swimming” ”cats”

Setup

I subset of 10 conceptsI trained on:

I 500 noisy pos. samplesI 1000 neg. samples

I tested on:I 500 pos. samplesI 1500 neg. samples

Noisy Pos. Samples

I label precision of webvideo: 20− 50% [Ulges10]

I for this experiments: 20%I 500 noisy pos. samples:

I 100 true pos. samplesI 400 false pos. samples

D.Borth: : Relevance Filtering meets Active Learning 15 March 29 2010

Page 24: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Experiments

YouTube-22Concepts-Dataset

I 100 videos per concept

I keyframes extractedI features:

I SIFT [Lowe99]

I visual words [Sivic03]”swimming” ”cats”

Setup

I subset of 10 conceptsI trained on:

I 500 noisy pos. samplesI 1000 neg. samples

I tested on:I 500 pos. samplesI 1500 neg. samples

Noisy Pos. Samples

I label precision of webvideo: 20− 50% [Ulges10]

I for this experiments: 20%I 500 noisy pos. samples:

I 100 true pos. samplesI 400 false pos. samples

D.Borth: : Relevance Filtering meets Active Learning 15 March 29 2010

Page 25: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Experiments

YouTube-22Concepts-Dataset

I 100 videos per concept

I keyframes extractedI features:

I SIFT [Lowe99]

I visual words [Sivic03]”swimming” ”cats”

Setup

I subset of 10 conceptsI trained on:

I 500 noisy pos. samplesI 1000 neg. samples

I tested on:I 500 pos. samplesI 1500 neg. samples

Noisy Pos. Samples

I label precision of webvideo: 20− 50% [Ulges10]

I for this experiments: 20%I 500 noisy pos. samples:

I 100 true pos. samplesI 400 false pos. samples

D.Borth: : Relevance Filtering meets Active Learning 15 March 29 2010

Page 26: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Experiments

YouTube-22Concepts-Dataset

I 100 videos per concept

I keyframes extractedI features:

I SIFT [Lowe99]

I visual words [Sivic03]”swimming” ”cats”

Setup

I subset of 10 conceptsI trained on:

I 500 noisy pos. samplesI 1000 neg. samples

I tested on:I 500 pos. samplesI 1500 neg. samples

Noisy Pos. Samples

I label precision of webvideo: 20− 50% [Ulges10]

I for this experiments: 20%I 500 noisy pos. samples:

I 100 true pos. samplesI 400 false pos. samples

D.Borth: : Relevance Filtering meets Active Learning 15 March 29 2010

Page 27: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Experiments - Impact of Label Noise

Relevance Filtering

mea

n av

g. p

reci

sion

0.30

0.40

0.50

0.60

no relevance filteringautomatic relevance filt.ground truth

System Performance

I Mean Average Precision (MAP)

I auto. relevance filtering helps

I potential gap of improvementremains

system MAPnoisy data 0.455

auto. relevance filtering 0.482ground truth 0.557

D.Borth: : Relevance Filtering meets Active Learning 16 March 29 2010

Page 28: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Experiments - Relevance Filtering

0 100 200 300 400 500

0.46

0.50

0.54

Active Learning

labeled samples

mea

n av

g. p

reci

sion

ground truth labels

automatic relevance filtering

no relevance filtering

DWRrandommost relevantuncertainty

0 100 200 300 400 500

0.46

0.50

0.54

Active Relevance Filtering

labeled samples

mea

n av

g. p

reci

sion

ground truth labels

automatic relevance filtering

no relevance filtering

DWRmost relevantuncertaintyrandom

Active Learning

I active learning canoutperform randomselection

Active Rel. Filtering

I initial auto. relevancefiltering helps

I improves active learning further

D.Borth: : Relevance Filtering meets Active Learning 17 March 29 2010

Page 29: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Experiments - Relevance Filtering

0 100 200 300 400 500

0.46

0.50

0.54

Active Learning

labeled samples

mea

n av

g. p

reci

sion

ground truth labels

automatic relevance filtering

no relevance filtering

DWRrandommost relevantuncertainty

0 100 200 300 400 500

0.46

0.50

0.54

Active Relevance Filtering

labeled samples

mea

n av

g. p

reci

sion

ground truth labels

automatic relevance filtering

no relevance filtering

DWRmost relevantuncertaintyrandom

Direct Comparison

I DWR sampling approach# refined samples AL ARF

0 0.455 0.48250 0.474 0.541

250 0.525 0.557

D.Borth: : Relevance Filtering meets Active Learning 18 March 29 2010

Page 30: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Experiments - Top Ranked Keyframes

concept: basketball

a) no relevance filtering, b) automatic relevance filtering, c) active relevance filtering

D.Borth: : Relevance Filtering meets Active Learning 19 March 29 2010

Page 31: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Experiments - Top Ranked Keyframes

concept: basketball

a) no relevance filtering, b) automatic relevance filtering, c) active relevance filtering

D.Borth: : Relevance Filtering meets Active Learning 20 March 29 2010

Page 32: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Experiments - Top Ranked Keyframes

concept: basketball

a) no relevance filtering, b) automatic relevance filtering, c) active relevance filtering

D.Borth: : Relevance Filtering meets Active Learning 21 March 29 2010

Page 33: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Experiments - Top Ranked Keyframes

concept: eiffeltower

a) no relevance filtering, b) automatic relevance filtering, c) active relevance filtering

D.Borth: : Relevance Filtering meets Active Learning 22 March 29 2010

Page 34: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Experiments - Top Ranked Keyframes

concept: eiffeltower

a) no relevance filtering, b) automatic relevance filtering, c) active relevance filtering

D.Borth: : Relevance Filtering meets Active Learning 23 March 29 2010

Page 35: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Experiments - Top Ranked Keyframes

concept: eiffeltower

a) no relevance filtering, b) automatic relevance filtering, c) active relevance filtering

D.Borth: : Relevance Filtering meets Active Learning 24 March 29 2010

Page 36: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Discussion

Contributions

I concept learning from noisy (= weakly labeled) web video

I evaluation of different refinement approaches

I proposed approach: active relevance filtering

Experimental Results

I automatic relevance filtering helps but is limited

I active learning is outperforming random selectionI active relevance filtering is able to improves active learning

I auto. relevance filtering + active learning

D.Borth: : Relevance Filtering meets Active Learning 25 March 29 2010

Page 37: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Discussion

Contributions

I concept learning from noisy (= weakly labeled) web video

I evaluation of different refinement approaches

I proposed approach: active relevance filtering

Experimental Results

I automatic relevance filtering helps but is limited

I active learning is outperforming random selectionI active relevance filtering is able to improves active learning

I auto. relevance filtering + active learning

D.Borth: : Relevance Filtering meets Active Learning 25 March 29 2010

Page 38: Relevance Filtering meets Active Learning:  Improving Web-based Concept Detectors

Questions?

D.Borth: : Relevance Filtering meets Active Learning 26 March 29 2010