Staple: Complementary Learners for Real-Time...

1
Staple: Complementary Learners for Real-Time Tracking Luca Bertinetto, Jack Valmadre, Stuart Golodetz, Ondrej Miksik, Philip Torr Torr Vision Group, Department of Engineering Science, University of Oxford, UK Model-free, short-term, single-target tracking Problem: track any given object in a video. I Model-free - algorithm has to be agnostic on the class of the object. I Short-term - full occlusions not addressed, i.e. no re-detection logic. I Single-target - no data association problem. Main challenges I Huge variety of scenes and transformations. I Stability-Plasticity dilemma. Related work and motivation Correlation filters I Fast evaluation in Fourier domain. I Dense sampling of target object and surroundings. I HOG: robust to blur, illumination changes, sensitive to deformation. Colour histograms I Fast evaluation with Integral Images. I Dense sampling. I Colour statistics, no concept of locality: sensitive to blur, illumination changes, robust to deformation. Ensemble methods I Run several trackers in parallel to alleviate their inaccuracies. I Model individual trackers’ confidence. I Merge final estimations only. Figure 1 : Sampling space is a circulant matrix and can be diagonlized with DFT (image from [Henriques12]). Figure 2 : a) Samples for object and surrounding. b) Samples for object and distractors. c) Likelihood map obtained combining models built on a) and b) (image from [Possegger15]). Figure 3 : HMM-TxD merges the final prediction of an array of independent trackers and a detector (image from [Vojir15]). Formulation I Staple: Sum of Template and Pixel-wise Learners, i.e. combined score function f (x )= γ tmpl f tmpl (x )+ γ hist f hist (x ) from: I Complementary cues I Compatible (dense) responses I Compatibility of responses assured by objective functions, both with target [0 - 1] in ridge regression framework. I f tmpl (x ; h)= u∈T h[u ] T φ x [u ]. I f hist (x ; β )= g (ψ x ; β )= 1 |H| u∈H ζ (β,ψ) [u ]. I Both template and histogram features are feature images ; we can use correlation in Fourier and Integral Image for fast sliding window search. Results Despite its simplicity, Staple outperforms the state-of-the-art and runs at approximately 90 FPS (on a 4GHz machine). I No dataset overfitting : VOT15 used as validation fold, VOT14 and OTB-13 as test folds. I Up-to-date comparison with very recent trackers . I VOT14 report_challenge ranking report_challenge #1 (Staple) #2 #3 VOT14 winner I OTB-13 Overlap threshold 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Success rate 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Success plots of OPE SRDCF (2015) [0.626] Staple [0.600] CNN-SVM (2015) [0.597] SO-DLT (2015) [0.592] MEEM (2013) [0.572] DSST (2014) [0.554] EBT (2014) [0.532] TGPR (2014) [0.529] SCM (2012) [0.499] Struck (2011) [0.474] Overlap threshold 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Success rate 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Success plots of TRE SRDCF (2015) [0.647] Staple [0.617] SO-DLT (2015) [0.585] MEEM (2013) [0.585] DSST (2014) [0.566] ACT (2014) [0.516] Struck (2011) [0.514] SCM (2012) [0.514] ASLA (2012) [0.485] CXT (2011) [0.463] Overlap threshold 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Success rate 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Success plots of SRE SRDCF (2015) [0.570] SO-DLT (2015) [0.559] Staple [0.545] MEEM (2013) [0.518] DSST (2014) [0.494] Struck (2011) [0.439] ASLA (2012) [0.421] ACT (2014) [0.420] SCM (2012) [0.420] TLD (2010) [0.402] Staple pipeline TRAINING - frame x t , position p t TESTING - frame x t+1 , position p t Desired response y Integral Image p t+1 Final response Template-related Histogram-related HOG training colour training HOG testing Correlation Update Update Update , colour testing Template response Histogram response Supported by Apical - www.apical.co.uk [email protected] www.robots.ox.ac.uk/ luca/staple.html

Transcript of Staple: Complementary Learners for Real-Time...

Page 1: Staple: Complementary Learners for Real-Time Trackingdata.votchallenge.net/vot2015/presentations/poster_Luca... · 2017. 5. 18. · Figure 3 :HMM-TxD merges the nal prediction of

Staple: Complementary Learners for Real-Time Tracking

Luca Bertinetto, Jack Valmadre, Stuart Golodetz, Ondrej Miksik, Philip TorrTorr Vision Group, Department of Engineering Science, University of Oxford, UK

Model-free, short-term, single-target tracking

Problem: track any given object in a video.

I Model-free - algorithm has to be agnostic on theclass of the object.

I Short-term - full occlusions not addressed, i.e. nore-detection logic.

I Single-target - no data association problem.

Main challenges

I Huge variety of scenes and transformations.

I Stability-Plasticity dilemma.

Related work and motivation

Correlation filters

I Fast evaluation in Fourier domain.

I Dense sampling of target object andsurroundings.

I HOG: robust to blur, illumination changes,sensitive to deformation.

Colour histograms

I Fast evaluation with Integral Images.

I Dense sampling.

I Colour statistics, no concept of locality:sensitive to blur, illumination changes,robust to deformation.

Ensemble methods

I Run several trackers in parallel to alleviatetheir inaccuracies.

I Model individual trackers’ confidence.

I Merge final estimations only.

Figure 1 : Sampling space is a circulant matrix and canbe diagonlized with DFT (image from [Henriques12]).

Figure 2 : a) Samples for object and surrounding. b)Samples for object and distractors. c) Likelihood mapobtained combining models built on a) and b) (imagefrom [Possegger15]).

Figure 3 : HMM-TxD merges the final prediction of anarray of independent trackers and a detector (image from[Vojir15]).

Formulation

I Staple: Sum of Template and Pixel-wise Learners,i.e. combined score function f (x) = γtmplftmpl(x) + γhistfhist(x) from:I Complementary cuesI Compatible (dense) responses

I Compatibility of responses assured by objective functions, both with target [0− 1]in ridge regression framework.I ftmpl(x ; h) =

∑u∈T h[u]Tφx [u].

I fhist(x ; β) = g(ψx ; β) = 1|H|

∑u∈H ζ(β,ψ)[u].

I Both template and histogram features are feature images; we can use correlation inFourier and Integral Image for fast sliding window search.

Results

Despite its simplicity, Staple outperforms the state-of-the-art andruns at approximately 90 FPS (on a 4GHz machine).

I No dataset overfitting: VOT15 used as validation fold, VOT14 and OTB-13 as test folds.

I Up-to-date comparison with very recent trackers.

I VOT14

report_challenge ranking

report_challenge

#1 (Staple)

#2

#3

VOT14 winner

I OTB-13

Overlap threshold

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Success r

ate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Success plots of OPE

SRDCF (2015) [0.626]

Staple [0.600]

CNN-SVM (2015) [0.597]

SO-DLT (2015) [0.592]

MEEM (2013) [0.572]

DSST (2014) [0.554]

EBT (2014) [0.532]

TGPR (2014) [0.529]

SCM (2012) [0.499]

Struck (2011) [0.474]

Overlap threshold

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Success r

ate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Success plots of TRE

SRDCF (2015) [0.647]

Staple [0.617]

SO-DLT (2015) [0.585]

MEEM (2013) [0.585]

DSST (2014) [0.566]

ACT (2014) [0.516]

Struck (2011) [0.514]

SCM (2012) [0.514]

ASLA (2012) [0.485]

CXT (2011) [0.463]

Overlap threshold

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Success r

ate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Success plots of SRE

SRDCF (2015) [0.570]

SO-DLT (2015) [0.559]

Staple [0.545]

MEEM (2013) [0.518]

DSST (2014) [0.494]

Struck (2011) [0.439]

ASLA (2012) [0.421]

ACT (2014) [0.420]

SCM (2012) [0.420]

TLD (2010) [0.402]

Staple pipeline

TRAINING - frame xt , position pt

TESTING - frame xt+1 , position pt

Desired response y

Integral Image

pt+1

Final response

Template-related

Histogram-relatedHOG

training

colourtraining

HOG

testing

Correlation

Update

Update

Update ,

colourtesting

Template response

Histogram response

Supported by Apical - www.apical.co.uk [email protected] www.robots.ox.ac.uk/∼luca/staple.html