Object Tracking in Thermal Infrared Imagery based on ...

7
Object Tracking in Thermal Infrared Imagery based on Channel Coded Distribution Fields Amanda Berg, Jörgen Ahlberg and Michael Felsberg Conference article, Oral presenteation Cite this conference article as: Berg, A., Ahlberg, J., Felsberg, M. Object Tracking in Thermal Infrared Imagery based on Channel Coded Distribution Fields, 2017. Svenska sällskapet för automatiserad bildanalys (SSBA) The self-archived postprint version of this oral presented conference article is available at Linköping University Institutional Repository (DiVA): http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-136743

Transcript of Object Tracking in Thermal Infrared Imagery based on ...

Page 1: Object Tracking in Thermal Infrared Imagery based on ...

 

 

  

  

Object Tracking in Thermal Infrared Imagery based on Channel Coded Distribution Fields   

Amanda Berg, Jörgen Ahlberg and Michael Felsberg

Conference article, Oral presenteation

Cite this conference article as:

Berg, A., Ahlberg, J., Felsberg, M. Object Tracking in Thermal Infrared Imagery based on Channel Coded Distribution Fields, 2017.

Svenska sällskapet för automatiserad bildanalys (SSBA)

The self-archived postprint version of this oral presented conference article is available at Linköping University Institutional Repository (DiVA): http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-136743

  

 

Page 2: Object Tracking in Thermal Infrared Imagery based on ...

Object Tracking in Thermal Infrared Imagerybased on Channel Coded Distribution Fields

Amanda Berg∗†, Jorgen Ahlberg∗† and Michael Felsberg†∗Termisk Systemteknik AB, Diskettgatan 11 B, 583 35 Linkoping, Sweden

Email: {amanda.,jorgen.ahl}[email protected]†Computer Vision Laboratory, Dept. EE, Linkoping University, 581 83 Linkoping, Sweden

Email: {amanda.,jorgen.ahl,michael.fels}[email protected]

Abstract—We address short-term, single-object tracking, atopic that is currently seeing fast progress for visual video,for the case of thermal infrared (TIR) imagery. Trackingmethods designed for TIR are often subject to a numberof constraints, e.g., warm objects, low spatial resolution,and static camera. As TIR cameras become less noisy andget higher resolution these constraints are less relevant,and for emerging civilian applications, e.g., surveillance andautomotive safety, new tracking methods are needed.

Due to the special characteristics of TIR imagery, we ar-gue that template-based trackers based on distribution fieldsshould have an advantage over trackers based on spatialstructure features. In this paper, we propose a template-based tracking method (ABCD) designed specifically for TIRand not being restricted by any of the constraints above.The proposed tracker is evaluated on the VOT-TIR2015 andVOT2015 datasets using the VOT evaluation toolkit and acomparison of relative ranking of all common participatingtrackers in the challenges is provided. Experimental resultsshow that the ABCD tracker performs particularly well onthermal infrared sequences.

I. INTRODUCTION

Tracking of objects in video is a problem that hasbeen subject to extensive research. In recent years, thesub-topic of short-term single-object tracking (STSO) hasseen significant progress and is important as it is at thecore of more complex (long-term, multi-camera, multi-object) tracking systems. Indicators of the popularity ofthis research topic are challenges and benchmarks likethe recurring Visual Object Tracking (VOT) challenge [1],[2], [3], the Online Object Tracking (OTB) benchmarks[4], [5], and the series of workshops on PerformanceEvaluation of Tracking and Surveillance (PETS) [6].

Thermal infrared tracking has historically been of inter-est mainly for military purposes. Thermal cameras havedelivered noisy images with low resolution, used mainlyfor tracking small objects (point targets) against colderbackgrounds. However, in recent years, thermal camerashave decreased in both price and size while image qualityand resolution has improved, which has opened up newapplication areas [7]. The main advantages of thermalcameras are their ability to see in total darkness, theirrobustness to illumination changes and shadow effects,and less intrusion on privacy. In 2015, the first challengeon STSO tracking in thermal infrared imagery (VOT-TIR)was organized [8], an indication of an increasing interestfrom the community.

In this paper, we will discuss the differences betweenthermal and visual tracking, argue that template-basedtrackers based on distribution fields are suited for thermaltracking, and propose three enhancements to such trackingmethods: First, we propose a method for improving dis-tribution field-based trackers using background weightingof object template updates. Second, we propose a methodfor improving the search phase in such trackers using anadaptive object region. Third, a scale estimation techniqueemploying background information is evaluated. Finally,we show that these improvements are complementary,and evaluate the resulting tracker on both RGB and TIRsequences.

II. THERMAL IMAGING AND TRACKING

The infrared wavelength band is usually divided intodifferent bands according to their different properties:near infrared (NIR, wavelengths 0.7–1 µm), shortwaveinfrared (SWIR, 1–3 µm), midwave infrared (MWIR, 3–5 µm), and longwave infrared (LWIR, 7.5–12 µm). Otherdefinitions exist as well. LWIR, and sometimes MWIR,is commonly referred to as thermal infrared (TIR). TIRcameras should not be confused with NIR cameras thatare dependent on illumination and in general behave in asimilar way as visual cameras.

In thermal infrared, most of the captured radiation isemitted from the observed objects, in contrast to visualand near infrared, where most of the radiation is reflected.Thus, knowing or assuming material and environmentalproperties, temperatures can be measured using a thermalcamera (i.e., the camera is said to be radiometric).

There are two common misconceptions regarding ob-ject tracking in TIR. One is that it is all about hotspottracking, that is, tracking warm (bright) objects against acold (dark) background. In certain military applications,such as missile warning, this assumption is valid, but formost other applications the situation is more complexand hotspot tracking less suitable (this is backed byour experimental results given in Sec. V). The othermisconception is that TIR tracking is identical to trackingin grayscale visual imagery, and, as a consequence, thata tracker that is good for visual tracking is good for TIRtracking. However, there are differences between the twotypes of imagery that indicate that this is not the case:

First, there are no shadows in TIR. A tracker that isoptimized to handle shadows might thus be suboptimal

Page 3: Object Tracking in Thermal Infrared Imagery based on ...

for TIR (e.g. a tracker employing foreground/backgrounddetection that includes shadow removal [9]).

Second, the noise characteristics are different. Com-pared to a visual camera, a TIR camera typically has moreblooming1, lower resolution, and a larger percentage ofdead pixels. As a consequence, a tracker that dependsheavily on features based on (high resolution) spatialstructure is presumably suboptimal for TIR imagery.

Third, visual color patterns are discernible in TIR onlyif they correspond to variations in material or temperature.Again, the consequence is that trackers relying on (highresolution) spatial patterns might be suboptimal for TIRimagery. Moreover, re-identification and resolving occlu-sions might need to be done differently. For example,two persons with differently patterned or colored clothesmight look similar in TIR.

Fourth, in most applications, the emitted radiationchange much slower than the reflected radiation. That is,an object moving from a dark room into the sunlight willnot immediately change its appearance (as it would invisual imagery). Thus, trackers that exploit the absolutelevels (for example, distribution field trackers) shouldhave an advantage in TIR. This is especially relevant forradiometric 16-bit cameras, since they have a dynamicrange large enough to accommodate relevant temperatureintervals without adapting the dynamic range to eachframe.

III. RELATED WORK

A common approach to TIR tracking is to combine adetector with a motion tracking filter such as a Kalmanfilter (KF) or a particle filter (PF). The detector is typ-ically based on thresholding, i.e., hotspot detection, ora pre-trained classifier. This approach is intuitive whentracking warm objects in low-resolution images and hashistorically been the main use of thermal cameras sincetypical objects of interest often generate kinetic energyin order to move (e.g. airborne and ground vehicles).Extensions include the work of Padole and Alexandre[10], [11], who use a PF for motion tracking and combinespatial and temporal information. Lee et al. [12] improvetracking performance in the case of repetitive patternsusing a KF and a curve matching framework. Gade andMoeslund [13] use hotspot detection followed by splittingand connection of blobs in order to track sports playersand maintain the players’ identities. Goubet et al. [14] alsorely on high contrast between object and background. Incontrast, Skoglar et al. [15] use a pre-trained boostingbased detector and improve tracking in a surveillanceapplication using road network information and a mul-timodal PF. Portmann et al. [16] combine backgroundsubtraction and a part-based detector using a support vec-tor machine to classify Histograms of Gradients. Junglingand Arens [17] combine tracking and detection in a singleframework, extracting SURF features and using a KF forpredicting the motion of individual features.

1TIR cameras have blooming, but not the same kind of blooming asCCD arrays

In the visual domain, methods based on matchinga template (object model) that is trained and updatedonline is currently subject to intensive research [4]. Fewpapers can be found where the recent ideas leading tosuch fast progress for visual video are transferred toTIR tracking. The existing template-based TIR-trackingmethods often assume small targets and low signal-to-noise ratio. Further, they frequently use separate detectionand tracking phases where the target tracking step is basedon spatio-temporal correlation. In contrast, many of themethods used in the recent RGB-tracking benchmarksemploy joint detection and tracking. Some TIR template-based tracking methods activate template matching onlyfor the purpose of recovery [18], [19]. Bal et al. [18]base the activation on a Cartesian distance metric whileLamberti et al. [19] use a motion prediction-based metric.The latter strategy improves the robustness of the tracker.Johnston et. al. [20] use a dual domain approach withAM-FM consistency checks. The method automaticallydetects the need for a template update through a combina-tion of pixel and modulation domain correlation trackers.

Alam and Bhuiyan [21] provide an overview of thecharacteristics of matched filter correlation techniquesfor detection and tracking in TIR imagery. The filtersare, however, pre-trained and not adaptively updated.He et al. [22] employs some of the ideas of RGB-tracking methods and presents an infrared target trackingmethod under a tracking-by-detection framework basedon a weighted correlation filter.

Methods based on distribution field tracking (DFT) [23]rely neither on color nor on features based on sharpedges. Instead, the object model is a distribution field,i.e., an array of probability distributions; one distributionfor each pixel in the template. In [23], the probabilitydistributions are represented by local histograms. Fels-berg [24] showed that changing the representation of theprobability distributions to channel coded vectors [25],[26] improves tracking performance. A channel vectorbasically consists of sampled values of overlapping basisfunctions, e.g., cos2 or B-spline functions. Each of the Nelements (channels) of the channel vector describe to whatextent the encoded value activates the n’th basis function.

In the particular case of EDFT [24] (Enhanced Dis-tribution Field Tracking), the object model is built bychannel encoding of an image patch of the object whichis then convolved with a 2D Gaussian kernel. This modelis updated in each new frame using an update factorα ∈ [0, 1] as

mtobj(i, j, n) = (1− α)mt−1

obj (i, j, n) + αpt(i, j, n) (1)

where mtobj is the object model at time t, and p is the

best matching channel encoded patch in the current frame.i ∈ [0, I − 1] and j ∈ [0, J − 1] where I, J are the widthand height of the model and n ∈ [0, N − 1] where N isthe number of channels.

The best matching patch p is found by performing alocal search starting from a predicted image position. Thesearch procedure is equal to that of DFT [23]. A distancemeasure, d, is calculated as the absolute difference of the

Page 4: Object Tracking in Thermal Infrared Imagery based on ...

object model and the channel encoded query patch, q (2).p is found at the position where d is minimized.

d =

I−1∑i=0

J−1∑j=0

N−1∑n=0

|mobj(i, j, n)− q(i, j, n)| (2)

IV. PROPOSED TRACKING METHOD

The proposed tracking method is based on the En-hanced Distribution Field Tracking (EDFT). The word”enhanced” refers to the modifications of the DFT trackerintroduced in [24], the most important being the changeof representation of the distribution field from soft his-tograms to B-spline kernel channel coded features. EDFThas achieved good results in the VOT challenges 2013and 2014 for visual sequences. EDFT neither relies oncolor nor on features based on sharp edges which makesit suitable for thermal infrared imagery, however, asall template-based trackers it suffers from backgroundcontamination and, moreover, it cannot adaptively rescaleto changing object size.

A. Background contamination in tracking

In all template-based tracking methods, there is a riskthat the spatial region to be tracked contains backgroundpixels. This leads to two problems; in the search phaserespectively the template update phase. First, if the regioncontains background pixels, which is highly probable inpractice, the tracker will try to track not only the objectbut also some of the surrounding background. Second,if the template (object model) is continuously updated,background pixels might be included. With increasingnumber of background pixels in the model, the risk oflosing track grows. We address these two problems intwo different ways below.

The same principle of how to build an object model (1)in EDFT can be used to create and update a backgroundmodel. Each pixel is represented by a channel vector,and in each frame, the background model mbg is updatedusing a background update factor β.

mtbg(x, y, n) = (1− β)mt−1

bg (x, y, n) + βzt(x, y, n) (3)

x ∈ [0,W − 1] and y ∈ [0, H − 1] where W and H arethe width and height of the image respectively. zt(x, y, n)is the current frame, the image at time t. Our approachis to use the additional information from the backgroundmodel when tracking an object. Note that we do not needto build the model in advance – the background modelis continuously built and updated for a region around theobject (we use I + 50 and J + 30 pixels) only. If thecamera is moving, the background model is not correctedfor the camera motion.

Also note that any other model for background pixeldistributions can be used, for example a mixture of Gaus-sians, a kernel density estimator [27] or a soft histogram[23]. However, since the channel coded distribution isalready available from the EDFT, we can use this withoutextra computational cost.

(a) (b) (c) (d)

Fig. 1: Example of image patch and corresponding fore-ground mask from sequences (a) horse (image) (b) horse(mask) (c) crouching (image) (d) crouching (mask).

B. Background weighted model update (B-EDFT)

We propose a method to mitigate the problem ofbackground pixels contaminating the process of updatingthe object model. From the background model describedabove, a soft mask consisting of the probabilities of pixelsbelonging to the foreground can be made. The `1 norm ofa second order B-spline kernel channel vector is one [28].This implies that

∑N−1n=0 |mbg(i, j, n)− p(i, j, n)| ∈ [0, 2]

for some pixel (i, j). mbg(i, j, n) is the background modelat the position of the best matching patch p(i, j, n), andN is the number of channels. The individual elements ofthe mask b (I × J) ∈ [0, 1] are then calculated as

b(i, j) =

∑N−1n=0 |mbg(i, j, n)− p(i, j, n)|

2. (4)

A lower value b(i, j) indicates a higher similarity betweenpixel (i, j) and the background. The elements of theforeground mask b can be used as weights for the differentpixels when the object model is updated.

In order to incorporate the mask into the update of theobject model, (1) is modified to

mtobj(i, j, n) = (1−αb(i, j))mt−1

obj (i, j, n)+αb(i, j)p(i, j, n).(5)

That is, the more likely a pixel is to belong to thebackground, the slower the corresponding distribution inthe object model is updated. Thus, the risk of backgroundcontamination in the model is reduced.

In Fig. 1, two examples of foreground masks areprovided. Note that in the horse sequence, the camerais moving and there is a high contrast between the objectand the background. Hence, the outline of the object inthe mask is wider and has higher intensity compared tothe outline of the person in the crouching sequence.

If the object slows down to a stand still, it will be con-sidered background and the elements of the foregroundmask will be low. Hence, the object model will not beupdated but since the object do not change significantlythe tracker will be able to localize the object anyway.

C. Adaptive object region (A-EDFT)

Second, the problem of the tracked region encompass-ing background pixels in the search phase is addressed. Asubregion suitable for tracking is adaptively selected. Invisual imagery, this might correspond to choosing goodfeatures to track, whereas in thermal imagery, selectinga region is typically more suitable. In thermal imagery,

Page 5: Object Tracking in Thermal Infrared Imagery based on ...

Fig. 2: Illustration of how the inner bounding box isadaptively selected for sequence horse (left) and car(right). The blue curves represent the projected pixelvalues of the mask onto the x- and y-axis respectively.

there is less structure and sharp edges, and the trackerexploits intensity rather than structure. Thus, within theinitial region to track we select an inner region that isused for the actual tracking. In this particular case, theregion is a bounding box.

An illustration of how the inner bounding box isselected from the first mask is provided in Fig. 2. Thevalues within the mask are projected onto the x- and y-axis respectively (blue curves) and the mean, µx, µy , andstandard deviations, σx, σy , of each axis are computed.The inner bounding box is placed at (µx, µy) with width1.8σx and height 1.8σy . Also, the outer bounding box, theone being reported as tracking result, is centred around(µx, µy). The size of the inner bounding box is fixedthroughout the sequence unless reinitialized or rescaled.

There is a major positive effect of using an innerregion where size and placement are estimated adaptively.Even if the detector (user, annotator) has marked a regionlarger than the actual object, the tracker will still select atrackable region instead of being confused by too muchbackground in the object model.

D. Scale change estimation

Correct estimation of object scale changes is crucialfor correct tracking of an object if the object is subjectto large scale variations. This is, for example, the case ifthe object extends or reduces its distance to the camera.We propose to exploit the probability mass within theforeground mask, b(i, j), in order to detect scale changesof the tracked object. If the scale of the object increases,the probability mass within the mask will also increase.However, if another foreground object passes behind orin front of the tracked object, the probability mass withinthe object patch will also increase. Therefore, the flowof probability mass must be considered. A scale changeimplies probability mass changes in all directions whileanother object entering the tracked area will cause the

(a) (b) (c)

Fig. 3: Scaling principle examples: (a) Horse mask withoverlaid grid lines. (b) The probability mass (pm) withineach grid cell has increased, thus, the size should beincreased. (c) Another object enters from the right, andthe pm in the rightmost grid cells have increased. The pmin the lower left cell has not, therefore, the size shouldnot be increased.

probability mass to change on one side of the patch only.The flow of probability mass is roughly estimated by di-viding the image patch into a grid of 2×2 cells, see Fig. 3.If all grid cells have increased their mean probabilitymass from time t− sw to t, where sw is a constant timeinterval, the scale of the patch is increased by a scalestep ss. The opposite applies to the case of decreasingprobability mass. In order to achieve robustness to noisein the probability mass, the mean probability mass valueof each grid cell is updated using an update factor andthe scale step, ss, is kept relatively low. Further, in orderto avoid rounding errors, the width, I , of the boundingbox is updated iff mod(∆I, 2) = 0 and the height, J , iffmod(∆J, 2) = 0.

E. Combining the three methods (ABCD)

The three proposed methods are independent of eachother and can thus easily be combined. The resultingtracker is called ABCD – Adaptive object region +Background weighted scaled Channel coded Distributionfield tracking. In the next section, evaluation results forA-EDFT, B-EDFT, the combination AB-EDFT as well asABCD (AB-EDFT + scale estimation) will be presented.

V. EVALUATION AND RESULTS

In this section, the evaluation procedure and experi-mental results are presented. Two datasets have been usedfor evaluation. The LTIR-dataset2 [29] and the VOT2015RGB dataset [3]. LTIR is a thermal infrared datasetconsisting of 20 sequences that has recently been used inthe thermal infrared visual object tracking VOT-TIR2015challenge [8].

A. Evaluation methodology

The evaluation of trackers has been performed inaccordance with the VOT evaluation procedure usingthe VOT-evaluation toolkit3. The tracker is initializedwith the ground truth bounding box in the first frameand the performance is evaluated using four performancemeasures; accuracy, robustness, speed, and ranking. Theaccuracy at time t, At, measures the overlap between the

2http://www.cvl.isy.liu.se/research/datasets/ltir/3https://github.com/vicoslab/vot-toolkit

Page 6: Object Tracking in Thermal Infrared Imagery based on ...

ρA ρR ΦABCD 0.65 1.30 0.32

AB-EDFT 0.65 1.55 0.31A-EDFT 0.61 1.70 0.28B-EDFT 0.65 2.80 0.22

EDFT 0.61 3.00 0.22

TABLE I: Average accuracy (ρA), robustness (averagenumber of failures) (ρR), and expected average overlap(Φ) on the VOT-TIR2015 dataset.

bounding boxes given by the annotated ground truth OGt

and the tracker OTt as

At =OG

t ∩OTt

OGt ∪OT

t

(6)

Robustness measures the failure rate, i.e., the numberof times At = 0 during a sequence. When a failureis detected, the tracker is reinitialized five frames later,and for another 10 frames the achieved accuracy is notincluded when per-sequence accuracy is computed. Per-sequence accuracy is calculated as the average accuracyfor the set of valid frames in all experiment repetitions.The per-experiment measures, ρA and ρR, are the aver-age accuracy and robustness (number of failures) overall sequences. Finally, the trackers are ranked for eachattribute and an average ranking for the two experimentsis computed.

The expected average overlap, Φ = 〈ΦNs〉, was intro-

duced in VOT2015. The measure combines accuracy androbustness by averaging the average overlaps, Φ, on alarge set of Ns frames long sequences. ΦNs

is the averageof per-frame overlaps, Φi:

ΦNs=

1

Ns

∑i=1:Ns

Φi. (7)

If the tracker fails at some point during the Ns frameslong sequence, it is not reinitialized.

B. Experiments

Three different experiments have been performed. First,the EDFT tracker and the extensions A-EDFT, B-EDFT,AB-EDFT, and ABCD, all of which have been proposedin this paper, are evaluated on the VOT-TIR2015 dataset.Second, the ABCD and EDFT trackers are evaluated onthe VOT2015 (RGB sequences) and VOT-TIR2015 (TIRsequences) datasets against other participating trackers inthe challenges. Third, the ranking results of the ABCDtracker are added to the confusion matrix in [8].

C. Results

Table I lists the average accuracy and average num-ber of failures per sequence for the EDFT tracker aswell as the proposed extensions A-EDFT, B-EDFT, AB-EDFT, and ABCD on the VOT-TIR2015 dataset. It isclear that B-EDFT mainly improves accuracy while A-EDFT improves robustness. When both extensions arecombined in AB-EDFT, the robustness is improved evenfurther. Adding the ability to scale to AB-EDFT in ABCDimproves robustness without a decrease in accuracy.

VOT2015 VOT-TIR2015ρA ρR Φ r ρA ρR Φ r

ABCD 0.45 3.12 0.14 50 0.65 1.30 0.32 6EDFT 0.45 3.50 0.14 49 0.61 3.00 0.22 15

TABLE II: Average accuracy (ρA), robustness (averagenumber of failures) (ρR), expected average overlap (Φ),and ranking (r) for the EDFT and ABCD trackers in thebaseline experiment of the VOT2015 and VOT-TIR2015challenges respectively.

Fig. 4: Comparison of relative ranking of 21 trackers inVOT and VOT-TIR.

In total, the combined extensions (ABCD) give anincrease in average accuracy of 6.6%, a decrease of theaverage number of failures of 57%, and an increase ofthe expected average overlap of 45% compared to EDFT.

Regarding speed, the extension from EDFT to ABCDbenefits by exploiting information that is already com-puted, i.e., the channel coded image. That is, the improve-ment in accuracy and robustness implies no significantaddition in computation time. In the VOT-TIR2015 ex-periments, the ABCD tracker achieved a tracking speedof 6.88 in EFO (equivalent filter operations) units [8]. TheABCD tracker is implemented in Matlab.

Table II shows the average accuracy, number of failures,and rankings of the EDFT and ABCD trackers whencompared to the participating trackers in the VOT2015and VOT-TIR2015 challenges. The extension of the EDFTtracker provides a significant improvement in ranking forthe VOT-TIR2015 challenge while remaining unchangedfor the VOT2015 challenge. A comparison of the relativerankings for all common trackers in VOT2015 and VOT-TIR2015, including ABCD, are shown in Fig. 4.

D. Discussion

The results of the evaluation of the different proposedextensions of the EDFT tracker presented in Table Iindicate that the combination of extensions (ABCD) isfavourable to each individual extension itself. Further,when the ABCD and EDFT trackers were evaluated onthe VOT2015 dataset, Table II and Fig. 4, it becameclear that the extensions are particularly beneficial to

Page 7: Object Tracking in Thermal Infrared Imagery based on ...

thermal infrared data. Thermal infrared imagery containsless distinct edges and structures for the tracker to attachto. Therefore, the ABCD tracker was designed to exploitthe absolute values of the object rather than relying onthe spatial structure. A distribution field approach wasutilized for this purpose. Background information wasincorporated in the template update phase as well as in thechoice of an inner bounding box, reducing the backgroundcontamination of the object model.

Further, it should be emphasised that the size of tar-gets differ in the VOT2015 and VOT-TIR2015 datasets.VOT has, in general, more high-resolution targets thanVOT-TIR. Higher resolution targets provide more spatialstructure for trackers to exploit, making them more easilytracked for trackers employing such features.

VI. CONCLUSIONS

We have compared trackers based on spatial structurefeatures with trackers based on distribution fields andcome to the conclusion that distribution fields are moresuitable for TIR tracking. Since the state-of-the-art dis-tribution field tracker (EDFT) was not able to adapt tochanging object scale, we have developed a novel methodto achieve adaptivity. Moreover, we make the observationthat template-based trackers have two inherent problemsof background information contaminating the template ofthe object to be tracked; in the search phase and in thetemplate update phase. We propose how to mitigate boththese problems by exploiting a channel coded backgrounddistribution and show that this improves the tracking. Theresulting tracker, ABCD, has been evaluated on both RGBand TIR sequences and the results show that the proposedextensions are particularly beneficial for TIR imagery.

ACKNOWLEDGEMENTS

The research was funded by the Swedish ResearchCouncil through the project Learning Systems for Re-mote Thermography, grant no. D0570301, as well as bythe European Community Framework Programme 7, Pri-vacy Preserving Perimeter Protection Project (P5), grantagreement no. 312784, and Intelligent Piracy Avoidanceusing Threat detection and Countermeasure Heuristics(IPATCH), grant agreement no. 607567. The SwedishResearch Council has also funded the research through aframework grant for the project Energy Minimization forComputational Cameras (2014-6227) and the work hasbeen supported by ELLIIT, the Strategic Area for ICTresearch, funded by the Swedish Government.

REFERENCES

[1] M. Kristan et al., “The Visual Object Tracking VOT2013 Chal-lenge Results,” in ICCVW, 2013.

[2] M. Kristan et al., “The Visual Object Tracking VOT2014 challengeresults,” in ECCVW, ser. LNCS. Springer, 2014, pp. 1–27.

[3] M. Kristan et al., “The Visual Object Tracking VOT2015 challengeresults,” in ICCVW, 2015.

[4] Y. Wu, J. Lim, and M.-H. Yang, “Online object tracking: Abenchmark,” CVPR, vol. 0, pp. 2411–2418, 2013.

[5] Y. Wu, J. Lim, and M.-H. Yang, “Object tracking benchmark,”TPAMI, vol. PP, no. 99, pp. 1–1, 2015.

[6] D. P. Young and J. M. Ferryman, “PETS metrics: On-line perfor-mance evaluation service,” in Proceedings of the 14th InternationalConference on Computer Communications and Networks, ser.ICCCN ’05, 2005, pp. 317–324.

[7] R. Gade and T. Moeslund, “Thermal cameras and applications: Asurvey,” Machine Vision & Applications, vol. 25, no. 1, 2014.

[8] M. Felsberg, A. Berg, G. Hager, J. Ahlberg, M. Kristan, J. Matas,A. Leonardis, L. Cehovin, G. Fernandez, and et al., “The ThermalInfrared Visual Object Tracking VOT-TIR2015 challenge results,”in ICCVW, 2015, pp. 639–651.

[9] A. Prati, I. Mikic, M. M. Trivedi, and R. Cucchiara, “Detectingmoving shadows: algorithms and evaluation,” TPAMI, vol. 25,no. 7, pp. 918–923, July 2003.

[10] C. Padole and L. Alexandre, “Wigner distribution based motiontracking of human beings using thermal imaging,” in CVPRW,June 2010, pp. 9–14.

[11] C. N. Padole and L. A. Alexandre, “Motion based particle filterfor human tracking with thermal imaging,” Emerging Trends inEngineering & Technology, International Conference on, vol. 0,pp. 158–162, 2010.

[12] S. Lee, G. Shah, A. Bhattacharya, and Y. Motai, “Human trackingwith an infrared camera using a curve matching framework,”EURASIP Journal on Advances in Signal Processing, vol. 2012,no. 1, 2012.

[13] R. Gade and T. B. Moeslund, “Thermal tracking of sports players,”Sensors, vol. 14, no. 8, pp. 13 679–13 691, 2014.

[14] E. Goubet, J. Katz, and F. Porikli, “Pedestrian tracking using ther-mal infrared imaging,” in SPIE Conference on Infrared Technologyand Applications, vol. 6206, Jun. 2006, pp. 797–808.

[15] P. Skoglar, U. Orguner, D. Tornqvist, and F. Gustafsson, “Pedes-trian Tracking with an Infrared Sensor using Road Network In-formation,” EURASIP Journal on Advances in Signal Processing,vol. 1, no. 26, pp. 2012a–, 2012.

[16] J. Portmann, S. Lynen, M. Chli, and R. Siegwart, “People Detec-tion and Tracking from Aerial Thermal Views,” in ICRA, 2014.

[17] K. Jungling and M. Arens, “Local feature based person detectionand tracking beyond the visible spectrum,” in Machine VisionBeyond Visible Spectrum, ser. Augmented Vision and Reality.Springer Berlin Heidelberg, 2011, vol. 1, pp. 3–32.

[18] A. Bal and M. S. Alam, “Automatic target tracking in FLIR imagesequences using intensity variation function and template mod-eling,” IEEE Transactions on Instrumentation and Measurement,vol. 54, no. 5, pp. 1846–1852, Oct 2005.

[19] F. Lamberti, A. Sanna, and G. Paravati, “Improving robustness ofinfrared target tracking algorithms based on template matching,”IEEE Transactions on Aerospace and Electronic Systems, vol. 47,no. 2, pp. 1467–1480, April 2011.

[20] C. M. Johnston, N. Mould, J. P. Havlicek, and G. Fan, “Dualdomain auxiliary particle filter with integrated target signatureupdate,” in CVPRW, 2009, pp. 54–59.

[21] M. S. Alam and S. M. A. Bhuiyan, “Trends in correlation-basedpattern recognition and tracking in forward-looking infraredimagery,” Sensors, vol. 14, no. 8, p. 13437, 2014. [Online].Available: http://www.mdpi.com/1424-8220/14/8/13437

[22] Y.-J. He, M. Li, J. Zhang, and J.-P. Yao, “Infrared target trackingvia weighted correlation filter,” Infrared Physics and Technology,vol. 73, pp. 103–114, Nov. 2015.

[23] L. Sevilla-Lara and E. G. Learned-Miller, “Distribution fields fortracking.” in CVPR. IEEE, 2012, pp. 1910–1917.

[24] M. Felsberg, “Enhanced Distribution Field Tracking using ChannelRepresentations,” in ICCVW. IEEE, 2013, pp. 121–128.

[25] M. Felsberg, P.-E. Forssen, and H. Scharr, “Channel smoothing:Efficient robust smoothing of low-level signal features,” TPAMI,vol. 28, no. 2, pp. 209–222, February 2006.

[26] G. H. Granlund, “An associative perception-action structure usinga localized space variant information representation,” in AFPAC,2000.

[27] E. Parzen, “On estimation of a probability density function andmode,” The Annals of Mathematical Statistics, vol. 33, no. 3, pp.pp. 1065–1076, 1962.

[28] P.-E. Forssen, “Low and medium level vision using channel rep-resentations,” Ph.D. dissertation, Linkoping University, Sweden,SE-581 83 Linkoping, Sweden, March 2004, dissertation No. 858,ISBN 91-7373-876-X.

[29] A. Berg, J. Ahlberg, and M. Felsberg, “A thermal object trackingbenchmark,” in AVSS, 2015.