LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale...

18
LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1* Liting Lin 2* Fan Yang 1* Peng Chu 1* Ge Deng 1 Sijia Yu 1 Hexin Bai 1 Yong Xu 2 Chunyuan Liao 3 Haibin Ling 11 Department of Computer and Information Sciences, Temple University, Philadelphia, USA 2 School of Computer Science & Engineering, South China Univ. of Tech., Guangzhou, Peng Cheng Laboratory, Shenzhen, China 3 Meitu HiScene Lab, HiScene Information Technologies, Shanghai, China https://cis.temple.edu/lasot/ Abstract In this paper, we present LaSOT, a high-quality bench- mark for Large-scale Single Object Tracking. LaSOT con- sists of 1,400 sequences with more than 3.5M frames in to- tal. Each frame in these sequences is carefully and man- ually annotated with a bounding box, making LaSOT the largest, to the best of our knowledge, densely annotated tracking benchmark. The average video length of LaSOT is more than 2,500 frames, and each sequence comprises various challenges deriving from the wild where target ob- jects may disappear and re-appear again in the view. By re- leasing LaSOT, we expect to provide the community with a large-scale dedicated benchmark with high quality for both the training of deep trackers and the veritable evaluation of tracking algorithms. Moreover, considering the close con- nections of visual appearance and natural language, we en- rich LaSOT by providing additional language specification, aiming at encouraging the exploration of natural linguistic feature for tracking. A thorough experimental evaluation of 35 tracking algorithms on LaSOT is presented with detailed analysis, and the results demonstrate that there is still a big room for improvements. 1. Introduction Visual tracking, aiming to locate an arbitrary target in a video with an initial bounding box in the first frame, has been one of the most important problems in computer vision with many applications such as video surveillance, robotics, human-computer interaction and so forth [32, 47, 54]. With considerable progresses in the tracking community, numer- ous algorithms have been proposed. In this process, track- ing benchmarks have played a vital role in objectively eval- * Authors make equal contributions to this work. Corresponding author. NUS-PRO TC-128 OTB-2015 OTB-2013 UAV123 UAV20L VOT-2014 VOT-2017 LaSOT Figure 1. Summaries of existing tracking benchmarks with high- quality dense (per frame) annotations, including OTB-2013 [52], OTB-2015 [53], TC-128 [35], NUS-PRO [28], UAV123 [39], UAV20L [39], VOT-2014 [26], VOT-2017 [27] and LaSOT. The circle diameter is in proportion to the number of frames of a bench- mark. The proposed LaSOT is larger than all other benchmarks, and focused on long-term tracking. Best viewed in color. uating and comparing different trackers. Nevertheless, fur- ther development and assessment of tracking algorithms are restricted by existing benchmarks with several issues: Small-scale. Deep representations have been popularly ap- plied to modern object tracking algorithms, and demon- strated state-of-the-art performances. However, it is diffi- cult to train a deep tracker using tracking-specific videos due to the scarcity of large-scale tracking datasets. As shown in Fig. 1, existing datasets seldom have more than 400 sequences. As a result, researchers are restricted to leverage either the pre-trained models (e.g., [46] and [18]) from image classification for deep feature extraction or the sequences from video object detection (e.g., [45] and [43]) for deep feature learning, which may result in suboptimal tracking performance because of the intrinsic differences among different tasks [55]. Moreover, large scale bench- marks are desired for more reliable evaluation results. Lack of high-quality dense annotations. For tracking, 1 arXiv:1809.07845v2 [cs.CV] 27 Mar 2019

Transcript of LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale...

Page 1: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking

Heng Fan1∗ Liting Lin2∗ Fan Yang1∗ Peng Chu1∗ Ge Deng1 Sijia Yu1 Hexin Bai1

Yong Xu2 Chunyuan Liao3 Haibin Ling1†

1Department of Computer and Information Sciences, Temple University, Philadelphia, USA2School of Computer Science & Engineering, South China Univ. of Tech., Guangzhou,

Peng Cheng Laboratory, Shenzhen, China3Meitu HiScene Lab, HiScene Information Technologies, Shanghai, China

https://cis.temple.edu/lasot/

Abstract

In this paper, we present LaSOT, a high-quality bench-mark for Large-scale Single Object Tracking. LaSOT con-sists of 1,400 sequences with more than 3.5M frames in to-tal. Each frame in these sequences is carefully and man-ually annotated with a bounding box, making LaSOT thelargest, to the best of our knowledge, densely annotatedtracking benchmark. The average video length of LaSOTis more than 2,500 frames, and each sequence comprisesvarious challenges deriving from the wild where target ob-jects may disappear and re-appear again in the view. By re-leasing LaSOT, we expect to provide the community with alarge-scale dedicated benchmark with high quality for boththe training of deep trackers and the veritable evaluation oftracking algorithms. Moreover, considering the close con-nections of visual appearance and natural language, we en-rich LaSOT by providing additional language specification,aiming at encouraging the exploration of natural linguisticfeature for tracking. A thorough experimental evaluation of35 tracking algorithms on LaSOT is presented with detailedanalysis, and the results demonstrate that there is still a bigroom for improvements.

1. IntroductionVisual tracking, aiming to locate an arbitrary target in

a video with an initial bounding box in the first frame, hasbeen one of the most important problems in computer visionwith many applications such as video surveillance, robotics,human-computer interaction and so forth [32, 47, 54]. Withconsiderable progresses in the tracking community, numer-ous algorithms have been proposed. In this process, track-ing benchmarks have played a vital role in objectively eval-

∗Authors make equal contributions to this work.†Corresponding author.

NUS-PROTC-128

OTB-2015OTB-2013

UAV123

UAV20L

VOT-2014VOT-2017

LaSOT

Figure 1. Summaries of existing tracking benchmarks with high-quality dense (per frame) annotations, including OTB-2013 [52],OTB-2015 [53], TC-128 [35], NUS-PRO [28], UAV123 [39],UAV20L [39], VOT-2014 [26], VOT-2017 [27] and LaSOT. Thecircle diameter is in proportion to the number of frames of a bench-mark. The proposed LaSOT is larger than all other benchmarks,and focused on long-term tracking. Best viewed in color.

uating and comparing different trackers. Nevertheless, fur-ther development and assessment of tracking algorithms arerestricted by existing benchmarks with several issues:Small-scale. Deep representations have been popularly ap-plied to modern object tracking algorithms, and demon-strated state-of-the-art performances. However, it is diffi-cult to train a deep tracker using tracking-specific videosdue to the scarcity of large-scale tracking datasets. Asshown in Fig. 1, existing datasets seldom have more than400 sequences. As a result, researchers are restricted toleverage either the pre-trained models (e.g., [46] and [18])from image classification for deep feature extraction or thesequences from video object detection (e.g., [45] and [43])for deep feature learning, which may result in suboptimaltracking performance because of the intrinsic differencesamong different tasks [55]. Moreover, large scale bench-marks are desired for more reliable evaluation results.Lack of high-quality dense annotations. For tracking,

1

arX

iv:1

809.

0784

5v2

[cs

.CV

] 2

7 M

ar 2

019

Page 2: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

Table 1. Comparison of LaSOT with the most popular dense benchmarks in the literatures.Benchmark Videos Min

framesMeanframes

Medianframes

Maxframes

Totalframes

Totalduration

framerate

Absentlabels

Objectclasses

Classbalance

Num. ofattributes

Lingualfeature

OTB-2013 [52] 51 71 578 392 3,872 29K 16.4 min 30 fps 7 10 7 11 7OTB-2015 [53] 100 71 590 393 3,872 59K 32.8 min 30 fps 7 16 7 11 7

TC-128 [35] 128 71 429 365 3,872 55K 30.7 min 30 fps 7 27 7 11 7VOT-2014 [26] 25 164 409 307 1,210 10K 5.7 min 30 fps 7 11 7 n/a 7VOT-2017 [27] 60 41 356 293 1,500 21K 11.9 min 30 fps 7 24 7 n/a 7NUS-PRO [28] 365 146 371 300 5,040 135K 75.2 min 30 fps 7 8 7 n/a 7

UAV123 [39] 123 109 915 882 3,085 113K 62.5 min 30 fps 7 9 7 12 7UAV20L [39] 20 1,717 2,934 2,626 5,527 59K 32.6 min 30 fps 7 5 7 12 7

NfS [14] 100 169 3,830 2,448 20,665 383K 26.6 min 240 fps 7 17 7 9 7GOT-10k [22] 10,000 - - - - 1.5M - 10 fps 3 563 7 6 7

LaSOT 1,400 1,000 2,506 2,053 11,397 3.52M 32.5 hours 30 fps 3 70 3 14 3

dense (i.e., per frame) annotations with high precision areof importance for several reasons. (i) They ensure more ac-curate and reliable evaluations; (ii) they offer desired train-ing samples for the training of tracking algorithms; and(iii) they provide rich temporal contexts among consecu-tive frames that are important for tracking tasks. It is worthnoting that there are recently proposed benchmarks towardlarge-scale and long-term tracking, such as [41] and [51],their annotations are however either semi-automatic (e.g.,generated by a tracking algorithm) or sparse (e.g., labeledevery 30 frames), limiting their usabilities.Short-term tracking. A desired tracker is expected to becapable of locating the target in a relative long period, inwhich the target may disappear and re-enter the view. How-ever, most existing benchmarks have been focused on short-term tracking where the average sequence length is less than600 frames (i.e., 20 seconds for 30 fps, see again Fig. 1) andthe target almost always appears in the video frame. Theevaluations on such short-term benchmarks may not reflectthe real performance of a tracker in real-world applications,and thus restrain the deployment in practice.Category bias. A robust tracking system should exhibitstable performance insensitive to the category the target be-longs to, which signifies that the category bias (or class im-balance) should be inhibited in both training and evaluatingtracking algorithms. However, existing benchmarks usuallycomprise only a few categories (see Tab. 1) with unbalancednumbers of videos.

In the literature, many datasets have been proposed todeal with the issues above: e.g., [39, 51] for long-termtracking, [41] for large-scale, [52, 35, 25] for precise denseannotations. Nevertheless, none of them addresses all theissues, which motivates the proposal of LaSOT.

1.1. Contribution

With the above motivations, we provide the communitya novel benchmark for Large-scale Single Object Tracking(LaSOT) with multi-fold contributions:1) LaSOT consists of 1,400 videos with average 2,512

frames per sequence. Each frame is carefully inspectedand manually labeled, and the result visually double-checked and corrected when needed. This way, we gen-

erate around 3.52 million high-quality bounding box an-notations. Moreover, LaSOT contains 70 categories witheach consisting of twenty sequences. To our knowledge,LaSOT is the largest benchmark with high-quality man-ual dense annotations for object tracking to date. By re-leasing LaSOT, we aim to offer a dedicated platform forthe development and assessment of tracking algorithms.

2) Different from existing datasets, LaSOT provides bothvisual bounding box annotations and rich natural lan-guage specification, which has recently been proven tobe beneficial for various vision tasks (e.g., [21, 31]) in-cluding visual tracking [34]. By doing so, we aim to en-courage and facilitate explorations of integrating visualand lingual features for robust tracking performance.

3) To assess existing trackers and provide extensive base-lines for future comparisons on LaSOT, we evaluate 35representative trackers under different protocols, and an-alyze their performances using different metrics.

2. Related WorkWith considerable progresses in the tracking community,

many trackers and benchmarks have been proposed in re-cent decades. In this section, we mainly focus on the track-ing benchmarks that are relevant to our work, and refer thereaders to surveys [32, 47, 54, 30] for tracking algorithms.

For a systematic review, we intentionally classify track-ing benchmarks into two types: one with dense manual an-notations (referred to as dense benchmark for short) and theother one with sparse and/or (semi-)automatic annotations.In the following, we review each of these two categories.

2.1. Dense Benchmarks

Dense tracking benchmark provides dense bounding boxannotations for each video sequence. To ensure high qual-ity, the bounding boxes are usually manually labeled withcareful inspection. For the visual tracking task, these highlyprecise annotations are desired for both training and assess-ing trackers. Currently, the popular dense benchmarks con-tain OTB [52, 53], TC-128 [35], VOT [25], NUS-PRO [28],UAV [39], NfS [14] and GOT-10k [22].OTB. OTB-2013 [52] firstly contributes a testing dataset

2

Page 3: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

by collecting 51 videos with manually annotated boundingbox in each frame. The sequences are labeled with 11 at-tributes for further analysis of tracking performance. Later,OTB-2013 is extended to the larger OTB-2015 [53] by in-troducing extra 50 sequences.TC-128. TC-128 [35] comprises 128 videos that are specif-ically designated to evaluate color-enhanced trackers. Thevideos are labeled with 11 similar attributes as in OTB [52].VOT. VOT [25] introduces a series of tracking competitionswith up to 60 sequences in each of them, aiming to evalu-ate the performance of a tracker in a relative short duration.Each frame in the VOT datasets is annotated with a rotatedbounding box with several attributes.NUS-PRO. NUS-PRO [28] contains 365 sequences with afocus on human and rigid object tracking. Each sequence inNUS-PRO is annotated with both target location and occlu-sion level for evaluation.UAV. UAV123 and UAV20L [39] are utilized for unmannedaerial vehicle (UAV) tracking, comprising 123 short and 20long sequences, respectively. Both UAV123 and UAV20Lare labeled with 12 attributes.NfS. NfS [14] provides 100 sequences with a high framer-ate of 240 fps, aiming to analyze the effects of appearancevariations on tracking performance.GOT-10k. GOT-10k [22] consists of 10,000 videos, aim-ing to provide rich motion trajectories for developing andevaluating trackers.

LaSOT belongs to the category of dense tracking dataset.Compared to others, LaSOT is the largest with 3.52 millionframes and an average sequence length of 2,512 frames. Inaddition, LaSOT provides extra lingual description for eachvideo while others do not. Tab. 1 provides a detailed com-parison of LaSOT with existing dense benchmarks.

2.2. Other Benchmarks

In addition to the dense tracking benchmarks, there ex-ist other benchmarks which may not provide high-qualityannotations for each frame. Instead, these benchmarks areeither annotated sparsely (e.g., every 30 frames) or labeled(semi-)automatically by tracking algorithms. Representa-tives of this type of benchmarks include ALOV [47], Track-ingNet [41] and OxUvA [51]. ALOV [47] consists of 314sequences labeled in 14 attributes. Instead of densely an-notating each frame, ALOV provides annotations every 5frames. TrackingNet [41] is a subset of the video objectdetection benchmark YT-BB [43] by selecting 30K videos,each of which is annotated by a tracker. Though the trackerused for annotation is proven to be reliable in a short period(i.e., 1 second) on OTB 2015 [53], it is difficult to guaranteethe same performance on a harder benchmark. Besides, theaverage sequence length of TrackingNet does not exceed500 frames, which may not demonstrate the performance ofa tracker in long-term scenarios. OxUvA [51] also comes

from YT-BB [43]. Unlike TrackingNet, OxUvA is focusedon long-term tracking. It contains 366 videos with an av-erage length of around 4,200 frames. However, a problemwith OxUvA is that it does not provide dense annotationsin consecutive frames. Each video in OxUvA is annotatedevery 30 frames, ignoring rich temporal context betweenconsecutive frames when developing a tracking algorithm.

Despite reduction of annotation cost, the evaluations onthese benchmarks may not faithfully reflect the true per-formances of tracking algorithms. Moreover, it may causeproblems for some trackers that need to learn temporal mod-els from annotations, since the temporal context in thesebenchmarks may be either lost due to sparse annotation orinaccurate due to potentially unreliable annotation. By con-trast, LaSOT provides a large set of sequences with high-quality dense bounding box annotations, which makes itmore suitable for developing deep trackers as well as evalu-ating long-term tracking in practical application.

3. The Proposed LaSOT Benchmark3.1. Design Principle

LaSOT aims to offer the community a dedicated datasetfor training and assessing trackers. To such purpose, we fol-low five principles in constructing LaSOT, including large-scale, high-quality dense annotations, long-term tracking,category balance and comprehensive labeling.1) Large-scale. One of the key motivations of LaSOT is to

provide a dataset for training data-hungry deep trackers,which require a large set of annotated sequences. Ac-cordingly, we expect such a dataset to contain at least athousand videos with at least a million frames.

2) High-quality dense annotations. As mentioned before,a tracking dataset is desired to have high-quality densebounding box annotations, which are crucial for trainingrobust trackers as well as for faithful evaluation. For thispurpose, each sequence in LaSOT is manually annotatedwith additional careful inspection and fine-tuning.

3) Long-term tracking. In comparison with short-termtracking, long-term tracking can reflect more practicalperformance of a tracker in the wild. We ensure that eachsequence comprises at least 1,000 frames, and the aver-age sequence length in LaSOT is around 2,500 frames.

4) Category balance. A robust tracker is expected to per-form consistently regardless of the category the targetobject belongs to. For this purpose, in LaSOT we in-clude a diverse set of objects from 70 classes and eachclass contains equal number of videos.

5) Comprehensive labeling. As a complex task, trackinghas recently seen improvements from natural languagespecification. To stimulate more explorations, a princi-ple of LaSOT is to provide comprehensive labeling forvideos, including both visual and lingual annotations.

3

Page 4: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

3.2. Data Collection

Our benchmark covers a wide range of object categoriesin diverse contexts. Specifically, LaSOT consists of 70 ob-ject categories. Most of the categories are selected from the1,000 classes from ImageNet [12], with a few exceptions(e.g., drone) that are carefully chosen for popular trackingapplications. Different from existing dense benchmarks thathave less than 30 categories and typically are unevenly dis-tributed, LaSOT provides the same number of sequences foreach category to alleviate potential category bias. Details ofthe dataset can be found in the supplementary material.

After determining the 70 object categories in LaSOT, wehave searched for the videos of each class from YouTube.Initially, we collect over 5,000 videos. With a joint consid-eration of the quality of videos for tracking and the designprinciples of LaSOT, we pick out 1,400 videos. However,these 1,400 sequences are not immediately available for thetracking task because of a large amount of irrelevant con-tents. For example, for the video of person category (e.g., asporter), it often contains some introduction content of eachsporter in the beginning, which is undesirable for tracking.Therefore, we carefully filter out these unrelated contents ineach video and retain an usable clip for tracking. In addi-tion, each category in LaSOT consists of 20 targets, reflect-ing the category balance and varieties of natural scenes.

Eventually, we have compiled a large-scale dataset bygathering 1,400 sequences with 3.52 million frames fromYouTube under Creative Commons licence. The averagevideo length of LaSOT is 2,512 frames (i.e., 84 seconds for30 fps). The shortest video contains 1,000 frames (i.e., 33seconds), while the longest one consists of 11,397 frames(i.e., 378 seconds).

3.3. Annotation

In order to provide consistent bounding box annotation,we define a deterministic annotation strategy. Given a videowith a specific tracking target, for each frame, if the targetobject appears in the frame, a labeler manually draws/editsits bounding box as the tightest up-right one to fit any vis-ible part of the target; otherwise, the labeler gives an ab-sent label, either out-of-view or full occlusion, to the frame.Note that, such strategy can not guarantee to minimize thebackground area in the box, as observed in any other bench-marks. However, the strategy does provide a consistent an-notation that is relatively stable for learning the dynamics.

While the above strategy works great most of the time,exceptions exist. Some objects, e.g. a mouse, may have longand thin and highly deformable part, e.g. a tail, which notonly causes serious noise in object appearance and shape,but also provides little information for localizing of the tar-get object. We carefully identify such objects and associ-ated videos in LaSOT, and design specific rules for their an-notation (e.g., exclude the tails of mice when drawing their

Bear-12: “white bear walking on grass around the river bank”

Bus-19: “red bus running on the highway”

Horse-1: “brown horse running on the ground”

Person-14: “boy in black suit dancing in front of people”

Mouse-6: “white mouse moving on the ground around another white mouse”

Figure 2. Example sequences and annotations of our LaSOT. Wefocus on long-term videos in which target objects may disappear,and then re-enter the view again. In addition, we provide naturallanguage specification for each sequence. Best viewed in color.

initial annotation fine-tuned annotation

Figure 3. Examples of fine-tuning initial annotations.

bounding boxes). An example of such cases is shown in thelast row of Fig. 2.

The natural language specification of a sequence is repre-sented by a sentence that describes the color, behavior andsurroundings of the target. For LaSOT, we provide 1,400sentences for all videos. Note that the lingual descriptionaims to provide auxiliary help for tracking. For instance,if a tracker generates proposals for further processing, thelingual specification can assist in reducing the ambiguityamong them by serving as a global semantic guidance.

The greatest effort for constructing a high-quality densetracking dataset is, apparently, the manual labeling, double-checking, and error correcting. For this task, we have as-sembled an annotation team containing several Ph.D. stu-dents working on related areas and about 10 volunteers. Toguarantee high-quality annotation, each video is processedby teams: a labeling team and a validation team. A labelingteam is composed of a volunteer and an expert (Ph.D. stu-dent). The volunteer manually draws/edits the target bound-ing box in each frame, and the expert inspects the resultsand adjusts them if necessary. Then, the annotation resultsare reviewed by the validation team containing several (typ-

4

Page 5: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

Table 2. Descriptions of 14 different attributes in LaSOT.Attribute Definition Attribute Definition

CM Abrupt motion of the camera VC Viewpoint affects target appearance significantlyROT The target rotates in the image SV The ratio of bounding box is outside the rage [0.5, 2]DEF The target is deformable during tracking BC The background has the similar appearance as the targetFOC The target is fully occluded in the sequence MB The target region is blurred due to target or camera motion

IV The illumination in the target region changes ARC The ratio of bounding box aspect ratio is outside the rage [0.5, 2]OV The target completely leaves the video frame LR The target box is smaller than 1000 pixels in at least one frame

POC The target is partially occluded in the sequence FM The motion of the target is larger than the size of its bounding box

(a) Distribution of sequences in each attribute on LaSOT (b) Distribution comparison in common attributes on different benchmarks

Figure 4. Distribution of sequences in each attribute on LaSOT and comparison with other benchmarks. Best viewed in color.

ically three) experts. If an annotation result is not unani-mously agreed by the members of validation team, it willbe sent back to the original labeling team to revise.

To improve the annotation quality as much as possible,our team checks the annotation results very carefully andrevises them frequently. Around 40% of the initial annota-tions fail in the first round of validation. And many framesare revised more than three times. Some challenging exam-ples of frames that are initially labeled incorrectly or inac-curately are given in Fig. 3. With all these efforts, we finallyreach a benchmark with high-quality dense annotation, withsome examples shown in Fig. 2.

3.4. Attributes

To enable further performance analysis of trackers, welabel each sequence with 14 attributes, including illumi-nation variation (IV), full occlusion (FOC), partial occlu-sion (POC), deformation (DEF), motion blur (MB), fast mo-tion (FM), scale variation (SV), camera motion (CM), rota-tion (ROT), background clutter (BC), low resolution (LR),viewpoint change (VC), out-of-view (OV) and aspect ratiochange (ARC). The attributions are defined in Tab. 2, andFig. 4 (a) shows the distribution of videos in each attribute.

From Fig. 4 (a), we observe that the most common chal-lenge factors in LaSOT are scale changes (SV and ARC),occlusion (POC and FOC), deformation (DEF) and rota-tion (ROT), which are well-known challenges for trackingin real-world applications. Besides, Fig. 4 (b) demonstratesthe distribution of attributes of LaSOT compared to OTB-2015 [53] and TC-128 [35] on overlapping attributes. Fromthe figure we observe that more than 1,300 videos in La-SOT are involved with scale variations. Compared with

OTB-2015 and TC-128 with less than 70 videos with scalechanges, LaSOT is more challenging for scale changes. Inaddition, on the out-of-view attribute, LaSOT comprises477 sequences, much larger than existing benchmarks.

3.5. Evaluation Protocols

Though there is no restriction to use LaSOT, we suggesttwo evaluation protocols for evaluating tracking algorithms,and conduct evaluations accordingly.Protocol I. In protocol I, we use all 1,400 sequences to eval-uate tracking performance. Researchers are allowed to em-ploy any sequences except for those in LaSOT to developtracking algorithms. Protocol I aims to provide large-scaleevaluation of trackers.Protocol II. In protocol II, we split LaSOT into training andtesting subsets. According to the 80/20 principle (i.e., thePareto principle), we select 16 out of 20 videos in each cate-gory for training, and the rest is for testing1. In specific, thetraining subset contains 1,120 videos with 2.83M frames,and the testing subset consists of 280 sequences with 690Kframes. The evaluation of trackers is performed on the test-ing subset. Protocol II aims to provide a large set of videosfor training and assessing trackers in the mean time.

4. Evaluation4.1. Evaluation Metric

Following popular protocols (e.g. OTB-2015 [53]), weperform an One-Pass Evaluation (OPE) and measure theprecision, normalized precision and success of differenttracking algorithms under two protocols.

1The training/testing split is shown in the supplementary material.

5

Page 6: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

The precision is computed by comparing the distancebetween tracking result and groundtruth bounding box inpixels. Different trackers are ranked with this metric ona threshold (e.g., 20 pixels). Since the precision metric issensitive to target size and image resolution, we normalizethe precision as in [41]. With the normalized precision met-ric, we rank tracking algorithms using the Area Under theCurve (AUC) between 0 to 0.5. Please refer to [41] aboutthe normalized precision metric. The success is computedas the Intersection over Union (IoU) between tracking resultand groundtruth bounding box. The tracking algorithms areranked using the AUC between 0 to 1.

4.2. Evaluated TrackersWe evaluate 35 algorithms on LaSOT to provide exten-

sive baselines, comprising deep trackers (e.g., MDNet [42],TRACA [5], CFNet [50], SiamFC [4], StructSiam [59],DSiam [16], SINT [49] and VITAL [48]), correlation fil-ter trackers with hand-crafted features (e.g., ECO HC [7],DSST [8], CN [11], CSK [19], KCF [20], fDSST [9],SAMF [33], SCT4 [6], STC [57] and Staple [3]) or deepfeatures (e.g., HCFT [37] and ECO [7]) and regularizationtechniques (e.g., BACF [15], SRDCF [10], CSRDCF [36],Staple CA [40] and STRCF [29]), ensemble trackers (e.g.,PTAV [13], LCT [38], MEEM [56] and TLD [24]), sparsetrackers (e.g., L1APG [2] and ASLA [23]), other represen-tatives (e.g., CT [58], IVT [44], MIL [1] and Struck [17]).Tab. 3 summarizes these trackers with their representationschemes and search strategies in a chronological order.

4.3. Evaluation Results with Protocol IOverall performance. Protocol I aims at providing large-scale evaluations on all 1,400 videos in LaSOT. Each trackeris used as it is for evaluation, without any modification. Wereport the evaluation results in OPE using precision, nor-malized precision and success, as shown in Fig. 5. MD-Net achieves the best precision score of 0.374 and successscore of 0.413, and VITAL obtains the best normalized pre-cision score of 0.484. Both MDNet and VITAL are trainedin an online fashion, resulting in expensive computation andslow running speeds. SimaFC tracker, which learns off-linea matching function from a large set of videos using deepnetwork, achieves competitive results with 0.341 precisionscore, 0.449 normalized precision score and 0.358 successscore, respectively. Without time-consuming online modeladaption, SiamFC runs efficiently in real-time. The bestcorrelation filter tracker is ECO with 0.298 precision score,0.358 normalized precision score and 0.34 success score.

Compared to the typical tracking performances on ex-isting dense benchmarks (e.g., OTB-2015 [53]), the perfor-mances on LaSOT are severely degraded because of a largemount of non-rigid target objects and challenging factors in-volved in LaSOT. An interesting observation from Fig. 5 isthat all the top seven trackers leverage deep feature, demon-

Table 3. Summary of evaluated trackers. Representation: Sparse -Sparse Representation, Color - Color Names or Histograms, Pixel- Pixel Intensity, HoG - Histogram of Oriented Gradients, H or B -Haar or Binary, Deep - Deep Feature. Search: PF - Particle Filter,RS - Random Sampling, DS - Dense Sampling.

Representation Search

PCA

Spar

se

Col

or

Pixe

l

HoG

Hor

B

Dee

p

PF RS

DS

IVT [44] IJCV08 3 3MIL [1] CVPR09 H 3

Struck [17] ICCV11 H 3L1APG [2] CVPR12 3 3ASLA [23] CVPR12 3 3

CSK [19] ECCV12 3 3CT [58] ECCV12 H 3

TLD [24] PAMI12 B 3CN [11] CVPR14 3 3 3

DSST [8] BMVC14 3 3 3MEEM [56] ECCV14 3 3

STC [57] ECCV14 3 3SAMF [33] ECCVW14 3 3 3 3

LCT [38] CVPR15 3 3 3SRDCF [10] ICCV15 3 3

HCFT [37] ICCV15 3 3KCF [20] PAMI15 3 3Staple [3] CVPR16 3 3 3SINT [49] CVPR16 3 3SCT4 [6] CVPR16 3 3

MDNet [42] CVPR16 3 3SiamFC [4] ECCVW16 3 3

Staple CA[40] CVPR17 3 3 3ECO HC [7] CVPR17 3 3

ECO [7] CVPR17 3 3CFNet [50] CVPR17 3 3

CSRDCF [36] CVPR17 3 3 3 3PTAV [13] ICCV17 3 3 3 3

DSiam [16] ICCV17 3 3BACF [15] ICCV17 3 3fDSST [9] PAMI17 3 3 3

VITAL [48] CVPR18 3 3TRACA [5] CVPR18 3 3STRCF [29] CVPR18 3 3

StructSiam [59] ECCV18 3 3

strating its advantages in handling appearance changes.Attribute-based performance. To analyze different chal-lenges faced by existing trackers, we evaluate all trackingalgorithms on 14 attributes. We show the results on threemost challenging attributes, i.e., fast motion, out-of-viewand full occlusion, in Fig. 6 and refer the readers to supple-mentary material for detailed attribute-based evaluation.Qualitative evaluation. To qualitatively analyze differenttrackers and provide guidance for future research, we showthe qualitative evaluation results of six representative track-ers, including MDNet, SiamFC, ECO, PTAV, Staple andMEEM, in six typical hard challenges containing fast mo-tion, full occlusion, low resolution, out-of-view, aspect ra-tio change and background clutter in Fig. 7. From Fig. 7,we observe that, for videos with fast motion, full occlusionand out-of-view (e.g., Yoyo-3, Goldfish-4 and Basketball-15), the trackers are prone to lose the target because existingtrackers usually perform localization from a small local re-gion. To handle these challenges, a potential solution is toleverage an instance-specific detector to locate the target forsubsequent tracking. Trackers easily drift in video with low

6

Page 7: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE on LaSOT

[0.374] MDNet

[0.372] VITAL

[0.341] SiamFC

[0.340] StructSiam

[0.329] DSiam

[0.299] SINT

[0.298] ECO

[0.292] STRCF

[0.272] ECO_HC

[0.265] CFNet

[0.250] HCFT

[0.243] PTAV

[0.239] BACF

[0.237] TRACA

[0.231] Staple

[0.231] CSRDCF

[0.231] Staple_CA

[0.227] SRDCF

[0.224] MEEM

[0.214] SAMF

[0.196] DSST

[0.193] LCT

[0.192] fDSST

[0.186] SCT4

[0.185] Struck

[0.184] KCF

[0.184] TLD

[0.170] ASLA

[0.158] CN

[0.154] L1APG

[0.142] STC

[0.136] CSK

[0.131] IVT

[0.102] CT

[0.078] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE on LaSOT

[0.484] VITAL

[0.481] MDNet

[0.449] SiamFC

[0.443] StructSiam

[0.432] DSiam

[0.383] SINT

[0.358] ECO

[0.353] STRCF

[0.330] CFNet

[0.327] ECO_HC

[0.311] HCFT

[0.307] BACF

[0.298] Staple_CA

[0.298] TRACA

[0.298] MEEM

[0.297] Staple

[0.283] PTAV

[0.279] CSRDCF

[0.279] SRDCF

[0.271] SAMF

[0.242] LCT

[0.239] DSST

[0.236] fDSST

[0.231] SCT4

[0.228] KCF

[0.228] Struck

[0.224] TLD

[0.213] ASLA

[0.194] L1APG

[0.194] CN

[0.176] STC

[0.174] CSK

[0.173] IVT

[0.145] CT

[0.117] MIL

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE on LaSOT

[0.413] MDNet

[0.412] VITAL

[0.358] SiamFC

[0.356] StructSiam

[0.353] DSiam

[0.340] ECO

[0.339] SINT

[0.315] STRCF

[0.311] ECO_HC

[0.296] CFNet

[0.285] TRACA

[0.280] MEEM

[0.277] BACF

[0.272] HCFT

[0.271] SRDCF

[0.269] PTAV

[0.266] Staple

[0.263] CSRDCF

[0.262] Staple_CA

[0.258] SAMF

[0.246] LCT

[0.234] Struck

[0.233] DSST

[0.232] fDSST

[0.228] TLD

[0.214] SCT4

[0.211] ASLA

[0.211] KCF

[0.186] CN

[0.178] CT

[0.172] CSK

[0.168] L1APG

[0.163] MIL

[0.151] STC

[0.136] IVT

Figure 5. Evaluation results on LaSOT under protocol I using precision, normalized precision and success. Best viewed in color.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Fast Motion (296)

[0.250] MDNet

[0.245] VITAL

[0.217] ECO

[0.214] SINT

[0.208] DSiam

[0.205] StructSiam

[0.200] SiamFC

[0.183] ECO_HC

[0.180] STRCF

[0.168] MEEM

[0.157] CFNet

[0.150] TLD

[0.150] PTAV

[0.149] SRDCF

[0.146] TRACA

[0.145] BACF

[0.140] CSRDCF

[0.138] HCFT

[0.130] Staple

[0.129] LCT

[0.127] Staple_CA

[0.126] SAMF

[0.121] fDSST

[0.117] Struck

[0.109] DSST

[0.109] SCT4

[0.097] KCF

[0.092] ASLA

[0.085] CN

[0.078] CT

[0.074] CSK

[0.073] MIL

[0.069] STC

[0.069] L1APG

[0.056] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Full Occlusion (542)

[0.304] MDNet

[0.304] VITAL

[0.260] StructSiam

[0.260] DSiam

[0.255] SINT

[0.253] SiamFC

[0.247] ECO

[0.218] ECO_HC

[0.215] STRCF

[0.213] MEEM

[0.197] TRACA

[0.195] CFNet

[0.189] SRDCF

[0.182] PTAV

[0.182] HCFT

[0.180] TLD

[0.176] BACF

[0.173] CSRDCF

[0.172] SAMF

[0.166] LCT

[0.164] Staple

[0.164] Staple_CA

[0.162] Struck

[0.151] fDSST

[0.151] SCT4

[0.142] ASLA

[0.139] KCF

[0.138] DSST

[0.119] CN

[0.117] CT

[0.108] MIL

[0.108] CSK

[0.105] L1APG

[0.097] STC

[0.079] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Low Resolution (661)

[0.308] VITAL

[0.308] MDNet

[0.263] ECO

[0.259] DSiam

[0.257] StructSiam

[0.254] SiamFC

[0.245] SINT

[0.224] STRCF

[0.220] ECO_HC

[0.197] MEEM

[0.192] CFNet

[0.189] PTAV

[0.187] TRACA

[0.186] SRDCF

[0.184] BACF

[0.180] HCFT

[0.176] CSRDCF

[0.174] TLD

[0.168] Staple

[0.162] Staple_CA

[0.159] LCT

[0.158] SAMF

[0.155] Struck

[0.151] fDSST

[0.137] DSST

[0.133] SCT4

[0.129] ASLA

[0.125] KCF

[0.114] CN

[0.101] CT

[0.101] L1APG

[0.096] CSK

[0.093] STC

[0.093] MIL

[0.070] IVT

Figure 6. Performances of trackers on three most challenging attributes under protocol I using success. Best viewed in color.

MEEM Staple PTAV ECO SiamFC MDNet GT

Figure 7. Qualitative evaluation in six typical hard challenges: Yoyo-3 with fast motion, Goldfish-4 with full occlusion, Pool-4 with low-resolution, Basketball-15 with out-of-view, Train-1 with aspect ration change and Person-2 with background clutter. Best viewed in color.

resolution (e.g., Pool-4) due to the ineffective representationfor small target. A solution for deep feature based trackersis to combine features from multiple scales to incorporatedetails into representation. Video with aspect ratio changeis difficult as most existing trackers either ignore this issueor adopt a simple method (e.g., random search or pyramidstrategy) to deal with it. Inspired from the success of deeplearning based object detection, a generic regressor can beleveraged to reduce the effect of aspect ratio change (andscale change) on tracking. For sequence with background

clutter, trackers drift due to less discriminative representa-tion for target and background. A possible solution to alle-viate this problem is to utilize the contextual information toenhance the discriminability.

4.4. Evaluation Results with Protocol IIUnder protocol II, we split LaSOT into training and test-

ing sets. Researchers are allowed to leverage the sequencesin the training set to develop their trackers and assess theirperformances on the test set. In order to provide baselines

7

Page 8: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE on LaSOT Testing Set

[0.373] MDNet

[0.360] VITAL

[0.339] SiamFC

[0.333] StructSiam

[0.322] DSiam

[0.301] ECO

[0.298] STRCF

[0.295] SINT

[0.279] ECO_HC

[0.259] CFNet

[0.254] PTAV

[0.241] HCFT

[0.239] Staple

[0.239] BACF

[0.235] Staple_CA

[0.227] TRACA

[0.227] MEEM

[0.220] CSRDCF

[0.219] SRDCF

[0.203] SAMF

[0.190] LCT

[0.189] DSST

[0.184] fDSST

[0.181] Struck

[0.179] SCT4

[0.174] TLD

[0.168] ASLA

[0.166] KCF

[0.163] CN

[0.155] L1APG

[0.137] STC

[0.131] CSK

[0.122] IVT

[0.101] CT

[0.077] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE on LaSOT Testing Set

[0.460] MDNet

[0.453] VITAL

[0.420] SiamFC

[0.418] StructSiam

[0.405] DSiam

[0.354] SINT

[0.340] STRCF

[0.338] ECO

[0.320] ECO_HC

[0.312] CFNet

[0.286] HCFT

[0.283] BACF

[0.278] TRACA

[0.278] Staple

[0.274] PTAV

[0.270] Staple_CA

[0.265] MEEM

[0.254] CSRDCF

[0.248] SRDCF

[0.239] SAMF

[0.213] DSST

[0.209] LCT

[0.208] fDSST

[0.207] Struck

[0.200] SCT4

[0.193] TLD

[0.190] KCF

[0.189] ASLA

[0.177] CN

[0.174] L1APG

[0.156] STC

[0.152] IVT

[0.149] CSK

[0.122] CT

[0.097] MIL

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE on LaSOT Testing Set

[0.397] MDNet

[0.390] VITAL

[0.336] SiamFC

[0.335] StructSiam

[0.333] DSiam

[0.324] ECO

[0.314] SINT

[0.308] STRCF

[0.304] ECO_HC

[0.275] CFNet

[0.259] BACF

[0.257] TRACA

[0.257] MEEM

[0.250] HCFT

[0.250] PTAV

[0.245] SRDCF

[0.244] CSRDCF

[0.243] Staple

[0.238] Staple_CA

[0.233] SAMF

[0.221] LCT

[0.212] Struck

[0.210] TLD

[0.207] DSST

[0.203] fDSST

[0.194] ASLA

[0.191] SCT4

[0.178] KCF

[0.170] CN

[0.158] CT

[0.155] L1APG

[0.149] CSK

[0.139] MIL

[0.138] STC

[0.118] IVT

Figure 8. Evaluation results on LaSOT under protocol II using precision, normalized precision and success. Best viewed in color.

and comparisons on the testing set, we evaluate the 35 track-ing algorithms. Each tracker is used as it is for evalua-tion without any modification or re-training. The evalua-tion results are shown in Fig. 8 using precision, normal-ized precision and success. We observe consistent results asin protocol I. MDNet and VITAL show top performanceswith precision scores of 0.373 and 0.36, normalized preci-sion scores of 0.46 and 0.453 and success scores of 0.397and 0.39. Next, SiamFC achieves the third-ranked perfor-mance with a 0.339 precision score, a 0.42 normalized pre-cision score and a 0.336 success score, respectively. Despiteslightly lower scores in accuracy than MDNet and VITAL,SiamFC runs much faster and achieves real-time runningspeed, showing good balance between accuracy and effi-ciency. For attribute-based evaluation of trackers on LaSOTtesting set, we refer the readers to supplementary materialbecause of limited space.

In addition to evaluating each tracking algorithm as it is,we conduct experiments by re-training two representativedeep trackers, MDNet [42] and SiamFC [4], on the trainingset of LaSOT and assessing them. The evaluation resultsshow similar performances for these trackers as without re-training. A potential reason is that our re-training may notfollow the same configurations used by the original authors.Besides, since LaSOT are in general more challenging thanprevious datasets (e.g., all sequences are long-term), dedi-cated configuration may be needed for training these track-ers. We leave this part as a future work since it is beyondthe scope of this benchmark.

4.5. Retraining Experiment on LaSOTWe conduct the experiment by retraining SiamFC [4] on

the training set of LaSOT to demonstrate how deep learningbased tracker is improved using more data. Tab. 4 reportsthe results on OTB-2013 [52] and OTB-2015 [53] and com-parisons with the performance of original SiamFC trainedon ImageNet Video [45]. Note that, we utilize color imagesfor training, and apply a pyramid with 3 scales for track-ing, i.e., SiamFC-3s (color). All parameters for training and

Table 4. Retraining of SiamFC [4] on LaSOT.SiamFC-3s (color)

Training dataImageNetVideo [45]

LaSOTtraining set

OTB-2013 [52]Precision 0.803 0.816 (↑1.3%)

Success 0.588 0.608 (↑2.0%)

OTB-2015 [53]Precision 0.756 0.777 (↑2.1%)

Success 0.565 0.582 (↑1.7%)

tracking are kept the same in these two experiments. FromTab. 4, we observe consistent performance gains on the twobenchmarks, showing the importance of specific large-scaletraining set for deep trackers.

5. ConclusionWe present LaSOT with high-quality dense bounding

box annotations for visual object tracking. To the best ofour knowledge, LaSOT is the largest tracking benchmarkwith high quality annotations to date. By releasing LaSOT,we expect to provide the tracking community a dedicatedplatform for training deep trackers and assessing long-termtracking performance. Besides, LaSOT provides lingual an-notations for each sequence, aiming to encourage the explo-ration on integrating visual and lingual features for robusttracking. By releasing LaSOT, we hope to narrow the gapbetween the increasing number of deep trackers and the lackof large dedicated datasets for training, and meanwhile pro-vide more veritable evaluations for different trackers in thewild. Extensive evaluations on LaSOT under two protocolsimply a large room to improvement for visual tracking.Acknowledgement. We sincerely thank B. Huang, X. Li, Q.Zhou, L. Chen, J. Liang, J. Wang and anonymous volunteersfor their help in constructing LaSOT. This work is supported inpart by the China National Key Research and Development Plan(Grant No. 2016YFB1001200), and in part by US NSF Grants1618398, 1449860 and 1350521, and Yong Xu thanks the sup-ports by National Nature Science Foundation of China (U1611461and 61672241), the Cultivation Project of Major Basic Research ofNSF-Guangdong Province (2016A030308013).

8

Page 9: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

References[1] Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. Vi-

sual tracking with online multiple instance learning. InCVPR, 2009. 6

[2] Chenglong Bao, Yi Wu, Haibin Ling, and Hui Ji. Realtime robust l1 tracker using accelerated proximal gradientapproach. In CVPR, 2012. 6

[3] Luca Bertinetto, Jack Valmadre, Stuart Golodetz, OndrejMiksik, and Philip HS Torr. Staple: Complementary learnersfor real-time tracking. In CVPR, 2016. 6

[4] Luca Bertinetto, Jack Valmadre, Joao F Henriques, AndreaVedaldi, and Philip HS Torr. Fully-convolutional siamesenetworks for object tracking. In ECCVW, 2016. 6, 8

[5] Jongwon Choi, Hyung Jin Chang, Tobias Fischer, SangdooYun, Kyuewang Lee, Jiyeoup Jeong, Yiannis Demiris, andJin Young Choi. Context-aware deep feature compressionfor high-speed visual tracking. In CVPR, 2018. 6

[6] Jongwon Choi, Hyung Jin Chang, Jiyeoup Jeong, Yian-nis Demiris, and Jin Young Choi. Visual tracking usingattention-modulated disintegration and integration. In CVPR,2016. 6

[7] Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, andMichael Felsberg. Eco: Efficient convolution operators fortracking. In CVPR, 2017. 6

[8] Martin Danelljan, Gustav Hager, Fahad Khan, and MichaelFelsberg. Accurate scale estimation for robust visual track-ing. In BMVC, 2014. 6

[9] Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, andMichael Felsberg. Discriminative scale space tracking.TPAMI, 39(8):1561–1575, 2017. 6

[10] Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, andMichael Felsberg. Learning spatially regularized correlationfilters for visual tracking. In ICCV, 2015. 6

[11] Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg,and Joost Van de Weijer. Adaptive color attributes for real-time visual tracking. In CVPR, 2014. 6

[12] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li,and Li Fei-Fei. Imagenet: A large-scale hierarchical imagedatabase. In CVPR, 2009. 4, 12

[13] Heng Fan and Haibin Ling. Parallel tracking and verifying:A framework for real-time and high accuracy visual tracking.In ICCV, 2017. 6

[14] Hamed Kiani Galoogahi, Ashton Fagg, Chen Huang, DevaRamanan, and Simon Lucey. Need for speed: A benchmarkfor higher frame rate object tracking. In ICCV, 2017. 2, 3,11

[15] Hamed Kiani Galoogahi, Ashton Fagg, and Simon Lucey.Learning background-aware correlation filters for visualtracking. In ICCV, 2017. 6

[16] Qing Guo, Wei Feng, Ce Zhou, Rui Huang, Liang Wan, andSong Wang. Learning dynamic siamese network for visualobject tracking. In ICCV, 2017. 6

[17] Sam Hare, Amir Saffari, and Philip H. S. Torr. Struck: Struc-tured output tracking with kernels. In ICCV, 2011. 6

[18] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.Deep residual learning for image recognition. In CVPR,2016. 1

[19] Joao F Henriques, Rui Caseiro, Pedro Martins, and JorgeBatista. Exploiting the circulant structure of tracking-by-detection with kernels. In ECCV, 2012. 6

[20] Joao F Henriques, Rui Caseiro, Pedro Martins, and JorgeBatista. High-speed tracking with kernelized correlation fil-ters. TPAMI, 37(3):583–596, 2015. 6

[21] Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng,Kate Saenko, and Trevor Darrell. Natural language objectretrieval. In CVPR, 2016. 2

[22] Lianghua Huang, Xin Zhao, and Kaiqi Huang. Got-10k: Alarge high-diversity benchmark for generic object tracking inthe wild. arXiv:1810.11981, 2018. 2, 3

[23] Xu Jia, Huchuan Lu, and Ming-Hsuan Yang. Visual track-ing via adaptive structural local sparse appearance model. InCVPR, 2012. 6

[24] Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas.Tracking-learning-detection. TPAMI, 34(7):1409–1422,2012. 6

[25] Matej Kristan, Jiri Matas, Ales Leonardis, Tomas Vojır,Roman Pflugfelder, Gustavo Fernandez, Georg Nebehay,Fatih Porikli, and Luka Cehovin. A novel performanceevaluation methodology for single-target trackers. TPAMI,38(11):2137–2155, 2016. 2, 3

[26] Matej Kristan et al. The visual object tracking vot2014 chal-lenge results. In ECCVW, 2014. 1, 2

[27] Matej Kristan et al. The visual object tracking vot2017 chal-lenge results. In ICCVW, 2017. 1, 2, 11

[28] Annan Li, Min Lin, Yi Wu, Ming-Hsuan Yang, andShuicheng Yan. Nus-pro: A new visual tracking challenge.TPAMI, 38(2):335–349, 2016. 1, 2, 3, 11

[29] Feng Li, Cheng Tian, Wangmeng Zuo, Lei Zhang, and Ming-Hsuan Yang. Learning spatial-temporal regularized correla-tion filters for visual tracking. In CVPR, 2018. 6

[30] Peixia Li, Dong Wang, Lijun Wang, and Huchuan Lu. Deepvisual tracking: Review and experimental comparison. PR,76:323–338, 2018. 2

[31] Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, DayuYue, and Xiaogang Wang. Person search with natural lan-guage description. In CVPR, 2017. 2

[32] Xi Li, Weiming Hu, Chunhua Shen, Zhongfei Zhang, An-thony Dick, and Anton Van Den Hengel. A survey of appear-ance models in visual object tracking. ACM TIST, 4(4):58,2013. 1, 2

[33] Yang Li and Jianke Zhu. A scale adaptive kernel correlationfilter tracker with feature integration. In ECCVW, 2014. 6

[34] Zhenyang Li, Ran Tao, Efstratios Gavves, Cees GM Snoek,Arnold WM Smeulders, et al. Tracking by natural languagespecification. In CVPR, 2017. 2

[35] Pengpeng Liang, Erik Blasch, and Haibin Ling. Encodingcolor information for visual tracking: Algorithms and bench-mark. TIP, 24(12):5630–5644, 2015. 1, 2, 3, 5, 11, 12

[36] Alan Lukezic, Tomas Vojir, Luka Cehovin Zajc, Jiri Matas,and Matej Kristan. Discriminative correlation filter withchannel and spatial reliability. In CVPR, 2017. 6

[37] Chao Ma, Jia-Bin Huang, Xiaokang Yang, and Ming-HsuanYang. Hierarchical convolutional features for visual tracking.In ICCV, 2015. 6

9

Page 10: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

[38] Chao Ma, Xiaokang Yang, Chongyang Zhang, and Ming-Hsuan Yang. Long-term correlation tracking. In CVPR,2015. 6

[39] Matthias Mueller, Neil Smith, and Bernard Ghanem. Abenchmark and simulator for uav tracking. In ECCV, 2016.1, 2, 3, 11

[40] Matthias Mueller, Neil Smith, and Bernard Ghanem.Context-aware correlation filter tracking. In CVPR, 2017.6

[41] Matthias Muller, Adel Bibi, Silvio Giancola, Salman Al-Subaihi, and Bernard Ghanem. Trackingnet: A large-scaledataset and benchmark for object tracking in the wild. InECCV, 2018. 2, 3, 6

[42] Hyeonseob Nam and Bohyung Han. Learning multi-domainconvolutional neural networks for visual tracking. In CVPR,2016. 6, 8

[43] Esteban Real, Jonathon Shlens, Stefano Mazzocchi, Xin Pan,and Vincent Vanhoucke. Youtube-boundingboxes: A largehigh-precision human-annotated data set for object detectionin video. In CVPR, 2017. 1, 3

[44] David A Ross, Jongwoo Lim, Ruei-Sung Lin, and Ming-Hsuan Yang. Incremental learning for robust visual tracking.IJCV, 77(1-3):125–141, 2008. 6

[45] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San-jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy,Aditya Khosla, Michael Bernstein, et al. Imagenet largescale visual recognition challenge. IJCV, 115(3):211–252,2015. 1, 8

[46] Karen Simonyan and Andrew Zisserman. Very deep convo-lutional networks for large-scale image recognition. In ICLR,2015. 1

[47] Arnold WM Smeulders, Dung M Chu, Rita Cucchiara, Si-mone Calderara, Afshin Dehghan, and Mubarak Shah. Vi-sual tracking: An experimental survey. TPAMI, 36(7):1442–1468, 2014. 1, 2, 3

[48] Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, LinchaoBao, Wangmeng Zuo, Chunhua Shen, Rynson Lau, andMing-Hsuan Yang. Vital: Visual tracking via adversariallearning. In CVPR, 2018. 6

[49] Ran Tao, Efstratios Gavves, and Arnold WM Smeulders.Siamese instance search for tracking. In CVPR, 2016. 6

[50] Jack Valmadre, Luca Bertinetto, Joao Henriques, AndreaVedaldi, and Philip HS Torr. End-to-end representationlearning for correlation filter based tracking. In CVPR, 2017.6

[51] Jack Valmadre, Luca Bertinetto, Joao F Henriques, Ran Tao,Andrea Vedaldi, Arnold Smeulders, Philip Torr, and Efstra-tios Gavves. Long-term tracking in the wild: A benchmark.In ECCV, 2018. 2, 3

[52] Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. Online objecttracking: A benchmark. In CVPR, 2013. 1, 2, 3, 8

[53] Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. Object track-ing benchmark. TPAMI, 37(9):1834–1848, 2015. 1, 2, 3, 5,6, 8, 11

[54] Alper Yilmaz, Omar Javed, and Mubarak Shah. Object track-ing: A survey. ACM CSUR, 38(4):13, 2006. 1, 2

[55] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson.How transferable are features in deep neural networks? InNIPS, 2014. 1

[56] Jianming Zhang, Shugao Ma, and Stan Sclaroff. Meem: ro-bust tracking via multiple experts using entropy minimiza-tion. In ECCV, 2014. 6

[57] Kaihua Zhang, Lei Zhang, Qingshan Liu, David Zhang, andMing-Hsuan Yang. Fast visual tracking via dense spatio-temporal context learning. In ECCV, 2014. 6

[58] Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang. Real-timecompressive tracking. In ECCV, 2012. 6

[59] Yunhua Zhang, Lijun Wang, Jinqing Qi, Dong Wang,Mengyang Feng, and Huchuan Lu. Structured siamese net-work for real-time visual tracking. In ECCV, 2018. 6

10

Page 11: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking—– Supplementary Material —–

Table 5. Details of 70 object categories in LaSOT and comparison with existing dense benchmark. Best viewed when zoomed-in.NUS-PRO [28] OTB-2015 [53] TC-128 [35] UAV123 [39] VOT-17 [27] NfS [14] LaSOTclass # entries class # entries class # entries class # entries class # entries class # entries class # entries

person 193 person 36 person 45 person 48 person 19 ball 21 airplane 20head 60 head 26 head 16 car 30 head 5 person 20 basketball 20car 31 car 12 sphere 8 drone 10 fish 4 animal 10 bear 20

airplane 20 toy 8 2D print 5 wakeboard 10 motorcycle 4 vehicle 9 bicycle 20boat 20 2D print 4 bicycle 5 boat 9 car 3 shuffeboard 8 bird 20

helicopter 20 cuboid 3 car 5 building 5 drone 3 face 6 boat 20motorcycle 20 bird 2 ball 4 truck 5 ant 2 cup 4 book 20

drone 1 motorcycle 1 toy 4 bicycle 3 ball 2 dollar 4 bottle 20- - deer 1 hand 3 bird 3 bird 2 aircraft 4 bus 20- - bottle 1 kite 3 - - toy 2 airboard 2 car 20- - panda 1 logo 3 - - bag 1 fish 2 cat 20- - board 1 cuboid 3 - - book 1 motorcycle 2 cattle 20- - can 1 boat 2 - - butterfly 1 drone 2 chameleon 20- - dog 1 cup 2 - - cable 1 bicycle 2 coin 20- - transformer 1 fish 2 - - crab 1 bird 2 crab 20- - bicycle 1 guitar 2 - - cat 1 bag 1 crocodile 20- - - - bird 2 - - flamingo 1 yoyo 1 cup 20- - - - microphone 2 - - frisbee 1 - - deer 20- - - - torso 2 - - glove 1 - - dog 20- - - - motorcycle 2 - - hand 1 - - drone 20- - - - airplane 2 - - helicopter 1 - - electricFan 20- - - - board 1 - - leaf 1 - - elephant 20- - - - bottle 1 - - rabbit 1 - - flag 20- - - - can 1 - - sheep 1 - - fox 20- - - - deer 1 - - - - - - frog 20- - - - ring 1 - - - - - - gameTarget 20- - - - torus 1 - - - - - - gecko 20- - - - - - - - - - - - giraffe 20- - - - - - - - - - - - goldfish 20- - - - - - - - - - - - gorilla 20- - - - - - - - - - - - guitar 20- - - - - - - - - - - - hand 20- - - - - - - - - - - - hat 20- - - - - - - - - - - - helmet 20- - - - - - - - - - - - hippo 20- - - - - - - - - - - - horse 20- - - - - - - - - - - - kangaroo 20- - - - - - - - - - - - kite 20- - - - - - - - - - - - leopard 20- - - - - - - - - - - - licensePlate 20- - - - - - - - - - - - lion 20- - - - - - - - - - - - lizard 20- - - - - - - - - - - - microphone 20- - - - - - - - - - - - monkey 20- - - - - - - - - - - - motorcycle 20- - - - - - - - - - - - mouse 20- - - - - - - - - - - - person 20- - - - - - - - - - - - pig 20- - - - - - - - - - - - pool 20- - - - - - - - - - - - rabbit 20- - - - - - - - - - - - racing 20- - - - - - - - - - - - robot 20- - - - - - - - - - - - rubicCube 20- - - - - - - - - - - - sepia 20- - - - - - - - - - - - shark 20- - - - - - - - - - - - shark 20- - - - - - - - - - - - sheep 20- - - - - - - - - - - - skateboard 20- - - - - - - - - - - - spider 20- - - - - - - - - - - - squirrel 20- - - - - - - - - - - - surfboard 20- - - - - - - - - - - - swing 20- - - - - - - - - - - - tank 20- - - - - - - - - - - - tiger 20- - - - - - - - - - - - train 20- - - - - - - - - - - - truck 20- - - - - - - - - - - - turtle 20- - - - - - - - - - - - umbrella 20- - - - - - - - - - - - volleyball 20- - - - - - - - - - - - yoyo 20

11

Page 12: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

1. Details of 70 Object Categories in LaSOT and Comparison with Existing Dense BenchmarksLaSOT consists of 70 object categories with each containing 20 videos, as shown in Tab. 5. Most of 70 classes are chosen

form the 1,000 classes in ImageNet [12], with a few exceptions such as drone and gametarget, which are carefully selectedby the experts for tracking. The selection of each category must be agreed upon by all the experts to ensure its usability forvisual tracking. In addition, we also compare the object categories of different dense benchmarks. As shown in Tab. 5, thenumber of object categories in LaSOT is two times more than that of existing benchmarks (e.g., TC-128 [35] with 27 classes).Moreover, LaSOT eliminates the category bias of dataset for tracking while others do not.

2. Traing/Testing Split in Protocol IIIn protocol II, we split LaSOT into training and testing sets. The training set contains of 1,120 videos (i.e., 16 sequences

for each category) with 2.83M frames in total. The rest 280 videos (i.e., 4 sequences for each category) with 690K framesare used for testing.

Table 6. Comparison between training and testing sets of LaSOT.Video Min frames Mean frames Median frames Max frames Total frames Total duration

LaSOTtraining 1,120 1,000 2,529 2,043 11,397 2.83M 26.2 hours

LaSOTtesting 280 1,000 2,448 2,102 9,999 690K 6.3 hours

LaSOT 1,400 1,000 2,506 2,053 11,397 3.52M 32.5 hours

Figure 9. Comparison of sequence distribution in each attribute between training and testing sets. Best viewed in color.

Tab. 6 reports the detailed comparison between the training and the testing sets of LaSOT. We observe that the min frames,mean frames, median frames and max frames are similar between these two subsets. In addition, as shown in Fig. 9, we cansee that the ratios of sequences in all 14 attributes are similar. Both Tab. 6 and Fig. 9 evidence the consistency of ourtraining/testing split.

12

Page 13: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

3. Detailed Attribute-based Performance under Protocol I

Fig. 10 shows the performance of trackers on each attribute using precision under protocol I.

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Aspect Ration Change (1201)

[0.337] VITAL

[0.336] MDNet

[0.312] SiamFC

[0.309] StructSiam

[0.300] DSiam

[0.268] SINT

[0.255] ECO

[0.249] STRCF

[0.233] ECO_HC

[0.224] CFNet

[0.215] HCFT

[0.200] BACF

[0.199] PTAV

[0.197] MEEM

[0.197] TRACA

[0.195] CSRDCF

[0.192] Staple

[0.191] Staple_CA

[0.184] SRDCF

[0.179] SAMF

[0.162] LCT

[0.159] DSST

[0.156] Struck

[0.155] TLD

[0.155] SCT4

[0.152] fDSST

[0.151] KCF

[0.134] ASLA

[0.129] CN

[0.123] L1APG

[0.111] STC

[0.109] CSK

[0.102] IVT

[0.089] CT

[0.069] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Background Clutter (434)

[0.340] MDNet

[0.335] VITAL

[0.305] StructSiam

[0.300] SiamFC

[0.293] DSiam

[0.282] ECO

[0.276] STRCF

[0.272] SINT

[0.258] ECO_HC

[0.246] HCFT

[0.244] CFNet

[0.236] BACF

[0.231] PTAV

[0.230] TRACA

[0.229] SRDCF

[0.225] Staple

[0.223] Staple_CA

[0.221] MEEM

[0.214] CSRDCF

[0.210] SAMF

[0.200] LCT

[0.191] DSST

[0.186] fDSST

[0.186] SCT4

[0.182] TLD

[0.181] KCF

[0.177] Struck

[0.160] ASLA

[0.158] CN

[0.157] L1APG

[0.137] CSK

[0.136] STC

[0.123] IVT

[0.098] CT

[0.079] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Camera Motion (433)

[0.398] MDNet

[0.398] VITAL

[0.367] SiamFC

[0.364] StructSiam

[0.343] DSiam

[0.310] STRCF

[0.310] ECO

[0.302] SINT

[0.290] ECO_HC

[0.277] CFNet

[0.266] HCFT

[0.261] BACF

[0.256] PTAV

[0.252] Staple

[0.251] TRACA

[0.247] Staple_CA

[0.238] CSRDCF

[0.236] SAMF

[0.233] SRDCF

[0.232] MEEM

[0.201] LCT

[0.198] DSST

[0.195] SCT4

[0.193] KCF

[0.190] fDSST

[0.189] TLD

[0.178] Struck

[0.158] L1APG

[0.153] ASLA

[0.149] CN

[0.134] CSK

[0.133] IVT

[0.133] STC

[0.108] CT

[0.082] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Deformation (725)

[0.295] SiamFC

[0.293] MDNet

[0.290] VITAL

[0.289] StructSiam

[0.280] DSiam

[0.226] SINT

[0.207] STRCF

[0.196] CFNet

[0.192] ECO

[0.191] HCFT

[0.181] ECO_HC

[0.173] BACF

[0.172] Staple_CA

[0.167] CSRDCF

[0.165] TRACA

[0.163] Staple

[0.158] MEEM

[0.154] PTAV

[0.150] SAMF

[0.136] Struck

[0.136] SRDCF

[0.132] DSST

[0.130] LCT

[0.124] SCT4

[0.123] KCF

[0.119] ASLA

[0.114] L1APG

[0.108] fDSST

[0.104] CN

[0.102] TLD

[0.096] CSK

[0.095] IVT

[0.093] STC

[0.082] CT

[0.059] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1P

recis

ion

Precision plots of OPE - Fast Motion (296)

[0.287] MDNet

[0.279] VITAL

[0.249] SINT

[0.240] StructSiam

[0.234] ECO

[0.233] DSiam

[0.232] SiamFC

[0.211] ECO_HC

[0.200] STRCF

[0.189] TLD

[0.184] MEEM

[0.176] HCFT

[0.173] CFNet

[0.171] PTAV

[0.157] CSRDCF

[0.156] BACF

[0.153] SRDCF

[0.145] TRACA

[0.142] Staple

[0.135] Staple_CA

[0.129] LCT

[0.128] SAMF

[0.128] fDSST

[0.122] Struck

[0.120] SCT4

[0.114] DSST

[0.110] KCF

[0.092] CN

[0.086] ASLA

[0.075] STC

[0.074] CSK

[0.073] L1APG

[0.062] IVT

[0.062] CT

[0.055] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Full Occlusion (542)

[0.309] VITAL

[0.305] MDNet

[0.271] StructSiam

[0.269] DSiam

[0.265] SiamFC

[0.256] SINT

[0.244] ECO

[0.223] STRCF

[0.220] ECO_HC

[0.202] MEEM

[0.194] HCFT

[0.193] CFNet

[0.187] TRACA

[0.182] PTAV

[0.180] SRDCF

[0.178] TLD

[0.168] BACF

[0.167] CSRDCF

[0.166] Staple

[0.163] Staple_CA

[0.159] SAMF

[0.157] LCT

[0.153] SCT4

[0.145] fDSST

[0.144] Struck

[0.137] KCF

[0.131] DSST

[0.125] ASLA

[0.114] CN

[0.113] L1APG

[0.098] STC

[0.091] CSK

[0.088] IVT

[0.080] CT

[0.063] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Illumination Variation (224)

[0.390] VITAL

[0.381] MDNet

[0.365] StructSiam

[0.355] DSiam

[0.351] SiamFC

[0.348] STRCF

[0.332] ECO

[0.322] CFNet

[0.308] SINT

[0.307] ECO_HC

[0.305] BACF

[0.289] SRDCF

[0.283] PTAV

[0.277] CSRDCF

[0.271] Staple

[0.269] TRACA

[0.269] HCFT

[0.266] Staple_CA

[0.249] SAMF

[0.245] DSST

[0.239] fDSST

[0.229] KCF

[0.229] MEEM

[0.222] LCT

[0.207] SCT4

[0.205] L1APG

[0.195] TLD

[0.194] ASLA

[0.188] Struck

[0.184] CN

[0.155] STC

[0.152] IVT

[0.149] CSK

[0.103] CT

[0.071] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Low Resolution (661)

[0.377] VITAL

[0.376] MDNet

[0.324] StructSiam

[0.324] SiamFC

[0.322] DSiam

[0.311] ECO

[0.301] SINT

[0.278] STRCF

[0.272] ECO_HC

[0.242] PTAV

[0.241] MEEM

[0.240] HCFT

[0.239] CFNet

[0.223] CSRDCF

[0.219] TLD

[0.218] BACF

[0.216] TRACA

[0.213] SRDCF

[0.210] Staple

[0.201] Staple_CA

[0.188] SAMF

[0.184] Struck

[0.182] LCT

[0.181] fDSST

[0.173] SCT4

[0.168] DSST

[0.161] KCF

[0.150] ASLA

[0.146] CN

[0.128] L1APG

[0.123] STC

[0.112] CSK

[0.098] CT

[0.094] IVT

[0.082] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Motion Blur (449)

[0.335] MDNet

[0.330] VITAL

[0.302] SiamFC

[0.297] StructSiam

[0.285] DSiam

[0.268] SINT

[0.249] STRCF

[0.249] ECO

[0.232] ECO_HC

[0.209] HCFT

[0.204] CFNet

[0.202] BACF

[0.201] PTAV

[0.198] Staple

[0.198] MEEM

[0.195] Staple_CA

[0.187] CSRDCF

[0.185] TRACA

[0.181] SRDCF

[0.179] SAMF

[0.162] LCT

[0.161] TLD

[0.159] DSST

[0.155] SCT4

[0.152] KCF

[0.145] fDSST

[0.143] Struck

[0.123] CN

[0.114] L1APG

[0.112] ASLA

[0.104] STC

[0.099] CSK

[0.099] IVT

[0.081] CT

[0.062] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Out-of-View (477)

[0.284] MDNet

[0.282] VITAL

[0.258] StructSiam

[0.246] SiamFC

[0.238] DSiam

[0.225] SINT

[0.209] ECO

[0.202] STRCF

[0.188] ECO_HC

[0.173] HCFT

[0.167] MEEM

[0.166] CFNet

[0.159] PTAV

[0.156] CSRDCF

[0.154] TRACA

[0.150] BACF

[0.149] SRDCF

[0.148] Staple

[0.146] TLD

[0.144] SAMF

[0.139] Staple_CA

[0.133] LCT

[0.123] fDSST

[0.121] SCT4

[0.120] Struck

[0.114] DSST

[0.113] KCF

[0.104] CN

[0.096] ASLA

[0.093] STC

[0.091] L1APG

[0.082] CSK

[0.072] CT

[0.071] IVT

[0.060] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Partial Occlusion (913)

[0.325] MDNet

[0.325] VITAL

[0.296] SiamFC

[0.294] StructSiam

[0.287] DSiam

[0.263] SINT

[0.259] ECO

[0.248] STRCF

[0.229] ECO_HC

[0.226] CFNet

[0.218] HCFT

[0.204] TRACA

[0.198] PTAV

[0.197] BACF

[0.197] SRDCF

[0.196] Staple

[0.196] MEEM

[0.196] Staple_CA

[0.188] CSRDCF

[0.184] SAMF

[0.170] LCT

[0.165] fDSST

[0.163] SCT4

[0.163] DSST

[0.159] TLD

[0.154] KCF

[0.151] Struck

[0.144] ASLA

[0.134] CN

[0.131] L1APG

[0.120] STC

[0.117] CSK

[0.111] IVT

[0.086] CT

[0.067] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Rotation (812)

[0.340] VITAL

[0.337] MDNet

[0.311] SiamFC

[0.308] StructSiam

[0.302] DSiam

[0.273] SINT

[0.242] ECO

[0.242] STRCF

[0.222] ECO_HC

[0.215] CFNet

[0.214] HCFT

[0.197] MEEM

[0.196] PTAV

[0.194] BACF

[0.191] TRACA

[0.191] Staple_CA

[0.190] CSRDCF

[0.184] Staple

[0.175] SAMF

[0.174] SRDCF

[0.164] Struck

[0.154] LCT

[0.153] SCT4

[0.149] fDSST

[0.148] DSST

[0.144] KCF

[0.143] TLD

[0.137] ASLA

[0.128] CN

[0.121] L1APG

[0.114] STC

[0.112] CSK

[0.104] IVT

[0.088] CT

[0.068] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Scale Variation (1321)

[0.369] MDNet

[0.366] VITAL

[0.338] SiamFC

[0.336] StructSiam

[0.326] DSiam

[0.289] ECO

[0.289] SINT

[0.281] STRCF

[0.263] ECO_HC

[0.255] CFNet

[0.237] HCFT

[0.234] PTAV

[0.231] BACF

[0.228] TRACA

[0.223] CSRDCF

[0.222] Staple

[0.221] Staple_CA

[0.217] SRDCF

[0.214] MEEM

[0.206] SAMF

[0.187] DSST

[0.184] LCT

[0.184] fDSST

[0.175] TLD

[0.174] SCT4

[0.174] KCF

[0.173] Struck

[0.161] ASLA

[0.150] CN

[0.146] L1APG

[0.132] STC

[0.128] CSK

[0.121] IVT

[0.097] CT

[0.076] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Viewpoint Change (153)

[0.411] MDNet

[0.407] VITAL

[0.368] SiamFC

[0.362] ECO

[0.356] StructSiam

[0.347] DSiam

[0.322] STRCF

[0.305] ECO_HC

[0.301] CFNet

[0.299] SINT

[0.277] TRACA

[0.275] SRDCF

[0.267] PTAV

[0.266] CSRDCF

[0.261] BACF

[0.258] Staple

[0.248] HCFT

[0.248] MEEM

[0.244] fDSST

[0.243] Staple_CA

[0.241] DSST

[0.232] SAMF

[0.216] TLD

[0.208] Struck

[0.202] KCF

[0.201] ASLA

[0.201] LCT

[0.201] CN

[0.185] L1APG

[0.179] SCT4

[0.164] CSK

[0.155] STC

[0.145] IVT

[0.115] CT

[0.073] MIL

Figure 10. Performance of trackers on each attribute using precision under protocol I. Best viewed in color.

13

Page 14: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

Fig. 11 shows the performance of trackers on each attribute using normalized precision under protocol I.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Aspect Ration Change (1201)

[0.447] VITAL

[0.445] MDNet

[0.420] SiamFC

[0.413] StructSiam

[0.402] DSiam

[0.354] SINT

[0.309] ECO

[0.307] STRCF

[0.288] CFNet

[0.284] ECO_HC

[0.275] HCFT

[0.271] MEEM

[0.261] BACF

[0.255] Staple

[0.255] TRACA

[0.255] Staple_CA

[0.240] CSRDCF

[0.235] PTAV

[0.232] SRDCF

[0.231] SAMF

[0.209] LCT

[0.199] DSST

[0.197] SCT4

[0.196] TLD

[0.195] Struck

[0.192] KCF

[0.190] fDSST

[0.174] ASLA

[0.161] CN

[0.159] L1APG

[0.144] CSK

[0.144] STC

[0.142] IVT

[0.129] CT

[0.106] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Background Clutter (434)

[0.416] MDNet

[0.410] VITAL

[0.384] StructSiam

[0.376] SiamFC

[0.365] DSiam

[0.332] SINT

[0.330] ECO

[0.327] STRCF

[0.303] CFNet

[0.295] ECO_HC

[0.287] BACF

[0.287] HCFT

[0.278] TRACA

[0.273] Staple_CA

[0.273] Staple

[0.268] MEEM

[0.264] SRDCF

[0.261] PTAV

[0.258] SAMF

[0.252] CSRDCF

[0.239] LCT

[0.225] DSST

[0.221] fDSST

[0.216] SCT4

[0.215] KCF

[0.209] Struck

[0.206] TLD

[0.191] ASLA

[0.187] CN

[0.177] L1APG

[0.168] CSK

[0.163] STC

[0.153] IVT

[0.120] CT

[0.094] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Camera Motion (433)

[0.528] MDNet

[0.520] VITAL

[0.486] SiamFC

[0.478] StructSiam

[0.467] DSiam

[0.397] SINT

[0.389] ECO

[0.389] STRCF

[0.364] CFNet

[0.360] ECO_HC

[0.355] HCFT

[0.341] BACF

[0.333] TRACA

[0.331] Staple_CA

[0.329] Staple

[0.319] MEEM

[0.310] PTAV

[0.306] SAMF

[0.305] CSRDCF

[0.301] SRDCF

[0.260] LCT

[0.259] DSST

[0.255] KCF

[0.252] SCT4

[0.248] fDSST

[0.236] TLD

[0.225] Struck

[0.205] L1APG

[0.200] CN

[0.189] ASLA

[0.181] CSK

[0.178] IVT

[0.176] STC

[0.148] CT

[0.114] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Deformation (725)

[0.461] SiamFC

[0.458] VITAL

[0.458] MDNet

[0.450] StructSiam

[0.437] DSiam

[0.369] SINT

[0.307] CFNet

[0.307] STRCF

[0.299] HCFT

[0.290] ECO

[0.270] Staple_CA

[0.268] MEEM

[0.267] ECO_HC

[0.266] BACF

[0.258] TRACA

[0.256] Staple

[0.250] CSRDCF

[0.237] SAMF

[0.218] PTAV

[0.211] SRDCF

[0.207] Struck

[0.202] LCT

[0.197] DSST

[0.195] SCT4

[0.191] KCF

[0.181] ASLA

[0.176] L1APG

[0.173] TLD

[0.170] fDSST

[0.159] CN

[0.154] IVT

[0.148] CSK

[0.147] STC

[0.144] CT

[0.117] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Fast Motion (296)

[0.295] VITAL

[0.293] MDNet

[0.255] StructSiam

[0.252] DSiam

[0.248] SiamFC

[0.229] ECO

[0.229] SINT

[0.205] STRCF

[0.195] ECO_HC

[0.184] MEEM

[0.167] CFNet

[0.159] BACF

[0.158] HCFT

[0.157] PTAV

[0.153] TLD

[0.150] SRDCF

[0.148] TRACA

[0.143] CSRDCF

[0.142] Staple

[0.141] Staple_CA

[0.130] SAMF

[0.125] LCT

[0.123] fDSST

[0.114] SCT4

[0.113] DSST

[0.103] KCF

[0.102] Struck

[0.091] ASLA

[0.090] CN

[0.080] L1APG

[0.078] CSK

[0.071] STC

[0.067] IVT

[0.055] CT

[0.051] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Full Occlusion (542)

[0.349] VITAL

[0.346] MDNet

[0.313] StructSiam

[0.306] SiamFC

[0.306] DSiam

[0.281] SINT

[0.254] ECO

[0.238] STRCF

[0.230] ECO_HC

[0.225] MEEM

[0.207] TRACA

[0.206] CFNet

[0.196] HCFT

[0.193] SRDCF

[0.187] BACF

[0.185] PTAV

[0.180] Staple

[0.179] Staple_CA

[0.178] TLD

[0.175] CSRDCF

[0.172] SAMF

[0.167] LCT

[0.161] SCT4

[0.152] fDSST

[0.148] Struck

[0.147] KCF

[0.142] DSST

[0.140] ASLA

[0.124] CN

[0.121] L1APG

[0.105] CSK

[0.103] IVT

[0.103] STC

[0.091] CT

[0.071] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Illumination Variation (224)

[0.534] VITAL

[0.522] MDNet

[0.489] StructSiam

[0.482] DSiam

[0.470] SiamFC

[0.421] STRCF

[0.407] ECO

[0.401] CFNet

[0.399] SINT

[0.383] BACF

[0.377] ECO_HC

[0.343] SRDCF

[0.342] Staple_CA

[0.341] Staple

[0.340] HCFT

[0.339] TRACA

[0.330] CSRDCF

[0.323] PTAV

[0.313] MEEM

[0.311] SAMF

[0.292] DSST

[0.281] fDSST

[0.274] KCF

[0.270] LCT

[0.261] SCT4

[0.249] Struck

[0.245] L1APG

[0.243] ASLA

[0.239] TLD

[0.222] CN

[0.202] IVT

[0.188] CSK

[0.181] STC

[0.141] CT

[0.110] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Low Resolution (661)

[0.367] VITAL

[0.364] MDNet

[0.320] DSiam

[0.320] SiamFC

[0.319] StructSiam

[0.280] ECO

[0.274] SINT

[0.257] STRCF

[0.234] ECO_HC

[0.213] CFNet

[0.207] MEEM

[0.203] BACF

[0.202] PTAV

[0.196] TRACA

[0.195] HCFT

[0.195] SRDCF

[0.192] Staple

[0.187] Staple_CA

[0.181] CSRDCF

[0.169] TLD

[0.163] SAMF

[0.156] LCT

[0.155] fDSST

[0.146] Struck

[0.145] DSST

[0.143] SCT4

[0.135] ASLA

[0.134] KCF

[0.123] CN

[0.117] L1APG

[0.101] STC

[0.101] CSK

[0.087] IVT

[0.078] CT

[0.063] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Motion Blur (449)

[0.448] VITAL

[0.443] MDNet

[0.412] SiamFC

[0.401] StructSiam

[0.387] DSiam

[0.352] SINT

[0.315] STRCF

[0.313] ECO

[0.289] ECO_HC

[0.280] HCFT

[0.278] CFNet

[0.273] MEEM

[0.271] BACF

[0.267] Staple_CA

[0.265] Staple

[0.256] TRACA

[0.243] CSRDCF

[0.240] PTAV

[0.236] SAMF

[0.234] SRDCF

[0.211] LCT

[0.205] DSST

[0.200] SCT4

[0.194] KCF

[0.191] TLD

[0.189] fDSST

[0.181] Struck

[0.161] CN

[0.156] L1APG

[0.141] ASLA

[0.139] CSK

[0.138] STC

[0.133] IVT

[0.112] CT

[0.092] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Out-of-View (477)

[0.358] VITAL

[0.357] MDNet

[0.330] StructSiam

[0.321] SiamFC

[0.311] DSiam

[0.279] SINT

[0.240] ECO

[0.236] STRCF

[0.211] MEEM

[0.209] ECO_HC

[0.203] CFNet

[0.195] HCFT

[0.185] TRACA

[0.183] BACF

[0.177] Staple

[0.174] SRDCF

[0.171] PTAV

[0.170] Staple_CA

[0.168] CSRDCF

[0.162] SAMF

[0.160] TLD

[0.154] LCT

[0.142] fDSST

[0.135] DSST

[0.135] SCT4

[0.135] Struck

[0.131] KCF

[0.114] CN

[0.111] ASLA

[0.110] L1APG

[0.101] STC

[0.098] CSK

[0.090] IVT

[0.082] CT

[0.072] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Partial Occlusion (913)

[0.435] VITAL

[0.431] MDNet

[0.402] SiamFC

[0.399] StructSiam

[0.385] DSiam

[0.350] SINT

[0.315] ECO

[0.306] STRCF

[0.287] CFNet

[0.285] ECO_HC

[0.281] HCFT

[0.272] MEEM

[0.262] TRACA

[0.259] Staple_CA

[0.259] BACF

[0.255] Staple

[0.244] PTAV

[0.243] SRDCF

[0.241] CSRDCF

[0.237] SAMF

[0.219] LCT

[0.209] SCT4

[0.207] fDSST

[0.205] DSST

[0.203] TLD

[0.199] KCF

[0.195] Struck

[0.183] ASLA

[0.169] CN

[0.165] L1APG

[0.153] CSK

[0.150] STC

[0.148] IVT

[0.126] CT

[0.107] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Rotation (812)

[0.441] VITAL

[0.436] MDNet

[0.409] SiamFC

[0.401] StructSiam

[0.393] DSiam

[0.356] SINT

[0.291] ECO

[0.289] STRCF

[0.271] CFNet

[0.265] HCFT

[0.264] ECO_HC

[0.262] MEEM

[0.246] Staple_CA

[0.243] BACF

[0.239] Staple

[0.238] TRACA

[0.228] CSRDCF

[0.219] PTAV

[0.217] SAMF

[0.209] SRDCF

[0.195] Struck

[0.193] LCT

[0.185] SCT4

[0.181] TLD

[0.179] DSST

[0.178] fDSST

[0.176] KCF

[0.166] ASLA

[0.151] L1APG

[0.151] CN

[0.140] STC

[0.139] CSK

[0.137] IVT

[0.123] CT

[0.102] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Scale Variation (1321)

[0.474] VITAL

[0.474] MDNet

[0.441] SiamFC

[0.436] StructSiam

[0.425] DSiam

[0.369] SINT

[0.344] ECO

[0.339] STRCF

[0.316] CFNet

[0.313] ECO_HC

[0.294] HCFT

[0.294] BACF

[0.283] TRACA

[0.283] Staple_CA

[0.283] MEEM

[0.283] Staple

[0.269] PTAV

[0.267] CSRDCF

[0.264] SRDCF

[0.255] SAMF

[0.227] LCT

[0.225] DSST

[0.222] fDSST

[0.214] SCT4

[0.213] TLD

[0.212] KCF

[0.212] Struck

[0.200] ASLA

[0.183] L1APG

[0.181] CN

[0.162] STC

[0.161] CSK

[0.160] IVT

[0.134] CT

[0.110] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Viewpoint Change (153)

[0.509] MDNet

[0.505] VITAL

[0.462] SiamFC

[0.439] StructSiam

[0.438] DSiam

[0.418] ECO

[0.366] SINT

[0.365] STRCF

[0.363] CFNet

[0.349] ECO_HC

[0.341] BACF

[0.338] SRDCF

[0.332] Staple

[0.329] TRACA

[0.312] Staple_CA

[0.300] PTAV

[0.291] SAMF

[0.288] MEEM

[0.287] CSRDCF

[0.285] fDSST

[0.281] HCFT

[0.273] DSST

[0.262] ASLA

[0.244] LCT

[0.244] TLD

[0.236] KCF

[0.235] Struck

[0.225] L1APG

[0.223] SCT4

[0.211] CN

[0.198] IVT

[0.181] CSK

[0.179] STC

[0.150] CT

[0.104] MIL

Figure 11. Performance of trackers on each attribute using precision under protocol I. Best viewed in color.

14

Page 15: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

Fig. 12 shows the performance of trackers on each attribute using success under protocol I.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Aspect Ration Change (1201)

[0.387] MDNet

[0.384] VITAL

[0.336] SiamFC

[0.333] StructSiam

[0.329] DSiam

[0.319] SINT

[0.302] ECO

[0.279] ECO_HC

[0.277] STRCF

[0.266] CFNet

[0.261] MEEM

[0.254] TRACA

[0.250] HCFT

[0.244] BACF

[0.235] Staple

[0.234] CSRDCF

[0.233] SRDCF

[0.232] PTAV

[0.231] SAMF

[0.228] Staple_CA

[0.218] LCT

[0.213] Struck

[0.205] TLD

[0.201] DSST

[0.195] fDSST

[0.191] SCT4

[0.187] KCF

[0.181] ASLA

[0.164] CT

[0.162] CN

[0.152] MIL

[0.151] CSK

[0.140] L1APG

[0.132] STC

[0.111] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Background Clutter (434)

[0.368] MDNet

[0.359] VITAL

[0.319] StructSiam

[0.314] DSiam

[0.311] SiamFC

[0.311] ECO

[0.300] SINT

[0.292] STRCF

[0.280] ECO_HC

[0.268] CFNet

[0.265] TRACA

[0.258] BACF

[0.258] MEEM

[0.255] SRDCF

[0.252] HCFT

[0.245] Staple

[0.244] PTAV

[0.243] Staple_CA

[0.242] SAMF

[0.240] CSRDCF

[0.235] LCT

[0.216] DSST

[0.215] Struck

[0.213] fDSST

[0.210] TLD

[0.205] SCT4

[0.199] KCF

[0.195] ASLA

[0.178] CN

[0.167] CSK

[0.159] CT

[0.152] L1APG

[0.144] MIL

[0.143] STC

[0.123] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Camera Motion (433)

[0.439] MDNet

[0.436] VITAL

[0.376] SiamFC

[0.374] StructSiam

[0.371] DSiam

[0.361] ECO

[0.344] SINT

[0.341] STRCF

[0.335] ECO_HC

[0.322] CFNet

[0.313] TRACA

[0.301] BACF

[0.288] Staple

[0.287] SRDCF

[0.285] PTAV

[0.284] Staple_CA

[0.284] HCFT

[0.284] CSRDCF

[0.283] MEEM

[0.280] SAMF

[0.261] LCT

[0.247] DSST

[0.240] fDSST

[0.235] TLD

[0.229] Struck

[0.220] SCT4

[0.218] KCF

[0.193] ASLA

[0.187] CN

[0.181] CT

[0.174] L1APG

[0.173] CSK

[0.164] MIL

[0.150] STC

[0.137] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Deformation (725)

[0.407] MDNet

[0.401] VITAL

[0.371] SiamFC

[0.365] StructSiam

[0.361] DSiam

[0.339] SINT

[0.296] ECO

[0.289] CFNet

[0.284] STRCF

[0.278] ECO_HC

[0.276] MEEM

[0.271] HCFT

[0.266] TRACA

[0.260] BACF

[0.248] CSRDCF

[0.247] SAMF

[0.245] Staple

[0.245] Staple_CA

[0.233] Struck

[0.227] PTAV

[0.226] SRDCF

[0.225] LCT

[0.208] DSST

[0.201] SCT4

[0.199] KCF

[0.199] TLD

[0.194] ASLA

[0.188] fDSST

[0.187] CT

[0.174] CN

[0.174] MIL

[0.167] CSK

[0.154] L1APG

[0.140] STC

[0.122] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Fast Motion (296)

[0.250] MDNet

[0.245] VITAL

[0.217] ECO

[0.214] SINT

[0.208] DSiam

[0.205] StructSiam

[0.200] SiamFC

[0.183] ECO_HC

[0.180] STRCF

[0.168] MEEM

[0.157] CFNet

[0.150] TLD

[0.150] PTAV

[0.149] SRDCF

[0.146] TRACA

[0.145] BACF

[0.140] CSRDCF

[0.138] HCFT

[0.130] Staple

[0.129] LCT

[0.127] Staple_CA

[0.126] SAMF

[0.121] fDSST

[0.117] Struck

[0.109] DSST

[0.109] SCT4

[0.097] KCF

[0.092] ASLA

[0.085] CN

[0.078] CT

[0.074] CSK

[0.073] MIL

[0.069] STC

[0.069] L1APG

[0.056] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Full Occlusion (542)

[0.304] MDNet

[0.304] VITAL

[0.260] StructSiam

[0.260] DSiam

[0.255] SINT

[0.253] SiamFC

[0.247] ECO

[0.218] ECO_HC

[0.215] STRCF

[0.213] MEEM

[0.197] TRACA

[0.195] CFNet

[0.189] SRDCF

[0.182] PTAV

[0.182] HCFT

[0.180] TLD

[0.176] BACF

[0.173] CSRDCF

[0.172] SAMF

[0.166] LCT

[0.164] Staple

[0.164] Staple_CA

[0.162] Struck

[0.151] fDSST

[0.151] SCT4

[0.142] ASLA

[0.139] KCF

[0.138] DSST

[0.119] CN

[0.117] CT

[0.108] MIL

[0.108] CSK

[0.105] L1APG

[0.097] STC

[0.079] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Illumination Variation (224)

[0.438] VITAL

[0.435] MDNet

[0.377] ECO

[0.376] DSiam

[0.371] StructSiam

[0.358] SiamFC

[0.358] STRCF

[0.347] SINT

[0.340] ECO_HC

[0.337] CFNet

[0.323] BACF

[0.317] SRDCF

[0.311] TRACA

[0.296] CSRDCF

[0.295] Staple

[0.292] PTAV

[0.289] Staple_CA

[0.284] HCFT

[0.284] MEEM

[0.279] SAMF

[0.275] DSST

[0.267] fDSST

[0.266] LCT

[0.239] Struck

[0.233] TLD

[0.230] KCF

[0.228] SCT4

[0.227] ASLA

[0.202] CN

[0.197] L1APG

[0.182] CSK

[0.177] CT

[0.154] MIL

[0.153] IVT

[0.151] STC

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Low Resolution (661)

[0.308] VITAL

[0.308] MDNet

[0.263] ECO

[0.259] DSiam

[0.257] StructSiam

[0.254] SiamFC

[0.245] SINT

[0.224] STRCF

[0.220] ECO_HC

[0.197] MEEM

[0.192] CFNet

[0.189] PTAV

[0.187] TRACA

[0.186] SRDCF

[0.184] BACF

[0.180] HCFT

[0.176] CSRDCF

[0.174] TLD

[0.168] Staple

[0.162] Staple_CA

[0.159] LCT

[0.158] SAMF

[0.155] Struck

[0.151] fDSST

[0.137] DSST

[0.133] SCT4

[0.129] ASLA

[0.125] KCF

[0.114] CN

[0.101] CT

[0.101] L1APG

[0.096] CSK

[0.093] STC

[0.093] MIL

[0.070] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Motion Blur (449)

[0.379] VITAL

[0.378] MDNet

[0.324] SiamFC

[0.320] StructSiam

[0.314] DSiam

[0.309] SINT

[0.303] ECO

[0.282] STRCF

[0.278] ECO_HC

[0.259] CFNet

[0.255] MEEM

[0.245] TRACA

[0.242] BACF

[0.241] HCFT

[0.240] Staple

[0.233] Staple_CA

[0.231] CSRDCF

[0.231] PTAV

[0.230] SRDCF

[0.228] SAMF

[0.216] LCT

[0.202] TLD

[0.201] DSST

[0.197] Struck

[0.192] fDSST

[0.188] SCT4

[0.179] KCF

[0.156] CN

[0.153] ASLA

[0.146] CT

[0.140] CSK

[0.137] MIL

[0.134] L1APG

[0.121] STC

[0.102] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Out-of-View (477)

[0.314] MDNet

[0.310] VITAL

[0.271] StructSiam

[0.262] SiamFC

[0.259] DSiam

[0.255] SINT

[0.242] ECO

[0.216] STRCF

[0.210] ECO_HC

[0.204] MEEM

[0.195] CFNet

[0.182] TRACA

[0.182] HCFT

[0.179] SRDCF

[0.177] BACF

[0.173] CSRDCF

[0.172] PTAV

[0.170] TLD

[0.167] SAMF

[0.167] Staple

[0.165] LCT

[0.158] Staple_CA

[0.154] Struck

[0.148] fDSST

[0.137] DSST

[0.137] SCT4

[0.135] KCF

[0.124] ASLA

[0.118] CT

[0.115] CN

[0.114] MIL

[0.103] CSK

[0.099] L1APG

[0.095] STC

[0.071] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Partial Occlusion (913)

[0.380] MDNet

[0.379] VITAL

[0.330] StructSiam

[0.329] SiamFC

[0.325] DSiam

[0.320] SINT

[0.309] ECO

[0.283] STRCF

[0.283] ECO_HC

[0.269] CFNet

[0.265] MEEM

[0.259] TRACA

[0.254] HCFT

[0.245] SRDCF

[0.242] BACF

[0.241] PTAV

[0.239] Staple

[0.238] SAMF

[0.236] Staple_CA

[0.233] CSRDCF

[0.226] LCT

[0.213] Struck

[0.210] TLD

[0.210] fDSST

[0.207] DSST

[0.204] SCT4

[0.195] KCF

[0.192] ASLA

[0.170] CN

[0.166] CT

[0.159] CSK

[0.156] MIL

[0.144] L1APG

[0.135] STC

[0.118] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Rotation (812)

[0.385] MDNet

[0.384] VITAL

[0.332] SiamFC

[0.329] StructSiam

[0.326] DSiam

[0.323] SINT

[0.291] ECO

[0.267] STRCF

[0.266] ECO_HC

[0.260] MEEM

[0.257] CFNet

[0.251] HCFT

[0.242] TRACA

[0.236] BACF

[0.227] CSRDCF

[0.227] Staple

[0.226] Staple_CA

[0.225] SAMF

[0.223] PTAV

[0.218] SRDCF

[0.215] Struck

[0.208] LCT

[0.196] TLD

[0.188] SCT4

[0.187] DSST

[0.186] fDSST

[0.183] KCF

[0.177] ASLA

[0.162] CT

[0.159] CN

[0.152] MIL

[0.150] CSK

[0.133] L1APG

[0.132] STC

[0.106] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Scale Variation (1321)

[0.407] MDNet

[0.404] VITAL

[0.351] SiamFC

[0.349] StructSiam

[0.346] DSiam

[0.329] SINT

[0.328] ECO

[0.304] STRCF

[0.301] ECO_HC

[0.286] CFNet

[0.274] TRACA

[0.268] BACF

[0.266] MEEM

[0.259] SRDCF

[0.257] HCFT

[0.257] PTAV

[0.255] Staple

[0.254] CSRDCF

[0.249] Staple_CA

[0.246] SAMF

[0.234] LCT

[0.221] DSST

[0.220] Struck

[0.219] fDSST

[0.217] TLD

[0.200] ASLA

[0.200] SCT4

[0.197] KCF

[0.174] CN

[0.168] CT

[0.161] CSK

[0.159] L1APG

[0.155] MIL

[0.142] STC

[0.125] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Viewpoint Change (153)

[0.424] MDNet

[0.414] VITAL

[0.372] ECO

[0.359] SiamFC

[0.346] DSiam

[0.341] StructSiam

[0.336] ECO_HC

[0.317] SINT

[0.313] STRCF

[0.308] SRDCF

[0.306] TRACA

[0.303] CFNet

[0.299] BACF

[0.281] Staple

[0.276] PTAV

[0.272] CSRDCF

[0.259] fDSST

[0.258] Staple_CA

[0.250] MEEM

[0.247] SAMF

[0.240] DSST

[0.239] HCFT

[0.237] LCT

[0.233] TLD

[0.228] ASLA

[0.225] Struck

[0.195] L1APG

[0.194] KCF

[0.178] SCT4

[0.170] CN

[0.166] CT

[0.156] CSK

[0.144] IVT

[0.140] STC

[0.137] MIL

Figure 12. Performance of trackers on each attribute using success under protocol I. Best viewed in color.

15

Page 16: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

4. Detailed Attribute-based Performance under Protocol II

Fig. 13 shows the performance of trackers on each attribute using precision under protocol II.

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Aspect Ration Change (249)

[0.336] MDNet

[0.324] VITAL

[0.305] SiamFC

[0.302] StructSiam

[0.292] DSiam

[0.267] SINT

[0.265] ECO

[0.257] STRCF

[0.241] ECO_HC

[0.224] CFNet

[0.215] HCFT

[0.213] PTAV

[0.210] MEEM

[0.208] BACF

[0.203] Staple

[0.197] Staple_CA

[0.196] TRACA

[0.189] CSRDCF

[0.181] SRDCF

[0.175] SAMF

[0.173] LCT

[0.161] TLD

[0.159] DSST

[0.158] Struck

[0.157] SCT4

[0.148] fDSST

[0.144] KCF

[0.137] CN

[0.136] ASLA

[0.122] L1APG

[0.109] STC

[0.104] CSK

[0.094] CT

[0.094] IVT

[0.072] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Background Clutter (100)

[0.363] MDNet

[0.346] VITAL

[0.331] StructSiam

[0.325] SiamFC

[0.317] STRCF

[0.308] DSiam

[0.302] ECO

[0.288] SINT

[0.283] ECO_HC

[0.271] CFNet

[0.260] Staple

[0.259] PTAV

[0.256] BACF

[0.254] Staple_CA

[0.253] HCFT

[0.251] TRACA

[0.243] SRDCF

[0.225] SAMF

[0.220] MEEM

[0.208] CSRDCF

[0.205] DSST

[0.203] LCT

[0.200] SCT4

[0.198] fDSST

[0.195] L1APG

[0.192] Struck

[0.185] KCF

[0.184] TLD

[0.182] CN

[0.179] ASLA

[0.160] STC

[0.153] CSK

[0.145] IVT

[0.107] CT

[0.089] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Camera Motion (86)

[0.381] MDNet

[0.352] VITAL

[0.340] SiamFC

[0.334] STRCF

[0.326] ECO

[0.324] StructSiam

[0.321] ECO_HC

[0.316] DSiam

[0.297] SINT

[0.273] PTAV

[0.262] BACF

[0.257] HCFT

[0.253] CFNet

[0.249] Staple_CA

[0.245] Staple

[0.240] TRACA

[0.232] MEEM

[0.229] SRDCF

[0.222] CSRDCF

[0.209] SAMF

[0.200] LCT

[0.189] SCT4

[0.185] TLD

[0.163] fDSST

[0.151] DSST

[0.143] KCF

[0.142] Struck

[0.142] L1APG

[0.135] ASLA

[0.122] CN

[0.118] STC

[0.117] IVT

[0.101] CSK

[0.082] MIL

[0.080] CT

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Deformation (142)

[0.310] StructSiam

[0.308] MDNet

[0.303] VITAL

[0.298] SiamFC

[0.279] DSiam

[0.244] SINT

[0.225] STRCF

[0.206] CFNet

[0.202] ECO

[0.197] HCFT

[0.195] ECO_HC

[0.192] Staple_CA

[0.189] CSRDCF

[0.189] MEEM

[0.187] Staple

[0.180] PTAV

[0.178] BACF

[0.169] TRACA

[0.158] SAMF

[0.158] Struck

[0.153] LCT

[0.137] DSST

[0.137] ASLA

[0.136] SRDCF

[0.128] SCT4

[0.124] KCF

[0.122] L1APG

[0.117] CN

[0.110] fDSST

[0.106] CSK

[0.101] TLD

[0.097] STC

[0.097] IVT

[0.093] CT

[0.062] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1P

recis

ion

Precision plots of OPE - Fast Motion (53)

[0.312] MDNet

[0.302] VITAL

[0.265] ECO

[0.262] StructSiam

[0.243] ECO_HC

[0.238] SINT

[0.231] SiamFC

[0.225] DSiam

[0.219] PTAV

[0.216] STRCF

[0.199] HCFT

[0.194] BACF

[0.189] MEEM

[0.185] TRACA

[0.185] TLD

[0.183] CFNet

[0.177] LCT

[0.164] CSRDCF

[0.164] SRDCF

[0.151] Staple

[0.145] Staple_CA

[0.138] fDSST

[0.132] DSST

[0.130] Struck

[0.127] SAMF

[0.123] SCT4

[0.113] ASLA

[0.110] KCF

[0.107] CN

[0.080] STC

[0.076] L1APG

[0.075] CT

[0.071] CSK

[0.056] IVT

[0.036] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Full Occlusion (118)

[0.327] MDNet

[0.317] VITAL

[0.278] DSiam

[0.273] SiamFC

[0.267] StructSiam

[0.259] ECO

[0.257] SINT

[0.245] ECO_HC

[0.243] STRCF

[0.216] MEEM

[0.214] CFNet

[0.208] HCFT

[0.195] PTAV

[0.192] TRACA

[0.189] TLD

[0.187] BACF

[0.183] SRDCF

[0.180] Staple

[0.174] Staple_CA

[0.173] SAMF

[0.171] CSRDCF

[0.165] SCT4

[0.163] LCT

[0.154] DSST

[0.150] Struck

[0.150] KCF

[0.149] fDSST

[0.137] CN

[0.128] ASLA

[0.123] L1APG

[0.104] STC

[0.101] CSK

[0.088] CT

[0.083] IVT

[0.061] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Illumination Variation (47)

[0.398] MDNet

[0.390] SiamFC

[0.380] StructSiam

[0.377] VITAL

[0.364] DSiam

[0.359] ECO

[0.356] STRCF

[0.340] ECO_HC

[0.338] CFNet

[0.326] SRDCF

[0.325] Staple

[0.324] PTAV

[0.319] Staple_CA

[0.316] BACF

[0.314] SINT

[0.307] TRACA

[0.293] CSRDCF

[0.286] HCFT

[0.282] fDSST

[0.280] SAMF

[0.258] DSST

[0.232] MEEM

[0.217] L1APG

[0.211] KCF

[0.209] LCT

[0.205] SCT4

[0.199] ASLA

[0.198] TLD

[0.197] CN

[0.196] Struck

[0.155] STC

[0.151] CSK

[0.147] IVT

[0.111] CT

[0.076] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Low Resolution (141)

[0.386] MDNet

[0.374] VITAL

[0.340] SiamFC

[0.336] StructSiam

[0.333] DSiam

[0.330] ECO

[0.304] SINT

[0.303] STRCF

[0.288] ECO_HC

[0.262] PTAV

[0.259] CFNet

[0.245] HCFT

[0.244] MEEM

[0.234] BACF

[0.229] Staple

[0.226] SRDCF

[0.222] TRACA

[0.221] CSRDCF

[0.220] TLD

[0.212] Staple_CA

[0.202] SAMF

[0.196] LCT

[0.194] Struck

[0.190] DSST

[0.184] SCT4

[0.183] fDSST

[0.167] ASLA

[0.164] CN

[0.164] KCF

[0.139] L1APG

[0.130] STC

[0.123] CSK

[0.115] CT

[0.091] IVT

[0.077] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Motion Blur (89)

[0.380] MDNet

[0.348] VITAL

[0.331] SiamFC

[0.323] STRCF

[0.314] StructSiam

[0.302] DSiam

[0.291] SINT

[0.273] ECO

[0.268] ECO_HC

[0.249] PTAV

[0.236] BACF

[0.232] Staple

[0.228] Staple_CA

[0.225] MEEM

[0.222] HCFT

[0.218] CFNet

[0.210] CSRDCF

[0.204] TRACA

[0.197] SRDCF

[0.191] SAMF

[0.187] LCT

[0.178] TLD

[0.174] fDSST

[0.166] DSST

[0.155] SCT4

[0.139] L1APG

[0.134] Struck

[0.134] KCF

[0.132] CN

[0.118] STC

[0.110] ASLA

[0.101] CSK

[0.096] IVT

[0.067] MIL

[0.065] CT

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Out-of-View (104)

[0.299] MDNet

[0.267] VITAL

[0.266] StructSiam

[0.256] SiamFC

[0.235] DSiam

[0.228] SINT

[0.222] STRCF

[0.218] ECO

[0.204] ECO_HC

[0.178] MEEM

[0.176] Staple

[0.175] HCFT

[0.167] BACF

[0.166] PTAV

[0.162] CFNet

[0.161] Staple_CA

[0.158] CSRDCF

[0.158] LCT

[0.157] TRACA

[0.152] TLD

[0.149] SRDCF

[0.139] SAMF

[0.133] SCT4

[0.130] Struck

[0.118] fDSST

[0.115] KCF

[0.111] DSST

[0.110] CN

[0.089] ASLA

[0.086] STC

[0.082] CSK

[0.080] L1APG

[0.072] CT

[0.067] IVT

[0.059] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Partial Occlusion (187)

[0.339] MDNet

[0.327] VITAL

[0.295] StructSiam

[0.294] SiamFC

[0.281] DSiam

[0.263] SINT

[0.261] ECO

[0.252] STRCF

[0.239] ECO_HC

[0.225] CFNet

[0.210] HCFT

[0.207] PTAV

[0.201] TRACA

[0.199] MEEM

[0.196] Staple

[0.194] Staple_CA

[0.194] BACF

[0.184] SRDCF

[0.183] CSRDCF

[0.175] SAMF

[0.162] LCT

[0.162] DSST

[0.151] TLD

[0.149] SCT4

[0.149] Struck

[0.147] fDSST

[0.145] ASLA

[0.142] KCF

[0.141] CN

[0.136] L1APG

[0.113] CSK

[0.112] STC

[0.099] IVT

[0.085] CT

[0.065] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Rotation (175)

[0.351] MDNet

[0.342] VITAL

[0.310] StructSiam

[0.310] SiamFC

[0.308] DSiam

[0.280] SINT

[0.261] ECO

[0.253] STRCF

[0.243] ECO_HC

[0.222] HCFT

[0.221] CFNet

[0.216] PTAV

[0.211] MEEM

[0.208] BACF

[0.203] Staple

[0.196] TRACA

[0.195] CSRDCF

[0.193] Staple_CA

[0.176] SAMF

[0.176] Struck

[0.171] SRDCF

[0.166] LCT

[0.160] DSST

[0.160] SCT4

[0.155] fDSST

[0.153] TLD

[0.152] ASLA

[0.149] CN

[0.142] KCF

[0.134] L1APG

[0.122] CSK

[0.116] STC

[0.102] IVT

[0.099] CT

[0.071] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Scale Variation (273)

[0.367] MDNet

[0.354] VITAL

[0.334] SiamFC

[0.326] StructSiam

[0.317] DSiam

[0.292] ECO

[0.291] STRCF

[0.284] SINT

[0.270] ECO_HC

[0.253] CFNet

[0.244] PTAV

[0.235] BACF

[0.234] HCFT

[0.233] Staple

[0.228] Staple_CA

[0.222] TRACA

[0.218] MEEM

[0.216] CSRDCF

[0.213] SRDCF

[0.197] SAMF

[0.185] LCT

[0.184] DSST

[0.177] fDSST

[0.174] Struck

[0.173] SCT4

[0.171] TLD

[0.160] KCF

[0.160] ASLA

[0.158] CN

[0.148] L1APG

[0.130] STC

[0.127] CSK

[0.114] IVT

[0.100] CT

[0.078] MIL

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE - Viewpoint Change (33)

[0.343] MDNet

[0.327] VITAL

[0.321] ECO

[0.313] STRCF

[0.307] DSiam

[0.288] CFNet

[0.283] SiamFC

[0.273] ECO_HC

[0.257] StructSiam

[0.249] SINT

[0.239] BACF

[0.228] CSRDCF

[0.220] Staple

[0.220] SRDCF

[0.218] PTAV

[0.215] TRACA

[0.206] MEEM

[0.205] Staple_CA

[0.199] DSST

[0.189] HCFT

[0.174] SAMF

[0.160] TLD

[0.154] Struck

[0.150] LCT

[0.149] ASLA

[0.139] fDSST

[0.123] CN

[0.117] L1APG

[0.108] SCT4

[0.107] STC

[0.105] KCF

[0.101] CT

[0.097] CSK

[0.093] IVT

[0.067] MIL

Figure 13. Performance of trackers on each attribute using precision under protocol II. Best viewed in color.

16

Page 17: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

Fig. 14 shows the performance of trackers on each attribute using normalized precision under protocol II.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Aspect Ration Change (249)

[0.422] MDNet

[0.410] VITAL

[0.391] StructSiam

[0.389] SiamFC

[0.373] DSiam

[0.323] SINT

[0.297] STRCF

[0.295] ECO

[0.277] ECO_HC

[0.274] CFNet

[0.256] HCFT

[0.249] MEEM

[0.246] BACF

[0.244] TRACA

[0.239] Staple

[0.229] Staple_CA

[0.228] PTAV

[0.219] CSRDCF

[0.208] SRDCF

[0.208] SAMF

[0.187] LCT

[0.179] DSST

[0.176] TLD

[0.175] Struck

[0.173] SCT4

[0.168] fDSST

[0.162] KCF

[0.152] ASLA

[0.143] CN

[0.136] L1APG

[0.127] STC

[0.121] IVT

[0.115] CSK

[0.111] CT

[0.091] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Background Clutter (100)

[0.429] MDNet

[0.420] VITAL

[0.387] StructSiam

[0.366] SiamFC

[0.354] STRCF

[0.350] DSiam

[0.340] ECO

[0.337] SINT

[0.323] ECO_HC

[0.305] CFNet

[0.305] TRACA

[0.294] BACF

[0.290] HCFT

[0.289] PTAV

[0.285] Staple

[0.273] Staple_CA

[0.265] SRDCF

[0.258] SAMF

[0.246] CSRDCF

[0.240] MEEM

[0.233] DSST

[0.233] fDSST

[0.219] LCT

[0.219] SCT4

[0.215] KCF

[0.214] Struck

[0.212] L1APG

[0.204] TLD

[0.201] CN

[0.193] ASLA

[0.173] CSK

[0.171] STC

[0.166] IVT

[0.121] CT

[0.090] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Camera Motion (86)

[0.492] MDNet

[0.464] VITAL

[0.423] SiamFC

[0.414] StructSiam

[0.409] DSiam

[0.386] ECO_HC

[0.378] ECO

[0.376] STRCF

[0.357] SINT

[0.327] BACF

[0.326] HCFT

[0.321] CFNet

[0.319] TRACA

[0.318] PTAV

[0.285] Staple

[0.285] MEEM

[0.281] SRDCF

[0.278] Staple_CA

[0.266] CSRDCF

[0.261] SAMF

[0.225] SCT4

[0.221] LCT

[0.217] TLD

[0.212] fDSST

[0.202] DSST

[0.201] KCF

[0.181] Struck

[0.170] L1APG

[0.162] CN

[0.161] STC

[0.153] ASLA

[0.149] IVT

[0.142] CSK

[0.115] CT

[0.094] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Deformation (142)

[0.452] StructSiam

[0.443] MDNet

[0.439] SiamFC

[0.436] VITAL

[0.414] DSiam

[0.344] SINT

[0.295] STRCF

[0.295] CFNet

[0.278] HCFT

[0.278] ECO

[0.257] ECO_HC

[0.249] MEEM

[0.246] Staple_CA

[0.246] BACF

[0.242] Staple

[0.238] CSRDCF

[0.237] TRACA

[0.216] SAMF

[0.207] PTAV

[0.201] Struck

[0.185] LCT

[0.185] SRDCF

[0.166] ASLA

[0.166] DSST

[0.165] SCT4

[0.158] KCF

[0.156] L1APG

[0.145] IVT

[0.143] fDSST

[0.134] CN

[0.134] TLD

[0.130] CSK

[0.126] STC

[0.120] CT

[0.100] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Fast Motion (53)

[0.313] MDNet

[0.302] VITAL

[0.277] StructSiam

[0.242] ECO

[0.236] SiamFC

[0.234] DSiam

[0.226] ECO_HC

[0.226] STRCF

[0.215] SINT

[0.207] PTAV

[0.195] BACF

[0.189] CFNet

[0.182] MEEM

[0.179] HCFT

[0.177] TRACA

[0.153] CSRDCF

[0.153] SRDCF

[0.147] TLD

[0.146] Staple

[0.144] LCT

[0.142] SAMF

[0.141] Staple_CA

[0.134] fDSST

[0.131] DSST

[0.121] SCT4

[0.107] KCF

[0.104] Struck

[0.101] ASLA

[0.096] CN

[0.085] L1APG

[0.073] STC

[0.067] CSK

[0.062] IVT

[0.045] MIL

[0.044] CT

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Full Occlusion (118)

[0.346] MDNet

[0.337] VITAL

[0.296] SiamFC

[0.295] DSiam

[0.295] StructSiam

[0.261] SINT

[0.257] ECO

[0.251] STRCF

[0.251] ECO_HC

[0.227] TRACA

[0.221] CFNet

[0.213] MEEM

[0.206] HCFT

[0.199] BACF

[0.199] PTAV

[0.188] TLD

[0.186] Staple

[0.185] SAMF

[0.181] Staple_CA

[0.178] CSRDCF

[0.177] SRDCF

[0.164] SCT4

[0.161] LCT

[0.154] DSST

[0.152] KCF

[0.151] fDSST

[0.140] Struck

[0.139] ASLA

[0.134] CN

[0.127] L1APG

[0.103] STC

[0.103] CSK

[0.096] IVT

[0.086] CT

[0.057] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Illumination Variation (47)

[0.502] StructSiam

[0.500] DSiam

[0.492] MDNet

[0.475] VITAL

[0.473] SiamFC

[0.410] ECO

[0.405] CFNet

[0.388] STRCF

[0.376] ECO_HC

[0.356] SINT

[0.355] BACF

[0.355] HCFT

[0.346] Staple

[0.344] SRDCF

[0.344] PTAV

[0.343] Staple_CA

[0.340] TRACA

[0.324] SAMF

[0.317] CSRDCF

[0.304] fDSST

[0.293] DSST

[0.274] MEEM

[0.239] KCF

[0.237] L1APG

[0.227] SCT4

[0.219] Struck

[0.218] TLD

[0.216] LCT

[0.216] CN

[0.214] ASLA

[0.162] CSK

[0.160] STC

[0.147] IVT

[0.117] CT

[0.095] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Low Resolution (141)

[0.380] MDNet

[0.372] VITAL

[0.325] SiamFC

[0.325] StructSiam

[0.320] DSiam

[0.293] ECO

[0.284] STRCF

[0.274] SINT

[0.257] ECO_HC

[0.241] CFNet

[0.225] PTAV

[0.222] BACF

[0.215] Staple

[0.209] TRACA

[0.207] HCFT

[0.203] SRDCF

[0.197] Staple_CA

[0.195] MEEM

[0.190] CSRDCF

[0.180] SAMF

[0.180] TLD

[0.164] DSST

[0.163] SCT4

[0.159] LCT

[0.158] fDSST

[0.154] Struck

[0.151] ASLA

[0.142] KCF

[0.129] CN

[0.126] L1APG

[0.109] STC

[0.099] CSK

[0.087] IVT

[0.079] CT

[0.049] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Motion Blur (89)

[0.434] MDNet

[0.421] VITAL

[0.390] SiamFC

[0.381] StructSiam

[0.356] DSiam

[0.342] STRCF

[0.339] SINT

[0.311] ECO_HC

[0.298] ECO

[0.271] BACF

[0.271] PTAV

[0.262] HCFT

[0.259] MEEM

[0.251] TRACA

[0.251] Staple

[0.247] CFNet

[0.244] Staple_CA

[0.226] CSRDCF

[0.224] SAMF

[0.215] SRDCF

[0.199] fDSST

[0.197] LCT

[0.183] DSST

[0.179] TLD

[0.171] SCT4

[0.168] KCF

[0.157] L1APG

[0.154] Struck

[0.140] CN

[0.135] STC

[0.122] ASLA

[0.118] CSK

[0.114] IVT

[0.074] CT

[0.066] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Out-of-View (104)

[0.377] MDNet

[0.359] StructSiam

[0.351] VITAL

[0.334] SiamFC

[0.318] DSiam

[0.275] SINT

[0.257] STRCF

[0.248] ECO

[0.232] ECO_HC

[0.212] MEEM

[0.203] CFNet

[0.202] Staple

[0.198] BACF

[0.195] HCFT

[0.189] TRACA

[0.187] PTAV

[0.182] Staple_CA

[0.172] CSRDCF

[0.167] SRDCF

[0.162] SAMF

[0.161] LCT

[0.148] TLD

[0.146] SCT4

[0.138] fDSST

[0.138] KCF

[0.135] Struck

[0.135] DSST

[0.107] CN

[0.102] L1APG

[0.101] ASLA

[0.097] STC

[0.087] CSK

[0.084] IVT

[0.071] CT

[0.064] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Partial Occlusion (187)

[0.421] MDNet

[0.413] VITAL

[0.380] StructSiam

[0.374] SiamFC

[0.357] DSiam

[0.326] SINT

[0.296] ECO

[0.294] STRCF

[0.277] ECO_HC

[0.267] CFNet

[0.265] HCFT

[0.256] TRACA

[0.244] MEEM

[0.241] PTAV

[0.237] BACF

[0.232] Staple

[0.227] Staple_CA

[0.223] CSRDCF

[0.215] SAMF

[0.208] SRDCF

[0.192] LCT

[0.186] DSST

[0.180] fDSST

[0.176] SCT4

[0.173] TLD

[0.171] Struck

[0.170] KCF

[0.169] ASLA

[0.156] CN

[0.155] L1APG

[0.130] IVT

[0.130] CSK

[0.124] STC

[0.104] CT

[0.085] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Rotation (175)

[0.431] MDNet

[0.425] VITAL

[0.393] StructSiam

[0.388] DSiam

[0.385] SiamFC

[0.333] SINT

[0.283] ECO

[0.282] STRCF

[0.269] ECO_HC

[0.267] CFNet

[0.250] HCFT

[0.244] MEEM

[0.232] Staple

[0.232] BACF

[0.230] TRACA

[0.226] Staple_CA

[0.217] PTAV

[0.213] CSRDCF

[0.199] SAMF

[0.187] Struck

[0.185] LCT

[0.184] SRDCF

[0.172] TLD

[0.167] DSST

[0.166] ASLA

[0.165] SCT4

[0.161] fDSST

[0.148] KCF

[0.147] CN

[0.141] L1APG

[0.129] STC

[0.127] CSK

[0.124] IVT

[0.116] CT

[0.081] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Scale Variation (273)

[0.455] MDNet

[0.447] VITAL

[0.415] SiamFC

[0.413] StructSiam

[0.399] DSiam

[0.344] SINT

[0.332] STRCF

[0.329] ECO

[0.311] ECO_HC

[0.303] CFNet

[0.280] HCFT

[0.276] BACF

[0.271] TRACA

[0.269] Staple

[0.263] PTAV

[0.261] Staple_CA

[0.255] MEEM

[0.245] CSRDCF

[0.241] SRDCF

[0.229] SAMF

[0.204] DSST

[0.199] LCT

[0.198] fDSST

[0.197] Struck

[0.192] SCT4

[0.186] TLD

[0.181] KCF

[0.180] ASLA

[0.167] CN

[0.166] L1APG

[0.147] STC

[0.142] IVT

[0.141] CSK

[0.117] CT

[0.094] MIL

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Normalized Precision plots of OPE - Viewpoint Change (33)

[0.426] MDNet

[0.404] VITAL

[0.366] DSiam

[0.366] SiamFC

[0.348] STRCF

[0.342] CFNet

[0.338] ECO

[0.325] StructSiam

[0.293] ECO_HC

[0.290] BACF

[0.279] SINT

[0.258] Staple

[0.245] TRACA

[0.242] SRDCF

[0.236] CSRDCF

[0.236] Staple_CA

[0.211] MEEM

[0.206] PTAV

[0.203] SAMF

[0.196] DSST

[0.186] ASLA

[0.181] HCFT

[0.168] TLD

[0.157] Struck

[0.156] L1APG

[0.136] LCT

[0.130] fDSST

[0.127] SCT4

[0.126] CN

[0.118] IVT

[0.111] STC

[0.110] KCF

[0.105] CT

[0.094] CSK

[0.071] MIL

Figure 14. Performance of trackers on each attribute using precision under protocol II. Best viewed in color.

17

Page 18: LaSOT: A High-quality Benchmark for Large-scale …LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking Heng Fan 1Liting Lin2 Fan Yang Peng Chu 1Ge Deng Sijia Yu

Fig. 15 shows the performance of trackers on each attribute using success under protocol II.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Aspect Ration Change (249)

[0.366] MDNet

[0.358] VITAL

[0.313] StructSiam

[0.308] SiamFC

[0.307] DSiam

[0.294] SINT

[0.288] ECO

[0.269] STRCF

[0.267] ECO_HC

[0.243] MEEM

[0.243] CFNet

[0.234] HCFT

[0.228] BACF

[0.227] TRACA

[0.215] CSRDCF

[0.213] Staple

[0.212] PTAV

[0.211] SAMF

[0.208] SRDCF

[0.206] Staple_CA

[0.198] LCT

[0.193] Struck

[0.192] TLD

[0.180] DSST

[0.171] SCT4

[0.168] fDSST

[0.163] ASLA

[0.158] KCF

[0.149] CN

[0.144] CT

[0.128] MIL

[0.127] CSK

[0.125] L1APG

[0.119] STC

[0.094] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Background Clutter (100)

[0.374] MDNet

[0.365] VITAL

[0.320] StructSiam

[0.319] ECO

[0.317] STRCF

[0.308] SiamFC

[0.305] SINT

[0.304] DSiam

[0.304] ECO_HC

[0.278] TRACA

[0.271] CFNet

[0.265] BACF

[0.259] SRDCF

[0.254] SAMF

[0.251] PTAV

[0.251] Staple

[0.246] HCFT

[0.244] Staple_CA

[0.238] MEEM

[0.233] CSRDCF

[0.231] LCT

[0.226] Struck

[0.222] fDSST

[0.219] DSST

[0.213] TLD

[0.212] SCT4

[0.198] KCF

[0.197] ASLA

[0.185] CN

[0.175] L1APG

[0.170] CSK

[0.163] CT

[0.145] STC

[0.138] MIL

[0.134] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Camera Motion (86)

[0.416] MDNet

[0.397] VITAL

[0.358] ECO

[0.346] ECO_HC

[0.344] STRCF

[0.337] DSiam

[0.337] StructSiam

[0.333] SiamFC

[0.315] SINT

[0.292] TRACA

[0.289] CFNet

[0.286] BACF

[0.272] SRDCF

[0.266] MEEM

[0.266] PTAV

[0.261] HCFT

[0.259] Staple_CA

[0.254] Staple

[0.251] CSRDCF

[0.248] SAMF

[0.234] LCT

[0.223] TLD

[0.206] SCT4

[0.202] fDSST

[0.200] DSST

[0.191] Struck

[0.180] KCF

[0.170] ASLA

[0.159] CN

[0.148] CT

[0.144] L1APG

[0.140] STC

[0.140] CSK

[0.139] MIL

[0.113] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Deformation (142)

[0.391] MDNet

[0.384] VITAL

[0.361] StructSiam

[0.351] SiamFC

[0.338] DSiam

[0.319] SINT

[0.279] ECO

[0.274] STRCF

[0.271] CFNet

[0.270] ECO_HC

[0.257] HCFT

[0.256] MEEM

[0.238] BACF

[0.235] CSRDCF

[0.230] TRACA

[0.229] Staple

[0.229] SAMF

[0.221] Staple_CA

[0.216] Struck

[0.209] LCT

[0.208] PTAV

[0.204] SRDCF

[0.183] DSST

[0.180] ASLA

[0.178] SCT4

[0.174] KCF

[0.173] TLD

[0.168] CT

[0.163] CN

[0.162] fDSST

[0.149] MIL

[0.148] CSK

[0.139] L1APG

[0.127] STC

[0.103] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Fast Motion (53)

[0.260] MDNet

[0.246] VITAL

[0.233] ECO

[0.215] StructSiam

[0.199] ECO_HC

[0.199] SINT

[0.198] STRCF

[0.197] DSiam

[0.195] SiamFC

[0.176] BACF

[0.168] PTAV

[0.163] MEEM

[0.161] LCT

[0.156] CFNet

[0.154] TRACA

[0.146] HCFT

[0.145] SRDCF

[0.143] TLD

[0.140] CSRDCF

[0.134] Staple

[0.129] Staple_CA

[0.125] SAMF

[0.118] fDSST

[0.114] DSST

[0.107] Struck

[0.106] SCT4

[0.104] ASLA

[0.089] KCF

[0.086] CN

[0.080] STC

[0.076] CT

[0.068] L1APG

[0.066] CSK

[0.055] MIL

[0.050] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Full Occlusion (118)

[0.305] MDNet

[0.301] VITAL

[0.254] ECO

[0.253] DSiam

[0.245] SiamFC

[0.243] SINT

[0.241] StructSiam

[0.234] STRCF

[0.229] ECO_HC

[0.215] MEEM

[0.201] TRACA

[0.199] CFNet

[0.189] HCFT

[0.188] BACF

[0.188] SAMF

[0.182] TLD

[0.181] SRDCF

[0.179] PTAV

[0.167] CSRDCF

[0.166] LCT

[0.162] Staple

[0.161] Struck

[0.159] Staple_CA

[0.158] SCT4

[0.142] fDSST

[0.140] DSST

[0.140] KCF

[0.138] ASLA

[0.127] CN

[0.114] CT

[0.113] L1APG

[0.105] CSK

[0.099] STC

[0.094] MIL

[0.076] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Illumination Variation (47)

[0.407] MDNet

[0.403] VITAL

[0.380] DSiam

[0.373] ECO

[0.366] StructSiam

[0.346] SiamFC

[0.334] STRCF

[0.328] ECO_HC

[0.318] CFNet

[0.314] SINT

[0.312] BACF

[0.298] SRDCF

[0.294] TRACA

[0.281] CSRDCF

[0.272] Staple

[0.267] SAMF

[0.267] Staple_CA

[0.266] PTAV

[0.258] HCFT

[0.253] DSST

[0.248] fDSST

[0.244] MEEM

[0.224] LCT

[0.212] TLD

[0.209] Struck

[0.205] ASLA

[0.190] L1APG

[0.187] SCT4

[0.184] KCF

[0.161] CN

[0.146] CT

[0.143] STC

[0.142] CSK

[0.115] IVT

[0.112] MIL

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Low Resolution (141)

[0.317] MDNet

[0.309] VITAL

[0.267] ECO

[0.258] StructSiam

[0.257] DSiam

[0.252] SiamFC

[0.246] SINT

[0.245] STRCF

[0.236] ECO_HC

[0.198] PTAV

[0.195] BACF

[0.195] CFNet

[0.194] MEEM

[0.187] SRDCF

[0.185] HCFT

[0.184] TRACA

[0.180] TLD

[0.177] CSRDCF

[0.173] Staple

[0.170] SAMF

[0.170] LCT

[0.165] Staple_CA

[0.158] Struck

[0.147] fDSST

[0.143] DSST

[0.143] SCT4

[0.137] ASLA

[0.126] KCF

[0.117] CN

[0.111] L1APG

[0.109] CT

[0.095] STC

[0.095] CSK

[0.084] MIL

[0.069] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Motion Blur (89)

[0.376] MDNet

[0.363] VITAL

[0.324] STRCF

[0.309] StructSiam

[0.308] SiamFC

[0.305] ECO

[0.304] ECO_HC

[0.300] DSiam

[0.297] SINT

[0.251] MEEM

[0.246] BACF

[0.245] TRACA

[0.244] PTAV

[0.238] CFNet

[0.231] Staple

[0.226] CSRDCF

[0.226] Staple_CA

[0.223] SRDCF

[0.222] HCFT

[0.220] SAMF

[0.217] LCT

[0.202] TLD

[0.193] fDSST

[0.187] DSST

[0.175] Struck

[0.173] SCT4

[0.161] KCF

[0.150] CN

[0.150] ASLA

[0.141] L1APG

[0.128] CSK

[0.121] STC

[0.119] CT

[0.114] MIL

[0.093] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Out-of-View (104)

[0.330] MDNet

[0.304] VITAL

[0.278] StructSiam

[0.256] SiamFC

[0.254] SINT

[0.253] DSiam

[0.239] ECO

[0.232] STRCF

[0.225] ECO_HC

[0.209] MEEM

[0.189] HCFT

[0.185] Staple

[0.183] CFNet

[0.181] BACF

[0.176] PTAV

[0.175] CSRDCF

[0.173] LCT

[0.173] TRACA

[0.171] Staple_CA

[0.168] SRDCF

[0.167] TLD

[0.159] SAMF

[0.158] Struck

[0.139] SCT4

[0.137] fDSST

[0.136] DSST

[0.130] KCF

[0.115] ASLA

[0.113] CN

[0.112] CT

[0.105] MIL

[0.096] STC

[0.094] L1APG

[0.092] CSK

[0.059] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Partial Occlusion (187)

[0.370] MDNet

[0.361] VITAL

[0.310] StructSiam

[0.306] SiamFC

[0.303] DSiam

[0.296] SINT

[0.290] ECO

[0.273] STRCF

[0.267] ECO_HC

[0.246] CFNet

[0.244] MEEM

[0.240] TRACA

[0.233] HCFT

[0.222] BACF

[0.217] PTAV

[0.214] SRDCF

[0.212] SAMF

[0.211] CSRDCF

[0.209] Staple

[0.208] Staple_CA

[0.198] LCT

[0.191] Struck

[0.189] TLD

[0.182] DSST

[0.177] SCT4

[0.175] fDSST

[0.174] ASLA

[0.161] KCF

[0.149] CN

[0.142] CT

[0.137] L1APG

[0.131] CSK

[0.129] MIL

[0.116] STC

[0.100] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Rotation (175)

[0.379] MDNet

[0.371] VITAL

[0.321] DSiam

[0.319] StructSiam

[0.310] SiamFC

[0.302] SINT

[0.285] ECO

[0.269] ECO_HC

[0.265] STRCF

[0.247] CFNet

[0.245] MEEM

[0.239] HCFT

[0.224] BACF

[0.221] TRACA

[0.221] CSRDCF

[0.219] Staple

[0.219] PTAV

[0.212] SAMF

[0.210] Staple_CA

[0.199] Struck

[0.199] SRDCF

[0.196] LCT

[0.189] TLD

[0.179] DSST

[0.174] ASLA

[0.172] SCT4

[0.172] fDSST

[0.158] KCF

[0.156] CN

[0.148] CT

[0.135] CSK

[0.131] MIL

[0.128] L1APG

[0.125] STC

[0.097] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Scale Variation (273)

[0.392] MDNet

[0.385] VITAL

[0.332] SiamFC

[0.331] StructSiam

[0.328] DSiam

[0.318] ECO

[0.307] SINT

[0.303] STRCF

[0.297] ECO_HC

[0.267] CFNet

[0.255] BACF

[0.252] TRACA

[0.250] MEEM

[0.244] HCFT

[0.241] PTAV

[0.239] CSRDCF

[0.239] SRDCF

[0.237] Staple

[0.231] Staple_CA

[0.226] SAMF

[0.215] LCT

[0.205] Struck

[0.204] TLD

[0.200] DSST

[0.195] fDSST

[0.187] ASLA

[0.184] SCT4

[0.171] KCF

[0.163] CN

[0.152] CT

[0.149] L1APG

[0.141] CSK

[0.136] MIL

[0.132] STC

[0.109] IVT

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success r

ate

Success plots of OPE - Viewpoint Change (33)

[0.358] MDNet

[0.339] VITAL

[0.317] ECO

[0.298] ECO_HC

[0.287] DSiam

[0.286] SiamFC

[0.283] STRCF

[0.254] BACF

[0.248] CFNet

[0.247] SINT

[0.246] StructSiam

[0.238] CSRDCF

[0.226] Staple

[0.225] SRDCF

[0.208] TRACA

[0.203] PTAV

[0.194] MEEM

[0.193] Staple_CA

[0.178] SAMF

[0.177] HCFT

[0.176] DSST

[0.172] TLD

[0.167] LCT

[0.160] ASLA

[0.158] Struck

[0.137] L1APG

[0.136] fDSST

[0.119] CT

[0.112] CN

[0.102] STC

[0.101] SCT4

[0.100] CSK

[0.100] MIL

[0.095] IVT

[0.093] KCF

Figure 15. Performance of trackers on each attribute using success under protocol II. Best viewed in color.

18