ROAM: Recurrently Optimizing Tracking Model · tensively evaluate our trackers, ROAM and ROAM++, on...

10
ROAM: Recurrently Optimizing Tracking Model Tianyu Yang 1,2 Pengfei Xu 3 Runbo Hu 3 Hua Chai 3 Antoni B. Chan 2 1 Tencent AI Lab 2 City University of Hong Kong 3 Didi Chuxing Abstract In this paper, we design a tracking model consisting of response generation and bounding box regression, where the first component produces a heat map to indicate the presence of the object at different positions and the second part regresses the relative bounding box shifts to anchors mounted on sliding-window locations. Thanks to the resiz- able convolutional filters used in both components to adapt to the shape changes of objects, our tracking model does not need to enumerate different sized anchors, thus saving model parameters. To effectively adapt the model to appear- ance variations, we propose to offline train a recurrent neu- ral optimizer to update tracking model in a meta-learning setting, which can converge the model in a few gradient steps. This improves the convergence speed of updating the tracking model while achieving better performance. We ex- tensively evaluate our trackers, ROAM and ROAM++, on the OTB, VOT, LaSOT, GOT-10K and TrackingNet bench- mark and our methods perform favorably against state-of- the-art algorithms. 1. Introduction Generic visual object tracking is the task of estimating the bounding box of a target in a video sequence given only its initial position. Typically, the preliminary model learned from the first frame needs to be updated continuously to adapt to the target’s appearance variations caused by rota- tion, illumination, occlusion, deformation, etc. However, it is challenging to optimize the initial learned model effi- ciently and effectively as tracking proceeds. Training sam- ples for model updating are usually collected based on es- timated bounding boxes, which could be inaccurate. Those small errors will accumulate over time, gradually resulting in model degradation. To avoid model updating, which may introduce unre- liable training samples that ruin the model, several ap- proaches [4, 43] investigate tracking by only comparing the first frame with the subsequent frames, using a similarity function based on a learned discriminant and invariant deep Siamese feature embedding. However, training such a deep representation is difficult due to drastic appearance varia- tions that commonly emerge in long-term tracking. Other methods either update the model via an exponential moving average of templates [16, 44], which marginally improves the performance, or optimize the model with hand-designed SGD methods [33, 41], which needs numerous iterations to converge thus preventing real-time speed. Limiting the number of SGD iterations can allow near real-time speeds, but at the expense of poor quality model updates due to the loss function not being optimized sufficiently. In recent years, much effort has been done on localizing the object using robust online learned classifier, while few attention is paid on designing accurate bounding box esti- mation. Most trackers simply resort to multi-scale search by assuming that the object aspect ratio does not change during tracking, which is often violated in real world. Re- cently, SiamRPN [23] borrows the idea of region proposal networks [37] in object detection to decompose tracking task into two branches: 1) classifying the target from the background, and 2) regressing the accurate bounding box based with reference to anchor boxes mounted on different positions. As is shown on the VOT benchmarks [19, 20], SiamRPN achieves higher precision on bounding box esti- mation but suffers lower robustness compared with state-of- the-art methods [8, 9, 26] due to no online model updating. Furthermore, SiamRPN mounts anchors with different as- pect ratios on every spatial location of the feature map to handle possible shape changes, which is redundant in both computation and storage. In this paper, we propose a tracking framework which is composed of two modules: response generation and bound- ing box regression, where the first component produces a re- sponse map to indicate the possibility of covering the object for anchor boxes mounted on sliding-window positions, and the second part predicts bounding box shifts from the an- chors to get refined rectangles. Instead of enumerating dif- ferent aspect ratios of anchors as in SiamRPN, we propose to use only one sized anchor for each position and adapt it to shape changes by resizing its corresponding convolu- tional filter using bilinear interpolation, which saves model parameters and computing time. To effectively adapt the tracking model to appearance changes during tracking, we propose a recurrent model optimization method to learn a more effective gradient descent that converges the model update in 1-2 steps, and generalizes better to future frames. The key idea is to train a neural optimizer that can con-

Transcript of ROAM: Recurrently Optimizing Tracking Model · tensively evaluate our trackers, ROAM and ROAM++, on...

Page 1: ROAM: Recurrently Optimizing Tracking Model · tensively evaluate our trackers, ROAM and ROAM++, on the OTB, VOT, LaSOT, GOT-10K and TrackingNet bench-mark and our methods perform

ROAM: Recurrently Optimizing Tracking Model

Tianyu Yang1,2 Pengfei Xu3 Runbo Hu3 Hua Chai3 Antoni B. Chan2

1 Tencent AI Lab 2 City University of Hong Kong 3 Didi Chuxing

Abstract

In this paper, we design a tracking model consisting ofresponse generation and bounding box regression, wherethe first component produces a heat map to indicate thepresence of the object at different positions and the secondpart regresses the relative bounding box shifts to anchorsmounted on sliding-window locations. Thanks to the resiz-able convolutional filters used in both components to adaptto the shape changes of objects, our tracking model doesnot need to enumerate different sized anchors, thus savingmodel parameters. To effectively adapt the model to appear-ance variations, we propose to offline train a recurrent neu-ral optimizer to update tracking model in a meta-learningsetting, which can converge the model in a few gradientsteps. This improves the convergence speed of updating thetracking model while achieving better performance. We ex-tensively evaluate our trackers, ROAM and ROAM++, onthe OTB, VOT, LaSOT, GOT-10K and TrackingNet bench-mark and our methods perform favorably against state-of-the-art algorithms.

1. IntroductionGeneric visual object tracking is the task of estimating

the bounding box of a target in a video sequence given onlyits initial position. Typically, the preliminary model learnedfrom the first frame needs to be updated continuously toadapt to the target’s appearance variations caused by rota-tion, illumination, occlusion, deformation, etc. However,it is challenging to optimize the initial learned model effi-ciently and effectively as tracking proceeds. Training sam-ples for model updating are usually collected based on es-timated bounding boxes, which could be inaccurate. Thosesmall errors will accumulate over time, gradually resultingin model degradation.

To avoid model updating, which may introduce unre-liable training samples that ruin the model, several ap-proaches [4, 43] investigate tracking by only comparing thefirst frame with the subsequent frames, using a similarityfunction based on a learned discriminant and invariant deepSiamese feature embedding. However, training such a deeprepresentation is difficult due to drastic appearance varia-tions that commonly emerge in long-term tracking. Other

methods either update the model via an exponential movingaverage of templates [16, 44], which marginally improvesthe performance, or optimize the model with hand-designedSGD methods [33, 41], which needs numerous iterationsto converge thus preventing real-time speed. Limiting thenumber of SGD iterations can allow near real-time speeds,but at the expense of poor quality model updates due to theloss function not being optimized sufficiently.

In recent years, much effort has been done on localizingthe object using robust online learned classifier, while fewattention is paid on designing accurate bounding box esti-mation. Most trackers simply resort to multi-scale searchby assuming that the object aspect ratio does not changeduring tracking, which is often violated in real world. Re-cently, SiamRPN [23] borrows the idea of region proposalnetworks [37] in object detection to decompose trackingtask into two branches: 1) classifying the target from thebackground, and 2) regressing the accurate bounding boxbased with reference to anchor boxes mounted on differentpositions. As is shown on the VOT benchmarks [19, 20],SiamRPN achieves higher precision on bounding box esti-mation but suffers lower robustness compared with state-of-the-art methods [8, 9, 26] due to no online model updating.Furthermore, SiamRPN mounts anchors with different as-pect ratios on every spatial location of the feature map tohandle possible shape changes, which is redundant in bothcomputation and storage.

In this paper, we propose a tracking framework which iscomposed of two modules: response generation and bound-ing box regression, where the first component produces a re-sponse map to indicate the possibility of covering the objectfor anchor boxes mounted on sliding-window positions, andthe second part predicts bounding box shifts from the an-chors to get refined rectangles. Instead of enumerating dif-ferent aspect ratios of anchors as in SiamRPN, we proposeto use only one sized anchor for each position and adaptit to shape changes by resizing its corresponding convolu-tional filter using bilinear interpolation, which saves modelparameters and computing time. To effectively adapt thetracking model to appearance changes during tracking, wepropose a recurrent model optimization method to learn amore effective gradient descent that converges the modelupdate in 1-2 steps, and generalizes better to future frames.The key idea is to train a neural optimizer that can con-

Page 2: ROAM: Recurrently Optimizing Tracking Model · tensively evaluate our trackers, ROAM and ROAM++, on the OTB, VOT, LaSOT, GOT-10K and TrackingNet bench-mark and our methods perform

verge the tracking model to a good solution in a few gradientsteps. During the training phase, the tracking model is firstupdated using the neural optimizer, and then it is appliedon future frames to obtain an error signal for minimization.Under this particular setting, the resulting optimizer con-verges the tracking classifier significant faster than SGD-based optimizers, especially for learning the initial trackingmodel. In summary, our contributions are:• We propose a tracking model consisting of resizable

response generator and bounding box regressor, whereonly one sized anchor is used on each spatial posi-tion and its corresponding convolutional filter could beadapted to shape variations by bilinear interpolation.• We propose a recurrent neural optimizer, which is

trained in a meta-learning setting, that recurrently up-dates the tracking model with faster convergence.• We conduct comprehensive experiments on large scale

datasets including OTB, VOT, LaSOT, GOT10k andTrackingNet, and our trackers achieve favorable per-formance compared with the state-of-the-art.

2. Related WorkVisual Tracking Predicting a heat map to indicate the

position of object is commonly used in visual tracking com-munity [3, 4, 4, 9, 13, 16, 45]. Among them, SiamFC [4]is one of the most popular methods due to its fast speedand good performance. However, most response generationbased trackers, including SiamFC, estimate the boundingbox via simple multi-scale search mechanism, which is un-able to handle aspect ratio changes. To addresses this issue,recent SiamRPN [23] and its extensions [11, 22, 52] pro-pose to train a bounding box regressor as in object detec-tion [37], showing impressive performance. Different fromthese algorithms which enumerates a set of predefined an-chors with different aspect ratios on each spatial position,we adopt a resizable anchor to adapt the shape variation ofobject, which saves model parameters and computing time.

Online model updating is another important module thatSiamFC lacks. Recent works improve SiamFC [4] by intro-ducing various model updating strategies, including recur-rent generation of target template filters through a convo-lutional LSTM [48], a dynamic memory network [49, 50],where object information is written into and read from anaddressable external memory, and distractor-aware incre-mental learning [53], which make use of hard-negative tem-plates around the target to suppress distractors. It shouldbe noted that all these algorithms essentially achieve modelupdating by linearly interpolating old target templates withthe newly generated one, in which the major difference ishow to control the weights when combining them. This isfar from optimal compared with optimization methods us-ing gradient decent, which minimize the tracking loss di-rectly to adapt to new target appearances. Instead of usinga Siamese network to build the convolutional filter, othermethods [26, 34, 41] generate the filter by performing gra-

dient decent on the first frame, which could be continuouslyoptimized during subsequent frames. In particular, [34] pro-poses to train the initial tracking model in a meta-learningsetting, which shows promising results. However, it stilluses traditional SGD to optimize the tracking model dur-ing the subsequent frames, which is not effective to adapt tonew appearance and slow in updating the model. In contrastto these trackers, our off-line learned recurrent neural opti-mizer applies meta-learning on both the initial model andmodel updates, which allows model initialization and up-dating in only one or two gradient steps, resulting in muchfaster runtime speed, and better accuracy.

Learning to Learn. Learning to learn or meta-learninghas a long history [2, 32, 38]. With the recent successes ofapplying meta-learning on few-shot classification [31, 36]and reinforcement learning [12, 39], it has regained atten-tion. The pioneering work [1] designs an off-line learnedoptimizer using gradient decent and shows promising per-formance compared with traditional optimization methods.However, it does not generalize well for large numbers ofdescent step. To mitigate this problem, [28] proposes sev-eral training techniques, including parameters scaling andcombination with convex functions to coordinate the learn-ing process of the optimizer. [46] also addresses this issueby designing a hierarchical RNN architecture with dynami-cally adapted input and output scaling. In contrast to otherworks that output an increment for each parameter update,which is prone to overfitting due to different gradient scales,we instead associate an adaptive learning rate produced bya recurrent neural network with the computed gradient forfast convergence of the model update.

3. Proposed AlgorithmOur tracker consists of two main modules: 1) a tracking

model that is resizable to adapt to shape changes; and 2) aneural optimizer that is in charge of model updating. Thetracking model contains two branches where the responsegeneration branch determines the presence of target by pre-dicting a confidence score map and the bounding box re-gression branch estimates the precise box of the target byregressing coordinate shifts from the box anchors mountedon the sliding-window positions. The offline learned neu-ral optimizer is trained using a meta-learning framework toonline update the tracking model in order to adapt to appear-ance variations. Note both response generation and bound-ing box regression are built on the feature map computedfrom the backbone CNN network. The whole framework isbriefly illustrated on Fig. 1

3.1. Resizable Tracking Model

Trackers like correlation filter [16] and MetaTracker [34]initialize a convolutional filter based on the size of objectin the first frame and keep its size fixed during subsequentframes. This setting is based on the assumption that the as-

Page 3: ROAM: Recurrently Optimizing Tracking Model · tensively evaluate our trackers, ROAM and ROAM++, on the OTB, VOT, LaSOT, GOT-10K and TrackingNet bench-mark and our methods perform

CNN

FeatureExtractor TrackingModel Predic7on LabelsFrames

HistoricalFrames

FutureFrame

F1...Fi...FN<latexit sha1_base64="Iw/uSsNywH23SWGIHOpWAZmbMXY=">AAACFHicbZDLSsNAFIZP6q3GW7RLN8EiuAqJCLosCMWVVLAXaEOYTCft0MmFmYkYQl7DrVt9B12JO3HvK/gUTtMutPWHGT7+cw5n5vcTRoW07S+tsrK6tr5R3dS3tnd294z9g46IU45JG8cs5j0fCcJoRNqSSkZ6CSco9Bnp+pPLab17R7igcXQrs4S4IRpFNKAYSWV5Rq3p5U5hWVbTo+WdXxeeUbctu5S5DM4c6o3qxysotTzjezCMcRqSSGKGhOg7diLdHHFJMSOFPkgFSRCeoBHpK4xQSISbl48vzGPlDM0g5upE0izd3xM5CoXIQl91hkiOxWJtav5bG3GUjCm+X9gvgws3p1GSShLh2fogZaaMzWk+5pBygiXLFCDMqfqBiceIIyxVirquq3CcxSiWoXNqOYpvVEpnMFMVDuEITsCBc2jAFbSgDRgyeIQneNYetBftTXuftVa0+UwN/kj7/AE6Ip50</latexit><latexit sha1_base64="KdPjJpIgksbpsOEh7IPl91nmxPg=">AAACFHicbZDLSsNAFIYn9VbjLbZLN6FFcBUSEXRZUIorqWAv0IYwmU7aoZNJmJmIIeQ13LoTfQddiTtx7yv4FE7TLrT1wAwf/38OZ+b3Y0qEtO0vrbSyura+Ud7Ut7Z3dveM/UpHRAlHuI0iGvGeDwWmhOG2JJLiXswxDH2Ku/7kfOp3bzEXJGI3Mo2xG8IRIwFBUCrJM6pNL3Nyy7KaHinu7Cr3jLpt2UWZy+DMod4of7xWLh5rLc/4HgwjlISYSUShEH3HjqWbQS4JojjXB4nAMUQTOMJ9hQyGWLhZ8fjcPFTK0Awirg6TZqH+nshgKEQa+qozhHIsFr2p+K834jAeE3S3sF8GZ25GWJxIzNBsfZBQU0bmNB9zSDhGkqYKIOJE/cBEY8ghkipFXddVOM5iFMvQObYcxdcqpRMwqzI4ADVwBBxwChrgErRAGyCQggfwBJ61e+1Fe9PeZ60lbT5TBX9K+/wB01ufkg==</latexit><latexit sha1_base64="KdPjJpIgksbpsOEh7IPl91nmxPg=">AAACFHicbZDLSsNAFIYn9VbjLbZLN6FFcBUSEXRZUIorqWAv0IYwmU7aoZNJmJmIIeQ13LoTfQddiTtx7yv4FE7TLrT1wAwf/38OZ+b3Y0qEtO0vrbSyura+Ud7Ut7Z3dveM/UpHRAlHuI0iGvGeDwWmhOG2JJLiXswxDH2Ku/7kfOp3bzEXJGI3Mo2xG8IRIwFBUCrJM6pNL3Nyy7KaHinu7Cr3jLpt2UWZy+DMod4of7xWLh5rLc/4HgwjlISYSUShEH3HjqWbQS4JojjXB4nAMUQTOMJ9hQyGWLhZ8fjcPFTK0Awirg6TZqH+nshgKEQa+qozhHIsFr2p+K834jAeE3S3sF8GZ25GWJxIzNBsfZBQU0bmNB9zSDhGkqYKIOJE/cBEY8ghkipFXddVOM5iFMvQObYcxdcqpRMwqzI4ADVwBBxwChrgErRAGyCQggfwBJ61e+1Fe9PeZ60lbT5TBX9K+/wB01ufkg==</latexit><latexit sha1_base64="KdPjJpIgksbpsOEh7IPl91nmxPg=">AAACFHicbZDLSsNAFIYn9VbjLbZLN6FFcBUSEXRZUIorqWAv0IYwmU7aoZNJmJmIIeQ13LoTfQddiTtx7yv4FE7TLrT1wAwf/38OZ+b3Y0qEtO0vrbSyura+Ud7Ut7Z3dveM/UpHRAlHuI0iGvGeDwWmhOG2JJLiXswxDH2Ku/7kfOp3bzEXJGI3Mo2xG8IRIwFBUCrJM6pNL3Nyy7KaHinu7Cr3jLpt2UWZy+DMod4of7xWLh5rLc/4HgwjlISYSUShEH3HjqWbQS4JojjXB4nAMUQTOMJ9hQyGWLhZ8fjcPFTK0Awirg6TZqH+nshgKEQa+qozhHIsFr2p+K834jAeE3S3sF8GZ25GWJxIzNBsfZBQU0bmNB9zSDhGkqYKIOJE/cBEY8ghkipFXddVOM5iFMvQObYcxdcqpRMwqzI4ADVwBBxwChrgErRAGyCQggfwBJ61e+1Fe9PeZ60lbT5TBX9K+/wB01ufkg==</latexit><latexit sha1_base64="D9uc3cTaj14D1y+MAf6rS95qOtM=">AAACFHicbZBNS8MwGMfT+TbrW3VHL8EheCqtCHocCMOTTHAvsJWSZukWlqYlScVS+jW8etXv4E28evcr+CnMuh5084GEH///8/Ak/yBhVCrH+TJqa+sbm1v1bXNnd2//wDo86sk4FZh0ccxiMQiQJIxy0lVUMTJIBEFRwEg/mF3P/f4DEZLG/F5lCfEiNOE0pBgpLflWo+3nbmHbdtun5Z3fFr7VdGynLLgKbgVNUFXHt75H4xinEeEKMyTl0HUS5eVIKIoZKcxRKkmC8AxNyFAjRxGRXl4+voCnWhnDMBb6cAVL9fdEjiIpsyjQnRFSU7nszcV/vYlAyZTix6X9KrzycsqTVBGOF+vDlEEVw3k+cEwFwYplGhAWVP8A4ikSCCudommaOhx3OYpV6J3bruY7p9m6qGKqg2NwAs6ACy5BC9yADugCDDLwDF7Aq/FkvBnvxseitWZUMw3wp4zPH5oYnKM=</latexit>

Features

Feature

UpdateLoss

MetaLossCNN

F (t+�)<latexit sha1_base64="Yv/iBoLmz3VBDlUhSlZjJhpPFzo=">AAACDXicbZDJSgNBEIZrXOO4RT16GQxCRAgzIujNgCAeI5gFstHT05M06VnorhHDkGfwam7qO3gQxKvP4Cv4FHaWgyb+0PDxVxVV/bux4Apt+8tYWFxaXlnNrJnrG5tb29md3YqKEklZmUYikjWXKCZ4yMrIUbBaLBkJXMGqbu9yVK/eMal4FN5iP2bNgHRC7nNKUFutq1aax+OGxwSSo0E7m7ML9ljWPDhTyF28DYdPAFBqZ78bXkSTgIVIBVGq7tgxNlMikVPBBmYjUSwmtEc6rK4xJAFTzXR89cA61I5n+ZHUL0Rr7P6eSEmgVD9wdWdAsKtmayPz31pHkrjL6f3MfvTPmykP4wRZSCfr/URYGFmjYCyPS0ZR9DUQKrn+gUW7RBKKOj7TNHU4zmwU81A5KTiab+xc8RQmysA+HEAeHDiDIlxDCcpAQcIjPMOL8WC8Gu/Gx6R1wZjO7MEfGZ8/4cGeIQ==</latexit><latexit sha1_base64="43HwIVKPHVy5cor05FNIWrn7Ayc=">AAACDXicbZDLSsNAFIYn9VbjrerSTbAIFaEkIujOgiAuK9gLtGmZTCbt0MkkzJyIJfQZ3Ko7fQcXgrj1GXwFn8JJ24W2/jDw8Z9zOGd+L+ZMgW1/GbmFxaXllfyquba+sblV2N6pqyiRhNZIxCPZ9LCinAlaAwacNmNJcehx2vAGF1m9cUulYpG4gWFM3RD3BAsYwaCtzmUnLcFR26cc8OGoWyjaZXssax6cKRTP3x4zPVW7he+2H5EkpAIIx0q1HDsGN8USGOF0ZLYTRWNMBrhHWxoFDqly0/HVI+tAO74VRFI/AdbY/T2R4lCpYejpzhBDX83WMvPfWk/iuM/I3cx+CM7clIk4ASrIZH2QcAsiKwvG8pmkBPhQAyaS6R9YpI8lJqDjM01Th+PMRjEP9eOyo/naLlZO0ER5tIf2UQk56BRV0BWqohoiSKIH9IxejHvj1Xg3PiatOWM6s4v+yPj8AWmFn+Y=</latexit><latexit sha1_base64="43HwIVKPHVy5cor05FNIWrn7Ayc=">AAACDXicbZDLSsNAFIYn9VbjrerSTbAIFaEkIujOgiAuK9gLtGmZTCbt0MkkzJyIJfQZ3Ko7fQcXgrj1GXwFn8JJ24W2/jDw8Z9zOGd+L+ZMgW1/GbmFxaXllfyquba+sblV2N6pqyiRhNZIxCPZ9LCinAlaAwacNmNJcehx2vAGF1m9cUulYpG4gWFM3RD3BAsYwaCtzmUnLcFR26cc8OGoWyjaZXssax6cKRTP3x4zPVW7he+2H5EkpAIIx0q1HDsGN8USGOF0ZLYTRWNMBrhHWxoFDqly0/HVI+tAO74VRFI/AdbY/T2R4lCpYejpzhBDX83WMvPfWk/iuM/I3cx+CM7clIk4ASrIZH2QcAsiKwvG8pmkBPhQAyaS6R9YpI8lJqDjM01Th+PMRjEP9eOyo/naLlZO0ER5tIf2UQk56BRV0BWqohoiSKIH9IxejHvj1Xg3PiatOWM6s4v+yPj8AWmFn+Y=</latexit><latexit sha1_base64="aH1IhIcmwlASxeWC/Rc4YrEnRbA=">AAACDXicbZDLSsNAFIYn9VbjrerSzWARKkJJRNBlQRCXFewF2rRMJpN26GQSZk7EEvoMbt3qO7gTtz6Dr+BTOG2z0NYfBj7+cw7nzO8ngmtwnC+rsLK6tr5R3LS3tnd290r7B00dp4qyBo1FrNo+0UxwyRrAQbB2ohiJfMFa/uh6Wm89MKV5LO9hnDAvIgPJQ04JGKt308sqcNYNmAByOumXyk7VmQkvg5tDGeWq90vf3SCmacQkUEG07rhOAl5GFHAq2MTuppolhI7IgHUMShIx7WWzqyf4xDgBDmNlngQ8c39PZCTSehz5pjMiMNSLtan5b22gSDLk9HFhP4RXXsZlkgKTdL4+TAWGGE+DwQFXjIIYGyBUcfMDTIdEEQomPtu2TTjuYhTL0DyvuobvnHLtIo+piI7QMaogF12iGrpFddRAFCn0jF7Qq/VkvVnv1se8tWDlM4foj6zPH9ozm1c=</latexit>

�(t�1)<latexit sha1_base64="vUHryu/+0sr79bTwctpjSKBh6wc=">AAAB+HicbVDLSgNBEOz1GeMjqx69DAYhHgy7EtBjwIvHCOYByRpmJ7PJkNkHM71CXPIlXjwo4tVP8ebfOEn2oIkFA0VVNd1TfiKFRsf5ttbWNza3tgs7xd29/YOSfXjU0nGqGG+yWMaq41PNpYh4EwVK3kkUp6Evedsf38z89iNXWsTRPU4S7oV0GIlAMIpG6tulnjThAX3IKnjhnk/7dtmpOnOQVeLmpAw5Gn37qzeIWRryCJmkWnddJ0EvowoFk3xa7KWaJ5SN6ZB3DY1oyLWXzQ+fkjOjDEgQK/MiJHP190RGQ60noW+SIcWRXvZm4n9eN8Xg2stElKTII7ZYFKSSYExmLZCBUJyhnBhCmRLmVsJGVFGGpquiKcFd/vIqaV1WXcPvauV6La+jACdwChVw4QrqcAsNaAKDFJ7hFd6sJ+vFerc+FtE1K585hj+wPn8AtrOSaQ==</latexit><latexit sha1_base64="vUHryu/+0sr79bTwctpjSKBh6wc=">AAAB+HicbVDLSgNBEOz1GeMjqx69DAYhHgy7EtBjwIvHCOYByRpmJ7PJkNkHM71CXPIlXjwo4tVP8ebfOEn2oIkFA0VVNd1TfiKFRsf5ttbWNza3tgs7xd29/YOSfXjU0nGqGG+yWMaq41PNpYh4EwVK3kkUp6Evedsf38z89iNXWsTRPU4S7oV0GIlAMIpG6tulnjThAX3IKnjhnk/7dtmpOnOQVeLmpAw5Gn37qzeIWRryCJmkWnddJ0EvowoFk3xa7KWaJ5SN6ZB3DY1oyLWXzQ+fkjOjDEgQK/MiJHP190RGQ60noW+SIcWRXvZm4n9eN8Xg2stElKTII7ZYFKSSYExmLZCBUJyhnBhCmRLmVsJGVFGGpquiKcFd/vIqaV1WXcPvauV6La+jACdwChVw4QrqcAsNaAKDFJ7hFd6sJ+vFerc+FtE1K585hj+wPn8AtrOSaQ==</latexit><latexit sha1_base64="vUHryu/+0sr79bTwctpjSKBh6wc=">AAAB+HicbVDLSgNBEOz1GeMjqx69DAYhHgy7EtBjwIvHCOYByRpmJ7PJkNkHM71CXPIlXjwo4tVP8ebfOEn2oIkFA0VVNd1TfiKFRsf5ttbWNza3tgs7xd29/YOSfXjU0nGqGG+yWMaq41PNpYh4EwVK3kkUp6Evedsf38z89iNXWsTRPU4S7oV0GIlAMIpG6tulnjThAX3IKnjhnk/7dtmpOnOQVeLmpAw5Gn37qzeIWRryCJmkWnddJ0EvowoFk3xa7KWaJ5SN6ZB3DY1oyLWXzQ+fkjOjDEgQK/MiJHP190RGQ60noW+SIcWRXvZm4n9eN8Xg2stElKTII7ZYFKSSYExmLZCBUJyhnBhCmRLmVsJGVFGGpquiKcFd/vIqaV1WXcPvauV6La+jACdwChVw4QrqcAsNaAKDFJ7hFd6sJ+vFerc+FtE1K585hj+wPn8AtrOSaQ==</latexit><latexit sha1_base64="vUHryu/+0sr79bTwctpjSKBh6wc=">AAAB+HicbVDLSgNBEOz1GeMjqx69DAYhHgy7EtBjwIvHCOYByRpmJ7PJkNkHM71CXPIlXjwo4tVP8ebfOEn2oIkFA0VVNd1TfiKFRsf5ttbWNza3tgs7xd29/YOSfXjU0nGqGG+yWMaq41PNpYh4EwVK3kkUp6Evedsf38z89iNXWsTRPU4S7oV0GIlAMIpG6tulnjThAX3IKnjhnk/7dtmpOnOQVeLmpAw5Gn37qzeIWRryCJmkWnddJ0EvowoFk3xa7KWaJ5SN6ZB3DY1oyLWXzQ+fkjOjDEgQK/MiJHP190RGQ60noW+SIcWRXvZm4n9eN8Xg2stElKTII7ZYFKSSYExmLZCBUJyhnBhCmRLmVsJGVFGGpquiKcFd/vIqaV1WXcPvauV6La+jACdwChVw4QrqcAsNaAKDFJ7hFd6sJ+vFerc+FtE1K585hj+wPn8AtrOSaQ==</latexit> O✓(t�1)`(t�1)

<latexit sha1_base64="QDkHdQtreXDgf8gSrtR7NBxjB2Q=">AAACLHicbZDBShxBEIZrTIw6mmTVo5dGERSJzARBj4IXL4KBrArOutT01u429vQM3TXqMuxD+CJec9V38CIScvPgU9i7ayBZ/aHh67+qKOpPC60cR9FDMPHh4+SnqemZcHbu85evtfmFI5eXVlJd5jq3Jyk60spQnRVrOiksYZZqOk7P9wb14wuyTuXmJ/cKamTYMaqtJLK3mrWNhK1C09HUyi9Ns0q4S4xn1Rp/i9f7fZGQ1n9/zdpKtBkNJd5C/Aoruwt/Dm4B4LBZe05auSwzMiw1OncaRwU3KrSspKZ+mJSOCpTn2KFTjwYzco1qeFRfrHqnJdq59c+wGLr/TlSYOdfLUt+ZIXfdeG1gvlvrWCy6Sl6N7ef2TqNSpiiZjBytb5dacC4GuYmWsiRZ9zygtMpfIGQXLUr26YZh6MOJx6N4C0ffN2PPP3xKWzDSNCzBMqxBDNuwC/twCHWQcA2/4BbugpvgPngMfo9aJ4LXmUX4T8HTC6P5qdQ=</latexit><latexit sha1_base64="WA17lfy4ZYeBqYSk9OqN0NfLY/8=">AAACLHicbZDPSxtBFMdnrW3j9leMRy+DoZBSGnaLYI+CFy9CBBMFNw1vJy/JkNnZZeZta1jyR/iPeO1J0P/Bi4h48+DZg0cniUIb+4WBz3zfezzeN86UtBQEl97Cq8XXb96Wlvx37z98/FRerrRsmhuBTZGq1BzEYFFJjU2SpPAgMwhJrHA/Hm5N6vu/0FiZ6j0aZdhOoK9lTwogZ3XKXyMyEnRfYTf9rTtFRAMk+FnU6Fv4ZTzmESr1/OuUq0E9mIq/hPAJqpuVm53Th3vb6JTvom4q8gQ1CQXWHoZBRu0CDEmhcOxHucUMxBD6eOhQQ4K2XUyPGvPPzunyXmrc08Sn7t8TBSTWjpLYdSZAAztfm5j/rfUNZAMpjub2U+9Hu5A6ywm1mK3v5YpTyie58a40KEiNHIAw0l3AxQAMCHLp+r7vwgnno3gJre/10PGuS2mdzVRiq2yN1VjINtgm22YN1mSCHbM/7IydeyfehXflXc9aF7ynmRX2j7zbRwQ8rDA=</latexit><latexit sha1_base64="WA17lfy4ZYeBqYSk9OqN0NfLY/8=">AAACLHicbZDPSxtBFMdnrW3j9leMRy+DoZBSGnaLYI+CFy9CBBMFNw1vJy/JkNnZZeZta1jyR/iPeO1J0P/Bi4h48+DZg0cniUIb+4WBz3zfezzeN86UtBQEl97Cq8XXb96Wlvx37z98/FRerrRsmhuBTZGq1BzEYFFJjU2SpPAgMwhJrHA/Hm5N6vu/0FiZ6j0aZdhOoK9lTwogZ3XKXyMyEnRfYTf9rTtFRAMk+FnU6Fv4ZTzmESr1/OuUq0E9mIq/hPAJqpuVm53Th3vb6JTvom4q8gQ1CQXWHoZBRu0CDEmhcOxHucUMxBD6eOhQQ4K2XUyPGvPPzunyXmrc08Sn7t8TBSTWjpLYdSZAAztfm5j/rfUNZAMpjub2U+9Hu5A6ywm1mK3v5YpTyie58a40KEiNHIAw0l3AxQAMCHLp+r7vwgnno3gJre/10PGuS2mdzVRiq2yN1VjINtgm22YN1mSCHbM/7IydeyfehXflXc9aF7ynmRX2j7zbRwQ8rDA=</latexit><latexit sha1_base64="geybXVntoJ94RQE78X1cic/Gxf4=">AAACLHicbZDPSsNAEMY3/jf+q3r0slgERZREBD0WvHhUsCqYWibbabu42YTdiVpCH8IX8epV38GLiFcPPoXbWkGrHyz89psZhvniTElLQfDijYyOjU9MTk37M7Nz8wulxaVTm+ZGYFWkKjXnMVhUUmOVJCk8zwxCEis8i68OevWzazRWpvqEOhnWEmhp2ZQCyFn10mZERoJuKWykN7peRNRGgstinbbCjW6XR6jU969eKgfbQV/8L4QDKLOBjuqlj6iRijxBTUKBtRdhkFGtAENSKOz6UW4xA3EFLbxwqCFBWyv6R3X5mnMavJka9zTxvvtzooDE2k4Su84EqG2Haz3z31rLQNaW4nZoPzX3a4XUWU6oxdf6Zq44pbyXG29Ig4JUxwEII90FXLTBgCCXru/7LpxwOIq/cLqzHTo+DsqV3UFMU2yFrbJ1FrI9VmGH7IhVmWB37IE9sifv3nv2Xr23r9YRbzCzzH7Je/8EZ2unlg==</latexit>

`(t�1)<latexit sha1_base64="W6ihsxi3/kHh+E55HOU8erDMllY=">AAACC3icbZDLSsNAFIZPvNZ4q7p0EyxCXVgSEXRnwY3LCvYCTSyT6aQdOpkMMxOxhD6CW6Er+w6uFLc+hK/gUzi9LLT1h4GP/5zDOfOHglGlXffLWlpeWV1bz23Ym1vbO7v5vf2aSlKJSRUnLJGNECnCKCdVTTUjDSEJikNG6mHvelyvPxCpaMLvdF+QIEYdTiOKkTaW7xPG7rOiPvVOBq18wS25EzmL4M2gcPU2HI4AoNLKf/vtBKcx4RozpFTTc4UOMiQ1xYwMbD9VRCDcQx3SNMhRTFSQTW4eOMfGaTtRIs3j2pm4vycyFCvVj0PTGSPdVfO1sflvrSOR6FL8OLdfR5dBRrlINeF4uj5KmaMTZxyL06aSYM36BhCW1PzAwV0kEdYmPNu2TTjefBSLUDsreYZv3UL5HKbKwSEcQRE8uIAy3EAFqoBBwDO8wMh6sl6td+tj2rpkzWYO4I+szx8oNZ0t</latexit><latexit sha1_base64="XEfYaGis80TgpD023ksJHwPD+NI=">AAACC3icbZDLSsNAFIYn9VbjrerSTbAIdWFJRNCdBTcuK9gLNLFMptN26GQyzJyIJfQR3Aq60Xdwpbj1IXwFn8JJ24W2/jDw8Z9zOGf+UHKmwXW/rNzC4tLySn7VXlvf2NwqbO/UdZwoQmsk5rFqhlhTzgStAQNOm1JRHIWcNsLBRVZv3FKlWSyuYShpEOGeYF1GMBjL9ynnN2kJjrzDUbtQdMvuWM48eFMonr89ZnqqtgvfficmSUQFEI61bnmuhCDFChjhdGT7iaYSkwHu0ZZBgSOqg3R888g5ME7H6cbKPAHO2P09keJI62EUms4IQ1/P1jLz31pPYdln5G5mP3TPgpQJmQAVZLK+m3AHYieLxekwRQnwoQFMFDM/cEgfK0zAhGfbtgnHm41iHurHZc/wlVusnKCJ8mgP7aMS8tApqqBLVEU1RJBED+gZvVj31qv1bn1MWnPWdGYX/ZH1+QOv6p7y</latexit><latexit sha1_base64="XEfYaGis80TgpD023ksJHwPD+NI=">AAACC3icbZDLSsNAFIYn9VbjrerSTbAIdWFJRNCdBTcuK9gLNLFMptN26GQyzJyIJfQR3Aq60Xdwpbj1IXwFn8JJ24W2/jDw8Z9zOGf+UHKmwXW/rNzC4tLySn7VXlvf2NwqbO/UdZwoQmsk5rFqhlhTzgStAQNOm1JRHIWcNsLBRVZv3FKlWSyuYShpEOGeYF1GMBjL9ynnN2kJjrzDUbtQdMvuWM48eFMonr89ZnqqtgvfficmSUQFEI61bnmuhCDFChjhdGT7iaYSkwHu0ZZBgSOqg3R888g5ME7H6cbKPAHO2P09keJI62EUms4IQ1/P1jLz31pPYdln5G5mP3TPgpQJmQAVZLK+m3AHYieLxekwRQnwoQFMFDM/cEgfK0zAhGfbtgnHm41iHurHZc/wlVusnKCJ8mgP7aMS8tApqqBLVEU1RJBED+gZvVj31qv1bn1MWnPWdGYX/ZH1+QOv6p7y</latexit><latexit sha1_base64="vBTVXnTfr8Xtim14TjhxvGtsS+8=">AAACC3icbZDLSgMxFIYz9VbHW9Wlm2AR6sIyI4IuC25cVrAXaMeSSTNtaCYTkjNiGfoIbt3qO7gTtz6Er+BTmLaz0NYfAh//OYdz8odKcAOe9+UUVlbX1jeKm+7W9s7uXmn/oGmSVFPWoIlIdDskhgkuWQM4CNZWmpE4FKwVjq6n9dYD04Yn8g7GigUxGUgecUrAWt0uE+I+q8CZfzrplcpe1ZsJL4OfQxnlqvdK391+QtOYSaCCGNPxPQVBRjRwKtjE7aaGKUJHZMA6FiWJmQmy2c0TfGKdPo4SbZ8EPHN/T2QkNmYch7YzJjA0i7Wp+W9toIkacvq4sB+iqyDjUqXAJJ2vj1KBIcHTWHCfa0ZBjC0Qqrn9AaZDogkFG57rujYcfzGKZWieV33Lt165dpHHVERH6BhVkI8uUQ3doDpqIIoUekYv6NV5ct6cd+dj3lpw8plD9EfO5w8gp5pj</latexit>

✓(t�1)<latexit sha1_base64="sAdGJJQs1BsKobsf8KtKOEtPpRA=">AAACHHicbVC7SgNBFJ2Nr7i+opZpFoMQC8NuFLQzYGMZwTwgiWF2dpIMmZ1dZu6KYdnCz9DC1lZLezuxFfwFv8LZJIUmHhjmcM693HuPG3KmwLa/jMzC4tLySnbVXFvf2NzKbe/UVRBJQmsk4IFsulhRzgStAQNOm6Gk2Hc5bbjD89Rv3FCpWCCuYBTSjo/7gvUYwaClbi7fdgPuqZGvv7gNAwo4uY6LcOgcJN1cwS7ZY1jzxJmSwtnrfYqHajf33fYCEvlUAOFYqZZjh9CJsQRGOE3MdqRoiMkQ92lLU4F9qjrx+IjE2teKZ/UCqZ8Aa6z+7oixr9I9daWPYaBmvVT81+tLHA4YuZ2ZD73TTsxEGAEVZDK+F3ELAivNyfKYpAT4SBNMJNMXWGSAJSag0zRNU4fjzEYxT+rlknNUKl/ahcoxmiCL8mgPFZGDTlAFXaAqqiGC7tATekYvxqPxZrwbH5PSjDHt2UV/YHz+ABujpiE=</latexit>

M1...Mi...MN<latexit sha1_base64="DZppVC5zQtoROwq9+wJfxnIUdQU=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN8EiuAqJFHRZcOOmUsFeoA1hMp20QyeTMDMRQqiv4saFIm59EHe+jdM0C209MMPH/5/DnPmDhFGpHOfbqGxsbm3vVHdre/sHh0fm8UlPxqnApItjFotBgCRhlJOuooqRQSIIigJG+sHsZuH3H4mQNOYPKkuIF6EJpyHFSGnJN+ttP3fntm23fVrc+d3cNxuO7RRlrYNbQgPK6vjm12gc4zQiXGGGpBy6TqK8HAlFMSPz2iiVJEF4hiZkqJGjiEgvL5afW+daGVthLPThyirU3xM5iqTMokB3RkhN5aq3EP/zhqkKr72c8iRVhOPlQ2HKLBVbiySsMRUEK5ZpQFhQvauFp0ggrHReNR2Cu/rldehd2q7m+2aj1SzjqMIpnMEFuHAFLbiFDnQBQwbP8ApvxpPxYrwbH8vWilHO1OFPGZ8/cgyTSw==</latexit><latexit sha1_base64="DZppVC5zQtoROwq9+wJfxnIUdQU=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN8EiuAqJFHRZcOOmUsFeoA1hMp20QyeTMDMRQqiv4saFIm59EHe+jdM0C209MMPH/5/DnPmDhFGpHOfbqGxsbm3vVHdre/sHh0fm8UlPxqnApItjFotBgCRhlJOuooqRQSIIigJG+sHsZuH3H4mQNOYPKkuIF6EJpyHFSGnJN+ttP3fntm23fVrc+d3cNxuO7RRlrYNbQgPK6vjm12gc4zQiXGGGpBy6TqK8HAlFMSPz2iiVJEF4hiZkqJGjiEgvL5afW+daGVthLPThyirU3xM5iqTMokB3RkhN5aq3EP/zhqkKr72c8iRVhOPlQ2HKLBVbiySsMRUEK5ZpQFhQvauFp0ggrHReNR2Cu/rldehd2q7m+2aj1SzjqMIpnMEFuHAFLbiFDnQBQwbP8ApvxpPxYrwbH8vWilHO1OFPGZ8/cgyTSw==</latexit><latexit sha1_base64="DZppVC5zQtoROwq9+wJfxnIUdQU=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN8EiuAqJFHRZcOOmUsFeoA1hMp20QyeTMDMRQqiv4saFIm59EHe+jdM0C209MMPH/5/DnPmDhFGpHOfbqGxsbm3vVHdre/sHh0fm8UlPxqnApItjFotBgCRhlJOuooqRQSIIigJG+sHsZuH3H4mQNOYPKkuIF6EJpyHFSGnJN+ttP3fntm23fVrc+d3cNxuO7RRlrYNbQgPK6vjm12gc4zQiXGGGpBy6TqK8HAlFMSPz2iiVJEF4hiZkqJGjiEgvL5afW+daGVthLPThyirU3xM5iqTMokB3RkhN5aq3EP/zhqkKr72c8iRVhOPlQ2HKLBVbiySsMRUEK5ZpQFhQvauFp0ggrHReNR2Cu/rldehd2q7m+2aj1SzjqMIpnMEFuHAFLbiFDnQBQwbP8ApvxpPxYrwbH8vWilHO1OFPGZ8/cgyTSw==</latexit><latexit sha1_base64="DZppVC5zQtoROwq9+wJfxnIUdQU=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN8EiuAqJFHRZcOOmUsFeoA1hMp20QyeTMDMRQqiv4saFIm59EHe+jdM0C209MMPH/5/DnPmDhFGpHOfbqGxsbm3vVHdre/sHh0fm8UlPxqnApItjFotBgCRhlJOuooqRQSIIigJG+sHsZuH3H4mQNOYPKkuIF6EJpyHFSGnJN+ttP3fntm23fVrc+d3cNxuO7RRlrYNbQgPK6vjm12gc4zQiXGGGpBy6TqK8HAlFMSPz2iiVJEF4hiZkqJGjiEgvL5afW+daGVthLPThyirU3xM5iqTMokB3RkhN5aq3EP/zhqkKr72c8iRVhOPlQ2HKLBVbiySsMRUEK5ZpQFhQvauFp0ggrHReNR2Cu/rldehd2q7m+2aj1SzjqMIpnMEFuHAFLbiFDnQBQwbP8ApvxpPxYrwbH8vWilHO1OFPGZ8/cgyTSw==</latexit>

M (t+�)<latexit sha1_base64="tGcr8xJUG1Qso/jg/iNtuP0HC3E=">AAAB9XicbZDLSgNBEEVr4ivGV9Slm8YgRIQwIwFdBty4ESKYByST0NPTSZr0POiuUcKQ/3DjQhG3/os7/8ZOMgtNvNBwuFVFVV8vlkKjbX9bubX1jc2t/HZhZ3dv/6B4eNTUUaIYb7BIRqrtUc2lCHkDBUrejhWngSd5yxvfzOqtR660iMIHnMTcDegwFAPBKBqrd9dLy3jR9blEej7tF0t2xZ6LrIKTQQky1fvFr64fsSTgITJJte44doxuShUKJvm00E00jykb0yHvGAxpwLWbzq+ekjPj+GQQKfNCJHP390RKA60ngWc6A4ojvVybmf/VOgkOrt1UhHGCPGSLRYNEEozILALiC8UZyokBypQwtxI2oooyNEEVTAjO8pdXoXlZcQzfV0u1ahZHHk7gFMrgwBXU4Bbq0AAGCp7hFd6sJ+vFerc+Fq05K5s5hj+yPn8AtaKR8Q==</latexit><latexit sha1_base64="tGcr8xJUG1Qso/jg/iNtuP0HC3E=">AAAB9XicbZDLSgNBEEVr4ivGV9Slm8YgRIQwIwFdBty4ESKYByST0NPTSZr0POiuUcKQ/3DjQhG3/os7/8ZOMgtNvNBwuFVFVV8vlkKjbX9bubX1jc2t/HZhZ3dv/6B4eNTUUaIYb7BIRqrtUc2lCHkDBUrejhWngSd5yxvfzOqtR660iMIHnMTcDegwFAPBKBqrd9dLy3jR9blEej7tF0t2xZ6LrIKTQQky1fvFr64fsSTgITJJte44doxuShUKJvm00E00jykb0yHvGAxpwLWbzq+ekjPj+GQQKfNCJHP390RKA60ngWc6A4ojvVybmf/VOgkOrt1UhHGCPGSLRYNEEozILALiC8UZyokBypQwtxI2oooyNEEVTAjO8pdXoXlZcQzfV0u1ahZHHk7gFMrgwBXU4Bbq0AAGCp7hFd6sJ+vFerc+Fq05K5s5hj+yPn8AtaKR8Q==</latexit><latexit sha1_base64="tGcr8xJUG1Qso/jg/iNtuP0HC3E=">AAAB9XicbZDLSgNBEEVr4ivGV9Slm8YgRIQwIwFdBty4ESKYByST0NPTSZr0POiuUcKQ/3DjQhG3/os7/8ZOMgtNvNBwuFVFVV8vlkKjbX9bubX1jc2t/HZhZ3dv/6B4eNTUUaIYb7BIRqrtUc2lCHkDBUrejhWngSd5yxvfzOqtR660iMIHnMTcDegwFAPBKBqrd9dLy3jR9blEej7tF0t2xZ6LrIKTQQky1fvFr64fsSTgITJJte44doxuShUKJvm00E00jykb0yHvGAxpwLWbzq+ekjPj+GQQKfNCJHP390RKA60ngWc6A4ojvVybmf/VOgkOrt1UhHGCPGSLRYNEEozILALiC8UZyokBypQwtxI2oooyNEEVTAjO8pdXoXlZcQzfV0u1ahZHHk7gFMrgwBXU4Bbq0AAGCp7hFd6sJ+vFerc+Fq05K5s5hj+yPn8AtaKR8Q==</latexit><latexit sha1_base64="tGcr8xJUG1Qso/jg/iNtuP0HC3E=">AAAB9XicbZDLSgNBEEVr4ivGV9Slm8YgRIQwIwFdBty4ESKYByST0NPTSZr0POiuUcKQ/3DjQhG3/os7/8ZOMgtNvNBwuFVFVV8vlkKjbX9bubX1jc2t/HZhZ3dv/6B4eNTUUaIYb7BIRqrtUc2lCHkDBUrejhWngSd5yxvfzOqtR660iMIHnMTcDegwFAPBKBqrd9dLy3jR9blEej7tF0t2xZ6LrIKTQQky1fvFr64fsSTgITJJte44doxuShUKJvm00E00jykb0yHvGAxpwLWbzq+ekjPj+GQQKfNCJHP390RKA60ngWc6A4ojvVybmf/VOgkOrt1UhHGCPGSLRYNEEozILALiC8UZyokBypQwtxI2oooyNEEVTAjO8pdXoXlZcQzfV0u1ahZHHk7gFMrgwBXU4Bbq0AAGCp7hFd6sJ+vFerc+Fq05K5s5hj+yPn8AtaKR8Q==</latexit>

L(t)<latexit sha1_base64="7xIXRlKgSuNEN3vDuU08k6NdWX0=">AAACEnicbZC7SgNBFIbPeo3rbaOlzWAQYhN2RdDOgI2FRQRzgSSG2clsMmT2wsysGpZ9C5sUgpU+g3Zi6wv4Cj6Fs0kKTfxh4OM/53DO/G7EmVS2/WUsLC4tr6zm1sz1jc2tbSu/U5NhLAitkpCHouFiSTkLaFUxxWkjEhT7Lqd1d3Ce1eu3VEgWBtdqGNG2j3sB8xjBSlsdK9/yseoTzJPL9CYpqsO0YxXskj0WmgdnCoWz19HoCQAqHeu71Q1J7NNAEY6lbDp2pNoJFooRTlOzFUsaYTLAPdrUGGCfynYyPj1FB9rpIi8U+gUKjd3fEwn2pRz6ru7MDpWztcz8t9YTOOozcj+zX3mn7YQFUaxoQCbrvZgjFaIsHdRlghLFhxowEUz/AJE+FpgonaFpmjocZzaKeagdlRzNV3ahfAwT5WAP9qEIDpxAGS6gAlUgcAeP8AwvxoPxZrwbH5PWBWM6swt/ZHz+AIWeoBM=</latexit><latexit sha1_base64="R+JHoL0mMOlAjAaH85iEPv3OjzM=">AAACEnicbZC7SgNBFIZn4y2ut42WNoNBiE3YFUE7AzYWFhHMBZI1nJ3MJkNmL8zMqmHJW9go2OozaCe2voCv4FM4m6TQxB8GPv5zDufM78WcSWXbX0ZuYXFpeSW/aq6tb2xuWYXtuowSQWiNRDwSTQ8k5SykNcUUp81YUAg8Thve4CyrN26okCwKr9Qwpm4AvZD5jIDSVscqtANQfQI8vRhdpyV1MOpYRbtsj4XnwZlC8fT1IdNjtWN9t7sRSQIaKsJBypZjx8pNQShGOB2Z7UTSGMgAerSlMYSASjcdnz7C+9rpYj8S+oUKj93fEykEUg4DT3dmh8rZWmb+W+sJiPuM3M3sV/6Jm7IwThQNyWS9n3CsIpylg7tMUKL4UAMQwfQPMOmDAKJ0hqZp6nCc2SjmoX5YdjRf2sXKEZooj3bRHiohBx2jCjpHVVRDBN2iJ/SMXox74814Nz4mrTljOrOD/sj4/AENYqHY</latexit><latexit sha1_base64="R+JHoL0mMOlAjAaH85iEPv3OjzM=">AAACEnicbZC7SgNBFIZn4y2ut42WNoNBiE3YFUE7AzYWFhHMBZI1nJ3MJkNmL8zMqmHJW9go2OozaCe2voCv4FM4m6TQxB8GPv5zDufM78WcSWXbX0ZuYXFpeSW/aq6tb2xuWYXtuowSQWiNRDwSTQ8k5SykNcUUp81YUAg8Thve4CyrN26okCwKr9Qwpm4AvZD5jIDSVscqtANQfQI8vRhdpyV1MOpYRbtsj4XnwZlC8fT1IdNjtWN9t7sRSQIaKsJBypZjx8pNQShGOB2Z7UTSGMgAerSlMYSASjcdnz7C+9rpYj8S+oUKj93fEykEUg4DT3dmh8rZWmb+W+sJiPuM3M3sV/6Jm7IwThQNyWS9n3CsIpylg7tMUKL4UAMQwfQPMOmDAKJ0hqZp6nCc2SjmoX5YdjRf2sXKEZooj3bRHiohBx2jCjpHVVRDBN2iJ/SMXox74814Nz4mrTljOrOD/sj4/AENYqHY</latexit><latexit sha1_base64="OmSWiXjHeNBLDFDcWHcEzdgnuAs=">AAACEnicbZDLSsNAFIYn9VbjLdWlm2AR6qYkIuiy4MaFiwr2Am0sk+mkHTqZhJkTtYS8hVu3+g7uxK0v4Cv4FE7aLLT1h4GP/5zDOfP7MWcKHOfLKK2srq1vlDfNre2d3T2rst9WUSIJbZGIR7LrY0U5E7QFDDjtxpLi0Oe0408u83rnnkrFInEL05h6IR4JFjCCQVsDq9IPMYwJ5ul1dpfW4CQbWFWn7sxkL4NbQBUVag6s7/4wIklIBRCOleq5TgxeiiUwwmlm9hNFY0wmeER7GgUOqfLS2emZfaydoR1EUj8B9sz9PZHiUKlp6OvO/FC1WMvNf2sjieMxI48L+yG48FIm4gSoIPP1QcJtiOw8HXvIJCXApxowkUz/wCZjLDEBnaFpmjocdzGKZWif1l3NN061cVbEVEaH6AjVkIvOUQNdoSZqIYIe0DN6Qa/Gk/FmvBsf89aSUcwcoD8yPn8AfhCdSQ==</latexit>

O<latexit sha1_base64="Y4wicZWudnH8+W1Zp2SabcuIaF4=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIp6LLgxp0V7APaUCbTSTt0MhNmboQS+hluXCji1q9x5984abPQ1gMDh3PuZc49YSK4Qc/7dkobm1vbO+Xdyt7+weFR9fikY1SqKWtTJZTuhcQwwSVrI0fBeolmJA4F64bT29zvPjFtuJKPOEtYEJOx5BGnBK3UH8QEJ5SI7H4+rNa8ureAu078gtSgQGtY/RqMFE1jJpEKYkzf9xIMMqKRU8HmlUFqWELolIxZ31JJYmaCbBF57l5YZeRGStsn0V2ovzcyEhszi0M7mUc0q14u/uf1U4xugozLJEUm6fKjKBUuKje/3x1xzSiKmSWEam6zunRCNKFoW6rYEvzVk9dJ56ruW/7QqDUbRR1lOINzuAQfrqEJd9CCNlBQ8Ayv8Oag8+K8Ox/L0ZJT7JzCHzifP4FgkVc=</latexit><latexit sha1_base64="Y4wicZWudnH8+W1Zp2SabcuIaF4=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIp6LLgxp0V7APaUCbTSTt0MhNmboQS+hluXCji1q9x5984abPQ1gMDh3PuZc49YSK4Qc/7dkobm1vbO+Xdyt7+weFR9fikY1SqKWtTJZTuhcQwwSVrI0fBeolmJA4F64bT29zvPjFtuJKPOEtYEJOx5BGnBK3UH8QEJ5SI7H4+rNa8ureAu078gtSgQGtY/RqMFE1jJpEKYkzf9xIMMqKRU8HmlUFqWELolIxZ31JJYmaCbBF57l5YZeRGStsn0V2ovzcyEhszi0M7mUc0q14u/uf1U4xugozLJEUm6fKjKBUuKje/3x1xzSiKmSWEam6zunRCNKFoW6rYEvzVk9dJ56ruW/7QqDUbRR1lOINzuAQfrqEJd9CCNlBQ8Ayv8Oag8+K8Ox/L0ZJT7JzCHzifP4FgkVc=</latexit><latexit sha1_base64="Y4wicZWudnH8+W1Zp2SabcuIaF4=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIp6LLgxp0V7APaUCbTSTt0MhNmboQS+hluXCji1q9x5984abPQ1gMDh3PuZc49YSK4Qc/7dkobm1vbO+Xdyt7+weFR9fikY1SqKWtTJZTuhcQwwSVrI0fBeolmJA4F64bT29zvPjFtuJKPOEtYEJOx5BGnBK3UH8QEJ5SI7H4+rNa8ureAu078gtSgQGtY/RqMFE1jJpEKYkzf9xIMMqKRU8HmlUFqWELolIxZ31JJYmaCbBF57l5YZeRGStsn0V2ovzcyEhszi0M7mUc0q14u/uf1U4xugozLJEUm6fKjKBUuKje/3x1xzSiKmSWEam6zunRCNKFoW6rYEvzVk9dJ56ruW/7QqDUbRR1lOINzuAQfrqEJd9CCNlBQ8Ayv8Oag8+K8Ox/L0ZJT7JzCHzifP4FgkVc=</latexit><latexit sha1_base64="Y4wicZWudnH8+W1Zp2SabcuIaF4=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIp6LLgxp0V7APaUCbTSTt0MhNmboQS+hluXCji1q9x5984abPQ1gMDh3PuZc49YSK4Qc/7dkobm1vbO+Xdyt7+weFR9fikY1SqKWtTJZTuhcQwwSVrI0fBeolmJA4F64bT29zvPjFtuJKPOEtYEJOx5BGnBK3UH8QEJ5SI7H4+rNa8ureAu078gtSgQGtY/RqMFE1jJpEKYkzf9xIMMqKRU8HmlUFqWELolIxZ31JJYmaCbBF57l5YZeRGStsn0V2ovzcyEhszi0M7mUc0q14u/uf1U4xugozLJEUm6fKjKBUuKje/3x1xzSiKmSWEam6zunRCNKFoW6rYEvzVk9dJ56ruW/7QqDUbRR1lOINzuAQfrqEJd9CCNlBQ8Ayv8Oag8+K8Ox/L0ZJT7JzCHzifP4FgkVc=</latexit> -I(t�1)

<latexit sha1_base64="L8gB/Gk9vOxvupRKf5MaugIo/pk=">AAAB/HicbVDLSsNAFL2pr1pf0S7dDBahLiyJFHRZcKO7CvYBbSyT6aQdOnkwMxFCiL/ixoUibv0Qd/6NkzYLbT0wcDjnXu6Z40acSWVZ30ZpbX1jc6u8XdnZ3ds/MA+PujKMBaEdEvJQ9F0sKWcB7SimOO1HgmLf5bTnzq5zv/dIhWRhcK+SiDo+ngTMYwQrLY3M6tDHakowT2+zh7Suzu2zbGTWrIY1B1oldkFqUKA9Mr+G45DEPg0U4VjKgW1FykmxUIxwmlWGsaQRJjM8oQNNA+xT6aTz8Bk61coYeaHQL1Borv7eSLEvZeK7ejKPKpe9XPzPG8TKu3JSFkSxogFZHPJijlSI8ibQmAlKFE80wUQwnRWRKRaYKN1XRZdgL395lXQvGrbmd81aq1nUUYZjOIE62HAJLbiBNnSAQALP8ApvxpPxYrwbH4vRklHsVOEPjM8fBP+USw==</latexit><latexit sha1_base64="L8gB/Gk9vOxvupRKf5MaugIo/pk=">AAAB/HicbVDLSsNAFL2pr1pf0S7dDBahLiyJFHRZcKO7CvYBbSyT6aQdOnkwMxFCiL/ixoUibv0Qd/6NkzYLbT0wcDjnXu6Z40acSWVZ30ZpbX1jc6u8XdnZ3ds/MA+PujKMBaEdEvJQ9F0sKWcB7SimOO1HgmLf5bTnzq5zv/dIhWRhcK+SiDo+ngTMYwQrLY3M6tDHakowT2+zh7Suzu2zbGTWrIY1B1oldkFqUKA9Mr+G45DEPg0U4VjKgW1FykmxUIxwmlWGsaQRJjM8oQNNA+xT6aTz8Bk61coYeaHQL1Borv7eSLEvZeK7ejKPKpe9XPzPG8TKu3JSFkSxogFZHPJijlSI8ibQmAlKFE80wUQwnRWRKRaYKN1XRZdgL395lXQvGrbmd81aq1nUUYZjOIE62HAJLbiBNnSAQALP8ApvxpPxYrwbH4vRklHsVOEPjM8fBP+USw==</latexit><latexit sha1_base64="L8gB/Gk9vOxvupRKf5MaugIo/pk=">AAAB/HicbVDLSsNAFL2pr1pf0S7dDBahLiyJFHRZcKO7CvYBbSyT6aQdOnkwMxFCiL/ixoUibv0Qd/6NkzYLbT0wcDjnXu6Z40acSWVZ30ZpbX1jc6u8XdnZ3ds/MA+PujKMBaEdEvJQ9F0sKWcB7SimOO1HgmLf5bTnzq5zv/dIhWRhcK+SiDo+ngTMYwQrLY3M6tDHakowT2+zh7Suzu2zbGTWrIY1B1oldkFqUKA9Mr+G45DEPg0U4VjKgW1FykmxUIxwmlWGsaQRJjM8oQNNA+xT6aTz8Bk61coYeaHQL1Borv7eSLEvZeK7ejKPKpe9XPzPG8TKu3JSFkSxogFZHPJijlSI8ibQmAlKFE80wUQwnRWRKRaYKN1XRZdgL395lXQvGrbmd81aq1nUUYZjOIE62HAJLbiBNnSAQALP8ApvxpPxYrwbH4vRklHsVOEPjM8fBP+USw==</latexit><latexit sha1_base64="L8gB/Gk9vOxvupRKf5MaugIo/pk=">AAAB/HicbVDLSsNAFL2pr1pf0S7dDBahLiyJFHRZcKO7CvYBbSyT6aQdOnkwMxFCiL/ixoUibv0Qd/6NkzYLbT0wcDjnXu6Z40acSWVZ30ZpbX1jc6u8XdnZ3ds/MA+PujKMBaEdEvJQ9F0sKWcB7SimOO1HgmLf5bTnzq5zv/dIhWRhcK+SiDo+ngTMYwQrLY3M6tDHakowT2+zh7Suzu2zbGTWrIY1B1oldkFqUKA9Mr+G45DEPg0U4VjKgW1FykmxUIxwmlWGsaQRJjM8oQNNA+xT6aTz8Bk61coYeaHQL1Borv7eSLEvZeK7ejKPKpe9XPzPG8TKu3JSFkSxogFZHPJijlSI8ibQmAlKFE80wUQwnRWRKRaYKN1XRZdgL395lXQvGrbmd81aq1nUUYZjOIE62HAJLbiBNnSAQALP8ApvxpPxYrwbH4vRklHsVOEPjM8fBP+USw==</latexit>

✓(t�1)

cf<latexit sha1_base64="iGk2FQX2DP+xkByTneTfX/Rg/WY=">AAACEXicbVDLSsNAFJ34rPVVdekmWIS6sCS1oMuCG5cV7AOaWiaTm3bo5MHMjVBCfsGNv+LGhSJu3bnzb5y0XWjrgWEO59zLvfe4seAKLevbWFldW9/YLGwVt3d29/ZLB4dtFSWSQYtFIpJdlyoQPIQWchTQjSXQwBXQccfXud95AKl4FN7hJIZ+QIch9zmjqKVBqeIgFx6kjhsJT00C/aUOjgBplt2nFTy3z7JByvxsUCpbVWsKc5nYc1ImczQHpS/Hi1gSQIhMUKV6thVjP6USOROQFZ1EQUzZmA6hp2lIA1D9dHpRZp5qxTP9SOoXojlVf3ekNFD5troyoDhSi14u/uf1EvSv+ikP4wQhZLNBfiJMjMw8HtPjEhiKiSaUSa53NdmISspQh1jUIdiLJy+Tdq1qX1Rrt/Vyoz6Po0COyQmpEJtckga5IU3SIow8kmfySt6MJ+PFeDc+ZqUrxrzniPyB8fkDx6WeLQ==</latexit>

✓(t�1)

reg<latexit sha1_base64="auZkbbStlDiT/uc/2LdiR3fNzfk=">AAACEnicbVC7SgNBFJ2Nrxhfq5Y2i0FICsNuDGgZsLGMYB6QjWF29iYZMvtg5q4Qlv0GG3/FxkIRWys7/8bJo9DEA8MczrmXe+/xYsEV2va3kVtb39jcym8Xdnb39g/Mw6OWihLJoMkiEcmORxUIHkITOQroxBJo4Aloe+Prqd9+AKl4FN7hJIZeQIchH3BGUUt9s+wiFz6krhcJX00C/aUujgBplt2nJTx3ylk/lTDM+mbRrtgzWKvEWZAiWaDRN79cP2JJACEyQZXqOnaMvZRK5ExAVnATBTFlYzqErqYhDUD10tlJmXWmFd8aRFK/EK2Z+rsjpYGarqsrA4ojtexNxf+8boKDq17KwzhBCNl80CARFkbWNB/L5xIYiokmlEmud7XYiErKUKdY0CE4yyevkla14lxUqre1Yr22iCNPTsgpKRGHXJI6uSEN0iSMPJJn8krejCfjxXg3PualOWPRc0z+wPj8Aa5qnqw=</latexit>

✓(t)

cf<latexit sha1_base64="yAmHVJ8MR8yI9uRz1m1aXRF90vc=">AAACD3icbVDLSsNAFJ34rPVVdekmWJS6KUkt6LLgxmUF+4Cmlsnkph06eTBzI5SQP3Djr7hxoYhbt+78GydtF9p6YJjDOfdy7z1uLLhCy/o2VlbX1jc2C1vF7Z3dvf3SwWFbRYlk0GKRiGTXpQoED6GFHAV0Ywk0cAV03PF17nceQCoehXc4iaEf0GHIfc4oamlQOnOQCw9Sx42EpyaB/lIHR4A0y+7TCp5ng5T52aBUtqrWFOYyseekTOZoDkpfjhexJIAQmaBK9Wwrxn5KJXImICs6iYKYsjEdQk/TkAag+un0nsw81Ypn+pHUL0Rzqv7uSGmg8l11ZUBxpBa9XPzP6yXoX/VTHsYJQshmg/xEmBiZeTimxyUwFBNNKJNc72qyEZWUoY6wqEOwF09eJu1a1b6o1m7r5UZ9HkeBHJMTUiE2uSQNckOapEUYeSTP5JW8GU/Gi/FufMxKV4x5zxH5A+PzB9ITnbs=</latexit>

✓(t)

reg<latexit sha1_base64="7PyvvU6/7dmZWCqBgep8/7HuBgI=">AAACEHicbVDLSsNAFJ34rPVVdekmWMS6KUkt6LLgxmUF+4Amlsnkth06eTBzI5SQT3Djr7hxoYhbl+78G6dtFtp6YJjDOfdy7z1eLLhCy/o2VlbX1jc2C1vF7Z3dvf3SwWFbRYlk0GKRiGTXowoED6GFHAV0Ywk08AR0vPH11O88gFQ8Cu9wEoMb0GHIB5xR1FK/dOYgFz6kjhcJX00C/aUOjgBplt2nFTzP+qmEYdYvla2qNYO5TOyclEmOZr/05fgRSwIIkQmqVM+2YnRTKpEzAVnRSRTElI3pEHqahjQA5aazgzLzVCu+OYikfiGaM/V3R0oDNV1WVwYUR2rRm4r/eb0EB1duysM4QQjZfNAgESZG5jQd0+cSGIqJJpRJrnc12YhKylBnWNQh2IsnL5N2rWpfVGu39XKjnsdRIMfkhFSITS5Jg9yQJmkRRh7JM3klb8aT8WK8Gx/z0hUj7zkif2B8/gC4dp46</latexit>

✓(t)<latexit sha1_base64="nakvMEc7d7eXvAzbtTwA54NYxrg=">AAACAnicbVDLSsNAFJ34rPUVdSVuBotQNyWpBV0W3LisYB/QxjKZTNqhkwczN0IJwY2/4saFIm79Cnf+jZM2C209MMzhnHu59x43FlyBZX0bK6tr6xubpa3y9s7u3r55cNhRUSIpa9NIRLLnEsUED1kbOAjWiyUjgStY151c5373gUnFo/AOpjFzAjIKuc8pAS0NzeOBGwlPTQP9pQMYMyDZfVqF82xoVqyaNQNeJnZBKqhAa2h+DbyIJgELgQqiVN+2YnBSIoFTwbLyIFEsJnRCRqyvaUgCppx0dkKGz7TiYT+S+oWAZ+rvjpQEKt9SVwYExmrRy8X/vH4C/pWT8jBOgIV0PshPBIYI53lgj0tGQUw1IVRyvSumYyIJBZ1aWYdgL568TDr1mn1Rq982Ks1GEUcJnaBTVEU2ukRNdINaqI0oekTP6BW9GU/Gi/FufMxLV4yi5wj9gfH5A+d+l7M=</latexit>

B1...Bi...BN<latexit sha1_base64="hsoxTdlhufDp6P+GOGImM1f2m3c=">AAAB/HicbVDLSsNAFJ3UV62vaJduBovgKiS1oMtSN66kgn1AG8JkOmmHTiZhZiKEEH/FjQtF3Poh7vwbp2kW2nrgXg7n3MvcOX7MqFS2/W1UNja3tnequ7W9/YPDI/P4pC+jRGDSwxGLxNBHkjDKSU9RxcgwFgSFPiMDf36z8AePREga8QeVxsQN0ZTTgGKktOSZ9Y6XObllWR2PFj27yz2zYVt2AbhOnJI0QImuZ36NJxFOQsIVZkjKkWPHys2QUBQzktfGiSQxwnM0JSNNOQqJdLPi+Byea2UCg0jo4goW6u+NDIVSpqGvJ0OkZnLVW4j/eaNEBdduRnmcKMLx8qEgYVBFcJEEnFBBsGKpJggLqm+FeIYEwkrnVdMhOKtfXif9puVcWs37VqPdKuOoglNwBi6AA65AG9yCLugBDFLwDF7Bm/FkvBjvxsdytGKUO3XwB8bnDz/0ky4=</latexit>

B(t+�)<latexit sha1_base64="r7SiNohxYD8aMpeFDZsmONt/qn0=">AAAB9XicbVBNS8NAEN3Ur1q/qh69LBahIpSkFvRY9OKxgv2ANi2bzaZdutmE3YlSQv+HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ58WCa7Dtbyu3tr6xuZXfLuzs7u0fFA+PWjpKFGVNGolIdTyimeCSNYGDYJ1YMRJ6grW98e3Mbz8ypXkkH2ASMzckQ8kDTgkYqX/TT8tw0fOZAHI+HRRLdsWeA68SJyMllKExKH71/IgmIZNABdG669gxuClRwKlg00Iv0SwmdEyGrGuoJCHTbjq/eorPjOLjIFKmJOC5+nsiJaHWk9AznSGBkV72ZuJ/XjeB4NpNuYwTYJIuFgWJwBDhWQTY54pREBNDCFXc3IrpiChCwQRVMCE4yy+vkla14lxWqve1Ur2WxZFHJ+gUlZGDrlAd3aEGaiKKFHpGr+jNerJerHfrY9Gas7KZY/QH1ucPpbOR6g==</latexit>

Figure 1: Pipeline of ROAM++. Given a mini-batch of training patches, which are cropped based on the predicted object boxes, deep features are extractedby Feature Extractor. The fixed-size Tracking Model θ(t−1) is warped to the current target size yielding the warped tracking model θ(t−1), as in (2, 3). Theresponse map and bounding boxes are then predicted for each sample using θ(t−1), from which the update loss `(t−1) and its gradient Oθ(t−1)`(t−1) arecomputed using ground truth labels. Next, the element-wise stack I(t−1) consisting of previous learning rates, current parameters, current update loss andits gradient are input into a coordinate-wise LSTMO to generate the adaptive learning rate λ(t−1) as in (11). The model is then updated using one gradientdescent step (denoted by �) as in (9). Finally, we apply the updated model θ(t) on a randomly selected future frame to get a meta loss for minimization asin (13).

pect ratio of object does not change during tracking, whichis however often violated. Therefore, dynamically adaptingthe convolutional filter to the object shape variations is de-sirable, which means the number of filter parameters mayvary between frames in the video and among different se-quences. However, this complicates the design of neuraloptimizer when using separate learning rates for each filterparameter. To simplify the meta-learning framework andto better allow for per-parameter learning rates, we definefixed-shape convolutional filters, which are warped to thedesired target size using bilinear interpolation before con-volving with the feature map. In subsequent frames, therecurrent optimizer updates the fixed-shape tracking model.Note that MetaTracker [34] also resizes filters to the objectsize for model initialization, However, instead of dynami-cally adapting the convolutional filters to the object size forsubsequent frames, MetaTracker keep the same shape as theinitial filter during the following frames

Specifically, tracking model θ contains two parts, i.e.correlation filter θcf and bounding box regression filterθreg . They are both warped to adapt to the shape variationof target,

θ = [θcf ,θreg], (1)

θcf =W(θcf , φ), (2)

θreg =W(θreg, φ), (3)

whereW resizes the convolutional filter to size φ = (fr, fc)using bilinear interpolation. The filter size is computedfrom the width and height (w, h) of the object in the pre-vious image patch (and for symmetry the filter size is odd),

fr = dρhce − dρh

cemod 2 + 1, (4)

fc = dρwce − dρw

cemod 2 + 1 (5)

where ρ is the scale factor to enlarge the filter size to coversome context information, and c is the stride of feature map,and d emeans ceiling. Because of the resizable filters, thereis no need to enumerate different aspect ratios and scalesof anchor boxes when performing bounding box regres-sion. We only use one sized anchor on each spatial locationwhose size is corresponding to the shape of regression filter,

(aw, ah) = (fc, fr)/ρ, (6)

This saves regression filter parameters and achieves fasterspeed. Note that we update the filter size and its correspond-ing anchor box every τ frames, i.e. just before every modelupdating, during both offline training and testing/trackingphases. Through this modification, we are able to initial-ize the tracking model with θ(0) and recurrently optimizeit in subsequent frames without worrying about the shapechanges of the tracked object.

Page 4: ROAM: Recurrently Optimizing Tracking Model · tensively evaluate our trackers, ROAM and ROAM++, on the OTB, VOT, LaSOT, GOT-10K and TrackingNet bench-mark and our methods perform

3.2. Recurrent Model OptimizationTraditional optimization methods have the problem of

slow convergence due to small learning rates and limitedtraining samples, while simply increasing the learning ratehas the risk of the training loss wandering wildly. Instead,we design a recurrent neural optimizer, which is trained toconverge the model to a good solution in a few gradientsteps1, to update the tracking model. Our key idea is basedon the assumption that the best optimizer should be able toupdate the model to generalize well on future frames. Dur-ing the offline training stage, we perform a one step gradientupdate on the tracking model using our recurrent neural op-timizer and then minimize its loss on future frames. Oncethe offline learning stage is finished, we use this learnedneural optimizer to recurrently update the tracking modelto adapt to appearance variations. The optimizer trained inthis way will be able to quickly converge the model updateto generalize well for future frame tracking.

We denote the response generation network asG(F ;θcf , φ) and the bounding box regression net-work as R(F ;θreg, φ), where F is the feature map inputand θ are the parameters. The tracking loss consists of twoparts: response loss and regression loss,

L(F,M,B;θ, φ) = ‖G(F ;θcf , φ)−M‖2+

‖R(F ;θreg, φ)−B‖s (7)

where the first term is the L2 loss and the second term isthe smooth L1 loss [37], and B represents the ground truthbox. Note we adopt parameterization of bounding box coor-dinates as in [37]. M is the corresponding label map whichis built using a 2D Gaussian function and the ground-truthobject location (x0, y0) and size (w, h),

M(x, y) = exp(−α

((x−x0)

2

σ2x

+ (y−y0)2σ2y

))(8)

where (σx, σy) = (w/c, h/c), and α controls the shapeof the response map. Note that we use past predictions aspseudo-labels when performing model updating during test-ing. We only use the ground truth during offline training.

A typical tracking process updates the tracking modelusing historical training examples and then tests this up-dated model on the following frames until the next update.We simulate this scenario in a meta-learning paradigm byrecurrently optimizing the tracking model, and then testingit on a future frame. Specifically, the tracking network isupdated by

θ(t) = θ(t−1) − λ(t−1) � Oθ(t−1)`(t−1), (9)

where λ(t−1) is a fully element-wise learning rate thathas the same dimension as the tracking model parameters

1We only use one gradient step in our experiment, while consideringmultiple steps is straightforward.

θ(t−1), and � is element-wise multiplication. The learn-ing rate is recurrently generated using an LSTM with inputconsisting of the previous learning rate λ(t−2), the currentparameters θ(t−1), the current update loss `(t−1) and its gra-dient Oθ(t−1)`(t−1),

I(t−1) = [λ(t−2),Oθ(t−1)`(t−1),θ(t−1), `(t−1)], (10)

λ(t−1) = σ(O(I(t−1);ω)), (11)

where O(·;ω) is a coordinate-wise LSTM [1] parameter-ized by ω, which shares the parameters across all dimen-sions of input, and σ is the sigmoid function to bound thepredicted learning rate. The LSTM input I(t−1) is formedby element-wise stacking the 4 sub-inputs along a newaxis2. The current update loss `(t−1) is computed from amini-batch of n updating samples,

`(t−1) =1

n

n∑j=1

L(Fj ,Mj , Bj ;θ(t−1), φ(t−1)), (12)

where the updating samples (Fj ,Mj , Bj) are collectedfrom the previous τ frames, where τ is the frame intervalbetween model updates during online tracking. Finally, wetest the newly updated model θ(t) on a randomly selectedfuture frame3 and obtain the meta loss,

L(t) = L(F (t+δ),M (t+δ), B(t+δ);θ(t), φ(t−1)), (13)

where δ is randomly selected within [0, τ − 1].During offline training stage, we perform the aforemen-

tioned procedure on a mini-batch of videos and get an aver-aged meta loss for optimizing the neural optimizer,

L =1

NT

N∑i=1

T∑t=1

L(t)Vi , (14)

where N is the batch size and T is the number of model up-dates, and Vi ∼ p(V) is a video clip sampled from the train-ing set. It should be noted that the initial tracking modelparameter θ(0) and initial learning rate λ(0) are also train-able variables, which are jointly learned with the neural op-timizer O. By minimizing the averaged meta loss L, weaim to train a neural optimizer that can update the track-ing model to generalize well on subsequent frames, as wellas to learn a beneficial initialization of the tracking model,which is broadly applicable to different tasks (i.e. videos).The overall training process is detailed in Algorithm 1.

2We therefore get an |θ|×4 matrix, where |θ| is the number of param-eters in θ. Note that the current update loss `(t−1) is broadcasted to havecompatible shape with other vectors. To better understand this process, wecan treat the input of LSTM as a mini-batch of vectors where |θ| is thebatch size and 4 is the dimension of the input vector.

3We found that using more than one future frame does not improveperformance but costs more time during the off-line training phase.

Page 5: ROAM: Recurrently Optimizing Tracking Model · tensively evaluate our trackers, ROAM and ROAM++, on the OTB, VOT, LaSOT, GOT-10K and TrackingNet bench-mark and our methods perform

Algorithm 1 Offline training of our framework

Input:p(V): distribution over training videos.

Output:θ(0),λ(0) : initial tracking model and learning rates.ω: recurrent neural optimizer.

1: Initialize all network parameters.2: while not done do3: Draw a mini-batch of videos: Vi ∼ p(V)4: for all Vi do5: Compute θ(1) ← θ(0) − λ(0) � Oθ(0)`(0) in (9).6: Compute meta loss L(1) using (13).7: for t = {1 + τ, 1 + 2τ, · · · , 1 + (T − 1)τ} do8: Compute adaptive learning rate λ(t−1) using

neural optimizer O as in (11).9: Compute updated model θ(t) using (9).

10: Compute meta loss L(t) using (13).11: end for12: end for13: Compute averaged meta loss L using (14).14: Update θ(0),λ(0),ω by computing gradient of L.15: end while

3.3. Random Filter ScalingNeural optimizers have difficulty to generalize well on

new tasks due to overfitting as discussed in [1, 28, 46].By analyzing the learned behavior of the neural optimizer,we found that our preliminary trained optimizer will pre-dict similar learning rates (see Fig. 2 left). We attributethis to overfitting to network inputs with similar magnitudescales. The following simple example illustrates the over-fitting problem. Suppose the objective function4 that theneural optimizer minimizes is g(θ) = ‖xθ − y‖2. Theoptimal element-wise learning rate is 1/2x2 since we canachieve the lowest loss of 0 in one gradient descent stepθ(t+1) = θ(t) − 1/2x2Og(θ(t)) = y/x. Note that the opti-mal learning rate depends on the network input x, and thusthe learned neural optimizer is prone to overfitting if it doesnot see enough magnitude variations of x. To address thisproblem, we multiply the tracking model θ with a randomlysampled vector ε, which has the same dimension as θ [28]during each iteration of offline training,

ε ∼ exp(U(−κ, κ)), θε = ε� θ, (15)

where U(−κ, κ) is a uniform distribution in the interval[−κ, κ], and κ is the range factor to control the scale extent.The objective function is then modified as gε(θ) = g(θ

ε ).In this way, the network input x is indirectly scaled withoutmodifying the training samples (x, y) in practice. Thus, thelearned neural optimizer is forced to predict adaptive learn-ing rates for inputs with different magnitudes (see Fig. 2

4Our actual loss function in (7) includes an L2 loss and smooth L1 loss;for simplicity here we consider a simple linear model with a L2 loss.

Without RFS

Predicted Learing Rates

With RFSTraining S

teps

Predicted Learing Rates

Figure 2: Histogram of predicted learning rates during offline training

right), rather than to produce similar learning rates for sim-ilar scaled inputs, which improves its generalization ability.

4. Online Tracking via Proposed Framework

The offline training produces a neural optimizer O, ini-tial tracking model θ(0) and learning rate λ(0), which wethen use to perform online tracking. The overall process issimilar to offline training except that we do not compute themeta loss or its gradient.

Model Initialization. Given the first frame, the ini-tial image patch is cropped and centered on the providedgroundtruth bounding box. We then augment the initial im-age patch to 9 samples by stretching the object to differentaspect ratios and scales. Specifically, the target is stretchedby [swr, sh/r], where (w, h) are the initial object width andheight, and (s, r) are scale and aspect ratio factors. Then weuse these examples, as well as the offline learned θ(0) andλ(0), to perform one-step gradient update to build an initialmodel θ(1) as in (9).

Bounding Box Estimation. We estimate the objectbounding box by first finding the presence of target throughthe response generation, and then predict accurate box bybounding box regression. We employ the penalty strategyused in [23] on the generated response to suppress estimatedboxes with large changes in scale and aspect ratio. In addi-tion, we also multiply the response map by a Gaussian-likemotion map to suppress large movement. The boundingbox computed by the anchor that corresponds to the max-imum score of response map is the final prediction. Tosmoothen the results, we linearly interpolate this estimatedobject size with the previous one. We denote our neural-optimized tracking model with bounding box regression asROAM++. We also design a baseline variant of our tracker,which uses multi-scale search to the estimate object box in-stead of bounding box regression, and denote it as ROAM.

Model Updating. We update the model every τ frame.Although offline training uses the previous τ frames to per-form a one-step gradient update of the model, in practice,we find that using more than one step could further improvethe performance during tracking (see Sec. 6.2). Therefore,we adopt two-step gradient update using the previous 2τframes in our experiments.

Page 6: ROAM: Recurrently Optimizing Tracking Model · tensively evaluate our trackers, ROAM and ROAM++, on the OTB, VOT, LaSOT, GOT-10K and TrackingNet bench-mark and our methods perform

5. Implementation Details

Patch Cropping. Given a bounding box of the object(x0, y0, w, h), the ROI of the image patch has the same cen-ter (x0, y0) and takes a larger size S = Sw = Sh = γ

√wh,

where γ is the ROI scale factor. Then, the ROI is resized toa fixed size S × S for batch processing.

Network Structure. We use the first 12 convolution lay-ers of the pretrained VGG-16 [40] as the feature extrac-tor. The top max-pooling layers are removed to increasethe spatial resolution of the feature map. Both the responsegeneration network C and bounding box regression networkconsists of two convolutional layers with a dimension re-duction layer of 512×64×1×1 (in-channels, out-channels,height, width) as the first layer, and either a correlation layerof 64×1×21×21 or a regression layer of 64×4×21×21 asthe second layer respectively. We use two stacked LSTMlayers with 20 hidden units for the neural optimizer O.

The ROI scale factor is γ = 5 and the search size isS = 281. The scale factor of filter size is ρ = 1.5. Theresponse generation uses α = 20 and the feature stride ofthe CNN feature extractor is c = 4. The scale and aspectratio factors s, r used for initial image patch augmentationare selected from {0.8, 1, 1.2}, generating a combination of9 pairs of (s, r). The range factors used in RFS are κcf =1.6, κreg = 1.3 5.

Training Details. We use ADAM [18] optimization witha mini-batch of 16 video clips of length 31 on 4 GPUs (4videos per GPU) to train our framework. We use the train-ing splits of ImageNet VID [21], TrackingNet [30], LaSOT[10], GOT10k [17], ImageNet DET [21] and COCO [25]for training. During training, we randomly extract a contin-uous sequence clip for video datasets, and repeat the samestill image to form a video clip for image datasets. Note werandomly augment all frames in a training clip by slightlystretching and scaling the images. We use a learning rate of1e-6 for the initial regression parameters θ(0) and the ini-tial learning rate λ(0). For the recurrent neural optimizerO,we use a learning rate of 1e-3. Both learning rates are mul-tiplied by 0.5 every 5 epochs. We implement our tracker inPython with the PyTorch toolbox [35], and conduct the ex-periments on a computer with an NVIDIA RTX2080 GPUand Intel(R) Core(TM) i9 CPU @ 3.6 GHz. Our trackerROAM and ROAM++ run at 13 FPS and 20 FPS respec-tively. (See supplementary for detailed speed comparison)

6. Experiments

We evaluate our trackers on six benchmarks: OTB-2015[47], VOT-2016 [19], VOT-2017 [20], LaSOT [10], GOT-10k [17] and TrackingNet [30].

5We use different range factor κ for θcf and θreg due to differentmagnitude of parameters in the two branches

0 10 20 30 40 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cis

ion

Precision plots of OPE

SiamRPN+ [0.923]ECO [0.910]ROAM [0.908]DSLT [0.907]ROAM++ [0.904]CCOT [0.898]DaSiamRPN [0.881]MetaTracker [0.856]C-RPN [0.853]SiamRPN [0.851]CREST [0.838]MemTrack [0.820]Staple [0.784]SiamFC [0.771]

0 0.2 0.4 0.6 0.8 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Su

cce

ss r

ate

Success plots of OPE

ECO [0.691]ROAM [0.681]ROAM++ [0.680]CCOT [0.671]SiamRPN+ [0.665]DaSiamRPN [0.658]DSLT [0.658]C-RPN [0.639]MetaTracker [0.637]SiamRPN [0.637]MemTrack [0.626]CREST [0.623]SiamFC [0.582]Staple [0.581]

Figure 3: Precision and success plot on OTB-2015.

VOT-2016 VOT-2017EAO(↑) A(↑) R(↓) EAO(↑) A(↑) R(↓)

ROAM++ 0.441 0.599 0.174 0.380 0.543 0.195ROAM 0.384 0.556 0.183 0.331 0.505 0.226

MetaTracker 0.317 0.519 - - - -DaSiamRPN 0.411 0.61 0.22 0.326 0.56 0.34SiamRPN+ 0.37 0.58 0.24 0.30 0.52 0.41

C-RPN 0.363 0.594 - 0.289 - -SiamRPN 0.344 0.56 0.26 0.244 0.49 0.46

ECO 0.375 0.55 0.20 0.280 0.48 0.27DSLT 0.343 0.545 0.219 - - -CCOT 0.331 0.54 0.24 0.267 0.49 0.32Staple 0.295 0.54 0.38 0.169 0.52 0.69

CREST 0.283 0.51 0.25 - - -MemTrack 0.272 0.531 0.373 0.248 0.524 0.357

SiamFC 0.235 0.532 0.461 0.188 0.502 0.585

Table 1: Results on VOT-2016/2017. The evaluation metrics are expectedaverage overlap (EAO), accuracy value (A), robustness value (R). The top3 performing trackers are colored with red, green and blue respectively.

6.1. Comparison Results with State-of-the-art

We compare our ROAM and ROAM++ with recent re-sponse generation based trackers including MetaTracker[34], DSLT [26], MemTrack [49], CREST [41], SiamFC[4], CCOT [9], ECO [8], Staple [3], as well as recent state-of-the-art trackers including SiamRPN[23], DaSiamRPN[53], SiamRPN+ [52] and C-RPN [11] on both OTB andVOT datasets. For the methods using SGD updates, thenumber of SGD steps followed their implementations.

OTB. Fig. 3 presents the experiment results on OTB-2015 dataset, which contains 100 sequences with 11 an-notated video attributes. Both our ROAM and ROAM++achieve similar AUC compared with top performer ECOand outperform all other trackers. Specifically, our ROAMand ROAM++ surpass MetaTracker [34], which is the base-line for meta learning trackers that uses traditional opti-mization method for model updating, by 6.9% and 6.7%on the success plot respectively, demonstrating the effec-tiveness of the proposed recurrent model optimization al-gorithm and resizable bounding box regression. In addi-tion, both our ROAM and ROAM++ outperform the veryrecent meta learning based tracker MLT [7] by a large mar-gin (ROAM/ROAM++: 0.681/0.680 vs MLT: 0.611) underthe AUC metric on OTB-2015.

VOT. Table 1 shows the comparison performance onVOT-2016 and VOT-2017 datasets. Our ROAM++ achievesthe best EAO on both VOT-2016 and VOT-2017. Specially,

Page 7: ROAM: Recurrently Optimizing Tracking Model · tensively evaluate our trackers, ROAM and ROAM++, on the OTB, VOT, LaSOT, GOT-10K and TrackingNet bench-mark and our methods perform

0 5 10 15 20 25 30 35 40 45 50

Location error threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

Pre

cis

ion

Precision plots of OPE

[0.445] ROAM++[0.373] MDNet[0.368] ROAM[0.360] VITAL[0.339] SiamFC[0.333] StructSiam[0.322] DSiam[0.301] ECO[0.298] STRCF[0.295] SINT[0.279] ECO_HC[0.259] CFNet

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Overlap threshold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Su

cce

ss r

ate

Success plots of OPE

[0.447] ROAM++[0.397] MDNet[0.390] ROAM[0.390] VITAL[0.336] SiamFC[0.335] StructSiam[0.333] DSiam[0.324] ECO[0.314] SINT[0.308] STRCF[0.304] ECO_HC[0.275] CFNet

Figure 4: Precision and success plot on LaSOT test dataset

MDNet[33]

CF2[29]

ECO[8]

CCOT[9]

GOTURN[15]

SiamFC[4]

SiamFCv2[44] ROAM ROAM++

AO(↑) 0.299 0.315 0.316 0.325 0.342 0.348 0.374 0.436 0.465SR0.5(↑) 0.303 0.297 0.309 0.328 0.372 0.353 0.404 0.466 0.532SR0.75(↑) 0.099 0.088 0.111 0.107 0.124 0.098 0.144 0.164 0.236

Table 2: Results on GOT-10k. The evaluation metrics include averageoverlap (AO), success rate at 0.5 overlap threshod. (SR0.5 ), success rateat 0.75 overlap threshod. (SR0.75 ). The top three performing trackers arecolored with red, green and blue respectively.

both our ROAM++ and ROAM shows superior performanceon robustness value compared with RPN based trackerswhich have no model updating, demonstrating the effective-ness of our recurrent model optimization scheme. In ad-dition, our ROAM++ and ROAM outperform the baselineMetaTracker [34] by 39.2% and 21.1% on EAO of VOT-2016, respectively.

LaSOT. LaSOT [10] is a recently proposed large-scaletracking dataset. We evaluate our ROAM against the top-10 performing trackers of the benchmark [10], includingMDNet [33], VITAL [42], SiamFC [4], StructSiam [51],DSiam [14], SINT [14], ECO [8], STRCF [24], ECO HC[8] and CFNet [44], on the testing split which consistsof 280 videos. Fig. 4 presents the comparison results ofprecision plot and success plot on LaSOT testset. OurROAM++ achieves the best result compared with state-of-the-art trackers on the benchmark, outperforming the sec-ond best MDNet with an improvement of 19.3% and 12.6%on precision plot and success plot respectively.

GOT-10k. GOT-10k [17] is large highly-diverse datasetfor object tracking that is also proposed recently. There isno overlap in object categories between the training set andtest set, which follows the one-shot learning setting [12].Therefore, using external training data is strictly forbiddenwhen testing trackers on their online server. Following theirprotocol, we train our ROAM by using only the trainingsplit of this dataset. Table 2 shows the detailed compari-son results on GOT-10k test dataset. Both our ROAM++and ROAM surpass other trackers with a large margin. Inparticular, our ROAM++ obtains an AO of 0.465, a SR0.5 of0.532 and a SR0.75 of 0.236, outperforming SiamFCv2 withan improvement of 24.3%, 31.7% and 63.9% respectively.

TrackingNet. TrackingNet [30] provides more than 30Kvideos with around 14M dense bounding box annotationsby filtering shorter video clips from Youtube-BB [5]. Ta-ble 3 presents the detailed comparison results on Track-ingNet test dataset. Our ROAM++ surpasses other state-of-

Staple[3]

CSRDCF[27]

ECOhc[8]

ECO[8]

SiamFC[4]

CFNet[44]

MDNet[33] ROAM ROAM++

AUC(↑) 0.528 0.534 0.541 0.554 0.571 0.578 0.606 0.620 0.670Prec.(↑) 0.470 0.480 0.478 0.492 0.533 0.533 0.565 0.547 0.623

Norm. Prec.(↑) 0.603 0.622 0.608 0.618 0.663 0.654 0.705 0.695 0.754

Table 3: Results on TrackingNet. The evaluation metrics include area un-der curve (AUC) of success plot, Precision, Normalized Precison. The topthree performing trackers are colored with red, green and blue respectively.

0 0.2 0.4 0.6 0.8 1

Overlap threshold

0

0.2

0.4

0.6

0.8

1

Success r

ate

Success plots of OPE

ROAM-oracle [0.742]SGD-oracle [0.699]ROAM [0.675]SGD [0.636]ROAM-w/o RFS [0.636]

0 0.2 0.4 0.6 0.8 1

Overlap threshold

0

0.2

0.4

0.6

0.8

1

Success r

ate

Success plots of OPE

ROAM [0.675]ROAM-GRU [0.668]ROAM-FC [0.647]ROAM-ConstLR [0.612]

Figure 5: Ablation Study on OTB-2015 with different variants of ROAM.

the-art tracking algorithms on all three evaluation metrics.In detail, our ROAM++ obtains an improvement of 10.6%,10.3% and 6.9% on AUC, precision and normalized pre-cision respectively compared with top performing trackerMDNet on the benchmark.

6.2. Ablation StudyFor a deeper analysis, we study our trackers from various

aspects. Note that all these ablations are trained only onImageNet VID dataset for simplicity.

Impact of Different Modules. To verify the effective-ness of different modules, we design four variants of ourframework: 1) SGD: replacing recurrent neural optimizerwith traditional SGD for model updating (using the samenumber of gradient steps as ROAM); 2) ROAM-w/o RFS:training a recurrent neural optimizer without RFS; 3) SGD-Oracle: using the ground-truth bounding boxes to buildupdating samples for SGD during the testing phase (usingthe same number of gradient steps as ROAM); 4) ROAM-Oracle: using the ground-truth bounding boxes to build up-dating samples for ROAM during the testing phase. Theresults are presented in Fig. 5 (Left). ROAM gains about6% improvement on AUC compared with the baseline SGD,demonstrating the effectiveness of our recurrent model op-timization method on model updating. Without RFS dur-ing offline training, the tracking performance drops sub-stantially due to overfitting. ROAM-Oracle performs bet-ter than SGD-Oracle, showing that our offline learned neu-ral optimizer is more effective than traditional SGD methodgiven the same updating samples. In addition, these two or-acles (SGD-oracle and ROAM-oracle) achieve higher AUCscore compared with their normal versions, indicating thatthe tracking accuracy could be boosted by improving thequality of the updating samples.

Architecture of Neural Optimizer. To investigate morearchitectures of the neural optimizer, we presents three vari-ants of our method: 1) ROAM-GRU: using two stackedGated Recurrent Unit (GRU) [6] as our neural optimizer;2) ROAM-FC: using two linear fully-connected layers fol-

Page 8: ROAM: Recurrently Optimizing Tracking Model · tensively evaluate our trackers, ROAM and ROAM++, on the OTB, VOT, LaSOT, GOT-10K and TrackingNet bench-mark and our methods perform

0 2 4 6 8 10 12 14 16Iterations

0.58

0.61

0.64

0.67

0.7

AU

C

AUC on OTB-2015 under different gradient steps

ROAMSGD(7e-6)SGD(7e-7)

Figure 6: AUC vs. gradient steps on OTB-2015.

0

0.2

0.4

update

loss

Car24

0

0.05

0.1

0.15Deer

0

0.05

0.1

0.15Fish

0

0.1

0.2

0.3Jumping

0 2000

frame number

0

0.2

0.4

0.6

meta

loss

0 50

frame number

0

0.2

0.4

0 200 400

frame number

0

0.1

0.2

0.3

0 200

frame number

0

0.2

0.4

ROAMSGD

Figure 7: Update loss (top row) and meta loss (bottom row) comparisonbetween ROAM and SGD.

lowed by tanh activation function as the neural optimizer;3) ROAM-ConstLR: using a learned constant element-wiselearning rate for model optimization instead of the adap-tively generated one. Fig. 5 (Right) presents the re-sults. using ROAM-GRU decreases the AUC slightly, whileROAM-FC has significantly lower AUC compared withROAM, showing the importance of our recurrent structure.Moreover, the performance drop of ROAM-ConstLR ver-ifies the necessity of using an adaptable learning rate formodel updating.

More Steps in Updating. During offline training, weonly perform one-step gradient decent to optimize the up-dating loss. We investigate the effect of using more thanone gradient step on tracking performance during the testphase for both ROAM and SGD (see Fig. 6). Our methodcan be further improved with more than one step, but willgradually decrease when using too many steps. This is be-cause our framework is not trained to perform so many stepsduring the offline stage. We also use two fixed learning ratesfor SGD, where the larger one is 7e-6 6 and the smaller oneis 7e-7. Using a larger learning rate, SGD could reach itsbest performance much faster than using a smaller learn-ing rate, while both have similar best AUC. Our ROAMconsistently outperforms SGD(7e-6), showing the superi-ority of adaptive element-wise learning rates. Furthermore,ROAM with 1-2 gradient steps outperforms SGD(7-e7) us-ing a large number of steps, which shows the improved gen-eralization of ROAM.

Update Loss and Meta Loss Comparison. To showthe effectiveness of our neural optimizer, we compare theupdate loss and meta loss over time between ROAM andSGD after two gradient steps for a few videos of OTB-2015in Fig. 7 (see supplementary for more examples). Underthe same number of gradient updates, our neural optimizerobtains lower loss compared with traditional SGD, demon-strating its faster converge and better generalization ability

6MetaTracker uses this learning rate.

Figure 8: Histogram of initial learning rates and updating learning rates onOTB-2015.

for model optimization.Why does ROAM work? As discussed in [34], directly

using the learned initial learning rate λ(0) for model op-timization in subsequent frames could lead to divergence.This is because the learning rates for model initializationare relatively larger than the ones needed for subsequentframes, which therefore causes unstable model optimiza-tion. In particular, the initial model θ(0) is offline trainedto be broadly applicable to different videos, which there-fore needs relatively larger gradient step to adapt to a spe-cific task, leading to a relative larger λ(0). For the subse-quent frames, the appearance variations could be sometimessmall or sometimes large, and thus the model optimizationprocess needs an adaptive learning rate to handle differentsituations. Fig. 8 presents the histograms of initial learn-ing rates and updating learning rates on OTB-2015. Mostof updating learning rates are relatively small because usu-ally there are only minor appearance variations between up-dates. As is shown in Fig. 3, our ROAM, which performsmodel updating for subsequent frames with adaptive learn-ing rates, obtains substantial performance gain comparedwith MetaTracker [34], which uses a traditional SGD witha constant learning rate for model updating.

7. Conclusion

In this paper, we propose a novel tracking model con-sisting of resziable response generator and bounding boxregressor, where the first part generates a score map to in-dicate the presence of target at different spatial locationsand the second module regresses bounding box coordinatesshift to the anchors densely mounted on sliding window po-sitions. To effectively update the tracking model for ap-pearance adaption, we propose a recurrent model optimiza-tion method within a meta learning setting in a few gradi-ent steps. Instead of producing an increment for parameterupdating, which is prone to overfit due to different gradi-ent scales, we recurrently generate adaptive learning ratesfor tracking model optimization using an LSTM. Extensiveexperiments on OTB, VOT, LaSOT, GOT-10k and Track-ingNet datasets demonstrate superior performance com-pared with state-of-the-art tracking algorithms.Acknowledgement: This work was supported by grantsfrom the Research Grants Council of the Hong Kong Spe-cial Administrative Region, China (Project No. [T32-101/15-R] and CityU 11212518).

Page 9: ROAM: Recurrently Optimizing Tracking Model · tensively evaluate our trackers, ROAM and ROAM++, on the OTB, VOT, LaSOT, GOT-10K and TrackingNet bench-mark and our methods perform

References[1] Marcin Andrychowicz, Misha Denil, Sergio Gomez,

Matthew W. Hoffman, David Pfau, Tom Schaul, and Nandode Freitas. Learning to Learn by Gradient Descent by Gradi-ent Descent. In NeurIPS, 2016. 2, 4, 5

[2] Samy Bengio, Yoshua Bengio, Jocelyn Cloutier, and JanGecsei. On the Optimization of a Synaptic Learning Rule. InPreprints Conf. Optimality in Artificial and Biological Neu-ral Networks, 1992. 2

[3] Luca Bertinetto, Jack Valmadre, Stuart Golodetz, OndrejMiksik, and Philip Torr. Staple: Complementary Learnersfor Real-Time Tracking. In CVPR, 2016. 2, 6, 7

[4] Luca Bertinetto, Jack Valmadre, Joao F. Henriques, AndreaVedaldi, and Philip H. S. Torr. Fully-Convolutional SiameseNetworks for Object Tracking. In ECCV Workshop on VOT,2016. 1, 2, 6, 7

[5] Google Brain, Jonathon Shlens, Stefano Mazzocchi, GoogleBrain, and Vincent Vanhoucke. YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set forObject Detection in Video Esteban Real bear dog airplanezebra. In CVPR, 2017. 7

[6] Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre,Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, andYoshua Bengio. Learning Phrase Representations usingRNN Encoder-decoder for Statistical Machine Translation.arXiv preprint arXiv:1406.1078, 2014. 7

[7] Janghoon Choi, Junseok Kwon, and Kyoung Mu Lee. DeepMeta Learning for Real-Time Target-Aware Visual Tracking.In ICCV, 2019. 6

[8] Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, andMichael Felsberg. ECO: Efficient Convolution Operators forTracking. In CVPR, 2017. 1, 6, 7

[9] Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan,and Michael Felsberg. Beyond correlation filters: Learn-ing continuous convolution operators for visual tracking. InECCV, 2016. 1, 2, 6, 7

[10] Heng Fan, Liting Lin, Fan Yang, and Peng Chu. LaSOT: A High-quality Benchmark for Large-scale Single ObjectTracking. In CVPR, 2019. 6, 7

[11] Heng Fan and Haibin Ling. Siamese Cascaded Region Pro-posal Networks for Real-Time Visual Tracking. In CVPR,2019. 2, 6

[12] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Net-works. In ICML, 2017. 2, 7

[13] Hamed Kiani Galoogahi, Terence Sim, and Simon Lucey.Correlation Filters with Limited Boundaries. In CVPR, 2015.2

[14] Qing Guo, Wei Feng, Ce Zhou, Rui Huang, Liang Wan, andSong Wang. Learning Dynamic Siamese Network for VisualObject Tracking. In ICCV, 2017. 7

[15] David Held, Sebastian Thrun, and Silvio Savarese. Learningto Track at 100 FPS with Deep Regression Networks. InECCV, 2016. 7

[16] Joao F. Henriques, Rui Caseiro, Pedro Martins, and JorgeBatista. High-speed tracking with kernelized correlation fil-ters. TPAMI, 2015. 1, 2

[17] Lianghua Huang, Xin Zhao, and Kaiqi Huang. GOT-10k : ALarge High-Diversity Benchmark for Generic Object Track-ing in the Wild. arXiv, 2019. 6, 7

[18] Diederik P Kingma and Jimmy Ba. Adam: A method for

stochastic optimization. arXiv preprint arXiv:1412.6980,2014. 6

[19] Matej Kristan, Ales Leonardis, Jiri Matas, and Michael Fels-berg. The Visual Object Tracking VOT2016 Challenge Re-sults. In ECCV Workshop on VOT, 2016. 1, 6

[20] Matej Kristan, Ales Leonardis, Jiri Matas, Michael Fels-berg, Roman Pflugfelder, Luka Cehovin Zajc, Tomas Vojır,Gustav Hager, Alan Lukezic, Abdelrahman Eldesokey, Gus-tavo Fernandez, and Others. The Visual Object TrackingVOT2017 Challenge Results. In ICCV Workshop on VOT,2017. 1, 6

[21] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.Imagenet classification with deep convolutional neural net-works. In NeurIPS, 2012. 6

[22] Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing,and Junjie Yan. SiamRPN++: Evolution of Siamese VisualTracking with Very Deep Networks. In CVPR, 2019. 2

[23] Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu.High Performance Visual Tracking with Siamese RegionProposal Network. In CVPR, 2018. 1, 2, 5, 6

[24] Feng Li, Cheng Tian, Wangmeng Zuo, Lei Zhang, and Ming-Hsuan Yang. Learning Spatial-Temporal Regularized Corre-lation Filters for Visual Tracking. In CVPR, pages 4904–4913, 2018. 7

[25] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays,Pietro Perona, Deva Ramanan, Piotr Dollar, and C LawrenceZitnick. Microsoft COCO: Common Objects in Context. InECCV, pages 740–755, 2014. 6

[26] Xiankai Lu, Chao Ma, Bingbing Ni, Xiaokang Yang, IanReid, and Ming-hsuan Yang. Deep Regression Tracking withShrinkage Loss. In ECCV, 2018. 1, 2, 6

[27] Alan Lukezic, Tomas Vojı, Luka Cehovin, Jiı Matas, andMatej Kristan. Discriminative Correlation Filter with Chan-nel and Spatial Reliability. In CVPR, 2017. 7

[28] Kaifeng Lv, Shunhua Jiang, and Jian Li. Learning Gradi-ent Descent: Better Generalization and Longer Horizons. InICML, 2017. 2, 5

[29] Chao Ma, Jia-bin Huang, Xiaokang Yang, and Ming-hsuanYang. Hierarchical Convolutional Features for Visual Track-ing. In ICCV, 2015. 7

[30] Matthias Muller, Adel Bibi, Silvio Giancola, Salman Al-Subaihi, and Bernard Ghanem. TrackingNet: A Large-ScaleDataset and Benchmark for Object Tracking in the Wild. InECCV, 2018. 6, 7

[31] Tsendsuren Munkhdalai and Hong Yu. Meta Networks. InICML, 2017. 2

[32] Devang K Naik and RJ Mammone. Meta-Neural Networksthat Learn by Learning. In Neural Networks, 1992. IJCNN.,International Joint Conference on, 1992. 2

[33] Hyeonseob Nam and Bohyung Han. Learning Multi-DomainConvolutional Neural Networks for Visual Tracking. InCVPR, 2016. 1, 7

[34] Eunbyung Park and Alexander C. Berg. Meta-Tracker: Fastand Robust Online Adaptation for Visual Object Trackers. InECCV, 2018. 2, 3, 6, 7, 8

[35] Adam Paszke, Sam Gross, Soumith Chintala, GregoryChanan, Edward Yang, Zachary DeVito, Zeming Lin, Al-ban Desmaison, Luca Antiga, and Adam Lerer. AutomaticDifferentiation in PyTorch. In NeurIPS Workshop, 2017. 6

[36] Sachin Ravi and Hugo Larochelle. Optimization As a Modelfor Few-Shot Learning. In ICLR, 2017. 2

[37] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun.

Page 10: ROAM: Recurrently Optimizing Tracking Model · tensively evaluate our trackers, ROAM and ROAM++, on the OTB, VOT, LaSOT, GOT-10K and TrackingNet bench-mark and our methods perform

Faster R-CNN: Towards Real-Time Object Detection withRegion Proposal Networks. In NeurIPS, 2015. 1, 2, 4

[38] Jurgen Schmidhuber. Evolutionary Principles in Self-referential Learning: On Learning how to Learn: the Meta-meta-meta...-hook. PhD thesis, 1987. 2

[39] John Schulman, Jonathan Ho, Xi Chen, and Pieter Abbeel.Meta Learning Shared Hierarchies. arXiv, 2018. 2

[40] Karen Simonyan and Andrew Zisserman. Very Deep Con-volutional Networks for Large-scale Image Recognition. InICLR, 2015. 6

[41] Yibing Song, Chao Ma, Lijun Gong, Jiawei Zhang, RynsonLau, and Ming-Hsuan Yang. CREST: Convolutional Resid-ual Learning for Visual Tracking. In ICCV, 2017. 1, 2, 6

[42] Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, LinchaoBao, Wangmeng Zuo, Chunhua Shen, Rynson Lau, andMing-Hsuan Yang. VITAL: VIsual Tracking via AdversarialLearning. In CVPR, 2018. 7

[43] Ran Tao, Efstratios Gavves, and Arnold W. M. Smeulders.Siamese Instance Search for Tracking. In CVPR, 2016. 1

[44] Jack Valmadre, Luca Bertinetto, F Henriques, AndreaVedaldi, and Philip H S Torr. End-to-end representationlearning for Correlation Filter based tracking. In CVPR,2017. 1, 7

[45] Ning Wang, Yibing Song, Wengang Zhou, Wei Liu, andHouqiang Li. Unsupervised Deep Tracking. In CVPR, 2019.2

[46] Olga Wichrowska, Niru Maheswaranathan, Matthew W.Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Nandode Freitas, and Jascha Sohl-Dickstein. Learned Optimizersthat Scale and Generalize. In ICML, 2017. 2, 5

[47] Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. Object Track-ing Benchmark. TPAMI, 2015. 6

[48] Tianyu Yang and Antoni B. Chan. Recurrent Filter Learningfor Visual Tracking. In ICCV Workshop on VOT, 2017. 2

[49] Tianyu Yang and Antoni B. Chan. Learning Dynamic Mem-ory Networks for Object Tracking. In ECCV, 2018. 2, 6

[50] Tianyu Yang and Antoni B. Chan. Visual Tracking via Dy-namic Memory Networks. TPAMI, 2019. 2

[51] Yunhua Zhang, Lijun Wang, Dong Wang, and Huchuan Lu.Structured Siamese Network for Real-Time Visual Tracking.In ECCV, 2018. 7

[52] Zhipeng Zhang, Houwen Peng, and Qiang Wang. Deeperand Wider Siamese Networks for Real-Time Visual Track-ing. In CVPR, 2019. 2, 6

[53] Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, andWeiming Hu. Distractor-aware Siamese Networks for VisualObject Tracking. In ECCV, 2018. 2, 6