Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source:...

$: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and$
Week 42:

Siamese Network: Architecture and

Applications in Visual Object Tracking

Yuanwei Wu

10-21-2016

1

Outline

• Siamese Architecture

• Siamese Applications in Computer Vision

• Paper review

Visual Object Tracking using Siamese CNN

• Future Work

2

What does “Siamese” mean?

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 3

Siamese Architecture

Source: Learning Hierarchies of Invariant Features. Yann LeCun. helper.ipam.ucla.edu/publications/gss2012/gss2012_10739.pdf 4

Siamese Architecture and loss function

Source: Learning Hierarchies of Invariant Features. Yann LeCun. helper.ipam.ucla.edu/publications/gss2012/gss2012_10739.pdf 5

Siamese Applications in Computer Vision:1. Signature Verification


Siamese Applications in Computer Vision:2. Dimensionality Reduction


Siamese Applications in Computer Vision:3.1 Learning Image Descriptors


CNN Model

Siamese Applications in Computer Vision:3.2 Learning Image Descriptors


Siamese Applications in Computer Vision:4.1 Face Verification


@article{bertinetto2016fully,

title={Fully-Convolutional Siamese Networks for Object Tracking},

author={Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F

and Vedaldi, Andrea and Torr, Philip HS},

journal={arXiv preprint arXiv:1606.09549},

year={2016} }

Paper Review:

Fully-Convolutional Siamese

Networks for Object Tracking

15

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

Architecture of Siamese CNN

16

Details of the Architecture of Siamese CNN

Source:

1: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012.

1.

17

Details of the Architecture of Siamese CNN

Source:

1: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012.

2: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,


1.

2.

18

Cross-correlation layer

Training: dataset

• ImageNet Video dataset of 2015:

contains ~4000 videos

with ~1 million annotated frames


fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.19

Training: preprocessing on the images• Preprocessing: 2820 videos, examplar image: 127 x 127,

search image: 255 x 255



Training: recap the steps

• ImageNet Video dataset of 2015: contains ~4000 videos

with ~1 million annotated frames

• Preprocessing:

2820 videos

examplar image: 127 x 127

search image: 255 x 255

• Training with a standard Stochastic Gradient Descent (SGD) solver using MathConvNet



Training: loss function



• Employing a discriminative training approach

using positive and negative pairs and adopting

the logistic loss:

22






the logistic loss:

• The loss of a score map is the mean of the

individual losses:

23






the logistic loss:

• The loss of a score map is the mean of the

individual losses:

• Applying SGD to find the conv-net Ѳ using

24

Tracking algorithm

• Use a search image centered at the previous

position of the target.



Tracking algorithm



• Only search for the object within a region of

approximately four times its previous size.



Tracking algorithm



• Only search for the object within a region of

approximately four times its previous size.

• A cosine window is added to the score map to

penalize large displacements.



Tracking algorithm

• Use a search image centered at the previous position of the target.

• Only search for the object within a region of approximately four times its previous size.

• A cosine window is added to the score map to penalize large displacements.

• The position of the maximum score relative to the center of the score map, multiplied by the stride of the network, gives the displacement of the target from frame to frame.



Experiments: training dataset size

• Accuracy: is calculated as the average

Intersection-over-Union (IoU)

• Robustness: in terms of the total number of

failures



Experiments: training dataset size

• Accuracy: is calculated as the average Intersection-over-Union (IoU)

• Robustness: in terms of the total number of failures

• Using a larger video dataset could increase the performance even further.



Experiments: OTB13 benchmark results



Experiments: VOT15 benchmark results




• Estimates the new position of the target object by merely cross-correlating the embeddings of two patches over three scales.

• Achieves real-time performance and state-of-the-art results.



Future work: How to improve the performance?

• By augmenting the online tracking pipeline:

online model updating (i.e. tracking-by-detection)

bounding-box regression (i.e. YOLO, Faster-CNN)

fine-tuning (i.e. correlation filters + CNN features)

memory (i.e. add RNN, LSTM)

35

Source: Guanghan Ning, Zhi Zhang, Chen Huang, Zhihai He, Xiaobo Ren, Haohong Wang, Spatially Supervised Recurrent Convolutional

Neural Networks for Visual Object Tracking, arXiv preprint, 2016.36


• By augmenting the online tracking pipeline: online model updating (i.e. tracking-by-detection)

bounding-box regression (i.e. YOLO, Faster-CNN)

fine-tuning (i.e. correlation filters + CNN features)

memory (i.e. add RNN, LSTM)

• By introducing new architecture in the framework of Siamese CNN, need to dig deeply in the structure of networks (i.e. regression network, triplet network).

37

Triplet Network



• By augmenting the online tracking pipeline: online model updating (i.e. tracking-by-detection) bounding-box regression (i.e. YOLO, Faster-CNN) fine-tuning (i.e. correlation filters + CNN features) memory (i.e. add RNN, LSTM)

• By introducing new architecture in the framework of Siamese CNN, need to dig deeply in the structure of networks (i.e. regression network, triplet network).

• By introducing new loss function is Siamese network.

39

40

Loss function used in face verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf

Thank you!41

Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source:...

Documents

Transcript of Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source:...