Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source:...

41
Week 42: Siamese Network: Architecture and Applications in Visual Object Tracking Yuanwei Wu 10-21-2016 1

Transcript of Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source:...

Page 1: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Week 42:

Siamese Network: Architecture and

Applications in Visual Object Tracking

Yuanwei Wu

10-21-2016

1

Page 2: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Outline

• Siamese Architecture

• Siamese Applications in Computer Vision

• Paper review

Visual Object Tracking using Siamese CNN

• Future Work

2

Page 3: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

What does “Siamese” mean?

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 3

Page 4: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Siamese Architecture

Source: Learning Hierarchies of Invariant Features. Yann LeCun. helper.ipam.ucla.edu/publications/gss2012/gss2012_10739.pdf 4

Page 5: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Siamese Architecture and loss function

Source: Learning Hierarchies of Invariant Features. Yann LeCun. helper.ipam.ucla.edu/publications/gss2012/gss2012_10739.pdf 5

Page 6: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Siamese Applications in Computer Vision:1. Signature Verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 6

Page 7: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Siamese Applications in Computer Vision:2. Dimensionality Reduction

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 7

Page 8: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Siamese Applications in Computer Vision:3.1 Learning Image Descriptors

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 8

CNN Model

Page 9: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Siamese Applications in Computer Vision:3.2 Learning Image Descriptors

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 9

Page 10: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Siamese Applications in Computer Vision:4.1 Face Verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 10

Page 11: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Siamese Applications in Computer Vision:4.2 Face Verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 11

Page 12: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Siamese Applications in Computer Vision:4.3 Face Verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 12

Page 13: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Siamese Applications in Computer Vision:4.4 Face Verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 13

Page 14: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Siamese Applications in Computer Vision:4.5 Face Verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 14

Page 15: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

@article{bertinetto2016fully,

title={Fully-Convolutional Siamese Networks for Object Tracking},

author={Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F

and Vedaldi, Andrea and Torr, Philip HS},

journal={arXiv preprint arXiv:1606.09549},

year={2016} }

Paper Review:

Fully-Convolutional Siamese

Networks for Object Tracking

15

Page 16: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

Architecture of Siamese CNN

16

Page 17: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Details of the Architecture of Siamese CNN

Source:

1: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012.

1.

17

Page 18: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Details of the Architecture of Siamese CNN

Source:

1: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012.

2: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

1.

2.

18

Cross-correlation layer

Page 19: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Training: dataset

• ImageNet Video dataset of 2015:

contains ~4000 videos

with ~1 million annotated frames

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.19

Page 20: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Training: preprocessing on the images• Preprocessing: 2820 videos, examplar image: 127 x 127,

search image: 255 x 255

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.20

Page 21: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Training: recap the steps

• ImageNet Video dataset of 2015: contains ~4000 videos

with ~1 million annotated frames

• Preprocessing:

2820 videos

examplar image: 127 x 127

search image: 255 x 255

• Training with a standard Stochastic Gradient Descent (SGD) solver using MathConvNet

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.21

Page 22: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Training: loss function

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

• Employing a discriminative training approach

using positive and negative pairs and adopting

the logistic loss:

22

Page 23: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Training: loss function

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

• Employing a discriminative training approach

using positive and negative pairs and adopting

the logistic loss:

• The loss of a score map is the mean of the

individual losses:

23

Page 24: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Training: loss function

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

• Employing a discriminative training approach

using positive and negative pairs and adopting

the logistic loss:

• The loss of a score map is the mean of the

individual losses:

• Applying SGD to find the conv-net Ѳ using

24

Page 25: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Tracking algorithm

• Use a search image centered at the previous

position of the target.

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.25

Page 26: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Tracking algorithm

• Use a search image centered at the previous

position of the target.

• Only search for the object within a region of

approximately four times its previous size.

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.26

Page 27: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Tracking algorithm

• Use a search image centered at the previous

position of the target.

• Only search for the object within a region of

approximately four times its previous size.

• A cosine window is added to the score map to

penalize large displacements.

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.27

Page 28: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Tracking algorithm

• Use a search image centered at the previous position of the target.

• Only search for the object within a region of approximately four times its previous size.

• A cosine window is added to the score map to penalize large displacements.

• The position of the maximum score relative to the center of the score map, multiplied by the stride of the network, gives the displacement of the target from frame to frame.

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.28

Page 29: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Experiments: training dataset size

• Accuracy: is calculated as the average

Intersection-over-Union (IoU)

• Robustness: in terms of the total number of

failures

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.29

Page 30: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Experiments: training dataset size

• Accuracy: is calculated as the average Intersection-over-Union (IoU)

• Robustness: in terms of the total number of failures

• Using a larger video dataset could increase the performance even further.

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.30

Page 31: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Experiments: OTB13 benchmark results

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.31

Page 32: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Experiments: VOT15 benchmark results

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.32

Page 33: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Experiments: VOT15 benchmark results

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.33

Page 34: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Experiments: VOT15 benchmark results

• Estimates the new position of the target object by merely cross-correlating the embeddings of two patches over three scales.

• Achieves real-time performance and state-of-the-art results.

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS,

fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.34

Page 35: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Future work: How to improve the performance?

• By augmenting the online tracking pipeline:

online model updating (i.e. tracking-by-detection)

bounding-box regression (i.e. YOLO, Faster-CNN)

fine-tuning (i.e. correlation filters + CNN features)

memory (i.e. add RNN, LSTM)

35

Page 36: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Source: Guanghan Ning, Zhi Zhang, Chen Huang, Zhihai He, Xiaobo Ren, Haohong Wang, Spatially Supervised Recurrent Convolutional

Neural Networks for Visual Object Tracking, arXiv preprint, 2016.36

Page 37: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Future work: How to improve the performance?

• By augmenting the online tracking pipeline: online model updating (i.e. tracking-by-detection)

bounding-box regression (i.e. YOLO, Faster-CNN)

fine-tuning (i.e. correlation filters + CNN features)

memory (i.e. add RNN, LSTM)

• By introducing new architecture in the framework of Siamese CNN, need to dig deeply in the structure of networks (i.e. regression network, triplet network).

37

Page 38: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Triplet Network

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 38

Page 39: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Future work: How to improve the performance?

• By augmenting the online tracking pipeline: online model updating (i.e. tracking-by-detection) bounding-box regression (i.e. YOLO, Faster-CNN) fine-tuning (i.e. correlation filters + CNN features) memory (i.e. add RNN, LSTM)

• By introducing new architecture in the framework of Siamese CNN, need to dig deeply in the structure of networks (i.e. regression network, triplet network).

• By introducing new loss function is Siamese network.

39

Page 40: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

40

Loss function used in face verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf

Page 41: Week 42: Fully-Convolutional Siamese Networks for Object … · Training: loss function Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and

Thank you!41