Regression in Deep Learning: Siamese and Triplet...
Transcript of Regression in Deep Learning: Siamese and Triplet...
![Page 1: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/1.jpg)
Regression in Deep Learning:
Siamese and Triplet Networks
Tu Bui,
John Collomosse
Leonardo Ribeiro,
Tiago Nazare, Moacir Ponti
Centre for Vision, Speech and Signal Processing
(CVSSP)
University of Surrey, United Kingdom
Institute of Mathematics and Computer Sciences
(ICMC)
University of Sao Paulo, Brazil
![Page 2: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/2.jpg)
Content
The regression problem
Siamese network and contrastive loss
Triplet network and triplet loss
Training tricks
Regression application: sketch-based image retrieval
Limitations and future work
2
![Page 3: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/3.jpg)
Revolution of deep learning in classification
3
28.19
25.7
15.3
11.19
6.74.86
2.99 2.25
2010 2011 2012 2013 2014 2015 2016 2017
top
-5 e
rro
r (%
)
year
ImageNet ILSVRC winner
shallow
AlexNet
ZFNet
GoogleNet
ResNetEnsemble SENet
Human
6% Lo
we
r is
be
tte
r
![Page 4: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/4.jpg)
Classification vs. Regression
Classification
- Discrete set of outputs
- Output: label/class/category
Regression
- “Continuous” valued output
- Output: embedding feature
x10
xn
x4
x2
x3
4
![Page 5: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/5.jpg)
Regression example: intra-domain learning
Face identification
Schroff et al. CVPR 2015
Tracking
Wang & Gupta ICCV 2015 5
![Page 6: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/6.jpg)
Regression example: cross-domain learning
Multi-modality visual search
duck
Language
model3D
model
AlexNetSkip-gram voxnet SketchANet
photo
model
sketch
model
Embedding space
6
![Page 7: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/7.jpg)
Conventional methods for
cross-domain regression
Source
data
Local
features
Global
featurestransformed
features
SIFT,
HoG,
SURF
BoW,
GMM
Learnable
transform matrix
Target
data
Local
features
Global
features
Embedding space
*M
7Problem: assume linear transformation between two domains.
Step 1 Step 2
![Page 8: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/8.jpg)
End-to-end regression with deep learning
End-to-end learning
Source
data…
target
data…
Layer 1 Layer 2 Layer n
Embedding space
8
Multi-stream
network
Layer 1 Layer 2 Layer m
![Page 9: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/9.jpg)
End-to-end regression with multi-stream
networks
●Open questions:
○Network designs?
○Loss function to be used?
9
![Page 10: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/10.jpg)
Using output of classification model as feature?
- Not intuitive: different objective function
- Cross-domain learning: training a classification network for
each domain separately does not guarantee a common
embedding.
fc7fc6
10
softmax loss
softmax loss
fc6 fc7
![Page 11: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/11.jpg)
Content
The regression problem
Siamese network and contrastive loss
Triplet network and triplet loss
Training tricks
Regression application: sketch-based image retrieval
Limitations and future work
11
![Page 12: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/12.jpg)
Siamese network and contrastive loss
x1 x2
fW1 gW2
a=f(W1,x1) p=g(W2,x2)
L(a,p)
- Siamese (2-branch) network
- Given an input training pair (x1,x2):
o Label:
o Network output:
𝑎 = 𝑓 𝑊1, 𝑥1𝑝 = 𝑔 𝑊2, 𝑥2
o Euclidean distance between outputs:
𝐷 𝑊1,𝑊2, 𝑥1, 𝑥2 = 𝑎 − 𝑝 2 = 𝑓 𝑊1, 𝑥1 − 𝑔 𝑊2, 𝑥2 2
𝑦 = ቊ0 if 𝑥1, 𝑥2 similar pair
1 if 𝑥1, 𝑥2 dissimilar pair
12
![Page 13: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/13.jpg)
Siamese network and contrastive loss
- Contrastive loss equation:
ℒ 𝑊1,𝑊2, 𝑥1, 𝑥2 =1
21 − 𝑦 𝐷2
+1
2𝑦max 0,𝑚 − 𝐷 2
margin m: desirable distance for dissimilar pair (x1,x2)
13
- Training: 𝐚𝐫𝐠𝐦𝐢𝐧𝑾𝟏,𝑾𝟐
𝓛
x1 x2
fW1 gW2
a=f(W1,x1) p=g(W2,x2)
L(a,p)
𝐷 = 𝑎 − 𝑝 2 = 𝑓 𝑊1, 𝑥1 − 𝑔 𝑊2, 𝑥2 2
𝑦 = ቊ0 if 𝑥1, 𝑥2 similar pair
1 if 𝑥1, 𝑥2 dissimilar pair
![Page 14: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/14.jpg)
Siamese network and contrastive loss
Contrastive loss functions:
- Standard form*
- Alternative form**
ℒ(𝑎, 𝑝) =1
21 − 𝑦 𝐷2 +
1
2𝑦 {max 0,𝑚 − 𝐷) 2
ℒ 𝑎, 𝑝 =1
21 − 𝑦 𝐷2 +
1
2𝑦 {max(0,𝑚 − 𝐷2)}
*Hadsell et al. CVPR 2006
**Chopra et al. CVPR2005
ℒ𝑦=0
ℒ𝑦=0
ℒ𝑦=1
ℒ𝑦=1
14
![Page 15: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/15.jpg)
Content
The regression problem
Siamese network and contrastive loss
Triplet network and triplet loss
Training tricks
Regression application: sketch-based image retrieval
Limitations and future work
15
![Page 16: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/16.jpg)
Triplet network and triplet loss
xaxp
fW1 gW2
a=f(W1,xa) p=g(W2,xp)
xn
gW2
n=g(W2,xn)
ℒ(𝑎, 𝑝, 𝑛)
- Triplet (3-branch) network
o Given a training triplet (xa,xp,xn): xa – anchor; xp –
positive (similar to xa); xn – negative (dissimilar
to xa)
o Pos/neg branches always share weights.
o Anchor branch can share weights (intra-domain
learning) or not (cross-domain learning).
o Network outputs:
𝑎 = 𝑓 𝑊1, 𝑥𝑎
𝑝 = 𝑔 𝑊2, 𝑥𝑝𝑛 = 𝑔(𝑊2, 𝑥𝑛)
16
![Page 17: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/17.jpg)
Triplet network and triplet loss
Triplet loss equation:
o Standard form*:
𝐷 𝑢, 𝑣 = 𝑢 − 𝑣 2
o Alternative form**:
𝐷 𝑢, 𝑣 = 1 −𝑢. 𝑣
𝑢 2 𝑣 2
𝐿 𝑎, 𝑝, 𝑛 =1
2{max(0,𝑚 + 𝐷2(𝑎, 𝑝) − 𝐷2(𝑎, 𝑛)}
17*Schroff et al. CVPR 2015
**Wang et al. ICCV 2015
xaxp
fW1 gW2
a=f(W1,xa) p=g(W2,xp)
xn
gW2
n=g(W2,xn)
ℒ(𝑎, 𝑝, 𝑛)
![Page 18: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/18.jpg)
Siamese vs. Triplet
Before training
a
p
n
a
p
nm
a
p
n
m
Contrastive lossTriplet loss
𝐿 𝑎, 𝑝 =1
2(1 − 𝑦) 𝑎 − 𝑝 2
2+
+1
2𝑦 {max(0,𝑚 − 𝑎 − 𝑝 2
2}
𝐿 𝑎, 𝑝, 𝑛 =1
2{max(0,𝑚 +
+ 𝑎 − 𝑝 22 − 𝑎 − 𝑛 2
2}
18
![Page 19: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/19.jpg)
Siamese or triplet?
Depending on data, training strategies, network design
and more:
- Siamese superior
○ Radenovie et al. ECCV 2016
- Triplet superior
o Hoffer & Ailon. SBPR 2015.
o Bui et al. arxiv 2016.
19
![Page 20: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/20.jpg)
Content
The regression problem
Siamese network and contrastive loss
Triplet network and triplet loss
Training tricks
Regression application: sketch-based image retrieval
Limitations and future work
20
![Page 21: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/21.jpg)
Training trick #1: solving gradient collapsing
problem
L =1
2𝑁
𝑖=1
𝑁
{max 0,m + 𝑎𝑖 − 𝑝𝑖 22 − 𝑎𝑖 − 𝑛𝑖 2
2 }
Margin m = 1.0
a
p
n
m
expected
a
p
n
reality
- The gradient collapsing
problem
21
![Page 22: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/22.jpg)
Training tricks #1
- Solution for gradient collapsing:
○Combine regression and classification loss for better regularisation.
○Change loss function.
L =1
2𝑁
𝑖=1
𝑁
{max 0,m + 𝑘𝑎𝑖 − 𝑝𝑖 22 − 𝑘𝑎𝑖 − 𝑛𝑖 2
2 }
p
n
L(a,p,n)L(a,p,n
)
p
n
𝑎 ≈ 𝑝, 𝑛 𝑎 ≠ 𝑝, 𝑛
Saddle point
22
![Page 23: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/23.jpg)
Training tricks #2: dimensional reduction
- Dimensional reduction in CNN:
part of the training process
FC7
Conv filter (fc)
4096x1x1
128x4096x1x1
128x1x1
=128x1x1+
bias
out
FC7 4096out 128
23
- Conventional methods:
o Redundant analysis on a fixed set of features.
o E.g. Principal Component Analysis (PCA),
Product quantisation, etc
![Page 24: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/24.jpg)
Training tricks #3: hard negative mining
Random paring
Positive and negative
samples are selected
randomly.
24
Hard negative mining
Negative example is the
nearest irrelevant
neighbor to the anchor.
Hard positive mining
Positive example is the
farthest relevant
neighbor to the anchor.
++
+
++
+
--
-
--
- -
++
+
++
+
--
-
--
- -
++
+
++
+
--
-
--
- -
duck 3D
duck photo
cat photo duck 3D duck 3D
cat photo
duck photoduck photo
swan photo
![Page 25: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/25.jpg)
Training tricks #4: layer sharing
- Consider sharing the anchor with the pos/neg branches
No-share
a pa p a p
Full-share Partial-share25
![Page 26: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/26.jpg)
Other training tricks
- Data augmentation:
o Random crop, rotation, scaling, flip, whitening…
- Dropout:
o Randomly disable neurons
- Regularisation:
o Add parameter magnitude to loss
o ℒ𝑡𝑜𝑡𝑎𝑙(𝑊, 𝑋) = ℒ𝑐𝑜𝑛𝑡𝑟𝑎𝑠𝑡𝑖𝑣𝑒,𝑡𝑟𝑖𝑝𝑙𝑒𝑡(𝑊, 𝑋) + 𝑊 2
26
![Page 27: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/27.jpg)
Content
The regression problem
Siamese network and contrastive loss
Triplet network and triplet loss
Training tricks
Regression application: sketch-based image retrieval
Limitations and future work
27
![Page 28: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/28.jpg)
Regression application: sketch-based image
retrieval (SBIR)
Search for a particular image
in your mind?
28
![Page 29: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/29.jpg)
Text search?
29
![Page 30: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/30.jpg)
Sketch-based Image Retrieval (SBIR)
retrieval
sketch
30
![Page 31: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/31.jpg)
Existing applications
Google Emoji Search Detexify: latex symbol search
31
![Page 32: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/32.jpg)
Challenges
●Free-hand sketch is usually messy.
Flickr-330 dataset, Hu et al. 2013
Horse category
32
![Page 33: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/33.jpg)
Challenges
●Various levels of abstraction.
House
Crocodile
TU-Berlin dataset, Eitz et al. 2012 33
![Page 34: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/34.jpg)
Challenges
●Domain gap: sketch does not always describe real-life
object accurately.
Caricature Anthropomorphism
Simplification Viewpoint
Cat’s whisker Hedgehog’s spine Smiling spider?
Category
“person walking”
TU-Berlin 34
![Page 35: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/35.jpg)
Challenges
●Limited #sketch datasets.
○Flickr15K: 330 sketches + 15k photos @33 classes
○TU-Berlin: 20k sketches@250 classes
○Sketchy: ~75k sketches + 12.5k photos @125 classes
Flickr15K [Hu et al. 2013] TU-Berlin [Eitz et al. 2012] Sketchy [Sangkloy et al. 2016]35
o New Google Quickdraw:
50M sketches @345 classes
![Page 36: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/36.jpg)
SBIR evaluation metric
- Evaluation metric
o Mean Average Precision (mAP)
o Precision-recal (PR) curve
o Kendal rank correlation coefficient
𝑃 𝑘 =# 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑖𝑛 𝑡𝑜𝑝 𝑘 𝑟𝑒𝑠𝑢𝑙𝑡𝑠
𝑘
𝐴𝑃 =σ𝑘=1𝑁 𝑃 𝑘 × 𝑟𝑒𝑙(𝑘)
# 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑖𝑚𝑎𝑔𝑒𝑠
𝑚𝐴𝑃 =1
𝑄×
𝑞∈𝑄
𝐴𝑃𝑞
36
![Page 37: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/37.jpg)
Background
●Conventional shallow SBIR framework
#
1#2
#
3
#
N
⁞
Photo database Edge map
Query
Index file
Edge
extraction
Feature
extraction
matching
![Page 38: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/38.jpg)
Background: hand-crafted features
●Structure tensor
[Eitz,2010]Method mAP(%)
Structure Tensor [Eitz, 2010]
7.98
Flickr15K benchmark
1
𝑆𝑊
𝐼 ∈𝑊
𝜕2𝐼
𝜕𝑥2𝜕2𝐼
𝜕𝑥 𝜕𝑦
𝜕2𝐼
𝜕𝑥 𝜕𝑦
𝜕2𝐼
𝜕𝑦2
W
dictionary
38
![Page 39: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/39.jpg)
Background: hand-crafted features
●Shape context [Mori,
2005]
Method mAP(%)
Structure Tensor [Eitz, 2010] 7.98
Shape Context [Mori, 2005] 8.14
Flickr15K benchmark
39
![Page 40: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/40.jpg)
Background: hand-crafted features
●Self similarity
[Shechtman, 2007]
Method mAP(%)
Structure Tensor [Eitz, 2010] 7.98
Shape Context [Mori, 2005] 8.14
SSIM [Shechtman, 2007] 9.57
Flickr15K benchmark
corr ,
40
![Page 41: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/41.jpg)
Background: hand-crafted features
●SIFT [Lowe, 2004]
●HoG [Dalas, 2005]
Method mAP(%)
Structure Tensor [Eitz, 2010] 7.98
Shape Context [Mori, 2005] 8.14
SSIM [Shechtman, 2007] 9.57
SIFT [Lowe, 2004] 9.11
HoG [Dalas, 2005] 10.93
Flickr15K benchmark
SIFT HoG
41
![Page 42: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/42.jpg)
Background: hand-crafted features
●GF-HoG [Hu et al. CVIU 2013]
●Color GF-HoG [Bui et al. ICCV 2015]
Method mAP(%)
Structure Tensor [Eitz, 2010] 7.98
Shape Context [Mori, 2005] 8.14
SSIM [Shechtman, 2007] 9.57
SIFT [Lowe, 2004] 9.11
HoG [Dalas, 2005] 10.93
GF-HoG [Hu, 2013] 12.22
Color GF-HoG [Bui, 2015] 18.20
Flickr15K benchmark
42
![Page 43: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/43.jpg)
Background: hand-crafted features
●PerceptualEdge [Qi, 2015] Method mAP(%)
Structure Tensor [Eitz, 2010] 7.98
Shape Context [Mori, 2005] 8.14
SSIM [Shechtman, 2007] 9.57
SIFT [Lowe, 2004] 9.11
HoG [Dalas, 2005] 10.93
GF-HoG [Hu, 2013] 12.22
Color GF-HoG [Bui, 2015] 18.20
PerceptualEdge [Qi, 2015] 18.37
Flickr15K benchmark
gPbPerceptual
edge
43
![Page 44: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/44.jpg)
Back ground: deep features
- Siamese network with
contrastive loss
- Qi et al. ICIP 2016
o Sketch-edgemap
o Fully shared
Method mAP(%)
Structure Tensor [Eitz, 2010] 7.98
Shape Context [Mori, 2005] 8.14
SSIM [Shechtman, 2007] 9.57
SIFT [Lowe, 2004] 9.11
HoG [Dalas, 2005] 10.93
GF-HoG [Hu, 2013] 12.22
Color GF-HoG [Bui, 2015] 18.20
PerceptualEdge [Qi, 2015] 18.37
Siamese network [Qi, 2016] 19.54
Flickr15K benchmark
44
![Page 45: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/45.jpg)
Triplet network for SBIR
• Sketch-edgemap
• CNN architecture: Sketch-A-Net
[Yu, 2015]
• Output dimension: 100
• Share layers: Conv 4-5, FC 6-8
• Loss:
33
33
33
33
a p n
C1
C2
C3
C4
C5
fc6
fc7
fc8
L =1
2𝑁
𝑖=1
𝑁
{max 0,m + 𝑘𝑎𝑖 − 𝑝𝑖 22 − 𝑘𝑎𝑖 − 𝑛𝑖 2
2 }
k = 2.045
![Page 46: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/46.jpg)
Training procedurecrop
rotation
scaling
flip
Stroke removal
• Images:• 25k photos: 100 photos/class.
• Edge extraction: gPb [Arbelaez, 2011].
• Mean subtraction, random crop/rotation/scaling/flip.
• Sketches:• 20k sketches: 20s training, 60s validation per class.
• Skeletonisation.
• Mean subtraction, random crop/rotation/scaling/flip.
• Random stroke removal.
• Triplet formation: • Random selection pos/neg samples.
• Training:• 10k epochs. Multistep decreasing learning rate k =
10-2 – 10-6.46
![Page 47: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/47.jpg)
Results
Method mAP(%)
Structure Tensor [Eitz, 2010] 7.98
Shape Context [Mori, 2005] 8.14
SSIM [Shechtman, 2007] 9.57
SIFT [Lowe, 2004] 9.11
HoG [Dalas, 2005] 10.93
GF-HoG [Hu, 2013] 12.22
Colour GF-HoG [Bui, 2015] 18.20
PerceptualEdge [Qi, 2015] 18.37
Single CNN 18.76
Siamese network [Qi, 2016] 19.54
Triplet full-share [Bui, 2016] 20.29
Triplet no-share [Bui, 2016] 20.93
Triplet half-share [Bui, 2016] 24.45
Flickr15K benchmark
47
![Page 48: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/48.jpg)
Sketch-photo direct matching
Training failureepochs
loss
33
33
33
33
a p n48
![Page 49: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/49.jpg)
49
loss weight
x1.0
x2.0
softmax loss softmax loss
triplet loss
softmax loss
AlexNet
AlexNet
SketchANet
hybrid
special layers
dimensional reduction
normalisation
Sketch-photo direct matching
![Page 50: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/50.jpg)
Multi-stage training procedure
50
• Stage 1: train unshared layers
• Train Sketch branch from scratch.
• Finetune image branch from AlexNet
• Stage 2: train shared layers
• Form a 2-branch network with
pretrained weights.
• Freeze unshared layers.
• Train the shared layers with contrastive
loss + softmax loss.
• Stage 3: regression with triplet loss
• Form a triplet network.
• Unfreeze the all layers.
• Train the whole network with triplet loss
+ softmax loss.Softmax
loss
Softmax
loss Softmax
loss
Triplet
loss
contrastive
loss
![Page 51: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/51.jpg)
Training results
51
Phase 1
Sketch branch Image branch
Siamese network Triplet network
Phase 2 Phase 3
![Page 52: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/52.jpg)
Results
Method mAP(%)
Structure Tensor [Eitz, 2010] 7.98
Shape Context [Mori, 2005] 8.14
SSIM [Shechtman, 2007] 9.57
SIFT [Lowe, 2004] 9.11
HoG [Dalas, 2005] 10.93
GF-HoG [Hu, 2013] 12.22
Colour GF-HoG [Bui, 2015] 18.20
PerceptualEdge [Qi, 2015] 18.37
Single CNN 18.76
Siamese network [Qi, 2016] 19.54
Sketch-edgemap triplet [Bui, 2016] 24.45
Sketch-photo triplet 31.38
Flickr15K benchmark
52
![Page 53: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/53.jpg)
Layer visualisation
53
64 15x15 filters in conv1 layer
SketchANet
96 11x11 filters in conv1 layer
AlexNet
![Page 54: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/54.jpg)
SBIR example
54
![Page 55: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/55.jpg)
Demo: SketchSearch
55
Sketch-based Image Retrieval Sketch Retrieval
![Page 56: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/56.jpg)
Content
The regression problem
Siamese network and contrastive loss
Triplet network and triplet loss
Training tricks
Regression application: sketch-based image retrieval
Limitations and future work
56
![Page 57: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/57.jpg)
Limitations
o Hard to train a regression model.
o Need labelled datasets.
o Real-life sketch can be very complicated
57
Guernica
by Pablo Picasso, 1937
![Page 58: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/58.jpg)
Future work
o Multi-domain regression e.g. 3D, text, photo, sketch, depth-map, cartoon…
o Toward unsupervised deep learning:
• Labelled image set, unlabelled or no sketch set
• Completely unsupervised: Auto-encoder, Generative Adversaries Network
(GAN) 58
duck
Photo
model
3D
model
Sketch
modelLanguage
model
Embedding space
Castrejon, 2016
Siddiquie, 2014
Radenovic, 2017
![Page 59: Regression in Deep Learning: Siamese and Triplet Networksconteudo.icmc.usp.br/pessoas/moacir/p17sibgrapi-tutorial/2017... · Regression in Deep Learning: Siamese and Triplet Networks](https://reader030.fdocuments.in/reader030/viewer/2022021808/5c04d78e09d3f2ff398c8e81/html5/thumbnails/59.jpg)
Thank you for listening
59https://sites.google.com/view/tubui/research