Semantic Correspondence as an Optimal Transport Problem ......Semantic Correspondence as an Optimal...

3
Semantic Correspondence as an Optimal Transport Problem Supplementary Materials Yanbin Liu 1 , Linchao Zhu 1 , Makoto Yamada 2,3 , Yi Yang 1 1 ReLER, University of Technology Sydney, 2 RIKEN AIP, 3 Kyoto University [email protected], {linchao.zhu,yi.yang}@uts.edu.au, [email protected] 1. Inference runtime analysis Components HPF SCOT Time (1) Feature extraction X X t CNN (2) Correlation map (Eq. 4) X X O(D * h s w s * h t w t ) (3) CAM (Eq. 7) X O(d L * h s w s + d L * h t w t ) (4) Algorithm 1 X O(t max * h s w s * h t w t ) Table 1: Complexity analysis for HPF [2] and the pro- posed SCOT. D denotes the feature dimension, d L de- notes the number of channels in the last convolutional layer, h s ,w s ,h t ,w t denote the height and width of source and tar- get images, and t max is the maximum number of iterations. In this section, we show the complexity comparison be- tween HPF [2] and the proposed SCOT. The four compo- nents of our method and their runtime costs are shown in Table 1. Since d L D × max(h s w s ,h t w t ) and t max D, the extra costs introduced in components (3) and (4) are marginal. We conclude that the runtime cost of our method is only slightly higher than HPF. We compared the runtime speed of the ResNet-101 back- bone 1 on the test split of SPair-71k. The average inference time is 66ms (HPF) vs. 86ms (Ours) using an NVIDIA 2080Ti GPU. 2. Keypoints matching results In this section, we show results of transferring keypoints in the source image to the corresponding points in the target image. As shown in Figure 2, the predicted keypoints are quite near to the ground truth. The proposed algorithm is robust under large intra-class variations, background clutter, view-point changes, and partial occlusion. 1 Input image pairs are resized to max(H, W ) = 300. The selected optimal layers are “2,22,24,25,27,28,29”. Figure 2: Keypoints matching results on SPair-71k. ×denotes the ground truth target keypoints. 1

Transcript of Semantic Correspondence as an Optimal Transport Problem ......Semantic Correspondence as an Optimal...

Page 1: Semantic Correspondence as an Optimal Transport Problem ......Semantic Correspondence as an Optimal Transport Problem Supplementary Materials Yanbin Liu 1, Linchao Zhu , Makoto Yamada2;3,

Semantic Correspondence as an Optimal Transport ProblemSupplementary Materials

Yanbin Liu1, Linchao Zhu1, Makoto Yamada2,3, Yi Yang1

1ReLER, University of Technology Sydney, 2RIKEN AIP, 3Kyoto [email protected], {linchao.zhu,yi.yang}@uts.edu.au, [email protected]

1. Inference runtime analysis

Components HPF SCOT Time(1) Feature extraction X X tCNN(2) Correlation map (Eq. 4) X X O(D ∗ hsws ∗ htwt)(3) CAM (Eq. 7) X O(dL ∗ hsws + dL ∗ htwt)(4) Algorithm 1 X O(tmax ∗ hsws ∗ htwt)

Table 1: Complexity analysis for HPF [2] and the pro-posed SCOT. D denotes the feature dimension, dL de-notes the number of channels in the last convolutional layer,hs, ws, ht, wt denote the height and width of source and tar-get images, and tmax is the maximum number of iterations.

In this section, we show the complexity comparison be-tween HPF [2] and the proposed SCOT. The four compo-nents of our method and their runtime costs are shown inTable 1. Since dL �D×max(hsws, htwt) and tmax � D,the extra costs introduced in components (3) and (4) aremarginal. We conclude that the runtime cost of our methodis only slightly higher than HPF.

We compared the runtime speed of the ResNet-101 back-bone 1 on the test split of SPair-71k. The average inferencetime is 66ms (HPF) vs. 86ms (Ours) using an NVIDIA2080Ti GPU.

2. Keypoints matching results

In this section, we show results of transferring keypointsin the source image to the corresponding points in the targetimage. As shown in Figure 2, the predicted keypoints arequite near to the ground truth. The proposed algorithm isrobust under large intra-class variations, background clutter,view-point changes, and partial occlusion.

1Input image pairs are resized to max(H,W ) = 300. The selectedoptimal layers are “2,22,24,25,27,28,29”.

Figure 2: Keypoints matching results on SPair-71k. “×”denotes the ground truth target keypoints.

1

Page 2: Semantic Correspondence as an Optimal Transport Problem ......Semantic Correspondence as an Optimal Transport Problem Supplementary Materials Yanbin Liu 1, Linchao Zhu , Makoto Yamada2;3,

(a) Source Image (b) Target Image (c) Ours (d) HPF [2] (e) A2Net [1] (f) WeakAlign [3] (g) NC-Net [4]

Figure 1: Qualitative results on SPari-71k. The source images are warped to align with target images using correspondences.For HPF [2], NC-net [4], and our method, we first use the source keypoints and the predicted target keypoitns to estimatethe thin-plate spline (TPS) parameters, then apply TPS transformation on the source image. For A2Net [2] and WeakAlign[3], they are global alignment methods that directly predict the global transformation parameters from the CNN models. Weshow image pairs with large intra-class, scale, and view-point changes. Our method performs better in complex conditionsdue to our global matching and background suppressing strategies.

3. More qualitative results

In this section, we show more qualitative results in Fig-ure 1. We warp the source images to align with the corre-sponding target images. For HPF [2], NC-net [4], and ourmethod, we first use the source keypoints and the predictedtarget keypoints to estimate the thin-plate spline (TPS) pa-rameters, then apply TPS transformation on the source im-age. For A2Net [2] and WeakAlign [3], they are global

alignment methods that directly predict the global trans-formation parameters from the CNN models. We showthe results of image pairs with large intra-class, scale, andview-point changes. Our method performs better in com-plex conditions due to our global matching and backgroundsuppressing strategies.

Page 3: Semantic Correspondence as an Optimal Transport Problem ......Semantic Correspondence as an Optimal Transport Problem Supplementary Materials Yanbin Liu 1, Linchao Zhu , Makoto Yamada2;3,

References[1] Paul Hongsuck Seo, Jongmin Lee, Deunsol Jung, Bo-

hyung Han, and Minsu Cho. Attentive semantic align-ment with offset-aware correlation kernels. In ECCV,2018. 2

[2] Juhong Min, Jongmin Lee, Jean Ponce, and Minsu Cho.Hyperpixel flow: Semantic correspondence with multi-layer neural features. In ICCV, 2019. 1, 2

[3] Ignacio Rocco, Relja Arandjelovic, and Josef Sivic.End-to-end weakly-supervised semantic alignment. InCVPR, 2018. 2

[4] Ignacio Rocco, Mircea Cimpoi, Relja Arandjelovic,Akihiko Torii, Tomas Pajdla, and Josef Sivic. Neigh-bourhood consensus networks. In NeurIPS, 2018. 2