Large-Scale Image Retrieval with Attentive Deep Local Features
Transcript of Large-Scale Image Retrieval with Attentive Deep Local Features
![Page 1: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/1.jpg)
Large-Scale Image Retrieval with Attentive Deep Local Features
2021.04.27Sebin Lee
ICCV 2017
![Page 2: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/2.jpg)
Contents
2
• Background & Motivation
• Our Approach
• Results
• Summary
• Quiz
![Page 3: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/3.jpg)
Recap: Keypoint
3picture from Sung-eui Yoon slide, originally made by Kristen Grauman and David Lowe
• Important point of image (e.g., corner)• The keypoint itself is not useful for image retrieval.• e.g., Harris corner detector, Local maxima of DoG
Background & Motivation
< Harris corner detector >
Scale
< Local maxima of DoG >
![Page 4: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/4.jpg)
Recap: Local Descriptor
4
• The local descriptor is used because the keypoint itself is not useful.• The local descriptor means a compact representation of the image
patch centered on the extracted keypoint.• e.g., SIFT
< Gradients of image patch > < SIFT descriptor >
Background & Motivation
![Page 5: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/5.jpg)
Classical Keypoint & Local Descriptor Properties
5picture from Sung-eui Yoon slide, originally made by Denis Simakov
• Repeatable• Distinctive• Invariant• Adequate Quantity
How about deep feature?
Sparse keypoint & descriptor
< Keypoints of Image>
Background & Motivation
![Page 6: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/6.jpg)
Deep Feature
6
• CNNs generate uniformly dense grid feature map.• We can regard the dense grid feature map as a grid of local descriptors.
localdescriptor
Background & Motivation
RGB Image(3, H, W)
Dense grid feature(C, H', W')
CNN layers
receptive field of local descriptor= local image patch
= Densely extracted local descriptors
![Page 7: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/7.jpg)
Motivation
7picture from Sung-eui Yoon slide, originally made by Denis Simakov
• Unlike classical methods, the CNNs generate uniformly dense grid local descriptors.• Therefore, local descriptors are extracted from regions that have no value as keypoints.
(e.g., texture-less region)• Many local descriptors containing unnecessary ones hinder search and making codebook.
Background & Motivation
Query DB
< Query feature contains unnecessary local descriptors (people) >
∴We needs keypoint selection that selects only helpful keypointsfor efficient and accurate image retrieval.
![Page 8: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/8.jpg)
Contents
8
• Background & Motivation
• Our Approach
• Results
• Summary
• Quiz
![Page 9: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/9.jpg)
Goal
9
• Train a local descriptor network using keypoint selection for efficient and accurate image retrieval.
Our Approach
Local descriptor network
< Overall Pipeline of Image Retrieval >
Our goal
![Page 10: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/10.jpg)
Goal
10
Our Approach
Local descriptor network
< Overall Pipeline of Image Retreival >
Our goal
• Train a local descriptor network using keypoint selection for efficient and accurate image retrieval.
![Page 11: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/11.jpg)
Goal
11
Our Approach
Local descriptor network
< Overall Pipeline of Image Retreival >
Our goal
• Train a local descriptor network using keypoint selection for efficient and accurate image retrieval.
![Page 12: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/12.jpg)
Additionallayers only for
training
Train Local Descriptor Network by using Classification task
12
Our Approach
Input image
Grid local descriptors
LabelC.ELoss
• Backbone: ResNet50 trained on ImageNet• Fine tune the local descriptor network on Google Landmarks dataset.
(Image Retrieval dataset)
• Use only cross-entropy loss (loss of classification task)• However, this method doesn’t perform keypoint selection.
Local Descriptor Network
![Page 13: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/13.jpg)
Training for Keypoint Selection
13
Our Approach
Input image
Grid local descriptors
Attention score (channel: 1)
Sparse local descriptors by keypoint selection
• Keypoint selection can be performed by attention.• Attention module is used to calculate attention score.• Attention score is close to 0: no keypoint• Attention score is close to 1: keypoint
LabelC.ELoss
Local Descriptor Network
∴We can train the local descriptor network performing keypoint selection by end-to-end manner.
Additionallayers only for
training
![Page 14: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/14.jpg)
Comparison with classical local descriptor
14
• Classical Local descriptor① keypoint selection② local descriptor extraction
• Ours① local descriptor extraction② keypoint selection
• The order of process is different, but the results are similar.
Our Approach
![Page 15: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/15.jpg)
Image retrieval pipeline with ours
15
• Overall image retrieval pipeline with our local descriptor network
Our Approach
Local descriptor network
Local descriptor network
Grid local descriptors
Sparse local descriptors by keypoint selection
![Page 16: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/16.jpg)
Contents
16
• Background & Motivation
• Our Approach
• Results
• Summary
• Quiz
![Page 17: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/17.jpg)
Precision & Recall Result
17
Results
• DELF: Backbone trained on ImageNet• FT: Fine Tune with Google Landmarks• ATT: Use Attention for keypoint selection• QE: Use Query Expansion
Absolute Recall(#true positives w.r.t all queries)
Prec
isio
n
PR curve @ Google Landmarks dataset
Our full model
Our Backbone
![Page 18: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/18.jpg)
mAP Result
18
Results
Mean average precision: mAP(%)
• DELF: Backbone trained on ImageNet• FT: Fine Tune with Google Landmarks• ATT: Use Attention• QE: Use Query Expansion• DIR: Use global descriptor
![Page 19: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/19.jpg)
Keypoint Selection Result(Attention Result)
19
Results
DELF DELF+FT DELF+FT+ATT
DELF DELF+FT+ATT
DELF+FT
• Keypoint selection effectively disregards clutter.• Attention activates on important pixels for image retrieval.
DELF: Backbone trained on ImageNetFT: Fine Tune with Google LandmarksATT: Use Attention
![Page 20: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/20.jpg)
Contents
20
• Background & Motivation
• Our Approach
• Results
• Summary
• Quiz
![Page 21: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/21.jpg)
Contributions
21
Summary
• Keypoint selection using the attention module• Unnecessary region descriptors are suppressed. (e.g., texture-less region) • Only sparse local descriptors that are useful for image retrieval are extracted.∴Search efficiency ↑, Accuracy ↑
< Result of Keypoint Selection>
![Page 22: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/22.jpg)
Strengths & Weaknesses
22
• Strengths• Proposed method can extract both local descriptors and keypoints via one forward pass.• Efficient and accurate image retrieval can be performed by keypoint selection using attention.
• Weaknesses• This paper trains the descriptor network using only classification task.
(Do not use other metric learning methods)• The image label is required to train the network.
Summary
![Page 23: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/23.jpg)
Contents
23
• Background
• Related work
• Our Approach
• Results
• Quiz
![Page 24: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/24.jpg)
Quiz
24
• Please submit this google form.
Link will be posted in the regular zoom meeting session.
Quiz
![Page 25: Large-Scale Image Retrieval with Attentive Deep Local Features](https://reader030.fdocuments.in/reader030/viewer/2022012018/61686b96d394e9041f6f77bf/html5/thumbnails/25.jpg)
25
THANK YOU