Lec07 aggregation-and-retrieval-system
-
Upload
united-states-air-force-academy -
Category
Education
-
view
298 -
download
0
Transcript of Lec07 aggregation-and-retrieval-system
Image Analysis & Retrieval
CS/EE 5590 Special Topics (Class Ids: 44873, 44874)
Fall 2016, M/W 4-5:15pm@Bloch 0012
Lec 07
Feature Aggregation and Image Retrieval System
Zhu Li
Dept of CSEE, UMKC
Office: FH560E, Email: [email protected], Ph: x 2346.
http://l.web.umkc.edu/lizhu
p.1Image Analysis & Retrieval, 2016
Outline
ReCap of Lecture 06 SIFT
Box Filter
Image Retrieval System
Why Aggregation ?
Aggregation Schemes
Summary
Image Analysis & Retrieval, 2016 p.2
Scale Space Theory - Lindeberg
Scale Space Response via Laplacian of Gaussian The scale is controlled by 𝜎
Characteristic Scale:
Image Analysis & Retrieval, 2016 p.3
2
2
2
22
y
g
x
gg
𝑔 = 𝑒− 𝑥+𝑦 2
2𝜎
r
image𝜎 = 0.8𝑟 𝜎 = 1.2𝑟 𝜎 = 2𝑟
…
characteristic scale
SIFT
Use DoG to approximate LoG Separable Gaussian filter
Difference of image instead of difference of Gaussian kernel
Image Analysis & Retrieval, 2016 p.4
LoG
Scale space construction By Gaussian Filtering, and Image Difference
Peak Strength & Edge Removal
Peak Strength: Interpolate true DoG response and pixel location by Taylor
expansion
Edge Removal:
Re-do Harris type detection to remove edge on much reduced pixel set
Image Analysis & Retrieval, 2016 p.5
Scale Invariance thru Dominant Orientation Coding
Voting for the dominant orientation Weighted by a Gaussian window to give more emphasis to the
gradients closer to the center
Image Analysis & Retrieval, 2016 p.6
SIFT Matching and Repeatability Prediction
SIFT Distance
Not all SIFT are created equal…
Peak strength (DoG response at interpolated position)
Image Analysis & Retrieval, 2016 p.7
Combined scale/peak strength pmf
𝑑(𝑠11, 𝑠𝑘∗
2 )
𝑑(𝑠11, 𝑠𝑘
2)≤ 𝜃
Box Fitler – CABOX work
Basic Idea: Approximate DoG with linear combination of box filters
min.𝒉
𝒈− 𝐵 ∙ 𝒉 𝐿22 + 𝒉 𝐿1
Solution by LASSO
Image Analysis & Retrieval, 2016 p.8
= h1* h2*+ + …
Outline
ReCap of Lecture 06 SIFT
Box Filter
Image Retrieval System
Why Aggregation ?
Aggregation Schemes
Summary
Image Analysis & Retrieval, 2016 p.9
Image Matching/Retrieval System
SIFT is a sub-image level feature, we actually care more on how SIFT match will translate into image level matching/retrieval accuracy
Say if we can compute a single distance from a collection of features:
Then for a data base of n images, we can compute an n x n distance matrix This gives us full information of the performance of this
feature/distance system
How to characterize the performance of such image matching and retrieval system ?
Image Analysis & Retrieval, 2016 p.10
𝑑 𝐼1, 𝐼2 =
𝑘
𝛼𝑘𝑑(𝐹𝑘1, 𝐹𝑘
2)
𝐷𝑖 ,𝑘= 𝑑(𝐼𝑗 , 𝐼𝑘)
Thresholding for Matching
Basically, for any pair of Images (documents, in IR jargon), we declare
Then for each possible image pair, or pairs we care, for a given threshold t, there will be 4 possible consequences TP pair: {Ij, Ik} declared matching pairs, d(Ij, Ik) < t;
FP pair: {Ij, Ik} declared matching pairs, d(Ij, Ik) >= t;
TN pair: {Ij, Ik} declared non-matching pairs, d(Ij, Ik) >= t;
FN pair: {Ij, Ik} declared non- matching pairs, d(Ij, Ik) < t;
Image Analysis & Retrieval, 2016 p.11
𝐼𝑗 , 𝐼𝑘 𝑎𝑟𝑒 𝑚𝑎𝑡𝑐ℎ, 𝑖𝑓 𝑑 𝐼𝑗 , 𝐼𝑘 < 𝑡
𝐼𝑗 , 𝐼𝑘 𝑎𝑟𝑒𝑛𝑜𝑡 𝑚𝑎𝑡𝑐ℎ, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Matching System Performance
True Positive Rate/Precision: Out of retrieved matching pairs, how many are true matching
pairs
For all matching pairs with distance < t
False Positive Rate:
Out of retrieved matching pairs, how many are actually negative, false matchings
Image Analysis & Retrieval, 2016 p.12
𝑇𝑃𝑅 =𝑡𝑝
𝑡𝑝 + 𝑓𝑛
𝐹𝑃𝑅 =𝑓𝑝
𝑓𝑝 + 𝑡𝑛
TPR-FPR
Definition:
TP rate = TP/(TP+FN)
FP rate = FP/(FP+TN)
From the actual value
point of view
Image Analysis & Retrieval, 2016 p.13
ROC curve(1)
ROC = receiver operating characteristic
Y:TP rate
X:FP rate
Image Analysis & Retrieval, 2016 p.14
ROC curve(2)
Which method (A or B) is better?compute ROC area: area under ROC
curve
Image Analysis & Retrieval, 2016 p.15
Precision, Recall, F-measure
Precision = TP/(TP + FP),
Recall = TP/(TP + FN)
F-measure = 2*(precision*recall)/(precision + recall)
Precision:is the probability that a
retrieved document is relevant.
Recall:is the probability that a
relevant documentis retrieved in a search.
Image Analysis & Retrieval, 2016 p.16
Matlab Implementation
We will compute all image pair distances D(j,k)
How do we compute the TPR-FPR plot ? Understand that TPR and
FPR are actually function of threshold t,
Just need to parameterize TPR(t) and FPR(t), and obtaining operating points of meaningful thresholds, to generate the plot.
Matlab Implementation: [tp, fp, tn,
fn]=getPrecisionRecall()
Image Analysis & Retrieval, 2016 p.17
d_min = min(min(d0), min(d1));
d_max = max(max(d0), max(d1));
delta = (d_max - d_min) / npt;
for k=1:npt
thres = d_min + (k-1)*delta;
tp(k) = length(find(d0<=thres));
fp(k) = length(find(d1<=thres));
tn(k) = length(find(d1>thres));
fn(k) = length(find(d0>thres));
end
if dbg
figure(22); grid on; hold on;
plot(fp./(tn+fp), tp./(tp+fn), '.-r',
'DisplayName', 'tpr-fpr');legend();
end
TPR-FPR
Image Matching performance are characterized by functions TPR(FPR)
Retrieval set: we want high Precision, Short List: High Recall.
Image Analysis & Retrieval, 2016 p.18
Outline
ReCap of Lecture 06 SIFT
Box Filter
Image Retrieval System
Why Aggregation ?
Aggregation Schemes
Summary
Image Analysis & Retrieval, 2016 p.19
Why Aggregation ?
What (Local) Interesting Points features bring us ? Scale and rotation invariance in the form of nk x d:
Un-cerntainty of the number of detected features nk, at query time
Permutation along rows of features are the same representation.
Problems: The feature has state, not able to draw decision boundaries,
Not directly indexable/hashable
Typically very high dimensionality
Image Analysis & Retrieval, 2016 p.20
𝑆𝑘| [𝑥𝑘 , 𝑦𝑘, 𝜃𝑘 , 𝜎𝑘, ℎ1, ℎ2, … , ℎ128] , 𝑘 = 1. . 𝑛
Decision Boundary in Matching
Can we have a decision boundary function for interesting points based representation ?
Image Analysis & Retrieval, 2016 p.21
…..
Curse of Dimensionality in Retrieval
What feature dimensions will do to the retrieval efficiency… Looking at retrieval 99% of per dimension locality, and the
total volume covered plot.
Matlab: showDimensionCurse.m
Image Analysis & Retrieval, 2016 p.22
+
Aggregation – 30,000ft view
Bag of Words Compute k centroids in feature space, called visual words
Compute histogram
k x1 feature, hard assignment
VLAD Compute centroids in feature space
Compute aggregaged difference w.r.t the centroids
k x d feature, soft assignment
Fisher Vector Compute a Gaussian Mixture Model (GMM) with 2nd order info
Compute the aggregated feature w.r.t the mean and covariance of GMM
2 x k x d feature
AKULA Adaptive centroids and feature count
Improved with covariance ?
Image Analysis & Retrieval, 2016 p.23
0.5
0.4 0.05
0.05
Visual Key Words: main idea
Extract some local features from a number of images …
Image Analysis & Retrieval, 2016 24
e.g., SIFT descriptor
space: each point is 128-
dimensional
Slide credit: D. Nister
Visual Key Words: main idea
Image Analysis & Retrieval, 2016 25Slide credit: D. Nister
Visual words: main idea
Image Analysis & Retrieval, 2016 26
Slide credit: D. Nister
Visual words: main idea
Image Analysis & Retrieval, 2016 27
Slide credit: D. Nister
Slide credit: D. Nister
Visual Key Words
Image Analysis & Retrieval, 2016 28
Each point is a local
descriptor, e.g. SIFT
vector.
Slide credit: D. Nister
Image Analysis & Retrieval, 2016 29
Visual words
Example: each group of patches belongs to the same visual word
Image Analysis & Retrieval, 2016 30
Figure from Sivic & Zisserman, ICCV 2003
Visual words
Image Analysis & Retrieval, 2016 3131
Source credit: K. Grauman, B. Leibe
• More recently used for describing scenes and objects for the sake of indexing or classification.
Sivic & Zisserman 2003;
Csurka, Bray, Dance, & Fan
2004; many others.
Object Bag of ‘words’
ICCV 2005 short course, L. Fei-Fei
Bag of Words
Image Analysis & Retrieval, 2016 32
BoW Examples
Illustration
Image Analysis & Retrieval, 2016 33
Bags of visual words
Summarize entire image based on its distribution (histogram) of word occurrences.
Analogous to bag of words representation commonly used for documents.
Image Analysis & Retrieval, 2016 34
Image credit: Fei-Fei Li
Texture Retrieval
Texons…
Image Analysis & Retrieval, 2016 35
Universal texton dictionary
histogram
Source: Lana Lazebnik
BoW Distance Metrics
Rank images by normalized scalar product between their (possibly weighted) occurrence counts---nearest neighbor search for similar images.
Image Analysis & Retrieval, 2016 p.36
[5 1 1 0][1 8 1 4]
djq
Inverted List
Image Retrieval via Inverted List
Image Analysis & Retrieval, 2016 37
Image credit: A. Zisserman
Visual
Word
number
List of image
numbers
When will this give us a significant gain in efficiency?
Indexing local features: inverted file index
For text documents, an efficient way to find all pageson which a word occurs is to use an index…
We want to find all images in which a feature occurs.
We need to index each feature by the image it appears and also we keep the # of occurrence.
Image Analysis & Retrieval, 2016 38
Source credit : K. Grauman, B. Leibe
TF-IDF Weighting
Term Frequency – Inverse Document Frequency Describe image by frequency of each visual word within
it, down-weight words that appear often in the database (Standard weighting for text retrieval)
Image Analysis & Retrieval, 2016 p.39
Total number of
words in database
Number of
occurrences of
word i in whole
database
Number of
occurrences of
word i in
document d
Number of
words in
document d
BoW Use Case with Spatial Localization
Collecting words within a query region
Image Analysis & Retrieval, 2016 40
Query region:
pull out only the SIFT
descriptors whose
positions are within the
polygon
Image Analysis & Retrieval, 2016 41
BoW Patch Search
Localizing the BoW representation
Image Analysis & Retrieval, 2016 42
Localization with BoW
Image Analysis & Retrieval, 2016 43
Hiearchical Assignment of Histogram
Tree construction:
Image Analysis & Retrieval, 2016 44
[Nister & Stewenius, CVPR’06]
Vocabulary Tree
Training: Filling the tree
Image Analysis & Retrieval, 2016 45
[Nister & Stewenius, CVPR’06]
46
Vocabulary Tree
Training: Filling the tree
Image Analysis & Retrieval, 2016 46Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
47
Vocabulary Tree
Training: Filling the tree
Image Analysis & Retrieval, 2016 47Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Vocabulary Tree
Training: Filling the tree
Image Analysis & Retrieval, 2016 48
[Nister & Stewenius, CVPR’06]
Vocabulary Tree
Training: Filling the tree
Image Analysis & Retrieval, 2016 49
[Nister & Stewenius, CVPR’06]
50
Vocabulary Tree
Recognition
Image Analysis & Retrieval, 2016 50Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
RANSAC
verification
Vocabulary Tree: Performance
Evaluated on large databases Indexing with up to 1M images
Online recognition for databaseof 50,000 CD covers Retrieval in ~1s
Find experimentally that large vocabularies can be beneficial for recognition
Image Analysis & Retrieval, 2016 51
[Nister & Stewenius, CVPR’06]
Larger vocabularies
can be
advantageous…
But what happens if it
is too large?
Visual Word Vocabulary Size
Performance w.r.t vocabulary size
Image Analysis & Retrieval, 2016 52
Bags of words: pros and cons
Good:+ flexible to geometry / deformations / viewpoint+ compact summary of image content+ provides vector representation for sets+ Inverted List implementation offers practical solution
against large repository
Bad:- Lost of information at quantization and histogram
generation- basic model ignores geometry – must verify afterwards,
or encode via features- background and foreground mixed when bag covers
whole image- interest points or sampling: no guarantee to capture
object-level parts
Image Analysis & Retrieval, 2016 53Source credit : K. Grauman, B. Leibe
Can we improve BoW ?
• E.g. Why isn’t our Bag of Words classifier at 90% instead of 70%?
• Training Data
– Huge issue, but not necessarily a variable you can manipulate.
• Learning method
– BoW is on top of any feature scheme
• Representation
– Are we losing too much info in the process ?
Image Analysis & Retrieval, 2016 p.54
Standard Kmeans Bag of Words
BoW revisited
Image Analysis & Retrieval, 2016 p.55
http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
Motivation
Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region
Why not including other statistics/information ?
Image Analysis & Retrieval, 2016 p.56
http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
We already looked at the Spatial Pyramid/Pooling
Spatial Pooling
Image Analysis & Retrieval, 2016 p.57
level 2: 4x4level 0: 1x1 level 1: 2x2
Key take away: Multiple assignment ? Soft Assignment ?
Motivation
Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region
Why not including other statistics? For instance:• mean of local descriptors
Image Analysis & Retrieval, 2016 p.58
http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
Motivation
Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region
Why not including other statistics? For instance:• mean of local descriptors
• (co)variance of local descriptors
Image Analysis & Retrieval, 2016 p.59
http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
Simple case: Soft Assignment
Called “Kernel codebook encoding” by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters.
Image Analysis & Retrieval, 2016 p.60
Simple case: Soft Assignment
Called “Kernel codebook encoding” by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters.
This is fast and easy to implement (try it for Project 3!) but it does have some downsides for image retrieval –the inverted file index becomes less sparse.
Image Analysis & Retrieval, 2016 p.61
A first example: the VLAD
Given a codebook ,e.g. learned with K-means, and a set oflocal descriptors :
• assign:
• compute:
• concatenate vi’s + normalize
Image Analysis & Retrieval, 2016 p.62
Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.
3
x
v1 v2v3 v4
v5
1
4
2
5
① assign descriptors
② compute x- i
③ vi=sum x- i for cell i
A first example: the VLAD
A graphical representation of
Image Analysis & Retrieval, 2016 p.63
Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.
VL_FEAT Implementation
Matlab:
Image Analysis & Retrieval, 2016 p.64
function [vc]=vladSiftEncoding(sift,
codebook)
dbg=1;
if dbg
if (0) % init VL_FEAT, only need
to do once
run('../../tools/vlfeat-
0.9.20/toolbox/vl_setup.m');
end
im = imread('../pics/flarsheim-
2.jpg');
[f, sift] =
vl_sift(single(rgb2gray(im))); sift =
single(sift');
[indx, codebook] = kmeans(sift,
16);
% make sift # smaller
sift = sift(1:800,:);
end
[n, kd]=size(sift);
[m, kd]=size(codebook);
% compute assignment
dist = pdist2(codebook, sift);
mdist = mean(mean(dist));
% normalize the heat kernel s.t. mean
dist is mapped to 0.5
a = -log(0.5)/mdist;
indx = exp(-a*dist);
vc=vl_vlad(sift', codebook', indx);
if dbg
figure(41); colormap(gray);
subplot(2,2,1); imshow(im);
title('image');
subplot(2,2,2); imagesc(dist);
title('m x n distance');
subplot(2,2,3); imagesc(indx);
title('m x n assignment');
subplot(2,2,4); imagesc(reshape(vc,
[m, kd]));title('vlad code');
end
VLAD Code
What are the tweaks ? Code book design
Soft Assignment options
Image Analysis & Retrieval, 2016 p.65
References
Vocabulary Tree: David Nistér, Henrik Stewénius: Scalable Recognition with a Vocabulary
Tree. CVPR (2) 2006: 2161-2168
VLAD: Herve Jegou, Matthijs Douze, Cordelia Schmid:
Improving Bag-of-Features for Large Scale Image Search. International Journal of Computer Vision 87(3): 316-336 (2010)
Fisher Vector: Florent Perronnin, Jorge Sánchez, Thomas Mensink:
Improving the Fisher Kernel for Large-Scale Image Classification. ECCV (4) 2010: 143-156
AKULA: Abhishek Nagar, Zhu Li, Gaurav Srivastava, Kyungmo Park:
AKULA - Adaptive Cluster Aggregation for Visual Search. DCC 2014: 13-22
Image Analysis & Retrieval, 2016 p.66
Lec 07 Summary
Image Retrieval System Metric What is true positive, false positive, true negative, false
negative ?
What is precision, recall, F-score ?
Why Aggregation ? Decision boundary
Indexing/Hashing
Bag of Words A histogram with bins visual words
Variations: hierarchical assignment with vocabulary tree
Implementation: Inverted List
VLAD Richer encoding of aggregated info
Soft assignment of features to codebook bins
Vectorized representation – no need for inverted list
Image Analysis & Retrieval, 2016 p.67