Compact Representation of Visual Data (BOW, Fisher Vector...
Transcript of Compact Representation of Visual Data (BOW, Fisher Vector...
![Page 1: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/1.jpg)
Compact Representation of Visual Data
(BOW, Fisher Vector & VLAD)
![Page 2: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/2.jpg)
We try to understand...
● What is Compact Code ?
● Why ?? Its Applications
● Couple of such Codes: BOV, FV, VLAD, Classemes
● Its Application in large scale image search and
classification
![Page 3: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/3.jpg)
Compact Code
● Code: The descriptor ( real or binary ) that represents an entity/instance
− E.g. entity: message, document, image or video
− E.g. descriptor: BoV, FV, VLAD
● Compact Code: efficiently represented ( less memory
space and easy to search for ) code
![Page 4: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/4.jpg)
Example Descriptor
BoF
[ Figure from SE263:Video Analytics by R Venkatesh Babu ]
![Page 5: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/5.jpg)
BoF
[ Figure from Kristen Grauman's website
![Page 6: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/6.jpg)
Example Descriptor
HoG
[ Figure from SE263:Video Analytics by R Venkatesh Babu ]
![Page 7: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/7.jpg)
Example Descriptor
VLAD
[ Figure from Jegou et. al, PAMI 2011 ]
![Page 8: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/8.jpg)
Applications (in IP/VP)
● CBIR, large scale image and Video search
● Object recognition
● Image/Video Annotation/Classification
● Event detection
● Detecting partial image duplicates on the web and
deformed copies
![Page 9: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/9.jpg)
Goal and Challenges
● Problem Addressing: Large scale image search
− Finding images representing the same object/content
● Constraints:
− Search accuracy
− Efficiency (Search time)
− Memory usage
![Page 10: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/10.jpg)
BOV Model
![Page 11: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/11.jpg)
BoF/BoW
● Success of BoW model is due to
− Powerful local descriptors like SIFT
− Comparison is easy (works with standard distances)
− High dimensionality → sparse vectors → inverted lists
can be employed
![Page 12: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/12.jpg)
Image representation with Fisher Vector for
Semantic Classification and Retrieval
![Page 13: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/13.jpg)
Motivation : Why ??
● Consider BOV representation
● Representation is computationally very expensive
− For each feature, need to find distance from all the cluster centers
− Runtime – O(NKd)
− N - number of features (~ 104 per image, say SIFT)
− K-number of centers (~ 1000 say for recognition)
− d-dimension of feature(~ 100 , for SIFT)
● In total, in the order of 109 multiplications per image, to obtain a
histogram of 1000 bins
![Page 14: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/14.jpg)
BOV Model
20
5
38
10
[ Figure: from http://lear.inrialpes.fr/~verbeek/MLCR.11.12.php by Jakob Verbeek]
![Page 15: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/15.jpg)
Motivation : Why ??
● For more efficient representation (using BOV)
− BOV stores the no. of features assigned to each word (0th order statistics)
[ Figure: from http://lear.inrialpes.fr/~verbeek/MLCR.11.12.php by Jakob Verbeek]
![Page 16: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/16.jpg)
Motivation : Why ??
● For more efficient representation (using BOV)
− BOV stores the no. of features assigned to each word (0th order statistics)
− If the number of words is increased → directly increases the computations
[ Figure: from http://lear.inrialpes.fr/~verbeek/MLCR.11.12.php by Jakob Verbeek]
![Page 17: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/17.jpg)
Motivation : Why ??
● For more efficient representation (using BOV)
− BOV stores the no. of features assigned to each word (0th order statistics)
− If the number of words is increased → directly increases the computations
− Leads to many empty bins, redundancy
[ Figure: from http://lear.inrialpes.fr/~verbeek/MLCR.11.12.php by Jakob Verbeek]
![Page 18: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/18.jpg)
Motivation : Why ??
● Even when the counts are the same, the position and variance of the points in the cell can vary
[ Figure: from http://lear.inrialpes.fr/~verbeek/MLCR.11.12.php by Jakob Verbeek]
![Page 19: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/19.jpg)
Slight deviation..
● Pattern classification techniques can be divided into
− Generative approaches
− Discriminative approaches
● Generative: focuses on the modeling of class-
conditional probability (p(x/y)) density functions
● Discriminative: focuses directly on the problem of
interest: classification
![Page 20: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/20.jpg)
Discriminative vs generative methods● Generative methods
● Say, X is the feature, Y is the label (simple 2 class case)
● Model the class conditional probabilities p(x/C1) and p(x/C2)
● Estimates the prior probabilities p(y)
● Uses Baye's rule to infer the class, given input
[ Figure: from http://lear.inrialpes.fr/~verbeek/MLCR.11.12.php by Jakob Verbeek]
![Page 21: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/21.jpg)
Discriminative vs generative methods
● Discriminative
● Directly estimate class probability given input: p(y|x)
● Some methods do not have probabilistic interpretation,
● eg. fit a function f(x), and assign to class 1 if f(x)>0, and to class 2 if f(x)<0
[ Figure: from http://lear.inrialpes.fr/~verbeek/MLCR.11.12.php by Jakob Verbeek]
![Page 22: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/22.jpg)
Fisher Vector Principles
● Fisher kernels: combine the benefits of generative and discriminative approaches
● Fit probabilistic model to data, p(X ; θ ).
● p is a pdf whose parameters are denoted by θ.
● Characterize the samples X = { xt; t = 1,..N } with the gradient vector:
Intuition ??Fixed-size !!
![Page 23: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/23.jpg)
Fisher Vector Principles
● GMM is the generally used distribution to model the SIFT features
![Page 24: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/24.jpg)
Fisher Vector Principles
● In total K(1+2D) dimensional representation
![Page 25: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/25.jpg)
Fisher Vector Principles (optional slide)
● Generally, Mixture of Gaussians is used to model the local (SIFT) descriptors with, (assumed) diagonal covariance matrices
[Figure: Garg V et al., Sparse Discriminative Fisher Vectors in Visual Classification, ICVGIP 2012]
![Page 26: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/26.jpg)
Fisher Vector Principles
[Figure: Garg V et al., Sparse Discriminative Fisher Vectors in Visual Classification, ICVGIP 2012]
![Page 27: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/27.jpg)
BOV vs FV● BOV
− Fits K-means clustering to the data
− Represents image as histogram of words
− Considers the 0th order statistics
● FV
− Fits GMM to the local descriptors
− Represents image with derivative of log likelihood
− Considers the 1st and 2nd order statistics also
● Computation
− Both compare N descriptors to K visual words (Centers/Gaussians)
● Memory Usage
− Higher for FV; a factor (2D+1) larger
− For K = 1000 ~ 1MB
− However, because we store more info per visual word, can obtain same or better performance
![Page 28: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/28.jpg)
BoV, FV and VLAD
● VLAD : FV :: k-means : GMM clustering
![Page 29: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/29.jpg)
References● Fisher kernels on visual vocabularies for image categorization F. Perronnin and C.
Dance, CVPR 2007
● http://lear.inrialpes.fr/~verbeek/MLCR.11.12.php
● T. Jaakkola and D. Haussler, “Exploiting generative models in discriminative classifiers,” in NIPS, 1998
● H. J´egou, Perronnin, M. Douze, Jorge S´anchez, C. Schmid, and P. P´erez, “Aggregating local descriptors into a compact image representation,” in PAMI, 2011.
![Page 30: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/30.jpg)
VLAD:Aggregating local descriptors into a compact
image representation
Jegou et.al, CVPR 10, PAMI 11
![Page 31: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/31.jpg)
Fisher Vector
● Perronnin et al. [3] applied Fisher Kernel for image classification
● Model visual words with GMM, restricted to diagonal variance
matrices (Probabilistic visual vocabulary)
● Derive a d X k dimensional vector considering only means or
variances
● Compared to BoW fewer visual words are required
− Varied k from 16 to 256
![Page 32: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/32.jpg)
Towards Efficiency
● Performance is achieved by optimizing
− The representation : aggregating local image
descriptors
− Dimensionality reduction of these vectors
− Indexing them
● These are dependent steps
![Page 33: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/33.jpg)
Dimensionality
High-Dimension
● Better exhaustive search results
● Difficult to index
Low-Dimension
● Indexed efficiently
● Low discriminative power
![Page 34: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/34.jpg)
VLAD: non probabilistic Fisher Kernel
● Jegou et al. Proposed in CVPR version
![Page 35: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/35.jpg)
VLAD
Images and corresponding VLAD descriptors, for K=16 centroids. The components of the descriptor are represented like SIFT, with negative components in red.
![Page 36: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/36.jpg)
Dimensionality reduction on local descriptors
● Applying the Fisher Kernel framework directly on local descriptors leads to suboptimal results
![Page 37: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/37.jpg)
Dimensionality reduction on local descriptors
● Applying the Fisher Kernel framework directly on local descriptors leads to suboptimal results
● Apply a PCA on the SIFT descriptors to reduce them
from 128D to d = 64
![Page 38: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/38.jpg)
Dimensionality reduction on local descriptors
● Applying the Fisher Kernel framework directly on local descriptors leads to suboptimal results
● Apply a PCA on the SIFT descriptors to reduce them
from 128D to d = 64
● Two reasons may explain the positive impact of this PCA:
![Page 39: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/39.jpg)
Dimensionality reduction on local descriptors
● Applying the Fisher Kernel framework directly on local descriptors leads to suboptimal results
● Apply a PCA on the SIFT descriptors to reduce them
from 128D to d = 64
● Two reasons may explain the positive impact of this PCA:
1. De-correlated data can be fitted more accurately by a
GMM with diagonal covariance matrices
![Page 40: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/40.jpg)
Dimensionality reduction on local descriptors
● Applying the Fisher Kernel framework directly on local descriptors leads to suboptimal results
● Apply a PCA on the SIFT descriptors to reduce them from 128D to d = 64
● Two reasons may explain the positive impact of this PCA:
1. De-correlated data can be fitted more accurately by aGMM with
diagonal covariance matrices
2. The GMM estimation is noisy for the less energetic components
![Page 41: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/41.jpg)
Evaluation of the Aggregation Methods
● Evaluation is performed(on Holidays dataset) without the subsequent indexing
![Page 42: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/42.jpg)
Evaluation of the Aggregation Methods
● Inferences
− Results are similar if these representations are learned and computed
on the plain SIFT descriptors
− FV+PCA outperforms VLAD by a few points of mAP
− The larger the number of centroids, the better the performance
● For K=4096 → mAP=68.9%, outperforms any result reported for standard
BOW on this dataset ([1] reports mAP=57.2% with a 200k vocabulary)
![Page 43: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/43.jpg)
Comparison of BOW/VLAD/FV
![Page 44: Compact Representation of Visual Data (BOW, Fisher Vector ...val.serc.iisc.ernet.in/DAV/CompactRepresentation.pdf · Evaluation of the Aggregation Methods Inferences − Results are](https://reader031.fdocuments.in/reader031/viewer/2022041814/5e599787c04b232ee15f5a84/html5/thumbnails/44.jpg)
Comparison of BOW/VLAD/FV