Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based...

33
Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012

Transcript of Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based...

Page 1: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Deconstruction: Discriminative learning of local image descriptorsSamantha HorvathLearning Based Methods in Vision2/14/2012

Page 2: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Introduction

•Computer vision makes use of many “hand-crafted” descriptors.

•These descriptors share many common components

•This paper presents a modular framework for designing and optimizing new feature descriptors

Page 3: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Common Descriptors

• SIFT▫ Most well-known

descriptor▫ Quantized gradient

vectors▫ Grid based spatial

histogram▫ Post-normalization

• PCA-SIFT▫ Quantized gradient

vectors▫ PCA to reduce

dimensionality• GLOH

▫ Quantized gradient vectors

▫ Polar based histogram▫ PCA to reduce

dimensionality

Page 4: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Common descriptors• SURF

▫Haar wavelet responses▫Grid based histograms

• HOG▫Dense SIFT

• Shape Context▫Extract points from object contour▫Polar based histogram

• Geometric blur▫Extract sparse channels▫Apply spatially varying blur▫Subsample to create descriptor

Page 5: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Image with Interest point Extracted Patch

Feature extraction

PoolingDimensionality reduction

Page 6: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Descriptor learning framework

Page 7: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Generalized Framework

•Represent each portion of the descriptor algorithm as an interchangeable block

•Blocks inspired by existing descriptor algorithms

•Blocks are organized into candidate descriptors, then parameters are optimized

Page 8: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Algorithmic building blocks

•G-block: Gaussian smoothing•T-block: non-linear transformation•S-block: Spatial summation/pooling•E-block: Embedding (dimensionality

reduction)•N-block: Normalization

Page 9: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

General Pipeline

Page 10: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Transformations T1 – Gradient Orientation binning

Variants: number of orientation bins T2 – Rectified gradient binning

Variants: number of rectification bins T3 – Steerable filters

Variants: filter order and number of orientations T4 – Difference of Gaussian Responses (center-

surround) Parameter: size of center

T5 – Haar wavelet transform T6 – Fixed 4 x 4 classifier T7 – Quantized gray levels

Variant: number of gray level bins

Page 11: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Spatial summation blocks - parametric S1 - SIFT style

bilinear weighted grid Parameters: overall

footprint size (continuous)

S2 – GLOH style log-polar regions Variants:

number/arrangement of pooling regions

Parameters: Ring radius

Page 12: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Spatial summation blocks - parametric S3 – Gaussian weighted

pooling regions on a grid Variants: Grid size Parameters: Grid sample

positions, Size of the gaussians

S4 – Gaussian weighted, polar arranged pooling regions Variants: number of

pooling regions Parameters: Ring radii,

gaussian kernel size, relative angular rotation

Page 13: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Embedding – non-parametric

E1 – PCA (non discriminative technique)E2/E3 – Projection minimizes the ratio of

in-class variance for match pairs to the variance of all match pairs (LPP)

E4/E5 – Projection minimizes the ratio of variance between matched and non-matched pairs (LDE)

E6/E7 – Projection minimizes the ratio of in-class variance for match pairs to the total data variance (GLDE)

Page 14: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Smoothing and Normalization

•Gaussian smoothing▫Parameter: σ

•Normalization▫Normalize to unit vector▫Clip to threshold▫Renormalize, rinse, repeat▫Parameters: clipping threshold

Page 15: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

New names!

•SIFT▫T1b – S1-16

•GLOH▫T1b – S2-17 – E1

•PCA-SIFT▫T1b – E1

•SURF▫T5 – S1

Page 16: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Data

Page 17: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Ground Truth dataset

Page 18: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Ground truth dataset

•Uses camera calibration and dense multi-view stereo data

•DoG interest points are detected•Interest points are mapped from one

image to another view of the scene•Stereo constraints are used to help match

interest points

Page 19: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Same to same vs. like to like

Page 20: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Learning/optimization•The G, S ,N blocks and one T block (T4)

contain parameters for optimization •The G, S, N, and T blocks are jointly optimized

using Powell minimization▫Powell minimization a conjugate gradient

method – does not require derivatives▫Optimization initialized with reasonable values

•E blocks are optimized separately – generalized eigenvalue problem▫Power regularization to avoid over fitting

Page 21: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Results…

Page 22: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Dimension Reduction on SIFT

Page 23: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

•T3 blocks (rectified steerable filters) with polar summation regions (S4/S2) performed the best

•Consistently 0utperformed SIFT descriptors

Page 24: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Optimal Summation regions

Optimal summation regions are foveated!!!

Page 25: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Foveated summation regions

•The S4-25 spatial pooling variant is very similar to the DAISY descriptor (designed for dense matching)

•Foveated regions are similar to geometric blur (increased blurring away from interest point)

Page 26: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Results – Pipeline 2

•Performance more varied•Steerable filters still perform the best•LDE and LPP best embedding methods•Does not consistently outperform SIFT

Page 27: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.
Page 28: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Results – Pipeline 3

•Dimensionality reduction after learned T/S block combination

•Greatly outperforms SIFT•Straightforward PCA works the best

Page 29: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Thoughts

•Would like to see how the optimized spatial pooling blocks vary with the different training sets

•Ultimately, would like to see this framework tested on different dataset types

•Difficulty is getting “ground truth” matches for identification/classification tasks

Page 30: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Now what?

Page 31: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Synthesis

•Majority of my computer vision research involves medical imaging

•Medical images are very different from natural scene images▫Images represent a planar slice through

the patient▫Often poor contrast between different

structures

Page 32: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

Synthesis

•Interest point detection has applications for medical imaging

•Non-rigid registrations▫Warp one set of interest points to overlay

the second set•Tracking

Page 33: Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012.

References1. M. Brown, G. Hua and S. Winder, Discriminant Learning of Local Image Descriptors. IEEE

Transactions on Pattern Analysis and Machine Intelligence. 2010. (original)2. S. Winder and M. Brown, “Learning local image descriptors,” in Proceedings of the International

Conference on Computer Vision and Pattern Recognition (CVPR07), Minneapolis, June 2007. (technical)

3. K. Mikolajczyk and C. Schmid, “Scale and affine invariant interest point detectors,” International Journal of Computer Vision, vol. 1, no. 60, pp. 63–86, 2004. (overview)

4. E. Tola, V. Lepetit, and P. Fua, “A fast local descriptor for dense matching,” in Proceedings of the International Conference on Computer Vision and Pattern Recognition, Anchorage, June 2008. (daisy)

5. Y. Ke and R. Sukthankar, “PCA-SIFT: a more distinctive representation for local image descriptors,” in Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 2, July 2004, pp. 506–513. (PCA-sift)

6. A. Berg and J. Malik, “Geometric blur and template matching,” in International Conference on Computer Vision and Pattern Recognition, 2001, pp. I:607–614.

7. D. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. (SIFT)

8. Bay,H,. Tuytelaars, T., &Van Gool, L.(2006). “SURF: Speeded Up Robust Features”, 9th European Conference on Computer Vision.

9. G. Sharma, F. Jurie, Learning discriminative spatial representation for image classification, British Machine Vision Conference (BMVC.11) (grid paper)

10. W. T. Freeman and E. H. Adelson, “The design and use of steerable filters,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, pp. 891–906, 1991.