Learning Local Image Descriptors

download Learning Local Image Descriptors

If you can't read please download the document

description

Learning Local Image Descriptors. Matthew Brown University of British Columbia (prev.) Microsoft Research. [ Collaborators: † Simon Winder, *Gang Hua , † Rick Szeliski † =MS Research, *=MS Live Labs]. Applications @MSFT. Panoramic Stitching - PowerPoint PPT Presentation

Transcript of Learning Local Image Descriptors

Local Image Descriptors for Scalable Recognition

Learning Local Image DescriptorsMatthew BrownUniversity of British Columbia(prev.) Microsoft Research[ Collaborators: Simon Winder, *Gang Hua, Rick Szeliski =MS Research, *=MS Live Labs]

Applications @MSFTPanoramic StitchingDigital Image Pro, Windows Live Photogallery, Expression, HDView3D ModellingPhotosynthVirtual EarthLocation RecognitionImage SearchLincoln

[ yellow = product, white = technology preview, grey = research ]

Photosynth

[ http://labs.live.com/photosynth ]Photo Tourism[ Slide credit: Noah Snavely]Scene reconstructionPhoto Explorer

Input photographs

Relative camera positions and orientationsPoint cloudSparse correspondence[ http://photour.cs.washington.edu ]Photosynth is based on Photo Tourism [Snavely, Seitz, Szeliski SIGGRAPH 2006 ]Photo Tourism uses SIFT for correspondenceOur system takes as input an unordered set of photos, either from an Internet search or from a large personal collection. We assume the photos are largely from the same static scene. The first step of our system is to apply a computer vision techniques to reconstruct the geometry of the scene. The output of this procedure is the relative positions and orientation for the cameras used to take a connected set of the photographs, as well as a point cloud representing the geometry of the scene, and a sparse set of correspondences between the photos.This information is then loaded into our interactive photo explorer tool.

multiview stereo = training data

[ Seitz et al CVPR 2006, Goesele et al ICCV 2007 ]Learning Image Features[ Photo Tourism Snavely, Seitz, Szeliski - SIGGRAPH 2006 ]

3D PointCloud6Learning Image Features

[ Photo Tourism Snavely, Seitz, Szeliski - SIGGRAPH 2006 ]

3D PointCloudLearning Image Features

3D PointCloud[ Photo Tourism Snavely, Seitz, Szeliski - SIGGRAPH 2006 ]Learning Image Features

3D PointCloud[ Photo Tourism Snavely, Seitz, Szeliski - SIGGRAPH 2006 ]

Problem Statement = for simplicity + efficiency* = measured by ROC curve

Q: Form of the descriptor function f(.)?Find a function of a local image patchdescriptor = f ( )s.t. a nearest neighbour classifier is optimal*

Descriptor Algorithms

AlgorithmNormalizedImage PatchDescriptorVectorGradientsQuantized tok OrientationsNormalizeSummation[ SIFT Lowe ICCV 1999 ]Descriptor Algorithms

AlgorithmNormalizedImage PatchDescriptorVectorGradientsQuantized tok OrientationsNormalize(plus PCA)Summation[ GLOH Mikolajzcyk Schmid PAMI 2005 ]Descriptor Algorithms

AlgorithmNormalizedImage PatchDescriptorVectorCreateEdge MapNormalizeSummation[ Shape Context Belongie Malik Puzicha NIPS 2000 ]Descriptor Algorithms

AlgorithmNormalizedImage PatchDescriptorVectorFeatureDetectorNormalizeSummationTSN[ Geometric Blur Berg Malik CVPR 2001 ]Our Contribution

NormalizedImage PatchDescriptorVectorTSNParametersPropose a framework for descriptor algorithmsLearn parameters to find best performanceTrain on a ground truth data set based on accurate 3D matchesT-blocks

NormalizedImage Patch(w x h)DescriptorVectorTSNTransformation blockLocal gradientsSteerable filtersIsotropic filtersHaar waveletsLocal classifierQuantized intensities(w x h x k)Output: one length k vector per source pixelS-Blocks

NormalizedImage Patch(w x h)DescriptorVectorSNT(w x h x k)(m x k)Spatial summation block with m regions

Output: m length k vectorsS1S2S3S4N-Blocks

NormalizedImage Patch(w x h)DescriptorVectorSNT(w x h x k)(m x k)(m x k)Normalization BlockUnit normalizationSIFT normalization with clippingLearning DescriptorsSTNLearning DescriptorsS2T1aN2ParametersTraining Pairs

Incorrect Match %Correct Match %Update Parameters(Powell)Descriptor Distances

Powell minimisation: variation on line search where the latest step is added to a set of direction vectors. Do line search on all the direction vectors and add the latest step to the direction set, throwing away oldest direction vector.21Testing DescriptorsS2T1aN2ParametersTest Pairs

Incorrect Match %Correct Match %Final Error RateDescriptor Distances

95%Example of Parameter Learning

Results: Changing T-Blocks (k=4)Polar lattice S2 always has lower error rate than rectangular S1Gradient and DOG with S2 beat our SIFT reference (4% vs 6% error)

Results: Changing T-Blocks (k=8)

Results: Changing T-Blocks (k=16)

Steerable filters produce great results if phase information is keptResults: Changing S-Blocks

Results

SIFT normalization is importantBest result: 4th order steerable filters with phase information combined with polar S4-25 Gaussian summation block (2% error vs SIFT at 6%)Very large numbers of dimensionsDimension Reduction: PCA

wPCA

Dimension Reduction: LDA

wLDADimension Reduction: LDA

wLDADimension Reduction: LDA

wLDA

Results: LDA on patchesLDA on pixels SIFT (6%)PCA gave small improvement

Normalised patches

Gradient patches

Effect of # of Training PairsResults: LDA on patches

LDA on pixels SIFT (6%)PCA gave small improvementNeed ~100,000 training examples Results: LDA on T blocks

LDA on T1-T3 < 4.5%Optimal #dimensions ~20-30Post-normalisation important

T1

T3T1 = gradients binned in 4 orientation bins. T3 = steerable filter magnitude response35Results: LDA on T blocksLDA using T blocks T1T4

LDA on T1-T3 < 4.5%Optimal #dimensions ~20-30Post-normalisation important Results: LDA on descriptorsLDA using CVPR 07 descriptors

Overall best results#dimensions reduced from 100s to 10sNeed more challenging dataset!Discussion: Image Descriptors

AlgorithmNormalizedImage PatchDescriptorVectorFeatureDetectorNormalizeSummation

TSNcomplexsimpleConclusionsUsed learning to obtain good descriptorsAchieved error rates 1/3 of SIFTProduced useful ground truth data set

Future WorkUse multi-view stereo ground truthMulti-level simple-complex architecture+ non-parametric T blocksLearn interest point detectors

[ refs: 1) Winder, Brown CVPR 2007 2) Hua, Brown, Winder ICCV 2007 ][email protected][http://research.microsoft.com/ivm/hdview.htm ]