Local Features: Detection, Description & Matching

Local Features: Detection, Description & Matching

Lecture 08 Computer Vision

• Dr George Stockman

Professor Emeritus, Michigan State University

• Dr David Lowe

Professor, University of British Columbia

• Dr Richard Szeliski

Head of Interactive Visual Media Group at Microsoft Research

• Dr Steve Seitz

Professor, University of Washington

• Dr Kristen Grauman

Associate Professor, University of Texas at Austin

• Dr Allan+ Jepson

Professor, University of Toronto

Material Citations

Suggested Readings

• Chapter 4

Richard Szeliski , “Computer Vision: Algorithms and Applications”, Springer; 2011.

• D. Lowe, “Distinctive image features from scale invariant key points”, International Journal of Computer Vision, 2004.

Since M is symmetric, we have

TXXM

2

1

0

0

The eigen values of M reveal the amount of intensity change in the two principal orthogonal gradient directions in the window.

Recall: Corners as distinctive interest points

( , ) ,u

E u v u v Mv

“flat” region

1 and 2 are small;

“edge”: 1 >> 2

2 >> 1

“corner”:

1 and 2 are large, 1 ~ 2;

One way to score the cornerness:

Recall: Corners as distinctive interest points

2

1 2 1 2R k

• Rotation invariant?

• Scale invariant?

TXXM

2

1

0

0

Yes

Properties of the Harris corner detector

• Rotation invariant?

• Scale invariant?

All points will be classified as edges

Corner !

Yes

No

Properties of the Harris corner detector

by Diva Sian

by swashford

Image matching

by Diva Sian by scgbt

Harder case

NASA Mars Rover images

Harder still?

NASA Mars Rover images

with SIFT feature matches Figure by Noah Snavely

Look for tiny colored squares..

• At an interesting point, let’s define a coordinate system (x,y axis) • Use the coordinate system to pull out a patch at that point

Image Matching

Image Matching

Local features: main components

• Detection: Identify the interest points • Description: Extract feature vector descriptor surrounding each interest point • Matching: Determine correspondence between descriptors in two views

(1) (1)

1 1[ ,........ ]dX x x

(2) (2)

2 1[ ,........ ]dX x x

• D. Lowe, “Distinctive image features from scale invariant key points”, International Journal of Computer Vision, 2004

• SIFT – Scale Invariant Feature Transform - Detector and descriptor

- SIFT is computationally expensive and copyrighted

SIFT

• A SIFT keypoint is a circular image region with an orientation • It is described by a geometric frame of four parameters:

- Keypoint center coordinates x and y - Its scale (the radius of the region) - and its orientation (an angle expressed in radians)

SIFT - Feature detector

Steps for Extracting Key Points

• Scale space peak selection - Potential locations for finding features

• Key point localization - Accurately locating the feature key points

• Orientation Assignment - Assigning orientation to the key points

• Key point descriptor - Describing the key point as a high dimensional vector

• Algorithm for finding points and representing their patches should produce similar results even when conditions vary

• Buzzword is “invariance”

– geometric invariance: translation, rotation, scale

– photometric invariance: brightness, exposure, …

Feature Descriptors

Invariant local features

• Say we have 2 images of this scene we’d like to align by matching local features • What would be the good local features (ones easy to match)?

What makes a good feature?

Uniqueness

Types of invariance

• Illumination

Types of invariance

• Illumination

• Scale

Types of invariance

• Illumination

• Scale

• Rotation

Types of invariance

• Illumination

• Scale

• Rotation

• Affine

• Illumination

• Scale

• Rotation

• Affine

• Perspective

Types of invariance

• Independently select interest points in each image, such that the detections are repeatable across different scales

Scale invariant interest points

Intuition: Find scale that gives local maxima of some function f in both position and scale.

f

region size

Image 1

f

region size

Image 2

s1 s2

Automatic scale selection

Intuition: Find scale that gives local maxima of some function f in both position and scale.

Automatic scale selection

– f is a local maximum in both position and scale

– Common definition of f: Laplacian: or difference between two Gaussian filtered images with different sigmas

gdx

df

f

gdx

d

Edge

Derivative of Gaussian

Edge = maximum of derivative

Recall: Edge detection

gdx

df

2

2

f

gdx

d2

2

Edge

Second derivative of Gaussian (Laplacian)

Edge = zero crossing of second derivative

Recall: Edge detection

• Edge: ripple

• Blob: superposition of two ripples

Spatial selection: the magnitude of the Laplacian response will achieve a maximum at the center of the blob, provided the scale of the Laplacian is “matched” to the scale of the blob

maximum

From edges to blobs

• Laplacian of Gaussian: Circularly symmetric operator for blob detection in 2D

2

2

2

22

y

g

x

gg

Blob detection in 2D

• Laplacian-of-Gaussian = “blob” detector

2

2

2

22

y

g

x

gg

filt

er s

cale

s

img1 img2 img3

Blob detection in 2D: scale selection

• We define the characteristic scale as the scale that produces peak of Laplacian response

characteristic scale

Blob detection in 2D

Original image at ¾ the size

Original image

Scale space blob detector- Example


Original image down sampled to ¾ the size but later stretched to match with original size

)()( yyxx LL

σ1

σ2

σ3

σ4

σ5

List of (x, y, σ)

scale

• Interest points are local maxima in both position and scale

Squared filter response maps


• Laplacian-of- Gaussian

Scale-space blob detector: Example

• Pyramids

– Smooth the image

– Subsample the smoothed image

– Repeat until image is small

• Each cycle of this process results in a smaller image with increased smoothing, but with decreased spatial sampling density (that is, decreased image resolution)

• Scale Space (DOG method)


Pyramids

• Pyramids

• Scale Space (DOG method)

– Pyramid but fill gaps with blurred images

– Take features from differences of these images

– If the feature is repeatably present in between Difference of Gaussians it is Scale Invariant and we should keep it.


Building a Scale Space

• Scale-space representation L(x,y;t) at scale t=0,1,4,16,64,256 respectively , corresponding to the original image


• All scales must be examined to identify scale invariant features

• An efficient function is to compute the Laplacian Pyramid (Difference of Gaussian)

Approximation of LoG by Difference of Gaussians

Heat Equation

Typical values : σ =1.6; k= 2

Rate of change of Gaussian

Gaussian filter with σ

• We can approximate the Laplacian with a difference of Gaussians; more efficient to implement.

2 ( , , ) ( , , )xx yyL G x y G x y

( , , ) ( , , )DoG G x y k G x y

Laplacian

Difference of Gaussians

Scale invariant interest points - Technical detail


Gaussian

Scale (next octave)

Scale (first octave) Peak detection

Local maxima/minima

Difference of Gaussian

k2σ

kσ

k3σ

k4σ

σ


http://www.vlfeat.org/api/sift.html

Scale Space Peak Detection

http://www.vlfeat.org/api/sift.html

• Compare a pixel (X) with 26 pixels in current and adjacent scales (Green Circles)

• Select a pixel (X) if larger/smaller than all 26 pixels

• Large number of extrema, computationally expensive - Detect the most stable subset with a

coarse sampling of scales 9+9+8 = 26

Key Point Localization

• Candidates are chosen from extrema detection

Original Image Extrema locations

Initial Outlier Rejection

• Low contrast candidates

• Poorly localized candidates along an edge

• Taylor series expansion of DOG, D

• Minima or maxima is located at

• Value of D(x) at minima/maxima must be large, |D(x)|>th

Initial Outlier Rejection

Reduced from 832 key points to 729 key points, th=0.03

Further Outlier Rejection

• DOG has strong response along edge

• Assume DOG as a surface

- Compute principal curvatures (PC)

- Along the edge PC is very low, across the edge is high


• Analogous to Harris corner detector

• Compute Hessian of D

• Remove outliers by evaluating


• Following quantity is minimum when r=1

• It increases with r

• Eliminate key points if


Reduced from 729 key points to 536 key points



(1) (1)

1 1[ ,........ ]dX x x

(2) (2)

2 1[ ,........ ]dX x x

Need both of the following:

1. Make sure your detector is invariant

– Harris is invariant to translation and rotation

– Scale is trickier

• common approach is to detect features at many scales using a Gaussian pyramid

• More sophisticated methods find “the best scale” to represent each feature (e.g., SIFT)

2. Design an invariant feature descriptor

– A descriptor captures the information in a region around the detected feature point

– The simplest descriptor: a square window of pixels

– Better approach is SIFT descriptor

How to achieve invariance

SIFT Descriptor: is a 3-D spatial histogram of the image gradients in

characterizing the appearance of a keypoint.

SIFT - Feature descriptors

• Use histograms to bin pixels within sub-patches according to their orientation.

0 2 p


Making descriptor rotation invariant

• To achieve rotation invariance, compute central derivatives, gradient magnitude and direction of L (smooth image) at the scale of key point (x,y)

Orientation assignment

• Create a weighted direction histogram in a neighborhood of a key point (36 bins) – 10 degree for each orientation = 36x10 = 360

• Weights are - Gradient magnitudes - Spatial gaussian filter with σ =1.5 x <scale of key point>

Gradient magnitude


• Select the highest peak as direction of the key point

Use of gradient orientation histograms has robust representation

• Find dominant orientation of the image patch

– This is given by x+, the eigenvector of H corresponding to +

• + is the larger eigen value

– Rotate the patch according to this angle


CSE 576: Computer Vision

• Rotate patch according to its dominant gradient orientation

• This puts the patches into a canonical orientation


Gradient orientation is more stable then raw intensity value

• Extraordinarily robust matching technique

• Can handle changes in viewpoint

- Up to about 60 degree out of plane rotation

• Can handle significant changes in illumination

- Sometimes even day vs. night (below)

• Fast and efficient—can run in real time http://people.csail.mit.edu/albert/ladypack/wiki/index.php?title=Known_implementations_of_SIFT


http://people.csail.mit.edu/albert/ladypack/wiki/index.php?title=Known_implementations_of_SIFT

• Take 16x16 square window around detected feature

• Compute edge orientation (angle of the gradient) for each pixel

• Throw out weak edges (threshold gradient magnitude)

• Create histogram of surviving edge orientations

0 2

angle histogram


Keypoint descriptor Image gradients

• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)

• Compute an orientation histogram for each cell

• 16 cells * 8 orientations = 128 dimensional descriptor

Keypoint descriptor Image gradients




(1) (1)

1 1[ ,........ ]dX x x

(2) (2)

2 1[ ,........ ]dX x x

• Given a feature in I1, how to find the best match in I2?

– Define distance function that compares two descriptors

Matching local features

Image 1 Image 2

?

• To generate candidate matches, find patches that have the most similar appearance (e.g., lowest SSD)

• Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance)

Image 1 Image 2


• How to define the difference between two features f1, f2?

– Simple approach is SSD(f1, f2)

• Sum of Square Differences between entries of the two descriptors

f1 f2


SSD can give good scores to very ambiguous (bad) matches !!!

Image 1 Image 2

• How to define the difference between two features f1, f2?

– Better approach: ratio = SSD(f1, f2) / SSD(f1, f2’)

• f2 is best SSD match to f1 in I2

• f2’ is 2nd best SSD match to f1 in I2

• Gives small values for ambiguous matches


f1 f2

Image 1 Image 2

f2'

• To add robustness to matching, can consider ratio : Euclidean distance to best match / Euclidean distance to second best match

- If low, first match looks good

- If high, could be ambiguous match

Image 1 Image 2

? ? ? ?

Ambiguous matches

• Nearest neighbor (Euclidean distance)

• Threshold ratio of nearest to 2nd nearest descriptor

Matching SIFT Descriptors

• How can we measure the performance of a feature matcher?

50

75

200

feature distance

Evaluating the results

• The distance threshold affects performance

– True positives = # of detected matches that are correct

• Suppose we want to maximize these—how to choose threshold?

– False positives = # of detected matches that are incorrect

• Suppose we want to minimize these—how to choose threshold?

50

75

200

feature distance

false match

true match

True/false positives

0.7


0 1

1

false positive rate

true

positive rate

# true positives

# matching features (positives)

0.1

# false positives

# unmatched features (negatives)


0.7


0 1

1

false positive rate

true

positive rate

# tru

e p

ositiv

es

# m

atc

hin

g f

eatu

res (p

ositiv

es)

0.1

# false positives

# unmatched features (negatives)

ROC curve (“Receiver Operator Characteristic”)

• ROC Curves

- Generated by counting # current/incorrect matches, for different threholds

- Want to maximize area under the curve (AUC)

- Useful for comparing different feature matching methods

- For more info: http://en.wikipedia.org/wiki/Receiver_operating_characteristic


http://en.wikipedia.org/wiki/Receiver_operating_characteristic

http://en.wikipedia.org/wiki/Receiver_operating_characteristic

Local Features: Detection, Description & Matching

Documents

Transcript of Local Features: Detection, Description & Matching