Local Features: Detection, Description & Matching
Transcript of Local Features: Detection, Description & Matching
Local Features: Detection, Description & Matching
Lecture 08 Computer Vision
• Dr George Stockman
Professor Emeritus, Michigan State University
• Dr David Lowe
Professor, University of British Columbia
• Dr Richard Szeliski
Head of Interactive Visual Media Group at Microsoft Research
• Dr Steve Seitz
Professor, University of Washington
• Dr Kristen Grauman
Associate Professor, University of Texas at Austin
• Dr Allan+ Jepson
Professor, University of Toronto
Material Citations
Suggested Readings
• Chapter 4
Richard Szeliski , “Computer Vision: Algorithms and Applications”, Springer; 2011.
• D. Lowe, “Distinctive image features from scale invariant key points”, International Journal of Computer Vision, 2004.
Since M is symmetric, we have
TXXM
2
1
0
0
The eigen values of M reveal the amount of intensity change in the two principal orthogonal gradient directions in the window.
Recall: Corners as distinctive interest points
( , ) ,u
E u v u v Mv
“flat” region
1 and 2 are small;
“edge”: 1 >> 2
2 >> 1
“corner”:
1 and 2 are large, 1 ~ 2;
One way to score the cornerness:
Recall: Corners as distinctive interest points
2
1 2 1 2R k
• Rotation invariant?
• Scale invariant?
TXXM
2
1
0
0
Yes
Properties of the Harris corner detector
• Rotation invariant?
• Scale invariant?
All points will be classified as edges
Corner !
Yes
No
Properties of the Harris corner detector
by Diva Sian
by swashford
Image matching
by Diva Sian by scgbt
Harder case
NASA Mars Rover images
Harder still?
NASA Mars Rover images
with SIFT feature matches Figure by Noah Snavely
Look for tiny colored squares..
• At an interesting point, let’s define a coordinate system (x,y axis) • Use the coordinate system to pull out a patch at that point
Image Matching
Image Matching
Local features: main components
• Detection: Identify the interest points • Description: Extract feature vector descriptor surrounding each interest point • Matching: Determine correspondence between descriptors in two views
(1) (1)
1 1[ ,........ ]dX x x
(2) (2)
2 1[ ,........ ]dX x x
• D. Lowe, “Distinctive image features from scale invariant key points”, International Journal of Computer Vision, 2004
• SIFT – Scale Invariant Feature Transform - Detector and descriptor
- SIFT is computationally expensive and copyrighted
SIFT
• A SIFT keypoint is a circular image region with an orientation • It is described by a geometric frame of four parameters:
- Keypoint center coordinates x and y - Its scale (the radius of the region) - and its orientation (an angle expressed in radians)
SIFT - Feature detector
Steps for Extracting Key Points
• Scale space peak selection - Potential locations for finding features
• Key point localization - Accurately locating the feature key points
• Orientation Assignment - Assigning orientation to the key points
• Key point descriptor - Describing the key point as a high dimensional vector
• Algorithm for finding points and representing their patches should produce similar results even when conditions vary
• Buzzword is “invariance”
– geometric invariance: translation, rotation, scale
– photometric invariance: brightness, exposure, …
Feature Descriptors
Invariant local features
• Say we have 2 images of this scene we’d like to align by matching local features • What would be the good local features (ones easy to match)?
What makes a good feature?
Uniqueness
Types of invariance
• Illumination
Types of invariance
• Illumination
• Scale
Types of invariance
• Illumination
• Scale
• Rotation
Types of invariance
• Illumination
• Scale
• Rotation
• Affine
• Illumination
• Scale
• Rotation
• Affine
• Perspective
Types of invariance
• Independently select interest points in each image, such that the detections are repeatable across different scales
Scale invariant interest points
Intuition: Find scale that gives local maxima of some function f in both position and scale.
f
region size
Image 1
f
region size
Image 2
s1 s2
Automatic scale selection
Intuition: Find scale that gives local maxima of some function f in both position and scale.
Automatic scale selection
– f is a local maximum in both position and scale
– Common definition of f: Laplacian: or difference between two Gaussian filtered images with different sigmas
gdx
df
f
gdx
d
Edge
Derivative of Gaussian
Edge = maximum of derivative
Recall: Edge detection
gdx
df
2
2
f
gdx
d2
2
Edge
Second derivative of Gaussian (Laplacian)
Edge = zero crossing of second derivative
Recall: Edge detection
• Edge: ripple
• Blob: superposition of two ripples
Spatial selection: the magnitude of the Laplacian response will achieve a maximum at the center of the blob, provided the scale of the Laplacian is “matched” to the scale of the blob
maximum
From edges to blobs
• Laplacian of Gaussian: Circularly symmetric operator for blob detection in 2D
2
2
2
22
y
g
x
gg
Blob detection in 2D
• Laplacian-of-Gaussian = “blob” detector
2
2
2
22
y
g
x
gg
filt
er s
cale
s
img1 img2 img3
Blob detection in 2D: scale selection
• We define the characteristic scale as the scale that produces peak of Laplacian response
characteristic scale
Blob detection in 2D
Original image at ¾ the size
Original image
Scale space blob detector- Example
Scale space blob detector- Example
Original image down sampled to ¾ the size but later stretched to match with original size
Scale space blob detector- Example
Scale space blob detector- Example
Scale space blob detector- Example
Scale space blob detector- Example
Scale space blob detector- Example
)()( yyxx LL
σ1
σ2
σ3
σ4
σ5
List of (x, y, σ)
scale
• Interest points are local maxima in both position and scale
Squared filter response maps
Scale invariant interest points
• Laplacian-of- Gaussian
Scale-space blob detector: Example
• Pyramids
– Smooth the image
– Subsample the smoothed image
– Repeat until image is small
• Each cycle of this process results in a smaller image with increased smoothing, but with decreased spatial sampling density (that is, decreased image resolution)
• Scale Space (DOG method)
Scale invariant interest points
Pyramids
• Pyramids
• Scale Space (DOG method)
– Pyramid but fill gaps with blurred images
– Take features from differences of these images
– If the feature is repeatably present in between Difference of Gaussians it is Scale Invariant and we should keep it.
Scale invariant interest points
Building a Scale Space
• Scale-space representation L(x,y;t) at scale t=0,1,4,16,64,256 respectively , corresponding to the original image
Building a Scale Space
• All scales must be examined to identify scale invariant features
• An efficient function is to compute the Laplacian Pyramid (Difference of Gaussian)
Approximation of LoG by Difference of Gaussians
Heat Equation
Typical values : σ =1.6; k= 2
Rate of change of Gaussian
Gaussian filter with σ
• We can approximate the Laplacian with a difference of Gaussians; more efficient to implement.
2 ( , , ) ( , , )xx yyL G x y G x y
( , , ) ( , , )DoG G x y k G x y
Laplacian
Difference of Gaussians
Scale invariant interest points - Technical detail
Building a Scale Space
Gaussian
Scale (next octave)
Scale (first octave) Peak detection
Local maxima/minima
Difference of Gaussian
k2σ
kσ
k3σ
k4σ
σ
Building a Scale Space
http://www.vlfeat.org/api/sift.html
Scale Space Peak Detection
http://www.vlfeat.org/api/sift.html
• Compare a pixel (X) with 26 pixels in current and adjacent scales (Green Circles)
• Select a pixel (X) if larger/smaller than all 26 pixels
• Large number of extrema, computationally expensive - Detect the most stable subset with a
coarse sampling of scales 9+9+8 = 26
Key Point Localization
• Candidates are chosen from extrema detection
Original Image Extrema locations
Initial Outlier Rejection
• Low contrast candidates
• Poorly localized candidates along an edge
• Taylor series expansion of DOG, D
• Minima or maxima is located at
• Value of D(x) at minima/maxima must be large, |D(x)|>th
Initial Outlier Rejection
Reduced from 832 key points to 729 key points, th=0.03
Further Outlier Rejection
• DOG has strong response along edge
• Assume DOG as a surface
- Compute principal curvatures (PC)
- Along the edge PC is very low, across the edge is high
Further Outlier Rejection
• Analogous to Harris corner detector
• Compute Hessian of D
• Remove outliers by evaluating
Further Outlier Rejection
• Following quantity is minimum when r=1
• It increases with r
• Eliminate key points if
Further Outlier Rejection
Reduced from 729 key points to 536 key points
Local features: main components
• Detection: Identify the interest points • Description: Extract feature vector descriptor surrounding each interest point • Matching: Determine correspondence between descriptors in two views
(1) (1)
1 1[ ,........ ]dX x x
(2) (2)
2 1[ ,........ ]dX x x
Need both of the following:
1. Make sure your detector is invariant
– Harris is invariant to translation and rotation
– Scale is trickier
• common approach is to detect features at many scales using a Gaussian pyramid
• More sophisticated methods find “the best scale” to represent each feature (e.g., SIFT)
2. Design an invariant feature descriptor
– A descriptor captures the information in a region around the detected feature point
– The simplest descriptor: a square window of pixels
– Better approach is SIFT descriptor
How to achieve invariance
SIFT Descriptor: is a 3-D spatial histogram of the image gradients in
characterizing the appearance of a keypoint.
SIFT - Feature descriptors
• Use histograms to bin pixels within sub-patches according to their orientation.
0 2 p
SIFT - Feature descriptors
Making descriptor rotation invariant
• To achieve rotation invariance, compute central derivatives, gradient magnitude and direction of L (smooth image) at the scale of key point (x,y)
Orientation assignment
• Create a weighted direction histogram in a neighborhood of a key point (36 bins) – 10 degree for each orientation = 36x10 = 360
• Weights are - Gradient magnitudes - Spatial gaussian filter with σ =1.5 x <scale of key point>
Gradient magnitude
Orientation assignment
• Select the highest peak as direction of the key point
Use of gradient orientation histograms has robust representation
• Find dominant orientation of the image patch
– This is given by x+, the eigenvector of H corresponding to +
• + is the larger eigen value
– Rotate the patch according to this angle
Orientation assignment
CSE 576: Computer Vision
• Rotate patch according to its dominant gradient orientation
• This puts the patches into a canonical orientation
Orientation assignment
Gradient orientation is more stable then raw intensity value
• Extraordinarily robust matching technique
• Can handle changes in viewpoint
- Up to about 60 degree out of plane rotation
• Can handle significant changes in illumination
- Sometimes even day vs. night (below)
• Fast and efficient—can run in real time http://people.csail.mit.edu/albert/ladypack/wiki/index.php?title=Known_implementations_of_SIFT
SIFT - Feature descriptors
• Take 16x16 square window around detected feature
• Compute edge orientation (angle of the gradient) for each pixel
• Throw out weak edges (threshold gradient magnitude)
• Create histogram of surviving edge orientations
0 2
angle histogram
SIFT - Feature descriptors
Keypoint descriptor Image gradients
• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor
Keypoint descriptor Image gradients
SIFT - Feature descriptors
Local features: main components
• Detection: Identify the interest points • Description: Extract feature vector descriptor surrounding each interest point • Matching: Determine correspondence between descriptors in two views
(1) (1)
1 1[ ,........ ]dX x x
(2) (2)
2 1[ ,........ ]dX x x
• Given a feature in I1, how to find the best match in I2?
– Define distance function that compares two descriptors
Matching local features
Image 1 Image 2
?
• To generate candidate matches, find patches that have the most similar appearance (e.g., lowest SSD)
• Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance)
Image 1 Image 2
Matching local features
• How to define the difference between two features f1, f2?
– Simple approach is SSD(f1, f2)
• Sum of Square Differences between entries of the two descriptors
f1 f2
Matching local features
SSD can give good scores to very ambiguous (bad) matches !!!
Image 1 Image 2
• How to define the difference between two features f1, f2?
– Better approach: ratio = SSD(f1, f2) / SSD(f1, f2’)
• f2 is best SSD match to f1 in I2
• f2’ is 2nd best SSD match to f1 in I2
• Gives small values for ambiguous matches
Matching local features
f1 f2
Image 1 Image 2
f2'
• To add robustness to matching, can consider ratio : Euclidean distance to best match / Euclidean distance to second best match
- If low, first match looks good
- If high, could be ambiguous match
Image 1 Image 2
? ? ? ?
Ambiguous matches
• Nearest neighbor (Euclidean distance)
• Threshold ratio of nearest to 2nd nearest descriptor
Matching SIFT Descriptors
• How can we measure the performance of a feature matcher?
50
75
200
feature distance
Evaluating the results
• The distance threshold affects performance
– True positives = # of detected matches that are correct
• Suppose we want to maximize these—how to choose threshold?
– False positives = # of detected matches that are incorrect
• Suppose we want to minimize these—how to choose threshold?
50
75
200
feature distance
false match
true match
True/false positives
0.7
• How can we measure the performance of a feature matcher?
0 1
1
false positive rate
true
positive rate
# true positives
# matching features (positives)
0.1
# false positives
# unmatched features (negatives)
Evaluating the results
0.7
• How can we measure the performance of a feature matcher?
0 1
1
false positive rate
true
positive rate
# tru
e p
ositiv
es
# m
atc
hin
g f
eatu
res (p
ositiv
es)
0.1
# false positives
# unmatched features (negatives)
ROC curve (“Receiver Operator Characteristic”)
• ROC Curves
- Generated by counting # current/incorrect matches, for different threholds
- Want to maximize area under the curve (AUC)
- Useful for comparing different feature matching methods
- For more info: http://en.wikipedia.org/wiki/Receiver_operating_characteristic
Evaluating the results
More on feature detection/description
Object recognition results – David Lowe