Post on 30-Dec-2015
description
Scale Invariant Feature Scale Invariant Feature Transform (SIFT)Transform (SIFT)
OutlineOutline
What is SIFTWhat is SIFT
Algorithm overviewAlgorithm overview
Object DetectionObject Detection
SummarySummary
OverviewOverview
19991999
Generates image features, “keypoints”Generates image features, “keypoints”– invariant to image scaling and rotation– partially invariant to change in illumination and
3D camera viewpoint– many can be extracted from typical images– highly distinctive
Algorithm overviewAlgorithm overview
Scale-space extrema detectionScale-space extrema detection– Uses difference-of-Gaussian functionUses difference-of-Gaussian function
Keypoint localizationKeypoint localization– Sub-pixel location and scale fit to a modelSub-pixel location and scale fit to a model
Orientation assignmentOrientation assignment– 1 or more for each keypoint1 or more for each keypoint
Keypoint descriptorKeypoint descriptor– Created from local image gradientsCreated from local image gradients
Scale spaceScale space
Definition: Definition:
wherewhere
),(),,(),,( yxIyxGyxL 222 2/)(
22
1),,(
yxeyxG
Scale spaceScale space
Keypoints are detected using scale-space Keypoints are detected using scale-space extrema in difference-of-Gaussian function extrema in difference-of-Gaussian function DD
DD definition: definition:
Efficient to computeEfficient to compute
),()),,(),,((),,( yxIyxGkyxGyxD
),,(),,( yxLkyxL
Relationship of Relationship of DD to to
Close approximation to scale-Close approximation to scale-normalized Laplacian of Gaussian,normalized Laplacian of Gaussian,
Diffusion equation:
Approximate ∂G/∂σ:
– giving,
When D has scales differing by a constant factor it already incorporates the σ2 scale normalization required for scale-invariance
G22
GG 2
k
yxGkyxGG ),,(),,(
GkyxGkyxG 22)1(),,(),,(
Gk
yxGkyxG 2),,(),,(
G22
Scale space constructionScale space construction
2k2σ
2kσ
2σ
kσ
σ
2kσ
2σ
kσ
σ
Scale space imagesScale space images
…
first octave
…
…
second octave
…
…
third octave
…
fourth octave
…
…
Difference-of-Gaussian imagesDifference-of-Gaussian images
…
first octave
…
…
second octave
…
…
third octave
…
fourth octave
…
…
Frequency of samplingFrequency of sampling
There is no minimumThere is no minimum
Best frequency determined experimentallyBest frequency determined experimentally
Prior smoothing for each octavePrior smoothing for each octave
Increasing Increasing σσ increases robustness, but costs increases robustness, but costs
σσ = 1.6 a good tradeoff = 1.6 a good tradeoff
Doubling the image initially increases Doubling the image initially increases number of keypointsnumber of keypoints
Finding extremaFinding extrema
Sample point is selected only if it is a Sample point is selected only if it is a minimum or a maximum of these pointsminimum or a maximum of these points
DoG scale spaceExtrema in this image
LocalizationLocalization
3D quadratic function is fit to the local sample 3D quadratic function is fit to the local sample pointspoints
Start with Taylor expansion with sample point Start with Taylor expansion with sample point as the originas the origin– wherewhere
Take the derivative with respect to Take the derivative with respect to XX, and set , and set it to 0, givingit to 0, giving
is the location of the keypointis the location of the keypoint
This is a 3x3 linear systemThis is a 3x3 linear system
2
2
2
1)(
DDDD T
T
Tyx ),,(
DD2
12
ˆ
XX
D
X
D ˆ02
2
LocalizationLocalization
Derivatives approximated by finite Derivatives approximated by finite differences,differences,– example:example:
If If XX is > 0.5 in any dimension, process is > 0.5 in any dimension, process repeatedrepeated
x
Dy
D
D
x
y
x
D
yx
D
x
Dyx
D
y
D
y
Dx
D
y
DD
2
222
2
2
22
22
2
2
4
)()(
1
2
2
,11
,11
,11
,11
2
,1
,,1
2
2
,1
,1
jik
jik
jik
jik
jik
jik
jik
jik
jik
DDDD
y
D
DDDD
DDD
FilteringFiltering
Contrast (use prev. equation):Contrast (use prev. equation):– If If | D(X) || D(X) | < 0.03, throw it out < 0.03, throw it out
Edge-iness:Edge-iness:– Use ratio of principal curvatures to throw out poorly Use ratio of principal curvatures to throw out poorly
defined peaksdefined peaks– Curvatures come from Hessian:Curvatures come from Hessian:– Ratio of Ratio of Trace(H)Trace(H)22 and and Determinant(H)Determinant(H)
– If ratio > If ratio > (r+1)(r+1)22/(r)/(r), throw it out (SIFT uses r=10), throw it out (SIFT uses r=10)
XD
DDT
ˆ2
1)ˆ(
yyxy
xyxx
DD
DDH
2)()(
)(
xyyyxx
yyxx
DDDHDet
DDHTr
Orientation assignmentOrientation assignment
Descriptor computed relative to keypoint’s Descriptor computed relative to keypoint’s orientation achieves rotation invarianceorientation achieves rotation invariance
Precomputed along with mag. for all levels Precomputed along with mag. for all levels (useful in descriptor computation)(useful in descriptor computation)
Multiple orientations assigned to keypoints Multiple orientations assigned to keypoints from an orientation histogramfrom an orientation histogram– Significantly improve stability of matchingSignificantly improve stability of matching
))),1(),1(/())1,()1,(((2tan),(
))1,()1,(()),1(),1((),( 22
yxLyxLyxLyxLayx
yxLyxLyxLyxLyxm
Keypoint imagesKeypoint images
DescriptorDescriptor
Descriptor has 3 dimensions Descriptor has 3 dimensions (x,y,(x,y,θθ))
Orientation histogram of gradient magnitudesOrientation histogram of gradient magnitudes
Position and orientation of each gradient Position and orientation of each gradient sample rotated relative to keypoint orientationsample rotated relative to keypoint orientation
DescriptorDescriptor
Weight magnitude of each sample point by Weight magnitude of each sample point by Gaussian weighting functionGaussian weighting function
Distribute each sample to adjacent bins by Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects)trilinear interpolation (avoids boundary effects)
DescriptorDescriptorBest results achieved with 4x4x8 = 128 Best results achieved with 4x4x8 = 128 descriptor sizedescriptor size
Normalize to unit lengthNormalize to unit length– Reduces effect of illumination changeReduces effect of illumination change
Cap each element to 0.2, normalize againCap each element to 0.2, normalize again– Reduces non-linear illumination changesReduces non-linear illumination changes– 0.2 determined experimentally0.2 determined experimentally
Object DetectionObject Detection
Create a database Create a database of keypoints from of keypoints from training imagestraining images
Match keypoints to Match keypoints to a databasea database– Nearest neighbor Nearest neighbor
searchsearch
PCA-SIFTPCA-SIFT
Different descriptor (same keypoints)Different descriptor (same keypoints)
Apply PCA to the gradient patchApply PCA to the gradient patch
Descriptor size is 20 (instead of 128)Descriptor size is 20 (instead of 128)
More robust, fasterMore robust, faster
SummarySummary
Scale spaceScale space
Difference-of-GaussianDifference-of-Gaussian
LocalizationLocalization
FilteringFiltering
Orientation assignmentOrientation assignment
Descriptor, 128 elementsDescriptor, 128 elements