Structure from Motion - University of California, San...
Transcript of Structure from Motion - University of California, San...
1
CSE152, Spring 2019 Intro Computer Vision
Structure from Motion
Introduction to Computer Vision CSE 152 Lecture 9
CSE152, Spring 2019 Intro Computer Vision
Announcements • HW2 assigned • Midterm moved to Monday 5/13 – next class after
HW2 due date.
2
CSE152, Spring 2019
Rectification
Original images
Rectified images
CSE152, Spring 2019
Rectification
(uL,vL)
(xL, yL) (uR,vR)
(xR, yR) €
xRyRwR
"
#
$ $ $
%
&
' ' '
= HR
uRvR1
"
#
$ $ $
%
&
' ' '
€
xLyLwL
"
#
$ $ $
%
&
' ' '
= HL
uLvL1
"
#
$ $ $
%
&
' ' '
Under perspective projection, the mapping from a plane to a plane is given by a linear transformation of homogeneous coordinates (called a projective transformation or homography).
.
Two images Two homographies
HL, HR
e1
e2
Question: Where are the epipoles in the rectified image?
3
CSE152, Spring 2019
How is warping done? Forward Method • Input: Source image: I and
Rectification matrix H • For each corner cs of Source image in
homogenous coordinates, compute ct=Hcs
• Compute smallest and largest x and y of ct’s, determine bounding box on target image, create target image T with size of bounding box.
• For each pixel coordinate ps (homogenous) in the Source image, compute location in the Target image as pt=Hps. Copy I(ps) to T(pt)
Source
Target
cs
ct
CSE152, Spring 2019
Problem with Forward Method • There’s no guarantee that every pixel
in Target Image will be written to.
• If Target Image is larger than Source or Target is highly stretched, there may be missing points that appear as speckles or lines. Source
Target
4
CSE152, Spring 2019
How is warping done? Backward method • Input: Source image: I and
rectification matrix H • For each corner cs of Source image in
homogenous coordinates, compute ct=Hcs
• Compute smallest and largest x and y of ct’s, determine bounding box on target image, create target image T with size of bounding box.
• For each pixel coordinate pt (homogenous) in the Target, compute location in the Source as ps=H-1pt. – If ps is within source image, copy I(ps) to
T(pt)
Source
Target
CSE152, Spring 2019
Estimate3Dstructurefromimages
5
CSE152, Spring 2019
How many views and how many points are needed to solve this?
Consider M images of N points, how many unknowns 1. Affix world coordinate system to location of first
camera frame: (M-1)*6 unknowns for cameras 2. 3-D Structure: 3*N unknowns for points 3. Can only recover structure and motion up to scale
factor (one fewer unknown
Total number of unknowns: (M-1)*6+3*N-1 Total number of measurements: 2*M*N Solution is possible when more measurements than
unknowns: (M-1)*6+3*N-1 ≤ 2*M*N Some values of N and M satisfying this:
M = 2, M = 3,
N = 5 N = 4
CSE152, Spring 2019
Two view structure from motion
6
CSE152, Spring 2019
Epipolar Constraint: Calibrated Case
Essential Matrix (Longuet-Higgins, 1981)
1p ⋅ 1t2 × ( 21R 2p ')⎡
⎣⎤⎦= 0
The vectors , Op! "!
and are coplanar O ' p! "!!!
'OO '! "!!!
1pTE 2p ' = 0 with E = [(1t2 )×]21R
skew
{1} {2}
CSE152, Spring 2019
The Eight-Point Algorithm (Longuet-Higgins, 1981)
uu ' uv ' u vu ' vv ' v u ' v ' 1⎡⎣⎢
⎤⎦⎥
E11E12E13E21E22E23E31E32E33
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
= 0
• Set E33 to 1 • Use 8 points (ui,vi), i=1..8
€
u,v,1[ ]E11 E12 E13E21 E22 E23
E31 E32 E33
"
#
$ $ $
%
&
' ' '
u'v '1
"
#
$ $ $
%
&
' ' '
= 0
€
u1u'1 u1v'1 u1 v1u'1 v1v'1 v1 u'1 v'1u2u'2 u2v'2 u2 v2u'2 v2v'2 v2 u'2 v'2u3u'3 u3v'3 u3 v3u'3 v3v'3 v3 u'3 v'3u4u'4 u4v'4 u4 v4u'4 v4v'4 v4 u'4 v'4u5u'5 u5v'5 u5 v5u'5 v5v'5 v5 u'5 v'5u6u'6 u6v'6 u6 v6u'6 v6v'6 v6 u'6 v'6u7u'7 u7v'7 u7 v7u'7 v7v'7 v7 u'7 v'7u8u'8 u8v'8 u8 v8u'8 v8v'8 v8 u'8 v'8
"
#
$ $ $ $ $ $ $ $ $ $
%
&
' ' ' ' ' ' ' ' ' '
E11E12E13E21
E22
E23
E31
E32
"
#
$ $ $ $ $ $ $ $ $ $
%
&
' ' ' ' ' ' ' ' ' '
= −
11111111
"
#
$ $ $ $ $ $ $ $ $ $
%
&
' ' ' ' ' ' ' ' ' '
• Solve for E11 to E32 These are elements of the Essential Matrix
• Then solve for R, t
Input: 8 corresponding points in two images 1pi=K1-1qi, 2pi’=K2
-1qi’ 1pTE 2p ' = 0 with E = [(1t2 )×]2
1R
E = [(1t2 )×]21R
7
CSE152, Spring 2019
Sketch of Two View SFM Algorithm Input: Two images 1. Detect feature points in each image 2. Using intrinsic parameters from calibration, compute
pi,j = Kj-1 qi,j
1. Find 8 matching feature points (easier said than done) 2. Compute the Essential Matrix E using 8-point Algorithm 3. Compute R and t (recall that E=[tx]R where [tx] is a skew
symmetric matrix). t can only be recovered up to a scale factor
4. Perform stereo matching using recovered epipolar geometry expressed via E
5. Reconstruct 3-D positions of corresponding points using R,T and pi,j
CSE152, Spring 2019
N-view structure from motion
8
CSE152, Spring 2019
Featuredetection
Several images observe a scene from different viewpoints
CSE152, Spring 2019
Featuredetection
Detect features using, for example, SIFT [Lowe, IJCV 2004]
9
CSE152, Spring 2019
FeaturematchingMatchfeaturesbetweeneachpairofimages
CSE152, Spring 2019
Structurefrommotion
Camera1Camera2
Camera3R1,t1
R2,t2
R3,t3
X1
X4
X3
X2
X5
X6
X7
minimize g(R, T, X)
q1,1
q1,2
q1,3
non-linearleastsquares
10
Bundleadjustment• Minimizesumofsquaredreprojectionerrors:
whereqi,j=(ui,j,vi,j)aretheimagecoordinatesandP(xi,Rj,tj)istheprojectionof3DpointxiforacameralocatedattjwithorientationRj.
– Optimizedwithnon-linearleastsquares– Levenberg-Marquardtisapopularchoice
• Practicalchallenges?– Initialization– Outliers
predictedimagelocation
observedimagelocationindicatorvariable:
whetherpointIvisibleinimagej
Inliersvs.Outliers• AsyousawinHW1,noteverymatchwascorrectwhenusing
SIFTdescriptors.• Squarederrorsmetricslike
highlypenalizemismatchesbecausetheygetsquared.• Inliers:Givenamodelwithsomeassumeddistribution,inliers
aredatapointsthatfitthemodel.• Outliersarepointsthatdonotfitthemodel.• Example:linefitting
Outliers:
Inliers:
11
Line Fitting
Line Fitting Given n points (xi, yi), estimate parameters of line
axi + byi - d = 0 subject to the constraint that
a2 + b2 = 1 Note: axi + byi - d is distance from (xi, yi) to line.
) ,( yx1. Minimize E with respect to d:
Where is the mean of the data points ybxabyax
nd
dE n
iii +=+=⇒=
∂
∂∑=1
10
Problem: minimize with respect to (a,b,d).
Cost Function: Sum of squared distances
between each point and the line
(xi,yi)
12
Line fitting cont. 2. Substitute d back into E where n=(a b)T. 3. Minimize E=|Un|2=nTUTUn=nTSn with respect to a, b subject to the constraint nTn = 1. Note that S is given by And it’s a real, symmetric, positive definite
where
S =
Line Fitting – Finished 4. This is a constrained optimization problem in n. Solve
with Lagrange multiplier L(n) = nTSn – λ(nTn – 1)
Take partial derivative (gradient) w.r.t. n and set to 0. ∇L = 2Sn – 2λn = 0
or Sn = λn
n=(a,b) is an Eigenvector of the symmetric matrix S (the one corresponding to the smallest Eigenvalue).
5. d is computed from Step 1.
13
RANdomSampleConsensus
RANSAC
SlidesadaptedfromFrankDellaertandMarcPollefeys
Motivation• Estimatingmodelsinthepresenceofoutliers
– Lines– Transformationsformossaicing– Essentialmatrix– Andothermodels
• Typically:keypointsintwoimages
14
SimplerExample
• Fittingastraightline(modelhaslineparameters)
• Inliers • Outliers
RANSACIdeaappliedtolinefitting
Problem:Givenspointsandthresholdτ,determinebestfitlineinpresenceofoutliesRepeatNtimes
– Selecttwopointsatrandom– Determinelineequationfromthetwopoints– Countnumberofpointsthatarewithindistanceτfromtheline.Thisiscalledthe“support”ofthelineandit’sthenumberofinliers
– Linewiththegreatestsupportwins
15
Whywillthiswork?
Iter 1 # of inliers: 2 # of outliers: 5
τ
Whywillthiswork?
Iter 1 # of inliers: 2 # of outliers: 5 Iter 2 # of inliers: 6 # of outliers: 1 τ
16
RANSACMoreGenerally
• WhatdoweneedtoapplyRANSAC1. Aparameterizedmodel2. Awaytoestimatethemodelparameters
fromsdatapoints{x1,…,xs}3. Giventheparametersofthemodel,awayto
estimatethedistancefromadatapointxitothemodel
RANSACMoreGenerallyObjective
Robust fit of model to data set S which contains outliers Algorithm REPEAT
(i) Randomly select a sample of s data points from S (ii) Instantiate the model from this sample. (iii) Determine the set of data points Si which are within a
distance threshold t of the model. The set Si is the consensus set of samples and defines the inliers of S.
(iv) Slargest = Si if Si is larger than Slargest
UNTIL (The size of Si is greater than some threshold T) OR (There have been N trials)
The model is re-estimated using all the points in Slargest
17
Howmanysamples?(WhatisN?)
ChooseN(numberofsamples)sothat,withprobabilityp,atleastonerandomsampleisfreefromoutliers.e.g.p=0.99
e:proportionofoutlierss:Numberofpointsneededforthemodel
€
N = log 1− p( ) /log 1− 1− e( )s( )€
1− 1− e( )s( )N
=1− p
proportion of outliers e s 5% 10% 20% 25% 30% 40% 50% 2 2 3 5 6 7 11 17 3 3 4 7 9 11 19 35 4 3 5 9 13 17 34 72 5 4 6 12 17 26 57 146 6 4 7 16 24 37 97 293 7 4 8 20 33 54 163 588 8 5 9 26 44 78 272 1177
Acceptableconsensusset?• Typically,terminatewheninlierratioreachesexpectedratio
ofinliers
• Where– N:NumberofSamples– E:proportionofoutliers– T:Sizeofconsensussetwhentostop
€
T = 1− e( )N
18
UsingRANSACtoestimatetheEssentialMatrix
• Whatisthemodel?
• Howmany“points”areneeded,andwheredotheycomefrom?
• Whatdistancedoweusetocomputetheconsensusset?
• Howoftendooutliersoccur
Essential Matrix (8 parameters) 8 points in each image or 8 matched pairs (usually use this) L2 distance of points to epipolar line Usually not known.
Featurepointsextractedbyacornerdetector
19
MatchedpointsbyRANSAC
Putativematchesofthefeaturepointsinbothimagesarecomputedbyusingacorrelationmeasureforpointsinoneimagewithafeaturesintheotherimage.Onlyfeatureswithinasmallwindowareconsideredtolimitcomputationtime.Mutuallybestmatchesareretained.RANSACisusedtorobustlydetermineFfromtheseputativematches.
EpipolarGeometryfromMatchedPoints