Structure from Motion - University of California, San...

19
1 CSE152, Spring 2019 Intro Computer Vision Structure from Motion Introduction to Computer Vision CSE 152 Lecture 9 CSE152, Spring 2019 Intro Computer Vision Announcements HW2 assigned Midterm moved to Monday 5/13 next class after HW2 due date.

Transcript of Structure from Motion - University of California, San...

Page 1: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

1

CSE152, Spring 2019 Intro Computer Vision

Structure from Motion

Introduction to Computer Vision CSE 152 Lecture 9

CSE152, Spring 2019 Intro Computer Vision

Announcements •  HW2 assigned •  Midterm moved to Monday 5/13 – next class after

HW2 due date.

Page 2: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

2

CSE152, Spring 2019

Rectification

Original images

Rectified images

CSE152, Spring 2019

Rectification

(uL,vL)

(xL, yL) (uR,vR)

(xR, yR) €

xRyRwR

"

#

$ $ $

%

&

' ' '

= HR

uRvR1

"

#

$ $ $

%

&

' ' '

xLyLwL

"

#

$ $ $

%

&

' ' '

= HL

uLvL1

"

#

$ $ $

%

&

' ' '

Under perspective projection, the mapping from a plane to a plane is given by a linear transformation of homogeneous coordinates (called a projective transformation or homography).

.

Two images Two homographies

HL, HR

e1

e2

Question: Where are the epipoles in the rectified image?

Page 3: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

3

CSE152, Spring 2019

How is warping done? Forward Method •  Input: Source image: I and

Rectification matrix H •  For each corner cs of Source image in

homogenous coordinates, compute ct=Hcs

•  Compute smallest and largest x and y of ct’s, determine bounding box on target image, create target image T with size of bounding box.

•  For each pixel coordinate ps (homogenous) in the Source image, compute location in the Target image as pt=Hps. Copy I(ps) to T(pt)

Source

Target

cs

ct

CSE152, Spring 2019

Problem with Forward Method •  There’s no guarantee that every pixel

in Target Image will be written to.

•  If Target Image is larger than Source or Target is highly stretched, there may be missing points that appear as speckles or lines. Source

Target

Page 4: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

4

CSE152, Spring 2019

How is warping done? Backward method •  Input: Source image: I and

rectification matrix H •  For each corner cs of Source image in

homogenous coordinates, compute ct=Hcs

•  Compute smallest and largest x and y of ct’s, determine bounding box on target image, create target image T with size of bounding box.

•  For each pixel coordinate pt (homogenous) in the Target, compute location in the Source as ps=H-1pt. –  If ps is within source image, copy I(ps) to

T(pt)

Source

Target

CSE152, Spring 2019

Estimate3Dstructurefromimages

Page 5: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

5

CSE152, Spring 2019

How many views and how many points are needed to solve this?

Consider M images of N points, how many unknowns 1. Affix world coordinate system to location of first

camera frame: (M-1)*6 unknowns for cameras 2. 3-D Structure: 3*N unknowns for points 3. Can only recover structure and motion up to scale

factor (one fewer unknown

Total number of unknowns: (M-1)*6+3*N-1 Total number of measurements: 2*M*N Solution is possible when more measurements than

unknowns: (M-1)*6+3*N-1 ≤ 2*M*N Some values of N and M satisfying this:

M = 2, M = 3,

N = 5 N = 4

CSE152, Spring 2019

Two view structure from motion

Page 6: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

6

CSE152, Spring 2019

Epipolar Constraint: Calibrated Case

Essential Matrix (Longuet-Higgins, 1981)

1p ⋅ 1t2 × ( 21R 2p ')⎡

⎣⎤⎦= 0

The vectors , Op! "!

and are coplanar O ' p! "!!!

'OO '! "!!!

1pTE 2p ' = 0 with E = [(1t2 )×]21R

skew

{1} {2}

CSE152, Spring 2019

The Eight-Point Algorithm (Longuet-Higgins, 1981)

uu ' uv ' u vu ' vv ' v u ' v ' 1⎡⎣⎢

⎤⎦⎥

E11E12E13E21E22E23E31E32E33

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

= 0

•  Set E33 to 1 •  Use 8 points (ui,vi), i=1..8

u,v,1[ ]E11 E12 E13E21 E22 E23

E31 E32 E33

"

#

$ $ $

%

&

' ' '

u'v '1

"

#

$ $ $

%

&

' ' '

= 0

u1u'1 u1v'1 u1 v1u'1 v1v'1 v1 u'1 v'1u2u'2 u2v'2 u2 v2u'2 v2v'2 v2 u'2 v'2u3u'3 u3v'3 u3 v3u'3 v3v'3 v3 u'3 v'3u4u'4 u4v'4 u4 v4u'4 v4v'4 v4 u'4 v'4u5u'5 u5v'5 u5 v5u'5 v5v'5 v5 u'5 v'5u6u'6 u6v'6 u6 v6u'6 v6v'6 v6 u'6 v'6u7u'7 u7v'7 u7 v7u'7 v7v'7 v7 u'7 v'7u8u'8 u8v'8 u8 v8u'8 v8v'8 v8 u'8 v'8

"

#

$ $ $ $ $ $ $ $ $ $

%

&

' ' ' ' ' ' ' ' ' '

E11E12E13E21

E22

E23

E31

E32

"

#

$ $ $ $ $ $ $ $ $ $

%

&

' ' ' ' ' ' ' ' ' '

= −

11111111

"

#

$ $ $ $ $ $ $ $ $ $

%

&

' ' ' ' ' ' ' ' ' '

•  Solve for E11 to E32 These are elements of the Essential Matrix

•  Then solve for R, t

Input: 8 corresponding points in two images 1pi=K1-1qi, 2pi’=K2

-1qi’ 1pTE 2p ' = 0 with E = [(1t2 )×]2

1R

E = [(1t2 )×]21R

Page 7: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

7

CSE152, Spring 2019

Sketch of Two View SFM Algorithm Input: Two images 1.  Detect feature points in each image 2.  Using intrinsic parameters from calibration, compute

pi,j = Kj-1 qi,j

1.  Find 8 matching feature points (easier said than done) 2.  Compute the Essential Matrix E using 8-point Algorithm 3.  Compute R and t (recall that E=[tx]R where [tx] is a skew

symmetric matrix). t can only be recovered up to a scale factor

4.  Perform stereo matching using recovered epipolar geometry expressed via E

5.  Reconstruct 3-D positions of corresponding points using R,T and pi,j

CSE152, Spring 2019

N-view structure from motion

Page 8: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

8

CSE152, Spring 2019

Featuredetection

Several images observe a scene from different viewpoints

CSE152, Spring 2019

Featuredetection

Detect features using, for example, SIFT [Lowe, IJCV 2004]

Page 9: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

9

CSE152, Spring 2019

FeaturematchingMatchfeaturesbetweeneachpairofimages

CSE152, Spring 2019

Structurefrommotion

Camera1Camera2

Camera3R1,t1

R2,t2

R3,t3

X1

X4

X3

X2

X5

X6

X7

minimize g(R, T, X)

q1,1

q1,2

q1,3

non-linearleastsquares

Page 10: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

10

Bundleadjustment•  Minimizesumofsquaredreprojectionerrors:

whereqi,j=(ui,j,vi,j)aretheimagecoordinatesandP(xi,Rj,tj)istheprojectionof3DpointxiforacameralocatedattjwithorientationRj.

–  Optimizedwithnon-linearleastsquares–  Levenberg-Marquardtisapopularchoice

•  Practicalchallenges?–  Initialization–  Outliers

predictedimagelocation

observedimagelocationindicatorvariable:

whetherpointIvisibleinimagej

Inliersvs.Outliers•  AsyousawinHW1,noteverymatchwascorrectwhenusing

SIFTdescriptors.•  Squarederrorsmetricslike

highlypenalizemismatchesbecausetheygetsquared.•  Inliers:Givenamodelwithsomeassumeddistribution,inliers

aredatapointsthatfitthemodel.•  Outliersarepointsthatdonotfitthemodel.•  Example:linefitting

Outliers:

Inliers:

Page 11: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

11

Line Fitting

Line Fitting Given n points (xi, yi), estimate parameters of line

axi + byi - d = 0 subject to the constraint that

a2 + b2 = 1 Note: axi + byi - d is distance from (xi, yi) to line.

) ,( yx1. Minimize E with respect to d:

Where is the mean of the data points ybxabyax

nd

dE n

iii +=+=⇒=

∂∑=1

10

Problem: minimize with respect to (a,b,d).

Cost Function: Sum of squared distances

between each point and the line

(xi,yi)

Page 12: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

12

Line fitting cont. 2. Substitute d back into E where n=(a b)T. 3. Minimize E=|Un|2=nTUTUn=nTSn with respect to a, b subject to the constraint nTn = 1. Note that S is given by And it’s a real, symmetric, positive definite

where

S =

Line Fitting – Finished 4. This is a constrained optimization problem in n. Solve

with Lagrange multiplier L(n) = nTSn – λ(nTn – 1)

Take partial derivative (gradient) w.r.t. n and set to 0. ∇L = 2Sn – 2λn = 0

or Sn = λn

n=(a,b) is an Eigenvector of the symmetric matrix S (the one corresponding to the smallest Eigenvalue).

5. d is computed from Step 1.

Page 13: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

13

RANdomSampleConsensus

RANSAC

SlidesadaptedfromFrankDellaertandMarcPollefeys

Motivation•  Estimatingmodelsinthepresenceofoutliers

– Lines– Transformationsformossaicing– Essentialmatrix– Andothermodels

•  Typically:keypointsintwoimages

Page 14: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

14

SimplerExample

•  Fittingastraightline(modelhaslineparameters)

•  Inliers •  Outliers

RANSACIdeaappliedtolinefitting

Problem:Givenspointsandthresholdτ,determinebestfitlineinpresenceofoutliesRepeatNtimes

– Selecttwopointsatrandom– Determinelineequationfromthetwopoints– Countnumberofpointsthatarewithindistanceτfromtheline.Thisiscalledthe“support”ofthelineandit’sthenumberofinliers

– Linewiththegreatestsupportwins

Page 15: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

15

Whywillthiswork?

Iter 1 # of inliers: 2 # of outliers: 5

τ

Whywillthiswork?

Iter 1 # of inliers: 2 # of outliers: 5 Iter 2 # of inliers: 6 # of outliers: 1 τ

Page 16: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

16

RANSACMoreGenerally

•  WhatdoweneedtoapplyRANSAC1.  Aparameterizedmodel2.  Awaytoestimatethemodelparameters

fromsdatapoints{x1,…,xs}3.  Giventheparametersofthemodel,awayto

estimatethedistancefromadatapointxitothemodel

RANSACMoreGenerallyObjective

Robust fit of model to data set S which contains outliers Algorithm REPEAT

(i)  Randomly select a sample of s data points from S (ii)  Instantiate the model from this sample. (iii)  Determine the set of data points Si which are within a

distance threshold t of the model. The set Si is the consensus set of samples and defines the inliers of S.

(iv)  Slargest = Si if Si is larger than Slargest

UNTIL (The size of Si is greater than some threshold T) OR (There have been N trials)

The model is re-estimated using all the points in Slargest

Page 17: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

17

Howmanysamples?(WhatisN?)

ChooseN(numberofsamples)sothat,withprobabilityp,atleastonerandomsampleisfreefromoutliers.e.g.p=0.99

e:proportionofoutlierss:Numberofpointsneededforthemodel

N = log 1− p( ) /log 1− 1− e( )s( )€

1− 1− e( )s( )N

=1− p

proportion of outliers e s 5% 10% 20% 25% 30% 40% 50% 2 2 3 5 6 7 11 17 3 3 4 7 9 11 19 35 4 3 5 9 13 17 34 72 5 4 6 12 17 26 57 146 6 4 7 16 24 37 97 293 7 4 8 20 33 54 163 588 8 5 9 26 44 78 272 1177

Acceptableconsensusset?•  Typically,terminatewheninlierratioreachesexpectedratio

ofinliers

•  Where–  N:NumberofSamples–  E:proportionofoutliers–  T:Sizeofconsensussetwhentostop

T = 1− e( )N

Page 18: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

18

UsingRANSACtoestimatetheEssentialMatrix

•  Whatisthemodel?

•  Howmany“points”areneeded,andwheredotheycomefrom?

•  Whatdistancedoweusetocomputetheconsensusset?

•  Howoftendooutliersoccur

Essential Matrix (8 parameters) 8 points in each image or 8 matched pairs (usually use this) L2 distance of points to epipolar line Usually not known.

Featurepointsextractedbyacornerdetector

Page 19: Structure from Motion - University of California, San Diegocseweb.ucsd.edu/classes/sp19/cse152-a/lec9.pdf · 2019-05-01 · Structure from Motion Introduction to Computer Vision CSE

19

MatchedpointsbyRANSAC

Putativematchesofthefeaturepointsinbothimagesarecomputedbyusingacorrelationmeasureforpointsinoneimagewithafeaturesintheotherimage.Onlyfeatureswithinasmallwindowareconsideredtolimitcomputationtime.Mutuallybestmatchesareretained.RANSACisusedtorobustlydetermineFfromtheseputativematches.

EpipolarGeometryfromMatchedPoints