Post on 20-Dec-2015
Stanford CS223B Computer Vision, Winter 2007
Lecture 8 Structure From Motion
Professors Sebastian Thrun and Jana Košecká
CAs: Vaibhav Vaish and David Stavens
Slide credit: Gary Bradski, Stanford SAIL
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Summary SFM
Problem– Determine feature locations (=structure)– Determine camera extrinsic (=motion)
Two Principal Solutions– Bundle adjustment (nonlinear least squares, local
minima)– SVD (through orthographic approximation, affine
geometry) Correspondence
– (RANSAC)– Expectation Maximization
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Structure From Motion
camera
features
Recover: structure (feature locations), motion (camera extrinsics)
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
SFM = Holy Grail of 3D Reconstruction
Take movie of object Reconstruct 3D model
Would be
commercially
highly viable
live.com
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Structure From Motion (1)
[Tomasi & Kanade 92]
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Structure From Motion (2)
[Tomasi & Kanade 92]
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Structure From Motion (3)
[Tomasi & Kanade 92]
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Structure From Motion (4a): Images
Marc Pollefeys
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Structure From Motion (4b)
Marc Pollefeys
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Structure From Motion (5)
http://www.cs.unc.edu/Research/urbanscape
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Structure From Motion
Problem 1:– Given n points pij =(xij, yij) in m images
– Reconstruct structure: 3-D locations Pj =(xj, yj, zj)
– Reconstruct camera positions (extrinsics) Mi=(Aj, bj)
Problem 2:– Establish correspondence: c(pij)
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Structure From Motion
camera
features
Recover: structure (feature locations), motion (camera extrinsics)
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Recovery Problems
1 image 2+ images
Location known calibration stereo
Location unknown
SFM, stitching
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
SFM: General Formulation
iz
jz
jy
jx
ii
ii
ii
ii
iy
ix
jz
jy
jx
ii
ii
ii
ii
ii
ii
jy
jx
b
P
P
P
b
b
P
P
P
fp
p
,
,
,
,
,
,
,
,
,
,
,
cossin0
sincos0
001
cos0sin
010
sin0cos
100
cossin0
sincos0
001
cos0sin
010
sin0cos
0cossin
0sincos
fZ Z
fXx
XO
-x
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
SFM: Bundle Adjustment
min
cossin0
sincos0
001
cos0sin
010
sin0cos
100
cossin0
sincos0
001
cos0sin
010
sin0cos
0cossin
0sincos
2
,
,
,
,
,
,
,
,
,
,
,
,
ji
iz
jz
jy
jx
ii
ii
ii
ii
iy
ix
jz
jy
jx
ii
ii
ii
ii
ii
ii
jy
jx
b
P
P
P
b
b
P
P
P
fp
p
fZ Z
fXx
XO
-x
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Bundle Adjustment
SFM = Nonlinear Least Squares problem Minimize through
– Gradient Descent– Conjugate Gradient– Gauss-Newton– Levenberg Marquardt common method
Prone to local minima
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Count # Constraints vs #Unknowns
m camera poses n points 2mn point constraints 6m+3n unknowns
Suggests: need 2mn 6m + 3n But: Can we really recover all parameters???
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
How Many Parameters Can’t We Recover?
0 3 6 7 8 10 12 n m nm
Place Your Bet!
We can recover all but…
m = #camera posesn = # feature points
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Count # Constraints vs #Unknowns
m camera poses n points 2mn point constraints 6m+3n unknowns
Suggests: need 2mn 6m + 3n But: Can we really recover all parameters???
– Can’t recover origin, orientation (6 params)– Can’t recover scale (1 param)
Thus, we need 2mn 6m + 3n - 7
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Are we done?
No, bundle adjustment has many local minima.
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
The “Trick Of The Day”
Replace Perspective by Orthographic Geometry
Replace Euclidean Geometry by Affine Geometry
Solve SFM linearly via PCA (“closed” form, globally optimal)
Post-Process to make solution Euclidean
Post-Process to make solution perspective
By Tomasi and Kanade, 1992
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Orthographic Camera Model
Orthographic = Limit of Pinhole Model:
z
y
x
z
y
x
z
y
x
b
b
b
P
P
P
aaa
aaa
aaa
p
p
p
333231
232221
131211
Extrinsic Parameters
Rotation
Orthographic Projection bAPb
b
P
P
P
a
a
a
a
a
a
p
p
y
x
Z
Y
X
y
x
23
13
22
12
21
11
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Orthographic Projection
Limit of Pinhole Model:
Orthographic Projection
1||
1||
0
22
21
21
a
a
aa
rotation is
333231
232221
131211
aaa
aaa
aaa
ijiij bPAp
featurejcamerai
bAPb
b
P
P
P
a
a
a
a
a
a
p
p
y
x
Z
Y
X
y
x
23
13
22
12
21
11
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
The Orthographic SFM Problem
}{ and },{recover jPii bA
ijiij bPAp featurejcamerai 1||
1||
0
22
21
21
a
a
aa
subject to
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
The Affine SFM Problem
}{ and },{recover jPii bA
ijiij bPAp featurejcamerai 1||
1||
0
22
21
21
a
a
aa
subject todrop theconstraints
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Count # Constraints vs #Unknowns
m camera poses n points 2mn point constraints 8m+3n unknowns
Suggests: need 2mn 8m + 3n But: Can we really recover all parameters???
ijiij bPAp featurejcamerai
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
How Many Parameters Can’t We Recover?
0 3 6 7 8 10 12 n m nm
Place Your Bet!
We can recover all but…
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
The Answer is (at least): 12
ijiij bPAp
iijiij bdAdCPCCAp ))(( :Proof 11
iji bPA
iiiji bdAdAPA
''' ijiij bPAp
dCPCP jj11'
iii bdAb 'singular-non , Cd CAA ii '
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Points for Solving Affine SFM Problem
m camera poses n points
Need to have: 2mn 8m + 3n-12
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Affine SFM
jiij PAp
Fix coordinate systemby making pi0=P0=origin
mj
j
j
p
p
q 1
mA
A
A 1
jj APqm :cameras
ADQn :points
NPPD 1
mn
n
m p
p
p
p
Q
1
1
11
ijiij bPAp
Proof:
3m2 size has A
Rank Theorem: Q has rank 3
nD 3 size has
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
The Rank Theorem
3rank has
1
1
1
1
11
11
Nyy
Nxx
Nyy
Nxx
MM
MM
pp
pp
pp
pp
n elements
2m
ele
me
nts
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Singular Value Decomposition
T
Nyy
Nxx
Nyy
Nxx
VWU
pp
pp
pp
pp
MM
MM
1
1
1
1
11
11
n332 m 33
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Affine Solution to Orthographic SFM
structure affine TWV
positions camera affine U
Gives also the optimal affine reconstruction under noise
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Back To Orthographic Projection
1||
1||
0
sConstraint
22
21
21
a
a
aa
matrix singular -non , vector Cd
with
Find C for which constraints are metSearch in 9-dim space (instead of 8m + 3n-12)
''' ijiij bPAp
dCPCP jj11'
ii CAA '
iii bdAb '
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Back To Projective Geometry
Orthographic (in the limit)
Projective
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Back To Projective Geometry
min
cossin0
sincos0
001
cos0sin
010
sin0cos
100
cossin0
sincos0
001
cos0sin
010
sin0cos
0cossin
0sincos
2
,
,
,
,
,
,
,
,
,
,
,
,
ji
iz
jz
jy
jx
ii
ii
ii
ii
iy
ix
jz
jy
jx
ii
ii
ii
ii
ii
ii
jy
jx
b
P
P
P
b
b
P
P
P
fp
p
fZ Z
fXx
XO
-x
Optimize
Using orthographic solution as starting point
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
The “Trick Of The Day”
Replace Perspective by Orthographic Geometry
Replace Euclidean Geometry by Affine Geometry
Solve SFM linearly via PCA (“closed” form, globally optimal)
Post-Process to make solution Euclidean
Post-Process to make solution perspective
By Tomasi and Kanade, 1992
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Structure From Motion
Problem 1:– Given n points pij =(xij, yij) in m images
– Reconstruct structure: 3-D locations Pj =(xj, yj, zj)
– Reconstruct camera positions (extrinsics) Mi=(Aj, bj)
Problem 2:– Establish correspondence: c(pij)
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
The Correspondence Problem
View 1 View 3View 2
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Correspondence: Solution 1
Track features (e.g., optical flow)
…but fails when images taken from widely different poses
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Correspondence: Solution 2
Start with random solution A, b, P Compute soft correspondence: p(c|A,b,P) Plug soft correspondence into SFM Reiterate
See Dellaert/Seitz/Thorpe/Thrun, Machine Learning Journal, 2003
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Correspondence: Alternative Approach
Ransac [Fisher/Bolles]
= Random sampling and consensus
Will be discussed Wednesday
Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007
Summary SFM
Problem– Determine feature locations (=structure)– Determine camera extrinsic (=motion)
Two Principal Solutions– Bundle adjustment (nonlinear least squares, local
minima)– SVD (through orthographic approximation, affine
geometry) Correspondence
– (RANSAC)– Expectation Maximization