Lecture 9 - Fundamental Matrix & Structure from...
Transcript of Lecture 9 - Fundamental Matrix & Structure from...
Fundamental Matrix & Structure from
Motion Instructor - Simon Lucey
16-423 - Designing Computer Vision Apps
Today
• Transformations between images
• Structure from Motion
• The Essential Matrix
• The Fundamental Matrix
Transformations between images
• So far we have considered transformations between the image and a plane in the world
• Now consider two cameras viewing the same plane
• There is a homography between camera 1 and the plane and a second homography between camera 2 and the plane
• It follows that the relation between the two images is also a homography
Camera under pure rotation
Special case is camera under pure rotation.Homography can be showed to be
Camera under pure rotation
Special case is camera under pure rotation.Homography can be showed to be
Why is this?
Panorama Example418 15 Models for transformations
a) b) c)
d)
Figure 15.22 Computing visual panoramas. a-c) Three images of the samescene where the camera has rotated but not translated. Five matching pointshave been identified by hand between each pair. d) A panorama can becreated by mapping the first and third images into the frame of reference ofthe second image.
(middle) photo into the center of a much larger empty image. Then 5 matches wereidentified by hand between this expanded second image and the first image andthe homography from the expanded second image to the first image was computed.For each empty pixel in the expanded second image we compute the position inthe first image using this homography. If this falls within the image boundaries,we copy the pixel value.
This process can be completely automated by finding features and fitting ahomography to each pair of images using a robust technique such as RANSAC. Ina real system, the final result is typically projected onto a cylinder and unrolled togive a more visually pleasing result.
Discussion
This chapter has presented a number of important ideas. First, we discussed afamily of transformations and how each can be related to a camera viewing the sceneunder special conditions. These transformations are used widely within machinevision and we will see them exploited in chapters 17, 18 and 19. Second, we
Copyright c�2011,2012 by Simon Prince; published by Cambridge University Press 2012.For personal use only, not for distribution.
Review - Homography
• Start with basic projection equation:
• Combining these two matrices we get:
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Review - Homography
Homogeneous:
Cartesian:
For short:
Review - Homography
Homogeneous:
Cartesian:
For short:
How many unknowns?
Homography Estimation
• Re-arrange cartesian equations,
15.3 Inference in transformation models 401
�
2
4xi
yi
1
3
5 =
2
4�11 �12 �13
�21 �22 �23
�31 �32 �33
3
5
2
4ui
vi
1
3
5 . (15.34)
Each homogeneous coordinate can be considered as a direction in 3D space(figure 14.11). So, this equation states that the left-hand side x̃
i
represents thesame direction in space as the right-hand side, �w̃
i
. If this is the case, their crossproduct must be zero, so that
x̃⇥�w̃ = 0. (15.35)
Writing this constraint in full gives the relationsProblem 15.9
2
4y(�31u+ �32v + �33)� (�21u+ �22v + �23)(�11u+ �12v + �13)� x(�31u+ �32v + �33)x(�21u+ �22v + �23)� y(�11u+ �12v + �13)
3
5 = 0. (15.36)
This appears to provide three linear constraints on the elements of �. However,only two of these three equations are independent, so we discard the third. Wenow stack the first two constraints from each of the I pairs of points {x
i
,wi
} toform the system
2
6666666664
0 0 0 �u1 �v1 �1 y1u1 y1v1 y1u1 v1 1 0 0 0 �x1u1 �x1v1 �x1
0 0 0 �u2 �v2 �1 y2u2 y2v2 y2u2 v2 1 0 0 0 �x2u2 �x2v2 �x2...
......
......
......
......
0 0 0 �uI
�vI
�1 yI
uI
yI
vI
yI
uI
vI
1 0 0 0 �xI
uI
�xI
vI
�xI
3
7777777775
2
6666666666664
�11
�12
�13
�21
�22
�23
�31
�32
�33
3
7777777777775
= 0, (15.37)
which has the form A� = 0.We solve this system of equations in a least squares sense with the constraint
�T� = 1 to prevent the trivial solution � = 0. This is a standard problem (seeappendix C.7.2). To find the solution, we compute the SVDA = ULVT and choose� to be the last column of V. This is reshaped into a 3⇥3 matrix � and used as astarting point for the nonlinear optimization of the true criterion (equation 15.33).
15.3 Inference in transformation models
We have introduced four transformations (Euclidean, similarity, a�ne, projective)Algorithm 15.5
that relate positions w on a real-world plane to their projected positions x in the
Copyright c�2011,2012 by Simon Prince; published by Cambridge University Press 2012.For personal use only, not for distribution.
• Form linear system
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Homography Estimation
• In MATLAB this becomes, >> [U,S,V] = svd(A); >> Phi = reshape(V(:,end),[3,3])’;
• Both sides are 3x1 vectors; should be parallel, so cross product will be zero
• For you to try MATLAB, >> x = [randn(2,1);1]; cross(x,4*x)
Caution…
• Approach only minimizes algebraic error NOT the re-
projection error!!!
• Need to employ non-linear optimization.
Estimating Extrinsics
Estimating Extrinsics
• Writing out the camera equations in full
• Estimate the homography from matched points• Factor out the intrinsic parameters
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Estimating Extrinsics
• Find the last column using the cross product of first two columns
• Make sure the determinant is 1. If it is -1, then multiply last column by -1.
• Find translation scaling factor between old and new values
• Finally, set
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Augmented Reality
Today
• Transformations between images
• Structure from Motion
• The Essential Matrix
• The Fundamental Matrix
Taken from Agarwal et al. “Building Rome in a Day”, ICCV 2009.
Taken from Agarwal et al. “Building Rome in a Day”, ICCV 2009.
Structure from Motion
For simplicity, we’ll start with simpler problem
• Just J=2 images• Known intrinsic matrix
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Epipolar lines
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Epipole
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Special configurations
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Today
• Transformations between images
• Structure from Motion
• The Essential Matrix
• The Fundamental Matrix
Review - Cross Product
• Cross product between two 3D vectors is defined as,
• Operation is equivalent to the,
• Can be written in short as,
614 C Linear algebra
C.1.3 Cross product
The cross product or vector product is specialized to three dimensions. The opera-tion c = a⇥ b is equivalent to the matrix multiplication (section C.2.1):
2
4c1c2c3
3
5 =
2
40 �a3 a2a3 0 �a1�a2 a1 0
3
5
2
4b1b2b3
3
5 , (C.3)
or for short
c = A⇥b, (C.4)
where A⇥ is the 3⇥3 matrix from equation C.3 that implements the cross product.
It is easily shown that the result c of the cross product is orthogonal to both aand b. In other words,
aT (a⇥ b) = bT (a⇥ b) = 0. (C.5)
C.2 Matrices
Matrices are used extensively throughout the book and are written as bold, capital,Roman or Greek letters (e.g., A,�). We categorize matrices as landscape (morecolumns than rows), square (the same number of columns and rows) or portrait(more rows than columns). They are always indexed by row first and then column,so a
ij
denotes the element of matrix A at the ith row and the jth column.
A diagonal matrix is a square matrix with zeros everywhere except on thediagonal (i.e., elements a
ii
) where the elements may take any value. An importantspecial case of a diagonal matrix is the identitymatrix I. This has zeros everywhereexcept for the diagonal, where all the elements are ones.
C.2.1 Matrix multiplication
To take the matrix product C = AB, we compute the elements of C as
cij
=KX
k=1
aik
bkj
. (C.6)
This can only be done when the number of columns in A equals the number ofrows in B. Matrix multiplication is associative so that A(BC) = (AB)C = ABC.However it is not commutative so that in general AB 6= BA.
Copyright c�2011,2012 by Simon Prince; published by Cambridge University Press 2012.For personal use only, not for distribution.
614 C Linear algebra
C.1.3 Cross product
The cross product or vector product is specialized to three dimensions. The opera-tion c = a⇥ b is equivalent to the matrix multiplication (section C.2.1):
2
4c1c2c3
3
5 =
2
40 �a3 a2a3 0 �a1�a2 a1 0
3
5
2
4b1b2b3
3
5 , (C.3)
or for short
c = A⇥b, (C.4)
where A⇥ is the 3⇥3 matrix from equation C.3 that implements the cross product.
It is easily shown that the result c of the cross product is orthogonal to both aand b. In other words,
aT (a⇥ b) = bT (a⇥ b) = 0. (C.5)
C.2 Matrices
Matrices are used extensively throughout the book and are written as bold, capital,Roman or Greek letters (e.g., A,�). We categorize matrices as landscape (morecolumns than rows), square (the same number of columns and rows) or portrait(more rows than columns). They are always indexed by row first and then column,so a
ij
denotes the element of matrix A at the ith row and the jth column.
A diagonal matrix is a square matrix with zeros everywhere except on thediagonal (i.e., elements a
ii
) where the elements may take any value. An importantspecial case of a diagonal matrix is the identitymatrix I. This has zeros everywhereexcept for the diagonal, where all the elements are ones.
C.2.1 Matrix multiplication
To take the matrix product C = AB, we compute the elements of C as
cij
=KX
k=1
aik
bkj
. (C.6)
This can only be done when the number of columns in A equals the number ofrows in B. Matrix multiplication is associative so that A(BC) = (AB)C = ABC.However it is not commutative so that in general AB 6= BA.
Copyright c�2011,2012 by Simon Prince; published by Cambridge University Press 2012.For personal use only, not for distribution.
c = [a]⇥b
Review - Cross Product
• In other words,
614 C Linear algebra
C.1.3 Cross product
The cross product or vector product is specialized to three dimensions. The opera-tion c = a⇥ b is equivalent to the matrix multiplication (section C.2.1):
2
4c1c2c3
3
5 =
2
40 �a3 a2a3 0 �a1�a2 a1 0
3
5
2
4b1b2b3
3
5 , (C.3)
or for short
c = A⇥b, (C.4)
where A⇥ is the 3⇥3 matrix from equation C.3 that implements the cross product.
It is easily shown that the result c of the cross product is orthogonal to both aand b. In other words,
aT (a⇥ b) = bT (a⇥ b) = 0. (C.5)
C.2 Matrices
Matrices are used extensively throughout the book and are written as bold, capital,Roman or Greek letters (e.g., A,�). We categorize matrices as landscape (morecolumns than rows), square (the same number of columns and rows) or portrait(more rows than columns). They are always indexed by row first and then column,so a
ij
denotes the element of matrix A at the ith row and the jth column.
A diagonal matrix is a square matrix with zeros everywhere except on thediagonal (i.e., elements a
ii
) where the elements may take any value. An importantspecial case of a diagonal matrix is the identitymatrix I. This has zeros everywhereexcept for the diagonal, where all the elements are ones.
C.2.1 Matrix multiplication
To take the matrix product C = AB, we compute the elements of C as
cij
=KX
k=1
aik
bkj
. (C.6)
This can only be done when the number of columns in A equals the number ofrows in B. Matrix multiplication is associative so that A(BC) = (AB)C = ABC.However it is not commutative so that in general AB 6= BA.
Copyright c�2011,2012 by Simon Prince; published by Cambridge University Press 2012.For personal use only, not for distribution.
Right-hand rule
• Given three orthonormal 3D vectors (x,y, z)
z = x⇥ y
• Then,
First camera:
Second camera:
Substituting:
This is a mathematical relationship between the points in the two images, but it’s not in the most convenient form.
The Essential Matrix
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
The Essential Matrix
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
The cross product term can be expressed as a matrix
Defining:
We now have the essential matrix relation
The Essential Matrix
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
• Rank 2:
• 5 degrees of freedom
• Non-linear constraints between elements
Properties of the Essential Matrix
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Recovering Epipolar Lines
Equation of a line:
or
or
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
T
Recovering Epipolar Lines
Equation of a line:
Now consider
This has the form where
So the epipolar lines are
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
T
T
Every epipolar line in image 1 passes through the epipole e1.
In other words for ALL
This can only be true if e1 is in the nullspace of E.
Similarly:
We find the null spaces by computing , and taking the last column of and the last row of .
Recovering Epipolar Lines
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Essential matrix:
To recover translation and rotation use the matrix:
We take the SVD and then we set
Decomposition of E
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Four Interpretations
To get the different
solutions, we mutliply τ by -1 and substitute
Today
• Transformations between images
• Structure from Motion
• The Essential Matrix
• The Fundamental Matrix
The Fundamental Matrix Song
Taken from Daniel Wedge’s home page - http://danielwedge.com/fmatrix/
The Fundamental Matrix Song
Taken from Daniel Wedge’s home page - http://danielwedge.com/fmatrix/
The Fundamental MatrixNow consider two cameras that are not normalised
By a similar procedure to before, we get the relation
orwhere
Relation between essential and fundamental
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Estimation of the Fundamental Matrix
Estimation of Fundamental Matrix
When the fundamental matrix is correct, the epipolar line induced by a point in the first image should pass through the matching point in the second image and vice-versa.
This suggests the criterion
If and then
Unfortunately, there is no closed form solution for this quantity.
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
The 8 Point AlgorithmApproach:
• solve for fundamental matrix using homogeneous coordinates
• closed form solution (but to wrong problem!)• Known as the 8 point algorithm
Start with fundamental matrix relation
Writing out in full:
or
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
The 8 Point Algorithm
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Can be written as:
where
Stacking together constraints from at least 8 pairs of points, we get the system of equations
Minimum direction problem of the form Find minimum of subject to .
To solve, compute the SVD and then set to the last column of .
The 8 Point Algorithm
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Fitting Concerns•This procedure does not ensure that solution is rank 2. Solution: set last singular value to zero.
•Can be unreliable because of numerical problems to do with the data scaling – better to re-scale the data first
•Needs 8 points in general positions (cannot all be planar).
•Fails if there is not sufficient translation between the views
•Use this solution to start non-linear optimisation of true criterion (must ensure non-linear constraints obeyed).
•There is also a 7 point algorithm.
More to read…
• Prince et al. • Chapter 16, Sections 1-3.