Advanced Computer Vision

76
Advanced Computer Vision Structure from Motion 1 Chapter 7 STRUCTURE FROM MOTION

description

Advanced Computer Vision . Chapter 7 S TRUCTURE FROM M OTION. What Is Structure from Motion?. Study of visual perception. Process of finding the three-dimensional structure of an object by analyzing local motion signals over time. A method for creating 3D models from - PowerPoint PPT Presentation

Transcript of Advanced Computer Vision

Page 1: Advanced Computer Vision

Advanced Computer Vision

Structure from Motion 1

Chapter 7

STRUCTURE FROM MOTION

Page 2: Advanced Computer Vision

What Is Structure from Motion?

1. Study of visual perception.2. Process of finding the three-dimensional

structure of an object by analyzing local motion signals over time.

3. A method for creating 3D models from 2D pictures of an object.

Structure from Motion 2

Page 3: Advanced Computer Vision

Example

Structure from Motion 3

Picture 1 Picture 2

Page 4: Advanced Computer Vision

Example (cont).

Structure from Motion 4

3D model created from the two images

Page 5: Advanced Computer Vision

7.1 Triangulation

• A problem of estimating a point’s 3D location when it is seen from multiple cameras is known as triangulation.

Structure from Motion 5

Page 6: Advanced Computer Vision

• Find the 3D point p that lies closest to all of the 3D rays corresponding to the 2D matching feature locations {xj}

Triangulation (cont).

Structure from Motion 6

Page 7: Advanced Computer Vision

Triangulation (cont).

• Find the 3D point p that lies closest to all of the 3D rays corresponding to the 2D matching feature locations {xj} observed by cameras

{Pj = Kj [Rj | tj] }

tj = -Rjcj

cj is the jth camera center.

Structure from Motion 7

Page 8: Advanced Computer Vision

Triangulation (cont).

• It is a converse of pose estimation problem.• Given projection matrices, 3D points can be

computed from their measured image positions in two or more views.

Structure from Motion 8

Page 9: Advanced Computer Vision

Triangulation (cont).

Structure from Motion 9

Page 10: Advanced Computer Vision

Triangulation (cont).

Structure from Motion 10

Page 11: Advanced Computer Vision

Triangulation (cont).

Structure from Motion 11

Page 12: Advanced Computer Vision

Triangulation (cont).

Structure from Motion 12

• x = PX {P = K [R|t] }

Page 13: Advanced Computer Vision

Triangulation (cont).

Structure from Motion 13

Figure 7.7: 3D point triangulation by finding the points p that lies nearest to all of the optical rays

Page 14: Advanced Computer Vision

Triangulation (cont).

• The rays originate at cj in a direction

• The nearest point to p on this ray, which is denoted as qj, minimizes the distance.

which has a minimum at Hence,

Structure from Motion 14

(p-cj) -

Page 15: Advanced Computer Vision

Triangulation (cont).

• The squared distance between p and qj is

• The optimal value for p, which lies closest to all of the rays, can be computed as a regular least square problem by summing over all the rj

2 and finding the optimal value of p,

Structure from Motion 15

(p-cj) -

Page 16: Advanced Computer Vision

Triangulation (cont).

•  

Structure from Motion 16

Page 17: Advanced Computer Vision

Triangulation (cont).

•  

Structure from Motion 17

Page 18: Advanced Computer Vision

Triangulation (cont).

• If we use homogeneous coordinates p=(X,Y,Z,W), the resulting set of equation is homogeneous and is solved as singular value decomposition (SVD).

• If we set W=1, we can use regular linear least square, but the resulting system may be singular or poorly coordinated (i.e. all of the viewing rays are parallel).

Structure from Motion 18

Page 19: Advanced Computer Vision

Singular Value Decomposition (SVD).

Structure from Motion 19

Page 20: Advanced Computer Vision

Singular Value Decomposition (SVD).

Structure from Motion 20

RotationRotation

Page 21: Advanced Computer Vision

Singular Value Decomposition (SVD).• Solution is the eigenvector corresponding to

the minimum eigenvalue of AAT

• AAT= UΣVT VΣTUT = U(ΣΣT)UT

• It is also the eigenvector corresponding to the minimum eigenvalue of A

Structure from Motion 21

Page 22: Advanced Computer Vision

Least Square

Structure from Motion 22

Page 23: Advanced Computer Vision

Linear Least Square Problem

Structure from Motion 23

Page 24: Advanced Computer Vision

Linear Least Square Problem

• Minimize F(X):

• Partial differential over X0, X1:

• Solve X0, X1 by combining two equations

Structure from Motion 24

Page 25: Advanced Computer Vision

7.2Two-Frame Structure from Motion

• In 3D reconstruction we have always

assumed that either 3D points position or the

3D camera poses are known in advance.

Structure from Motion 25

Page 26: Advanced Computer Vision

Two-Frame Structure from Motion (cont).

Structure from Motion 26

Figure 7.8: Epipolar geometry: The vectors t=c1 – c0, p – c0 and p-c1 are co-planar and the basic epipolar constraint expressed in terms of the pixel measurement x0 and x1

Page 27: Advanced Computer Vision

Two-Frame Structure from Motion (cont).

• Figure shows a 3D point p being viewed from two cameras whose relative position can be encoded by a rotation R and a translation t.

• We do not know anything about the camera positions, without loss of generality.

• We can set the first camera at the origin c0=0 and at a canonical orientation R0=I

Structure from Motion 27

Page 28: Advanced Computer Vision

Two-Frame Structure from Motion (cont).

• The observed location of point p in the first image, is mapped into the second image by the transformation

: the ray direction vectors.

Structure from Motion 28

Page 29: Advanced Computer Vision

Two-Frame Structure from Motion (cont).

Structure from Motion 29

• Taking the cross product of both the sides with t in order to annihilate it on the right hand side yields

• Taking the dot product of both the sides with yields

Page 30: Advanced Computer Vision

Two-Frame Structure from Motion (cont).

• The right hand side is triple product with two identical entries

• We therefore arrive at the basic epipolar constraint

: essential matrix

Structure from Motion 30

Page 31: Advanced Computer Vision

• The essential matrix E maps a point in image 0 into a line in image 1 since

Two-Frame Structure from Motion (cont).

Structure from Motion 31

Page 32: Advanced Computer Vision

Two-Frame Structure from Motion (cont).

• All such lines must pass through the second epipole e1, which is therefore defined as the left singular vector of E with 0 singular value, or, equivalently the projection of the vector t into image 1.

• The transpose of these relationships gives us the epipolar line in the first image as

and e0 as the zero value right singular vector E.

Structure from Motion 32

Page 33: Advanced Computer Vision

Two-Frame Structure from Motion (cont).

Structure from Motion 33

Page 34: Advanced Computer Vision

Two-Frame Structure from Motion (cont).

Given the relationship

If we have n corresponding measurements

{(xi0,xi1)}, we can form N homogeneous

equations in the elements of E= {e00…..e22}

Structure from Motion 34

Page 35: Advanced Computer Vision

Two-Frame Structure from Motion (cont).

Structure from Motion 35

Find min||AE||, E = least eigenvector of ATA.

Variants E’: enforcing the rank two constraint in E

Page 36: Advanced Computer Vision

Two-Frame Structure from Motion (cont).

• t is eigenvector correspended to min eignvalue under no noise:

• Estimate R from t:

Structure from Motion 36

Page 37: Advanced Computer Vision

• With ,we get

• Under no noise ( ):

→• However, you can flip both V,U signs and still

get a valid SVD:

Structure from Motion 37

Two-Frame Structure from Motion (cont).

Page 38: Advanced Computer Vision

Two-Frame Structure from Motion (cont).

• If the measurements have noise, the terms that are product of measurement have their noise amplified by the other element in the product, which lead to poor scaling.

• In order to deal with this, a suggestion is that the point coordinate should be translated and scaled so that their centroid lies at the original variance is unity; i.e.

Structure from Motion 38

Page 39: Advanced Computer Vision

Two-Frame Structure from Motion (cont).

such that

Structure from Motion 39

and

n= number of points.Once the essential matrix has been computed from the transformed coordinates; the original essential matrix E can be recovered as

Page 40: Advanced Computer Vision

Projective Reconstruction (cont).

Structure from Motion 40

• In the unreliable case, we do not know the calibration matrices Kj, so we cannot use the normalized ray directions.

• We have access to the image coordinate xj, so essential matrix becomes:

• fundamental matrix:

Page 41: Advanced Computer Vision

• Just like essential matrix, fundamental matrix can be written as follow with rank 2:

• And ( can not be recovered from F)

Structure from Motion 41

Projective Reconstruction (cont).

Page 42: Advanced Computer Vision

• As equations on P.37, F can be written as:

• Therefore,• : singular value matrix with the smallest value

replaced by middle value• We can form pair projective matrices as

follow and reconstruct scene by triangulation:

Structure from Motion 42

Projective Reconstruction (cont).

Page 43: Advanced Computer Vision

View Morphing• Application of basic two-frame structure from motion. • Also known as view interpolation. • Used to generate a smooth 3D animation from one

view of a 3D scene to another.• To create such a transition: smoothly interpolate

camera matrices, i.e., camera position, orientation, focal lengths. More effect is obtained by easing in and easing out camera parameters.

• To generate in-between frames: establish full set of 3D correspondences or 3D models for each reference view.

Structure from Motion 43

Page 44: Advanced Computer Vision

View Morphing• Triangulate set of matched feature points in each

image .• As the 3D points are re-projected into their

intermediate views, pixels can be mapped from their original source images to their new views using affine projective mapping.

• The final image then composited using linear blend of the two reference images as with usual morphing.

Structure from Motion 44

Page 45: Advanced Computer Vision

7.3 Factorization• n 3D points are seen in m views• q =(u,v,1): 2D image point• p =(x,y,z,1): 3D scene point• Π : projection matrix• π : projection function• qij is the projection of the i -th point on image j• λij projective depth of qij

Structure from Motion 45

Page 46: Advanced Computer Vision

Projection Models

Structure from Motion 46

Page 47: Advanced Computer Vision

Projection Models

Structure from Motion 47

Page 48: Advanced Computer Vision

Orthographic Projection

Structure from Motion 48

Page 49: Advanced Computer Vision

Orthographic Projection

Structure from Motion 49

Page 50: Advanced Computer Vision

Perspective Projection

Structure from Motion 50

Page 51: Advanced Computer Vision

SFM under Orthographic Projection

• In general, p: 4x1 matrix(x y z 1), q: 3x1 matrix(u v 1)

• Assume no translation, Π:3x3, p:3x1,q:3x1• Under orthographic projection, Π:2x3, p:3x1,

q:2x1Structure from Motion 51

Page 52: Advanced Computer Vision

SFM under Orthographic Projection

• Choose scene origin to be centroid of 3D points• Choose image origins to be centroid of 2D points• Allows us to drop the camera translation:

Structure from Motion 52

Page 53: Advanced Computer Vision

Factorization (cont).• Original input:• Centroid:• Translation:

Structure from Motion 53

=

Page 54: Advanced Computer Vision

Factorization (cont).

• Rank(W) <= 3

Structure from Motion 54

Page 55: Advanced Computer Vision

Factorization (cont).

• Use singular value decomposition to W:

• Eliminate noise, Σnxn → Σ’3x3, rank(Σ’)<=3,

U2mxn →U’ 2mx3, Vnxn →V’ 3xn

• .

Structure from Motion 55

Page 56: Advanced Computer Vision

Factorization (cont).• S’ differs from S by a linear transformation A:

• Solve for A by enforcing metric constraints on M:• Orthographic Camera

Rows of Π are orthonormal: Therefore, rows of M are orthonormal

→ Solve A → Solve M(=M’A)

Structure from Motion 56

Page 57: Advanced Computer Vision

Factorization (cont).• Assume Π=Π’A,

• Solve for G first by writing equations for every Πi in M

• Then G = AAT by SVD

Structure from Motion 57

Page 58: Advanced Computer Vision

Factorization with Noisy Data

• Provides optimal rank 3 approximation W’ of W by SVD:

• Estimate W’, then use noise-free factorization of W’ as before

• Result minimizes the SSD between positions of image features and projection of the reconstruction

Structure from Motion 58

Page 59: Advanced Computer Vision

Factorization with Missing Data

Structure from Motion 59

Page 60: Advanced Computer Vision

Factorization with Missing Data (cont).

• Apply factorization on W6X4:

Structure from Motion 60

Page 61: Advanced Computer Vision

Factorization with Missing Data (cont).

• Solve for i4 and j4:

Structure from Motion 61

Page 62: Advanced Computer Vision

Factorization with Missing Data (cont).• Disadvantages

• Finding the largest full submatrix of a matrix with missing elements is NP-hard.

• The data is not used symmetrically, these inaccuracies will propagate in the computation of additional missing elements.

Structure from Motion 62

Page 63: Advanced Computer Vision

Projective Factorization

• W has at most 4 rank

Structure from Motion 63

Page 64: Advanced Computer Vision

Projective Factorization• For the p-th point, its projective depths for the

i-th and j-th images are related by

Structure from Motion 64

Page 65: Advanced Computer Vision

Projective Factorization• Normalize the image i’s coordinate, by applying

transformations Ti.• Estimate the fundamental matrices and epipoles• Determine the scale factors λip

• Build rescaled matrix W• Compute the SVD of W• From the SVD, recover projective motion and shape• Adapt projection motion, to account for the

normalization transformation Ti of step 1

Structure from Motion 65

Page 66: Advanced Computer Vision

Projective Factorization

Structure from Motion 66

Page 67: Advanced Computer Vision

7.4 Bundle Adjustment• Minimize the squared reprojection errors of

the 2D points

• Solve the nonlinear least squared problem by Levenberg-Marquardt method

Structure from Motion 67

Page 68: Advanced Computer Vision

Bundle Adjustment (cont).

Structure from Motion 68

(a) (b) (c)

Figure 7.14: (a) Bipartite graph for a toy structure from motion problem and (b) its associated Jacobian J and (c) Hessian A. Numbers indicate cameras. The dashed arcs and light blue squares indicate the fill-in that occurs when the structure (point) variables are eliminated.

Page 69: Advanced Computer Vision

Constrained Structure and Motion

Line-based technique:• Pairwise epipolar geometry cannot be

recovered from line matches alone, even if the cameras are calibrated.

• Consider projecting the set of lines in each image into a set of 3D planes in space. You can move the two cameras around into any configuration and still obtain a valid reconstruction for 3D lines.

Structure from Motion 69

Page 70: Advanced Computer Vision

Constrained Structure and Motion• When lines are visible in three or more views,

the trifocal tensor can be used to transfer lines from one pair of image to another.

• The trifocal tensor can also be computed on the basis line matches alone.

• For triples of images, the trifocal tensor is used to verify that the lines are in geometric correspondence before evaluating the correlations between line segments.

Structure from Motion 70

Page 71: Advanced Computer Vision

Constrained Structure and Motion

Structure from Motion 71

Page 72: Advanced Computer Vision

Constrained Structure and Motion

Structure from Motion 72

• Camera matrices (3x4) for the three views:P = [I|0], P′= [A|a4], P′′= [B|b4]

• a4=e′ and b4= e′′ are the epipoles arising from the first camera center Cthus:e′= P′C and e′′= P′′C

Page 73: Advanced Computer Vision

Constrained Structure and Motion

Structure from Motion 73

• The lines: l↔l′↔l′′ back project to the planes:

• The planes π, π′ and π′′ coincide in the line L• This can be expressed algebraically with:

M = [π, π′, π′′], det(M) = 0

Page 74: Advanced Computer Vision

Constrained Structure and Motion

Structure from Motion 74

• For the top three vectors of M• This gives: l = (b⊤

4 l′′)A⊤l′−(a⊤4l′)B⊤l′′

= (l′′⊤b4)A⊤l′−(l′⊤a4)B⊤l′′• For the i-th element of we have:

Page 75: Advanced Computer Vision

Constrained Structure and Motion

Structure from Motion 75

• The set of the three matrices T1, T2, T3 constitute the trifocal tensor in matrix notation.

Page 76: Advanced Computer Vision

Reference• http://www.csie.ntu.edu.tw/~cyy/courses/vfx/05spring/

lectures/• http://staff.science.uva.nl/~leo/hz/chap11_13.pdf• http://www.math.zju.edu.cn/cagd/resources/thesis/%

E7%A1%95%E5%A3%AB%E8%AE%BA%E6%96%872010_%E5%8C%85%E7%AB%8B.pdf

• http://wenku.baidu.com/view/812f86ef0975f46527d3e1bb.html

Structure from Motion 76