Download - The Geometry Behind the Numerical Reconstruction of Two Photos · Central projection in coordinates Left hand matrix: (h′ 1,h 2) are image coordinates of the principal point H,

The Geometry Behind the Numerical

Reconstruction of Two Photos

Hellmuth Stachel

[email protected] — http://www.geometrie.tuwien.ac.at/stachel

ICEGD 2007, The 2nd Internat. Conf. on Eng’g Graphics and Design, Galati/Romania, June 7–10, 2007

Table of contents

1. Remarks on linear images

2. Geometry of two images

3. Numerical reconstruction of two images

ICEGD 2007, The 2nd Internat. Conf. on Eng’g Graphics and Design, Galati/Romania, June 7–10, 2007 1

1. Remarks on linear images

linear image nonlinear (curved) image


Central projection

The central projection (according to A. Durer)

can be generalized by a central axonometry.


Central axonometric principle

in space E3:

O

E1

E2

E3

U1

U2

U3

cartesian basis O; E1, E2, E3

and points at infinity U1, U2, U3

U c1

U c2

U c3

Ec1

Ec2

Ec3

Oc

in the image plane E2:

central axonometric reference systemOc; Ec

1, Ec2, E

c3;U

c1 , U c

2 , U c3


Definition of linear images

There is a unique collinear transformation

κ : E3 → E

2 mit O 7→ Oc, Ei 7→ Eci , Ui 7→ U c

i , i = 1, 2, 3.

Any two-dimensional image of E3 under a collinear transformation is called linear.

=⇒

{collinear points have collinear or coincident imagescross-ratios of any four collinear points are preserved.


Central projection in coordinates

Notation:

Z . . . center

H . . . principal point

d . . . focal length

x1, x2, x3 . . .camera frame

x′1, x

′2 . . . imagecoordinate frame

image plane

vanishing planeΠΠ

v

x1

x2

x3

X

Z H

d

Xc

x′1

x′2



(x′

1

x′2

)

=d

x3

(x1

x2

)

, or homogeneous

ξ′

0

ξ′

1

ξ′

2

=

0 0 0 10 d 0 00 0 d 0

ξ0...

ξ3

.

Transformation from the camera frame (x1, x2, x3) into arbitrary world coordinates(x1, x2, x3) and translation from the particular image frame (x′

1, x′2) into arbitrary

(x′1, x

′2) gives in homogeneous form

ξ′0ξ′1ξ′2

=

1 0 0h′

1 d f1 0h′

2 0 d f2

0 0 0 10 1 0 00 0 1 0

1 0 0 0o1...

o3

R

︸︷︷︸

matrix A

ξ0...ξ3

.



Left hand matrix: (h′1, h

′2) are image coordinates of the principal point H,

(f1, f2) are possible scaling factors, and d is the focal length.

These parameters are called the intrinsic calibration parameters.

Right hand matrix: R is an orthogonal matrix.

The position of the camera frame with respect to the world coordinates definesthe extrinsic calibration parameters.

Photos with known interior orientation are called calibrated images, others (likecentral axonometries) are uncalibrated.


Positive and negative central pespective

DGDGDGimage plane

vanishing plane

negative plane

ΠΠ Πv

x1

x2

x3

X

Z

H

H

dd

Xc

Xc

x′1

x′2

x′1

x′2


Photo versus linear image

Photo (= central perspective) or photo of a photo (= linear image) ?


unknown interior calibration parameters

ZZZZZZZZZZZZZZZZZ

collinear

bundle tran

sformation

ZZZZZZZZZZZZZZZZZ

the bundles Z and Zof the rays of sight arecollinear



Given: Two linear images or two photographs.

Wanted: Dimensions of the depicted 3D-object.

Historical ‘Stadtbahn’ station Karlsplatz in Vienna (Otto Wagner, 1897)



The geometry of two images is a classical subject of Descriptive Geometry.Its results have become standard (Finsterwalder, Kruppa, Krames,Wunderlich, Hohenberg, Tschupik, Brauner, Havlicek, H.S., . . . ).

Why now ? Advantages of digital images:

• less distorsion, because no paper prints are needed,

• exact boundary is available, and

• precise coordinate measurements are possible using standard software.


Geometry of two images (epipolar geometry)

viewing situation

collinear transformations

two images

π1π1π1π1π1π1π1π1π1π1π1π1π1π1π1π1π1


Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2 Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1

Z21Z21Z21Z21Z21Z21Z21Z21Z21Z21Z21Z21Z21Z21Z21Z21Z21


zzzzzzzzzzzzzzzzz

X1X1X1X1X1X1X1X1X1X1X1X1X1X1X1X1X1


XXXXXXXXXXXXXXXXX

δXδXδXδXδXδXδXδXδXδXδXδXδXδXδXδXδX

l1l2l2l2l2l2l2l2l2l2l2l2l2l2l2l2l2l2

π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1

π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2

γ1γ1γ1γ1γ1γ1γ1γ1γ1γ1γ1γ1γ1γ1γ1γ1γ1


X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′

X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′

l′l′l′l′l′l′l′l′l′l′l′l′l′l′l′l′l′

l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′Z′

2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2

Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1


Geometry of two images (epipolar geometry)

Notations:

line z = Z1Z2 . . . baseline,

Z ′2, Z

′′1 . . . epipoles

(German: Kernpunkte),

δX . . . epipolar plane (it is twiceprojecting),

l′, l′′ . . . pair of epipolar lines(German: Kernstrahlen),

(X ′, X ′′) . . . corresponding views.






zzzzzzzzzzzzzzzzz



XXXXXXXXXXXXXXXXX

δXδXδXδXδXδXδXδXδXδXδXδXδXδXδXδXδX


π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1

π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2



X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′X ′

X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′X ′′

l′l′l′l′l′l′l′l′l′l′l′l′l′l′l′l′l′

l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′l′′Z′

2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2Z′2

Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1Z′′1


Epipolar constraint

Theorem (synthetic version): For any two linear images of a scene, there is aprojectivity between two line pencils

Z ′2(δ

′X) ∧− Z ′′

1 (δ′′X)

such that the points X ′,X ′′ are corresponding ⇐⇒ they are located on(corresponding=) epipolar lines.

Theorem (analytic version): Using homogeneous coordinates for both images,there is a bilinear form β of rank 2 such that two points X ′ = x

′R = (ξ′0 : ξ′1 : ξ′2)

and X ′′ = x′′R = (ξ′′0 : ξ′′1 : ξ′′2 ) are corresponding

⇐⇒ β(x′,x′′) =2∑

i,j=0

bij ξ′i ξ′′j = (ξ′0 ξ′1 ξ′2)·(bij

)

0

@

ξ′′0

ξ′′1

ξ′′2

1

A = x′T · B · x′′ = 0 .


Epipolar constraint in the calibrated case

Theorem: In the calibrated casethe essential matrix B = (bij) is theproduct of a skew symmetric matrixand an orthogonal one, i.e.,

B = S ·R .



Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2Z2 Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1Z1z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′


Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12Z12

X1X1X1X1X1X1X1X1X1X1X1X1X1X1X1X1X1X2X2X2X2X2X2X2X2X2X2X2X2X2X2X2X2X2

XXXXXXXXXXXXXXXXXδXδXδXδXδXδXδXδXδXδXδXδXδXδXδXδXδX

l1l2l2l2l2l2l2l2l2l2l2l2l2l2l2l2l2l2x

′x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

Proof: We use both camera frames and the homogeneous coordinates

x′ =

−−−→Z1X

′, x′′ =

−−−→Z2X

′′.



For transforming the coordinates from the second camera frame into the first one,there is an orthogonal matrix R such that

x′′1 = z

′ + R·x′′ with RT = R−1 and z′ = (z′1, z′2, z′3)

T =−−−→Z1Z2.

The points X1, X2, Z1,Z2 are coplanar ⇐⇒ the tripleproduct of the vectors x

′, z′ and

x′′1 = Z1X2 vanishes, i.e.,

det(x′, z′,x′′1) = x

′ · (z′×x′′1) = 0.




z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′






′x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′



For transforming the coordinates from the second camera frame into the first one,there is an orthogonal matrix R such that

x′′1 = z

′ + R·x′′ with RT = R−1 and z′ = (z′1, z′2, z′3)

T =−−−→Z1Z2.

The points X1, X2, Z1, Z2

are coplanar ⇐⇒ the tripleproduct of the vectors x

′, z′ and

x′′1 = Z1X2 vanishes, i.e.,

det(x′, z′,x′′1) = x

′ · (z′×x′′1) = 0.




z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′

z′






′x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′

x′′



We replace the vector product (z′×x′′1) by

z′×(z′ + R·x′′) = z

′×R·x′′ = S ·R·x′′ mit S =

0

@

0 −z′3 z′

2

z′3 0 −z′

1

−z′2 z′

1 0

1

A.

Matrix S is skew symmetric and R is orthogonal.

Hence, the coplanarity of x′, x

′′ and z′ is equivalent to

0 = x′ · (z′×x

′′1) = x

′T · S ·R︸︷︷︸B

·x′′, also B = S ·R .

The decomposition of the fundamental matrix B into these two factors definesthe relative position of the second camera frame against the first one !


Singular value decomposition (SVD)

LinAlg

LinAlg

a0a1

a2 xA

α(a0)

α(a1)

α(a2)

α(x)

A′

U ·D·V T

A−→



LinAlg

LinAlg

a0a1

a2 xA

α(a0)

α(a1)

α(a2)

α(x)

A′

U ·D·V T

A−→

rotation ↓ V T rotation ↑ U

LinAlgLinAlg

D−→

scaling



Theorem: [Singular value decomposition]

Any matrix A ∈ M(m, n; R) can be decomposed into a product

A = U ·D ·V T with orthogonal U, V and D = diag(σ1, . . . , σp)

with D ∈ M(m,n; R), σi ≥ 0, and p = min{m, n}.

The positive entries in the main diagonal of D are called singular values of A.

The singular values of A can be seen as principal distortion factors of the affinetransformation represented by A, i.e., the semiaxes of the affine image of the unitsphere.

E.g., the singular values of an orthogonal projection are (0, 1, 1) as the unit sphereis mapped onto a unit disk.


Singular values of the essential matrix

Theorem:The essential matrix B has two equalsingular values σ := σ1 = σ2.

Proof: We have B = S ·R withorthogonal R. The vector

S ·x = z′×x

is orthogonal zu the orthogonal viewx

n, where

‖z′×x‖ = | sinϕ| ‖x‖ ‖z′‖ =

= ‖xn‖ ‖z′‖ = σ ‖xn‖.

z′

x

xn

z′×x

ϕ

Π ⊥ z′


What means ‘reconstruction’

Given: Two either calibratedor uncalibrated images.

π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1π′1 π′′

2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2π′′2

X ′1X ′1X ′1X ′1X ′1X ′1X ′1X ′1X ′1X ′1X ′1X ′1X ′1X ′1X ′1X ′1X ′1 X ′′

1X ′′1X ′′1X ′′1X ′′1X ′′1X ′′1X ′′1X ′′1X ′′1X ′′1X ′′1X ′′1X ′′1X ′′1X ′′1X ′′1

X ′2X ′2X ′2X ′2X ′2X ′2X ′2X ′2X ′2X ′2X ′2X ′2X ′2X ′2X ′2X ′2X ′2

X ′′2X ′′2X ′′2X ′′2X ′′2X ′′2X ′′2X ′′2X ′′2X ′′2X ′′2X ′′2X ′′2X ′′2X ′′2X ′′2X ′′2

Wanted: ‘viewing situation’,i.e., determine

• the relative position of thetwo camera frames, and

• the location of any spacepoint X from its images(X ′, X ′′).






zzzzzzzzzzzzzzzzz





The fundamental theorems

Theorem 1:From two uncalibrated images with given projectivity between epipolar lines thedepicted object can be reconstructed up to a collinear transformation.

Theorem 2 (S. Finsterwalder, 1899):From two calibrated images with given projectivity between epipolar lines thedepicted object can be reconstructed up to a similarity.


Determination of epipoles — geometric meaning

Problem of Projectivity:

Given: 7 pairs of corresponding points (X ′1,X

′′1 ), . . . , (X ′

7, X′′7 ).

Wanted: A pair of points (S′, S′′) (= epipoles) such that there is a projectivity

S′([S′X ′1], . . . , [S

′X ′7]) ∧− S′′([S′X ′′

1 ], . . . , [S′′X ′′7 ]).

X ′1 X ′

2

X ′3X ′

4

X ′5

X ′6

X ′7

X ′′1

X ′′2

X ′′3

X ′′4

X ′′5

X ′′6

X ′′7π′

π′′


Determination of epipoles — geometric meaning

Problem of Projectivity:

Given: 7 pairs of corresponding points (X ′1,X

′′1 ), . . . , (X ′

7, X′′7 ).

Wanted: A pair of points (S′, S′′) (= epipoles) such that there is a projectivity

S′([S′X ′1], . . . , [S

′X ′7]) ∧− S′′([S′X ′′

1 ], . . . , [S′′X ′′7 ]).

X ′1 X ′

2

X ′3X ′

4

X ′5

X ′6

X ′7

X ′′1

X ′′2

X ′′3

X ′′4

X ′′5

X ′′6

X ′′7

S′

S′′π′

π′′


Determination of epipoles — analytic solution

Theorem: If 7 pairs of corresponding points (X ′1, X

′′1 ), . . . , (X ′

7, X′′7 ) are given,

the determination of the epipoles is a cubic problem.

Proof: 7 pairs of corresponding points give 7 linear homogeneous equations

β(x′i,x

′′i ) = x

Ti · B · x′′

i = 0, i = 1, . . . , 7,

for the 9 entries in the (3×3)-matrix B = (bij) — called essential matrix.

det(bij) = 0 gives an additional cubic equation which fixes all bij up to a commonfactor.

For noisy image points it is recommended to use more than 7 points and methodsof least square approximation for obtaining the ‘best fitting matrix’ B:



1) Let A denote the coefficient matrix in the linear system for the entries of B.Then the ‘least square fit’ for this overdetermined system is an eigenvector forthe smallest eigenvalue of the symmetric matrix AT · A.

2) As an essential matrix needs to have rank 2, we use the ’projection into theessential space’. This means, the singular value decomposition of B gives arepresentation

B = U · diag(σ1, σ2, σ3) · VT with orthogonal U, V and σ1 ≥ σ2 ≥ σ3 .

Then in the uncalibrated case B = U ·diag(σ1, σ2, 0) ·V is optimal (with respectto the Frobenius norm) and in the calibrated case

B = U · diag(σ, σ, 0) · V T with σ1 = (σ1 + σ2)/2.



1) Let A denote the coefficient matrix in the linear system for the entries of B.Then the ‘least square fit’ for this overdetermined system is an eigenvector forthe smallest eigenvalue of the symmetric matrix AT · A.

2) As an essential matrix needs to have rank 2, we use the ’projection into theessential space’. This means, the singular value decomposition of B gives arepresentation

B = U · diag(σ1, σ2, σ3) · VT with orthogonal U, V and σ1 ≥ σ2 ≥ σ3 .

Then in the uncalibrated case B = U · diag(σ1, σ2, 0) · V is optimal (withrespect to the Frobenius norm) and in the calibrated case

B = U · diag(σ, σ, 0) · V T with σ1 = (σ1 + σ2)/2.


3. Numerical reconstruction of two images

Step 1: Specify at least 7 reference points

11111111111111111

22222222222222222

33333333333333333 44444444444444444

55555555555555555

6666666666666666677777777777777777

88888888888888888

99999999999999999

1010101010101010101010101010101010

1111111111111111111111111111111111

1212121212121212121212121212121212

13131313131313131313131313131313131414141414141414141414141414141414

1515151515151515151515151515151515

1616161616161616161616161616161616

1717171717171717171717171717171717

1818181818181818181818181818181818

1919191919191919191919191919191919

202020202020202020202020202020202011111111111111111

22222222222222222

33333333333333333 44444444444444444

55555555555555555

66666666666666666

7777777777777777788888888888888888

99999999999999999

1010101010101010101010101010101010

1111111111111111111111111111111111

1212121212121212121212121212121212

13131313131313131313131313131313131414141414141414141414141414141414

1515151515151515151515151515151515

1616161616161616161616161616161616

1717171717171717171717171717171717

1818181818181818181818181818181818

1919191919191919191919191919191919

2020202020202020202020202020202020

. . . manually — or automatically by methods of pattern recognition


Step 2: Compute the essential matrix

Step 2: Compute the essential matrix B — including the pairs of epipolar lines


Step 3: Factorize B = S.R

Theorem: There are exactly two ways of decomposing B = U ·D ·V T withD = diag(σ, σ, 0) into a product S ·R with skew-symmetric S and orthogonal R :

S = ±U ·R+·D ·UT and R = ±U ·RT+·V

T with R+ =

0

@

0 −1 0

1 0 0

0 0 1

1

A.


Step 4: Intersecting corresponding rays

In one of the frames compute the approximate point of intersection betweencorresponding rays.

X

photo 2

photo 1x′′

x′

z2

z1

s

For the center of the common perpendicular line segment the sum of squareddistances is minimal.


Summary of algorithm

1) Specify n > 7 pairs (X ′i,X

′′i ), i = 1, . . . , n.

2) Set up linear system of equations for the essential matrix B and seek bestfitting matrix (eigenvector of the smallest eigenvalue).

3) Compute the closest rank 2 matrix B with two equal singular values.

4) Factorize B = S · R ; this reveals the relative position of the two cameraframes.

5) In one of the frames compute the approximate point of intersection betweencorresponding rays.

6) Transform the recovered coordinates into world coordinates.


Remaining problems

• Analysis of precision,

• automated calibration (autofocus and zooming change the focal distance d),

• critical configurations.


The solution

11111111111111111

22222222222222222

33333333333333333 44444444444444444

55555555555555555

6666666666666666677777777777777777

88888888888888888

99999999999999999

1010101010101010101010101010101010

1111111111111111111111111111111111

1212121212121212121212121212121212

13131313131313131313131313131313131414141414141414141414141414141414

1515151515151515151515151515151515

1616161616161616161616161616161616

1717171717171717171717171717171717

1818181818181818181818181818181818

1919191919191919191919191919191919

2020202020202020202020202020202020

original image

1

2

3 4

56

78

9

10

11

121314

15

16

17

18

19

20

the reconstruction (M ∼ 1 : 100)


1

2

3 4

56

78

9

9

10

11

12

12

13

1314

15

16

17

18

18

19

20

Z1

Z1

Z2

Z2

Position of centers

relative to the depicted object

front view

top viewPhoto 1

Photo 2


Literatur

• H. Brauner: Lineare Abbildungen aus euklidischen Raumen. Beitr. AlgebraGeom. 21, 5–26 (1986).

• O. Faugeras: Three-Dimensional Computer Vision. A Geometric Viewpoint.MIT Press, Cambridge, Mass., 1906 .

• O. Faugeras, Q.-T. Luong: The Geometry of Multiple Images. MITPress, Cambridge, Mass., 2001.

• R. Harley, A. Zisserman: Multiple View Geometry in Computer Vision.Cambridge University Press 2000.

• H. Havlicek: On the Matrices of Central Linear Mappings. Math. Bohem.121, 151–156 (1996).


• E. Kruppa: Zur achsonometrischen Methode der darstellenden Geometrie.Sitzungsber., Abt. II, osterr. Akad. Wiss., Math.-Naturw. Kl. 119, 487–506(1910).

• Yi Ma, St. Soatto, J. Kosecka, S. Sh. Sastry: An Invitation to 3-DVision. Springer-Verlag, New York 2004.

• H. Stachel: Zur Kennzeichnung der Zentralprojektionen nach H. Havlicek.Sitzungsber., Abt. II, osterr. Akad. Wiss., Math.-Naturw. Kl. 204, 33–46(1995).

• H. Stachel: Descriptive Geometry Meets Computer Vision — The Geometryof Two Images. J. Geometry Graphics 10, 137–153 (2006).

• J. Szabo, H. Stachel, H. Vogel: Ein Satz uber die Zentralaxonometrie.Sitzungsber., Abt. II, osterr. Akad. Wiss., Math.-Naturw. Kl. 203, 3–11 (1994).


• J. Tschupik, F. Hohenberg: Die geometrische Grundlagen derPhotogrammetrie. In Jordan, Eggert, Kneissl (eds.): Handbuch derVermessungskunde III a/3. 10. Aufl., Metzlersche Verlagsbuchhandlung,Stuttart 1972, 2235–2295.