3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world,...

Post on 04-Sep-2020

0 views 0 download

Transcript of 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world,...

3D Vision – Real Objects, Real Cameras Chapter 11 (parts of) , 12 (parts of) Computerized Image Analysis 2 Anders Brun, anders@cb.uu.se

3D Vision

!  Philisophy !  Image formation

"  The pinhole camera "  Projective geometry "  Artefacts and challenges

!  Camera calibration !  Stereo vision !  Structured Light

Philosophy: Why 3-D?

!  Why do we model things in 3-D? !  Without a 3-D model of the

world, events are more difficult to predict! Movement, grasping, collision estimation, real size estimation, …

!  Example: 2-D: A car on the highway looks bigger and drives faster when it approaches

!  3-D: A car on the highway has constant size and speed when it approaches

x z

x y

Philosophy: 3-D cues …

Photo: Greg Keene

• Shape from: • Focus • Lighting • Stereo • Structured light • …

Philosophy: 3-D cues …

Philosophy: 3-D cues …

Philosophy: 3-D cues …

Philosophy: Marr and 2.5-D

!  Primal sketch: Edges and areas !  2.5-D sketch: Texture and depth !  3-D model: A hierarchical 3-D model of the world

Teddy dataset, from http://cat.middlebury.edu/

Philosophies

!  Build accurate 3-D world representation 1.  Build a complete 3-D model of the scene 2.  Plan the task using the 3-D model 3.  Example: Build a model of the scene, then

find the teddy bear and send a robot arm to grab it.

!  Plan as you go, act and react 1.  Collect features from the scene 2.  Use the features to guide your actions 3.  Example: Find the teddy bear using

template matching in image, then send the robot hand in that direction. Possibly take more images when halfway.

Passive, Active and Dynamic Vision

!  Passive vision: "  The camera has a fixed location

!  Dynamic vision: "  The camera is moving but cannot be steered

!  Active vision: "  The camera can be steered

The pinhole camera

!  The Pinhole camera is an idealized model

!  A real aperture is not a point. !  A real aperture has a non-vanishing area and

typically also a lens…

The pinhole camera model

!  Where is the point P projected on the image plane inside the camera?

f

P=(X,Y,Z)

x

focal point or origin (the “pinhole”)

image plane

x = − fXZ

The pinhole camera model (alternative)

!  Imagine an observer is located at the focal point !  A screen is placed at distance f from observer. !  Where on this screen is P projected

f = focal length

P=(X,Y,Z)

x

focal point (the “observer”)

screen

x = + fXZ

y = + f YZ

The pinhole camera model

!  In the pinhole camera, the world appears to be upside down (or 180° rotated).

!  The alternative interpretation is useful in computer graphics. It tells you exactly where to draw P on a screen, in front of the observer, in order to make it appear real for the observer. (OBS: the change of sign)

!  The alternative interpretation leads directly to “projective geometry”.

Projective Geometry (Very Briefly)

!  Points in 2-D are represented by lines in 3-D !  The 3-D space is called the embedding space !  All points along a line are equivalent !  This is analogous to a photography, every point (position) in a

photograph (2-D) corresponds to a line or ray in reality (3-D) Equivalence class

x

α x

x

Projective Geometry (Very Briefly)

!  We can convert points in the ordinary plane to the projective plane

!  2-D (x,y) # 3-D (x,y,1) !  In general: D-dimensional # (D+1)-dimensional !  Points x and α x are equivalent, α ≠ 0

1

Equivalence class

x

α x

x

Projective Geometry (Very Briefly)

1

x

1

(linear) transformation H x’

α x'y'1

#

$

% % %

&

'

( ( (

=

h11 h12 h13

h21 h22 h23

h31 h32 h33

#

$

% % %

&

'

( ( (

xy1

#

$

% % %

&

'

( ( (

x'= h11x + h12y + h13h31x + h32y + h33

y'=h21x + h22y + h23h31x + h32y + h33

Projective Geometry (Very Briefly)

!  Homography, a map from (D+1)-dim to (D+1)-dim !  Linear in the (D+1)-dim embedding space !  x’ = H x !  Represents a perspective transformation in D-dim space !  This is very nice!

1

x

1

(linear) transformation H x’

Projective Geometry (Very Briefly)

!  Using homographies, we can express a rich class of transformations using linear mappings

Identity Similarity Isometric Affine Perspective

R −Rt0 1#

$ %

&

' (

sR −Rt0 1

#

$ %

&

' (

A t0 1"

# $

%

& '

H = I

det(H) ≠ 0

Perspective Transformations

!  Remember this example? We wanted to compute the perspective transformation parameters.

From Feature based methods for structure and motion estimation by P. H. S. Torr and A. Zisserman

Perspective Transformations

x'= h11x + h12y + h13h31x + h32y + h33

y'=h21x + h22y + h23h31x + h32y + h33

!  Estimating H from point correspondences (simplified version, check the book for a more advanced version)

!  Each point correspondence translates to 2 linear equations (in the coefficients of H)

!  Assuming h33 =1, we need 4 corresponding 2-D point pairs (x,y,x’,y’) to solve this equation system (8 unknowns).

!  This way of solving the for the parameters has severe practical disadvantages, but it shows that it is possible at least...

h31xx'+h32yx'+h33x'−h11x − h12y − h13 = 0h31xy'+h32yy'+h33y'−h21x − h22y − h23 = 0

Px (x,y,x ',y')Py (x,y,x ',y')"

# $

%

& ' h =

00"

# $ %

& '

Perspective Transformations

!  A cleaner and more stable solution !  Multiply both sides with the “cross product matrix”

α

0 −1 y'1 0 −x'−y ' x ' 0

$

%

& & &

'

(

) ) ) x'y'1

$

%

& & &

'

(

) ) )

=

0 −1 y '1 0 −x '−y' x' 0

$

%

& & &

'

(

) ) )

h11 h12 h13

h21 h22 h23

h31 h32 h33

$

%

& & &

'

(

) ) ) xy1

$

%

& & &

'

(

) ) )

0 =

0 −1 y '1 0 −x '−y' x' 0

$

%

& & &

'

(

) ) )

h11 h12 h13

h21 h22 h23

h31 h32 h33

$

%

& & &

'

(

) ) ) xy1

$

%

& & &

'

(

) ) )

0 =Q(x,y,x',y ')h

“Now three equations killing two unknowns”

Single perspective camera

C

Oi

X

u

αu =f s −w0

0 g −v0

0 0 1

"

#

$$$$

%

&

''''

1 0 0 00 1 0 00 0 1 0

"

#

$$$

%

&

''' R −Rt0T 1

"

#$$

%

&''X

αu =MX

f

M: Projection matrix

Internal parameters

External parameters

Single perspective camera

!  Estimation of M from known coordinates (X,Y,Z,1) projections in a camera (x,y,1)

!  This is analogous to the homographic projection !  Algorithms exist to solve this with 6

correspondences €

α x'y'1

#

$

% % %

&

'

( ( (

=

m11 m12 m13 m14

m21 m22 m23 m24

m31 m32 m33 m34

#

$

% % %

&

'

( ( (

XYZ1

#

$

% % % %

&

'

( ( ( (

Single perspective camera

!  This enables calibration from 6 known points !  M can be factored: You can estimate camera

focal length, image coordinate systems, camera position and rotation.

!  Triangulation: If you known several Mi, then you can also estimate a position X (3-D) using several camera projections ui ,(2-D).

Marker based motion capture

Images: courtesy of Lennart Svensson

Mocap

Images: courtesy of Lennart Svensson

External calibration

!  Rotation + position, 6 DoF, ”calibration”

Images: courtesy of Lennart Svensson

Motion capture applications

!  Animation !  Biomechanical analysis !  Industrial analysis

Images: courtesy of Lennart Svensson

Image formation – Lenses

!  Thin lens ! 

zz'= f 2

Image focal point

object focal point

Image plane

z'

f

f

z

Object plane

Image formation – Lenses

!  Magnification, m = x/X !  From similarity, x/z’ = X/f

Image focal point

object focal point

Image plane

z'

f

f

z

Object plane

m =xX

=fz

=z'f

x

X

Image formation – Lenses

!  Depth of field

!  Thus, objects within depth of field, are scattered within an area smaller than a pixel, i.e. they are depicted sharp

Image focal point

object focal point

Image plane

z'

f

f

z

Object plane

ε

Δz

Δz

= size of a pixel

Image formation – Lenses

Image focal point

object focal point

Image plane

z'

f

f

z

Object plane

ε = size of a pixel

!  Depth of field

!  Aperture size and focal length both affects the depth of field. A larger aperture will yield a smaller depth of field.

Δz

Δz

Image formation – Lenses

Image focal point

object focal point

Image plane

z'

f

f

z

Object plane

ε = size of a pixel

Δz'

Δz'

!  Depth of focus

!  “Depth of focus” is analogous. How much the image plane can be shifted without scattering light from a point in focus more than a pixel

AACAM – @ Matlab File Exchange

!  Matlab code for non-perfect pinhole camera "  Set aperture radius and focal length "  Set depth of field "  Set object distance and aperture radius

Image formation – Lenses

!  (systems of) lenses # distortions: !  Spherical aberration !  Shorter focal length close to edges of lens

(Image from wikipedia)

Image formation – Lenses

!  (systems of) lenses # distortions: !  Coma

(Image from wikipedia)

Image formation – Lenses

!  (systems of) lenses # distortions: !  Chromatic aberration

(Image from wikipedia)

Image formation – Lenses

!  (systems of) lenses # distortions: !  Astigmatism

(Image from wikipedia)

Image formation – Lenses

!  (systems of) lenses # distortions: !  Geometric distortion

(Image from wikipedia)

Barrel distortion Pincushion distortion

Is this really a problem?

!  In old and cheap cameras, yes !  Uppsala 1999-01-01

From http://www.uu.se/carpediem/1999/

Is this really a problem?

!  But also for e.g. modern GoPRO cameras!

Camera Calibration Toolbox

!  A Matlab toolbox for camera calibration: !  http://www.vision.caltech.edu/bouguetj/calib_doc/ !  Freely available

Camera Toolbox Calibration

!  Focal length: The focal length in pixels is stored in the 2x1 vector fc. !  Principal point: The principal point coordinates are stored in the 2x1

vector cc. !  Skew coefficient: The skew coefficient defining the angle between

the x and y pixel axes is stored in the scalar alpha_c. !  Distortions: The image distortion coefficients (radial and tangential

distortions) are stored in the 5x1 vector kc.

Stereo – Basic equations

x z

B

f

P=(X,Y,Z)

x1 x2

x1 = − fXZ

x2 = − fX − BZ

⇒ Z =fB

x2 − x1=fBd

P=(X,Y,Z)

B

Stereo – the general case

!  It may happen that the relation between the two cameras is not a paralax translation

!  Then the “epipolar constraint” applies !  By “rectification” epipolar lines are aligned with

scanlines

From: Epipolar Rectification by Fusiello et al.

Stereo – Disparity Estimation

!  Search horizontally for patch disparity, use e.g. sum of squared differences (SSD)

Teddy dataset, from http://cat.middlebury.edu/

Stereo – Depth estimation

!  A simple formula converting disparity d to distance z when the inter camera distance is B:

! 

Z =fBd

Patch based estimate Ground truth

Stereo – Constraints

!  Constraints (Marr and Poggio): "  Each point in each image is assigned at most one

disparity value "  The disparity varies smoothly at most locations in the

images !  However… !  Different regularization

may be applied to the depth function x

z

x1 x2

Stereo from Segmentation

!  Alternative approach: "  Make a segmentation of the image first "  Apply a linear model in each segmented region "  Refine the models in the regions …

From Segment-based Stereo Matching Using Graph Cuts by Hong and CHen

Large Scale 3D Maps (C3/SAAB)

d

Courtesy of Petter Torle C3 Technologies

Large Scale 3D Maps (C3/SAAB)

Structured Light

!  A lightsource helps the stereo algorithm to find matching points.

!  Often used in industrial applications

From: http://mesh.brown.edu/3DPGP-2009/homework/hw2/hw2.html

More Structured Light

!  Microsoft Kinect, using infrared light

• http://www.youtube.com/watch?v=nvvQJxgykcU

Other Computer Vision Code

!  Open CV "  Free to use "  Supports IPP speedups "  http://en.wikipedia.org/wiki/OpenCV "  http://sourceforge.net/projects/opencvlibrary/ "  http://opencv.willowgarage.com/wiki/

!  Intel® Integrated Performance Primitives 6.0 "  http://www.intel.com/cd/software/products/asmo-na/eng/

302910.htm "  Commercial (but cheap) "  Includes Computer Vision, Signal Processing, Data

Compression, ….

Typical Exam Questions …

!  Project this object (points) using a pinhole camera

!  Can geometric transformation compensate for lens distortions in general?

!  Explain the parameters building up the projection matrix M

u =

f s −w0

0 g −v0

0 0 1

#

$

% % %

&

'

( ( (

1 0 0 00 1 0 00 0 1 0

#

$

% % %

&

'

( ( (

R −Rt0T 1#

$ %

&

' ( X

u =MX

Thank You!

!  Email questions to: anders.brun@it.uu.se