Post on 05-Jan-2016
description
3D Computer Vision
and Video Computing 3D Vision3D Vision
Topic 3 of Part IIStereo Vision
CSc I6716Fall 2010
Zhigang Zhu, City College of New York zhu@cs.ccny.cuny.edu
3D Computer Vision
and Video Computing Stereo VisionStereo Vision
n Probleml Infer 3D structure of a scene from two or more images taken from
different viewpoints
n Two primary Sub-problemsl Correspondence problem (stereo match) -> disparity map
n “Similar” instead of “Same”n Occlusion problem: some parts of the scene are visible only in one eye
l Reconstruction problem -> 3Dn What we need to know about the cameras’ parametersn Often a stereo calibration problem
n Lectures on Stereo Visionl Stereo Geometry – Epipolar Geometry (*) l Correspondence Problem (*) – Two classes of approachesl 3D Reconstruction Problems – Three approaches
3D Computer Vision
and Video Computing A Stereo PairA Stereo Pair
n Problemsl Correspondence problem (stereo match) -> disparity mapl Reconstruction problem -> 3D
CMU CIL Stereo Dataset : Castle sequencehttp://www-2.cs.cmu.edu/afs/cs/project/cil/ftp/html/cil-ster.html
?
3D?
3D Computer Vision
and Video Computing More Images…More Images…
n Problemsl Correspondence problem (stereo match) -> disparity mapl Reconstruction problem -> 3D
3D Computer Vision
and Video Computing More Images…More Images…
n Problemsl Correspondence problem (stereo match) -> disparity mapl Reconstruction problem -> 3D
3D Computer Vision
and Video Computing More Images…More Images…
n Problemsl Correspondence problem (stereo match) -> disparity mapl Reconstruction problem -> 3D
3D Computer Vision
and Video Computing More Images…More Images…
n Problemsl Correspondence problem (stereo match) -> disparity mapl Reconstruction problem -> 3D
3D Computer Vision
and Video Computing More Images…More Images…
n Problemsl Correspondence problem (stereo match) -> disparity mapl Reconstruction problem -> 3D
3D Computer Vision
and Video Computing Part I. Stereo GeometryPart I. Stereo Geometry
n A Simple Stereo Vision Systeml Disparity Equation l Depth Resolutionl Fixated Stereo System
n Zero-disparity Horopter
n Epipolar Geometryl Epipolar lines – Where to search correspondences
n Epipolar Plane, Epipolar Lines and Epipolesn http://www.ai.sri.com/~luong/research/Meta3DViewer/EpipolarGeo.html
l Essential Matrix and Fundamental Matrixn Computing E & F by the Eight-Point Algorithmn Computing the Epipoles
n Stereo Rectification
3D Computer Vision
and Video Computing Stereo GeometryStereo Geometry
n Converging Axes – Usual setup of human eyesn Depth obtained by triangulationn Correspondence problem: pl and pr correspond to the left and
right projections of P, respectively.
Object point
CentralProjection
Rays
Vergence Angle
pl
pr
P(X,Y,Z)
3D Computer Vision
and Video Computing A Simple Stereo SystemA Simple Stereo System
Zw=0
LEFT CAMERA
Left image:reference
Right image:target
RIGHT CAMERA
Elevation Zw
disparity
Depth Z
baseline
3D Computer Vision
and Video Computing Disparity EquationDisparity EquationP(X,Y,Z)
pl(xl,yl)
Optical Center Ol
f = focal length
Image plane
LEFT CAMERA
B = Baseline
Depth
Stereo system with parallel optical axes
f = focal length
Optical Center Or
pr(xr,yr)
Image plane
RIGHT CAMERA
dx
BfDZ
Disparity: dx = xr - xl
3D Computer Vision
and Video Computing Disparity vs. BaselineDisparity vs. BaselineP(X,Y,Z)
pl(xl,yl)
Optical Center Ol
f = focal length
Image plane
LEFT CAMERA
B = Baseline
Depth
f = focal length
Optical Center Or
pr(xr,yr)
Image plane
RIGHT CAMERA
dx
BfDZ
Disparity dx = xr - xl
Stereo system with parallel optical axes
3D Computer Vision
and Video Computing Depth AccuracyDepth Accuracyn Given the same image localization error
l Angle of cones in the figuren Depth Accuracy (Depth Resolution) vs.
Baselinel Depth Error 1/B (Baseline length)l PROS of Longer baseline,
n better depth estimationl CONS
n smaller common FOVn Correspondence harder due to occlusion
n Depth Accuracy (Depth Resolution) vs. Depthl Disparity (>0) 1/ Depthl Depth Error Depth2
l Nearer the point, better the depth estimation
n An Examplel f = 16 x 512/8 pixels, B = 0.5 ml Depth error vs. depth
Z2
Two viewpoints
Z2>Z1
Z1
Z1
Ol Or
)(Z 2
dxfB
Z
)(Z
Z dx
fB
Z
Absolute error
Relative error
3D Computer Vision
and Video ComputingStereo with Converging CamerasStereo with Converging Cameras
n Stereo with Parallel Axes l Short baseline
n large common FOVn large depth error
l Long baselinen small depth errorn small common FOVn More occlusion problems
n Two optical axes intersect at the Fixation Pointl converging angle ql The common FOV Increases
FOV
Left right
3D Computer Vision
and Video ComputingStereo with Converging CamerasStereo with Converging Cameras
n Stereo with Parallel Axes l Short baseline
n large common FOVn large depth error
l Long baselinen small depth errorn small common FOVn More occlusion problems
n Two optical axes intersect at the Fixation Pointl converging angle ql The common FOV Increases
FOV
Left right
3D Computer Vision
and Video ComputingStereo with Converging CamerasStereo with Converging Cameras
n Two optical axes intersect at the Fixation Pointl converging angle ql The common FOV Increases
n Disparity propertiesl Disparity uses angle instead of
distancel Zero disparity at fixation point
n and the Zero-disparity horopterl Disparity increases with the distance
of objects from the fixation pointsn >0 : outside of the horoptern <0 : inside the horopter
n Depth Accuracy vs. Depthl Depth Error Depth2
l Nearer the point, better the depth estimation
FOV
Left right
q
Fixation point
3D Computer Vision
and Video Computing
q
Stereo with Converging CamerasStereo with Converging Cameras
n Two optical axes intersect at the Fixation Pointl converging angle ql The common FOV Increases
n Disparity propertiesl Disparity uses angle instead of
distancel Zero disparity at fixation point
n and the Zero-disparity horopterl Disparity increases with the distance
of objects from the fixation pointsn >0 : outside of the horoptern <0 : inside the horopter
n Depth Accuracy vs. Depthl Depth Error Depth2
l Nearer the point, better the depth estimation
Left right
Fixation point
al ar
ar = al
d a = 0
Horopter
3D Computer Vision
and Video Computing
q
Stereo with Converging CamerasStereo with Converging Cameras
n Two optical axes intersect at the Fixation Pointl converging angle ql The common FOV Increases
n Disparity propertiesl Disparity uses angle instead of
distancel Zero disparity at fixation point
n and the Zero-disparity horopterl Disparity increases with the distance
of objects from the fixation pointsn >0 : outside of the horoptern <0 : inside the horopter
n Depth Accuracy vs. Depthl Depth Error Depth2
l Nearer the point, better the depth estimation
Left right
Fixation point
al ar
ar > al
d a > 0
Horopter
3D Computer Vision
and Video ComputingStereo with Converging CamerasStereo with Converging Cameras
n Two optical axes intersect at the Fixation Pointl converging angle ql The common FOV Increases
n Disparity propertiesl Disparity uses angle instead of
distancel Zero disparity at fixation point
n and the Zero-disparity horopterl Disparity increases with the distance
of objects from the fixation pointsn >0 : outside of the horoptern <0 : inside the horopter
n Depth Accuracy vs. Depthl Depth Error Depth2
l Nearer the point, better the depth estimation
Left right
Fixation point
aL
ar
ar < al
d a < 0
Horopter
3D Computer Vision
and Video ComputingStereo with Converging CamerasStereo with Converging Cameras
n Two optical axes intersect at the Fixation Pointl converging angle ql The common FOV Increases
n Disparity propertiesl Disparity uses angle instead of
distancel Zero disparity at fixation point
n and the Zero-disparity horopterl Disparity increases with the distance
of objects from the fixation pointsn >0 : outside of the horoptern <0 : inside the horopter
n Depth Accuracy vs. Depthl Depth Error Depth2
l Nearer the point, better the depth estimation
Left right
Fixation point
al ar
(D d ) a?
Horopter
3D Computer Vision
and Video Computing BreakBreak
n Homework #4 online, due on November 29 before class
3D Computer Vision
and Video Computing Parameters of a Stereo SystemParameters of a Stereo System
n Intrinsic Parametersl Characterize the
transformation from camera to pixel coordinate systems of each camera
l Focal length, image center, aspect ratio
n Extrinsic parametersl Describe the relative
position and orientation of the two cameras
l Rotation matrix R and translation vector T
pl
pr
P
Ol Or
Xl
Xr
Pl Pr
fl fr
Zl
Yl
Zr
Yr
R, T
3D Computer Vision
and Video Computing Epipolar GeometryEpipolar Geometry
n Notations
l Pl =(Xl, Yl, Zl), Pr =(Xr, Yr, Zr) n Vectors of the same 3-D point
P, in the left and right camera coordinate systems respectively
l Extrinsic Parametersn Translation Vector T = (Or-Ol) n Rotation Matrix R
l pl =(xl, yl, zl), pr =(xr, yr, zr)n Projections of P on the left and
right image plane respectivelyn For all image points, we have
zl=fl, zr=fr
T)R(PP lr
lPpl
ll Z
f r
r
rr Z
fPp
plpr
P
Ol Or
Xl
Xr
Pl Pr
fl fr
Zl
Yl
Zr
Yr
R, T
3D Computer Vision
and Video Computing Epipolar GeometryEpipolar Geometryn Motivation: where to search
correspondences?l Epipolar Plane
n A plane going through point P and the centers of projections (COPs) of the two cameras
l Conjugated Epipolar Lines n Lines where epipolar plane
intersects the image planes
l Epipolesn The image of the COP of one
camera in the othern Epipolar Constraint
l Corresponding points must lie on conjugated epipolar lines
pl
pr
P
Ol Orel er
Pl Pr
Epipolar Plane
Epipolar Lines
Epipoles
3D Computer Vision
and Video Computing Essential MatrixEssential Matrix
n Equation of the epipolar planel Co-planarity condition of vectors Pl, T and Pl-T
n Essential Matrix E = RS l 3x3 matrix constructed from R and T (extrinsic only)
n Rank (E) = 2, two equal nonzero singular values
0 ll PTT)(P T
0
0
0
xy
xz
yz
TT
TT
TT
S
333231
232221
131211
rrr
rrr
rrr
R
Rank (R) =3 Rank (S) =2
T)R(PP lr
0lTr EPP
0lTr Epp
lPpl
ll Z
f r
r
rr Z
fPp
3D Computer Vision
and Video Computing Essential MatrixEssential Matrix
n Essential Matrix E = RS l A natural link between the stereo point pair and the
extrinsic parameters of the stereo system n One correspondence -> a linear equation of 9 entriesn Given 8 pairs of (pl, pr) -> E
l Mapping between points and epipolar lines we are looking forn Given pl, E -> pr on the projective line in the right planen Equation represents the epipolar line of pr (or pl) in the
right (or left) image
n Note: l pl, pr are in the camera coordinate system, not pixel
coordinates that we can measure
0lTr Epp
3D Computer Vision
and Video Computing Fundamental MatrixFundamental Matrix
n Mapping between points and epipolar lines in the pixel coordinate systemsl With no prior knowledge on the stereo system
n From Camera to Pixels: Matrices of intrinsic parameters
n Questions: l What are fx, fy, ox, oy ?l How to measure pl in images?
0lTr pFp
1 lr EMMF T
l1ll pMp rrr pMp 1
100
0
0
int yy
xx
of
of
M0l
Tr Epp
Rank (Mint) =3
3D Computer Vision
and Video Computing Fundamental MatrixFundamental Matrix
n Fundamental Matrix l Rank (F) = 2l Encodes info on both intrinsic and extrinsic parameters
l Enables full reconstruction of the epipolar geometryl In pixel coordinate systems without any knowledge of
the intrinsic and extrinsic parameters l Linear equation of the 9 entries of F
0lTr pFp
1 lr EMMF T
0
1333231
232221
131211
)1( )(
)(
)()(
lim
lim
rim
rim y
x
fff
fff
fff
yx
3D Computer Vision
and Video ComputingComputing F: The Eight-point AlgorithmComputing F: The Eight-point Algorithm
n Input: n point correspondences ( n >= 8)l Construct homogeneous system Ax= 0 from
n x = (f11,f12, ,f13, f21,f22,f23 f31,f32, f33) : entries in Fn Each correspondence give one equationn A is a nx9 matrix
l Obtain estimate F^ by SVD of An x (up to a scale) is column of V corresponding to the least
singular valuel Enforce singularity constraint: since Rank (F) = 2
n Compute SVD of F^n Set the smallest singular value to 0: D -> D’n Correct estimate of F :
n Output: the estimate of the fundamental matrix, F’n Similarly we can compute E given intrinsic parameters
0lTr pFp
TUDVA
TUDVF ˆ
TVUDF' '
3D Computer Vision
and Video ComputingLocating the Epipoles from FLocating the Epipoles from F
n Input: Fundamental Matrix Fl Find the SVD of Fl The epipole el is the column of V corresponding to the
null singular value (as shown above)l The epipole er is the column of U corresponding to the
null singular valuen Output: Epipole el and er
TUDVF
el lies on all the epipolar lines of the left image
0lTr pFp
0lTr eFp
F is not identically zero
For every pr
0leF
pl pr
P
Ol Orel er
Pl Pr
Epipolar Plane
Epipolar Lines
Epipoles
3D Computer Vision
and Video Computing BreakBreak
n Homework #4 online, due on November 29 before class
3D Computer Vision
and Video Computing Stereo RectificationStereo Rectification
n Rectification l Given a stereo pair, the intrinsic and extrinsic parameters, find
the image transformation to achieve a stereo system of horizontal epipolar lines
l A simple algorithm: Assuming calibrated stereo cameras
p’lp’r
P
Ol Or
X’r
Pl Pr
Z’l
Y’l Y’r
TX’l
Z’r
n Stereo System with Parallel Optical Axesn Epipoles are at infinityn Horizontal epipolar lines
3D Computer Vision
and Video Computing Stereo RectificationStereo Rectification
n Algorithml Rotate both left and
right camera so that they share the same X axis : Or-Ol = T
l Define a rotation matrix Rrect for the left camera
l Rotation Matrix for the right camera is RrectRT
l Rotation can be implemented by image transformation
pl
pr
P
Ol Or
Xl
Xr
Pl Pr
Zl
Yl
Zr
Yr
R, T
TX’l
Xl’ = T, Yl’ = Xl’xZl, Z’l = Xl’xYl’
3D Computer Vision
and Video Computing Stereo RectificationStereo Rectification
n Algorithml Rotate both left and
right camera so that they share the same X axis : Or-Ol = T
l Define a rotation matrix Rrect for the left camera
l Rotation Matrix for the right camera is RrectRT
l Rotation can be implemented by image transformation
pl
pr
P
Ol Or
Xl
Xr
Pl Pr
Zl
Yl
Zr
Yr
R, T
TX’l
Xl’ = T, Yl’ = Xl’xZl, Z’l = Xl’xYl’
3D Computer Vision
and Video Computing Stereo RectificationStereo Rectification
n Algorithml Rotate both left and
right camera so that they share the same X axis : Or-Ol = T
l Define a rotation matrix Rrect for the left camera
l Rotation Matrix for the right camera is RrectRT
l Rotation can be implemented by image transformation
Zr
p’lp’r
P
Ol Or
X’r
Pl Pr
Z’l
Y’l Y’r
R, T
TX’l
T’ = (B, 0, 0), P’r = P’l – T’
3D Computer Vision
and Video Computing Epipolar Geometry: SummaryEpipolar Geometry: Summary
n Purposel where to search correspondences
n Epipolar plane, epipolar lines, and epipoles l known intrinsic (f) and extrinsic (R, T)
n co-planarity equation l known intrinsic but unknown extrinsic
n essential matrixl unknown intrinsic and extrinsic
n fundamental matrix
n Rectificationl Generate stereo pair (by software) with parallel optical
axis and thus horizontal epipolar lines
0lTr Epp
0lTr pFp
0 lTT
r PTRP
3D Computer Vision
and Video Computing Part II. Correspondence problemPart II. Correspondence problem
n Three Questionsl What to match?
n Features: point, line, area, structure?l Where to search correspondence?
n Epipolar line?l How to measure similarity?
n Depends on featuresn Approaches
l Correlation-based approachl Feature-based approach
n Advanced Topicsl Image filtering to handle illumination changesl Adaptive windows to deal with multiple disparitiesl Local warping to account for perspective distortionl Sub-pixel matching to improve accuracyl Self-consistency to reduce false matchesl Multi-baseline stereo
3D Computer Vision
and Video Computing Correlation ApproachCorrelation Approach
n For Each point (xl, yl) in the left image, define a window centered at the point
(xl, yl)LEFT IMAGE
3D Computer Vision
and Video Computing Correlation ApproachCorrelation Approach
n … search its corresponding point within a search region in the right image
(xl, yl)RIGHT IMAGE
3D Computer Vision
and Video Computing Correlation ApproachCorrelation Approach
n … the disparity (dx, dy) is the displacement when the correlation is maximum
(xl, yl)dx(xr, yr)RIGHT IMAGE
3D Computer Vision
and Video Computing Correlation ApproachCorrelation Approach
n Elements to be matchedl Image window of fixed size centered at each pixel in the
left imagen Similarity criterion
l A measure of similarity between windows in the two images
l The corresponding element is given by window that maximizes the similarity criterion within a search region
n Search regionsl Theoretically, search region can be reduced to a 1-D
segment, along the epipolar line, and within the disparity range.
l In practice, search a slightly larger region due to errors in calibration
3D Computer Vision
and Video Computing Correlation ApproachCorrelation Approach
n Equations
n disparity
n Similarity criterion l Cross-Correlation
l Sum of Square Difference (SSD)
l Sum of Absolute Difference(SAD)
W
Wk
W
Wlllrlll ldyykdxxIlykxIdydxc )),(),,((),(
)},({maxarg),( dydxcydxdR
d
d
uvvu ),(
2)(),( vuvu
||),( vuvu
3D Computer Vision
and Video Computing Correlation ApproachCorrelation Approach
n PROSl Easy to implementl Produces dense disparity mapl Maybe slow
n CONSl Needs textured images to work well l Inadequate for matching image pairs from very different
viewpoints due to illumination changesl Window may cover points with quite different disparitiesl Inaccurate disparities on the occluding boundaries
3D Computer Vision
and Video Computing Correlation ApproachCorrelation Approach
n A Stereo Pair of UMass Campus – texture, boundaries and occlusion
3D Computer Vision
and Video Computing Feature-based ApproachFeature-based Approach
n Featuresl Edge pointsl Lines (length, orientation, average contrast)l Corners
n Matching algorithml Extract features in the stereo pairl Define similarity measurel Search correspondences using similarity measure and
the epipolar geometry
3D Computer Vision
and Video Computing Feature-based ApproachFeature-based Approach
n For each feature in the left image…
LEFT IMAGE
corner line
structure
3D Computer Vision
and Video Computing Feature-based ApproachFeature-based Approach
n Search in the right image… the disparity (dx, dy) is the displacement when the similarity measure is maximum
RIGHT IMAGE
corner line
structure
3D Computer Vision
and Video Computing Feature-based ApproachFeature-based Approach
n PROSl Relatively insensitive to illumination changesl Good for man-made scenes with strong lines but weak
texture or textureless surfacesl Work well on the occluding boundaries (edges)l Could be faster than the correlation approach
n CONSl Only sparse depth mapl Feature extraction may be tricky
n Lines (Edges) might be partially extracted in one imagen How to measure the similarity between two lines?
3D Computer Vision
and Video Computing BreakBreak
n Homework #4 online, due on November 29 before class
3D Computer Vision
and Video Computing Advanced TopicsAdvanced Topics
n Mainly used in correlation-based approach, but can be applied to feature-based match
n Image filtering to handle illumination changes
l Image equalizationn To make two images more similar in illumination
l Laplacian filtering (2nd order derivative)n Use derivative rather than intensity (or original color)
3D Computer Vision
and Video Computing Advanced TopicsAdvanced Topics
n Adaptive windows to deal with multiple disparitiesl Adaptive Window Approach (Kanade and Okutomi)
n statistically adaptive technique which selects at each pixel the window size that minimizes the uncertainty in disparity estimates
n A Stereo Matching Algorithm with an Adaptive Window: Theory and Experiment, T. Kanade and M. Okutomi. Proc. 1991 IEEE International Conference on Robotics and Automation, Vol. 2, April, 1991, pp. 1088-1095
l Multiple window algorithm (Fusiello, et al)n Use 9 windows instead of just one to compute the SSD
measuren The point with the smallest SSD error amongst the 9
windows and various search locations is chosen as the best estimate for the given points
n A Fusiello, V. Roberto and E. Trucco, Efficient stereo with multiple windowing, IEEE CVPR pp858-863, 1997
3D Computer Vision
and Video Computing Advanced TopicsAdvanced Topics
n Multiple windows to deal with multiple disparities
Smooth
regions
Corners
edges
near far
3D Computer Vision
and Video Computing Advanced TopicsAdvanced Topics
n Sub-pixel matching to improve accuracyl Find the peak in the correlation curves
n Self-consistency to reduce false matches esp. for occlusionsl Check the consistency of matches from L to R and from R to L
n Multiple Resolution Approachl From coarse to fine for efficiency in searching correspondences
n Local warping to account for perspective distortionl Warp from one view to the other for a small patch given an initial
estimation of the (planar) surface normal
n Multi-baseline Stereol Improves both correspondences and 3D estimation by using more than
two cameras (images)
3D Computer Vision
and Video Computing 3D Reconstruction Problem3D Reconstruction Problem
n What we have donel Correspondences using either correlation or feature
based approachesl Epipolar Geometry from at least 8 point
correspondencesn Three cases of 3D reconstruction depending on the
amount of a priori knowledge on the stereo systeml Both intrinsic and extrinsic known - > can solve the
reconstruction problem unambiguously by triangulationl Only intrinsic known -> recovery structure and extrinsic
up to an unknown scaling factorl Only correspondences -> reconstruction only up to an
unknown, global projective transformation (*)
3D Computer Vision
and Video ComputingReconstruction by TriangulationReconstruction by Triangulation
n Assumption and Probleml Under the assumption that both
intrinsic and extrinsic parameters are known
l Compute the 3-D location from their projections, pl and pr
n Solutionl Triangulation: Two rays are
known and the intersection can be computed
l Problem: Two rays will not actually intersect in space due to errors in calibration and correspondences, and pixelization
l Solution: find a point in space with minimum distance from both rays
p pr
P
Ol Or
l
3D Computer Vision
and Video ComputingReconstruction up to a Scale FactorReconstruction up to a Scale Factor
n Assumption and Problem Statementl Under the assumption that only intrinsic parameters and
more than 8 point correspondences are givenl Compute the 3-D location from their projections, pl and pr, as
well as the extrinsic parametersn Solution
l Compute the essential matrix E from at least 8 correspondences
l Estimate T (up to a scale and a sign) from E (=RS) using the orthogonal constraint of R, and then R n End up with four different estimates of the pair (T, R)
l Reconstruct the depth of each point, and pick up the correct sign of R and T.
l Results: reconstructed 3D points (up to a common scale);l The scale can be determined if distance of two points (in
space) are known
3D Computer Vision
and Video ComputingReconstruction up to a Projective TransformationReconstruction up to a Projective Transformation
n Assumption and Problem Statementl Under the assumption that only n (>=8) point
correspondences are givenl Compute the 3-D location from their projections, pl and
prn Solution
l Compute the Fundamental matrix F from at least 8 correspondences, and the two epipoles
l Determine the projection matrices n Select five points ( from correspondence pairs) as the
projective basisl Compute the projective reconstruction
n Unique up to the unknown projective transformation fixed by the choice of the five points
(* not required for this course; needs advanced knowledge of projective geometry )
3D Computer Vision
and Video Computing SummarySummary
n Fundamental concepts and problems of stereon Epipolar geometry and stereo rectificationn Estimation of fundamental matrix from 8 point pairsn Correspondence problem and two techniques:
correlation and feature based matchingn Reconstruct 3-D structure from image
correspondences givenl Fully calibratedl Partially calibration l Uncalibrated stereo cameras (*)
3D Computer Vision
and Video Computing NextNext
n Understanding 3D structure and events from motion
Motion
n Homework #4 online, due on November 29 before class