M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with...
-
Upload
cody-reeves -
Category
Documents
-
view
213 -
download
0
Transcript of M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with...
![Page 1: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/1.jpg)
M. Harville1, A. Rahimi2, T. Darrell2,
G. Gordon3, J. Woodfill3
3D Head Pose Tracking with Linear Depth and Brightness
Constraints
1: Hewlett-Packard Labs; 2: MIT AI Lab; 3: Tyzx Inc.Part of work was done while all authors were employed by Interval Research.
![Page 2: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/2.jpg)
The Basic Problem to be Solved
We want to know the rotation (3 DOF) and translation (3 DOF) that a rigid object undergoes from one frame in a video to the next.
In this case, the inter-frame motion can be expressed as rotation about a vertical axis, followed by rightward translation
t t + t
![Page 3: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/3.jpg)
The Basic Problem to be Solved (cont.)
• Add up these incremental motions to get cumulative motion since start of video
• Motion estimation is equivalent to the tracking of object “pose”: position and orientation in some reference coordinate system.
• One way to visualize pose estimate: render axes in image as if they were rigidly affixed to object.
t t + t
![Page 4: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/4.jpg)
Applications - Lots!
• Perceptual user interface: understanding of head gaze, gestures
• Virtual reality: avatars; prosthetic input devices• Camera ego-motion: robot or mobile vehicle self-
localization; panoramic scene-reconstruction from video
• Augmented reality: make rendered object in a scene move with scene even as camera turns
• Object-tracking: pick-and-place assembly machines; surveillance; automobile collision avoidance
![Page 5: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/5.jpg)
Example: Head pose estimation
• Approximate head as a rigid body.• Want to know which way head is turned, and where
it is in space.
![Page 6: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/6.jpg)
The Inspiration
In most situations, all you have is color or grayscale video from a single camera, and most prior methods have focused on how to solve the problem under these conditions => very difficult!
Suppose you had a little more information:
a registered, companion video of
dense (per-pixel) depth.
Now what would be the best thing to do, and how good is it?
![Page 7: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/7.jpg)
Registered Intensity and Depth
![Page 8: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/8.jpg)
The Sales Pitch for Our Solution
Under the assumption that, in addition to intensity and/or color information, you have dense depth from some source (e.g. stereo, laser, structured light), here is a method that...
• Is designed for speed (single linear system of equations) => good for real-time applications
• Does not require approximation of shape model or prior knowledge of object shape
• Provides superior or comparable accuracy to other methods
![Page 9: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/9.jpg)
Prior Work:Feature-Based Methods
• Common approaches• General feature-tracking + Structure-from-Motion• Eye / Nose / Mouth tracking + Rigid Head model• State-of-the-art: Zelisky et. al. (Australia)
• Common problems• Features disappear• Rotation appears as Translation• Depth change must be inferred from scale change• Data are noisy: need to integrate information optimally
over entire observation
![Page 10: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/10.jpg)
An Alternative:Direct Motion Estimation
• Use measurements based on change in image values rather than tracked features
-> More robust -- doesn’t discard uncertainty information
• Express constraints directly on image values• Pool information with least squares estimate over
all pixels
-> Not dependent on small set of key features
• Lots of prior work: Horn and Weldon ‘88, Bergen et al. ‘92, Black and Yacoob ‘95, Bregler and Malik ‘98, Stein and Shashua ‘98, ...
![Page 11: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/11.jpg)
Some Variable Definitions
z
y
x
T
T
T
T
z
y
x
||||
Z
Y
X
P
X
Y
Z
Camera Center of Projection
y
xp
Points in Space and Points in Image
3D Coordinate System and Motion Parameters
System Input: I(x,y) and Z(x,y) at times t, t+1
System Output: inter-frame motion T and
O
![Page 12: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/12.jpg)
Direct Motion Estimation Using BCCE
• Brightness Change Constraint Equation (BCCE):
)1,,(),,( tvyvxItyxI yx
0dt
dIv
dy
dIv
dx
dIyx
y
x
v
v
dy
dI
dx
dI
dt
dI
• First-order Taylor series expansion:
• Matrix formulation:
![Page 13: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/13.jpg)
Direct Motion Estimation Using BCCE
Relate 2D velocities to 3D velocities via a camera projection model:
Orthographic Perspective
z
y
x
y
x
V
V
V
v
v
010
001
yVyvxVxv
YyXx
,
,
22 ,
,
Z
fYV
Z
fV
yvZ
fXV
Z
fVxv
Z
fYy
Z
fXx
zyzx
z
y
x
y
x
V
V
V
Z
y
Z
fZ
x
Z
f
v
v
0
0
OR OR
![Page 14: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/14.jpg)
Direct Motion Estimation Using BCCE
Constrain 3D velocities to be consistent with rotation and translation of a single rigid body:
For small angle rotations,
0
0
0ˆ where
,ˆˆ
XY
XZ
YZ
TTPTV
V
V
V
z
y
x
P
PIP
![Page 15: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/15.jpg)
Direct Motion Estimation Using BCCE
Chain these relations together to get one constraint equation per pixel:• Orthographic
• Perspective
Combine across pixels into one linear system and solve for [ T, ] via QR or SVD.
T
0100
0010
0001
)(1
XY
XZ
YZ
dy
dIy
dx
dIx
dy
dIf
dx
dIf
Zdt
dI
T
0100
0010
0001
0
XY
XZ
YZ
dy
dI
dx
dI
dt
dI
![Page 16: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/16.jpg)
Direct Motion Estimation Using BCCE
• Z unknown !
• Past solutions:• Assume approximate shape: planar (Black and
Yacoob), ellipsoidal (Basu and Pentland; Bregler and Malik), polygonal (Essa et.al.), hyperquadrics, etc.
• Laser-scanned 3D model of object to be tracked• Estimate depth and motion successively via linear or
non-linear methods, or together with non-linear optimization => “open loop” issues
T
0100
0010
0001
)(1
XY
XZ
YZ
dy
dIy
dx
dIx
dy
dIf
dx
dIf
Zdt
dI
![Page 17: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/17.jpg)
“Direct Depth”: two new ideas
1. Use (independently measured) Z directly in BCCE• Believe it or not, this appears to be novel.• Frees us from shape model that is either approximate
(e.g. planar, ellipsoidal, etc.) or which is known a priori.• Shape model can change (slowly) over time: allows for
360 degree rotations, better handles non-rigidity.• Related to Direct Motion Stereo of [Shieh et al.] and [Stein
and Shashua], but their methods assume infinitesimal camera baselines and require coarse-to-fine solution if disparities >1 pixel are generated. Also, they compute motion before depth; we use depth directly.
![Page 18: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/18.jpg)
“Direct Depth”: two new ideas
2. Express a direct constraint on the depth gradient.• It operates on depth image very similarly to how the
classic Brightness Change Constraint Equation (BCCE) applies to the intensity image.
• We call this the “Depth Change Constraint Equation”, or “DCCE”.
zyx VtvyvxZtyxZ )1,,(),,(
)1,,(),,( tvyvxItyxI yx
0 zyx Vdt
dZv
dy
dZv
dx
dZ
![Page 19: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/19.jpg)
The DCCE
Add in perspective projection and constrain to a single rigid motion:
Very similar to our result for BCCE:
T
0100
0010
0001
)(1
XY
XZ
YZ
dy
dZy
dx
dZxZ
dy
dZf
dx
dZf
Zdt
dZ
T
0100
0010
0001
)(1
XY
XZ
YZ
dy
dIy
dx
dIx
dy
dIf
dx
dIf
Zdt
dI
![Page 20: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/20.jpg)
DCCE vs. BCCE
• Advantages of DCCE over BCCE• Depth information is more robust to lighting changes in
space and time.• The BCCE is an assumption that is true only for perfectly
uniform illumination and Lambertian surfaces, whereas the DCCE is just a linearization of a generic description of motion in 3D.
• But…real-time depth data tends to be very noisy and full of holes!• Smoothing seems to help.
![Page 21: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/21.jpg)
Joint Constraint on Rigid Motion
Our proposal: combine the BCCE and DCCE constraint equations into a single linear system:
T
0100
0010
0001
)(
)(
XY
XZ
YZ
dy
dZy
dx
dZxZ
dy
dZf
dx
dZf
dy
dIy
dx
dIx
dy
dIf
dx
dIf
dt
dZdt
dI
b
b
THHTH
H1
Least squares problem, solve for six-parameter vector via QR or SVD.
![Page 22: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/22.jpg)
Some Important Practical Details
• Support maps• Only use constraint equations where depth and all depth
derivatives are valid.
• Ignore locations of very high depth gradient (due to self-occlusion/disocclusion)
• Coordinate shift• If center of coordinate system is far from object, it is easy to
confuse translation with rotation about a distant axis, and vice versa -> numerical instability.
• Solution: At each time step, find object centroid, compute motion in coordinate system centered there, then transform motion parameters back to world coordinate system.
![Page 23: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/23.jpg)
Experiments
• Synthetic and real sequences of moving heads• Synthetic sequences provide us with ground truth for
quantitative analysis• Real sequences show it’s not just theory.
• Hard cases: translation in Z, rotation out-of-plane• Compare four motion estimation methods
• BCCE only with planar depth -> representative of standard methods
• BCCE only with measured depth• DCCE only• BCCE + DCCE
![Page 24: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/24.jpg)
Synthetic Image Sequences
Generated color and depth image sequences by rendering a laser-scanned model of a human face with a standard graphics package.
Rotation sequence Z-translation sequence
![Page 25: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/25.jpg)
Synthetic Results - Rotation Sequence
![Page 26: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/26.jpg)
Synthetic Results - Z-Trans Sequence
![Page 28: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/28.jpg)
Real Results: Still-Frame Comparison
Select Frames from BCCE+planar depth
Select Frames from BCCE+
DCCE
=>
=>
=>
=>
Frame 68 Frame 111 Frame 162
![Page 29: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/29.jpg)
Real Results: Still-Frame Comparison
Select Frames from BCCE+planar depth
Select Frames from BCCE+
DCCE
=>
=>
=>
=>
Frame 211 Frame 293
![Page 30: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/30.jpg)
Real Results: BCCE with Planar Depth
![Page 31: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/31.jpg)
Real Results: BCCE + DCCE
![Page 32: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/32.jpg)
Extensions and Future Work
• Complement it with a slower, non-differential approach that helps detect and remove gross errors
• Real-time implementation!• Experiment with some mathematical tweaks:
• Constrained or weighted least squares• Use a second iteration per frame
• Add coarse-to-fine to handle large motions, if needed
• More ambitious tests: 360 degree rotation, slow non-rigidity, etc. => things few or no other methods can do
![Page 33: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/33.jpg)
Extensions & Future Work
• Apply direct depth and brightness constraint without rigid model: 3-D direct optic flow.
• Ego-motion: use joint depth and brightness constraint to recover camera motion.
• Articulated bodies: extend to use exponential twist formalism, a la Bregler and Malik.
M. Covell, A. Rahimi, M. Harville, T. Darrell. "Articulated-pose estimation using brightness- and depth-constancy constraints.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head S.C., June 2000.
![Page 34: M. Harville 1, A. Rahimi 2, T. Darrell 2, G. Gordon 3, J. Woodfill 3 3D Head Pose Tracking with Linear Depth and Brightness Constraints 1: Hewlett-Packard.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfe41a28abf838cb58e1/html5/thumbnails/34.jpg)
The End