A High-Resolution Geometry Capture System for Facial ...

1
Copyright is held by the author / owner(s). SIGGRAPH 2008, Los Angeles, California, August 11–15, 2008. ISBN 978-1-60558-466-9/08/0008 A High-Resolution Geometry Capture System for Facial Performance Wan-Chun Ma Andrew Jones Tim Hawkins Jen-Yuan Chiang Paul Debevec University of Southern California, Institute for Creative Technologies X-gradient Y-gradient Z-gradient Full-on Stripe #1 Stripe #2 Stripe #3 White Albedo Normal Displacement Geometry Figure 1: Top, images taken under spherical gradient illumination. Middle, A subset of the structured light pattern images of three different frequencies plus a full white pattern. The other three stripe patterns have the same frequencies but with a 90 degree phase shift. Bottom, Results for each scan: albedo texture, world-space normal map, displacement map, and a high-resolution mesh. Introduction We developed a high-resolution, real-time facial performance capture system based on a spherical gradient photo- metric stereo technique [Ma et al. 2007] and multi-view stereo. We use four spherical gradient illumination patterns to estimate nor- mal maps of subjects. A structured-light-assisted two-view stereo system is employed to acquire 3D positions of the subject. The cap- tured stereo geometry is then enhanced using the gradient normals. This allows details such as dynamic wrinkles and fine-scale stretch- ing and compression of skin pores to be captured in real-time. Capture System Our real-time 3D capture system uses a com- bination of structured light and photometric stereo to obtain high- resolution face scans. The grayscale structured light patterns are output by a high-speed MULE (Multi Use Light Engine) DLP video projector from Fakespace Lab, running at 288 frames per second. To capture fine details of human skin, a stereo pair of Vision Re- search Phantom high-speed digital cameras synchronized to the projector and a similar spherical gradient illumination device as in [Ma et al. 2007] to capture 24 full measurements per second, each comprising 12 images. We use six sinusoidal structured light patterns at varying scales and a full-on projector pattern. After each structured light sequence we generate four gradient illumination patterns and an additional diffuse tracking pattern with a spherical lighting apparatus. Be- cause our gradient illumination patterns are captured at different points in time, we correct for subject motion as in [Wenger et al. 2005] using an optical flow algorithm . We compute this flow be- Figure 2: The acquired high-resolution geometry (top) and shadings by using the nor- mal map and albedo texture with a simple Phong BRDF shader, from three different viewpoints. tween the the first gradient pattern and the tracking pattern, and then use this flow to warp the four gradient-lit images to the same point in time. This allows for accurate calculation of surface normals using ratios of the gradient-lit images. Geometry Processing A dynamic programming algorithm is used to find camera-to-camera correspondences from the ratios of the sinusoidal structured light patterns to the full-on pattern. The stereo geometry is created by triangulating camera rays based on the correspondences. We compute photometric surface normals from the spherical gradient patterns, and then use the photometric normals to correct the stereo geometry by minimizing the difference between the geometry normals and corrected photometric normals. This step not only adds fine-scale details to the stereo geometry but also corrects the low frequency geometry noise that is difficult to get rid of by using mesh smoothing algorithms. Results The two cameras capture data at a resolution of 2400 × 1800 (Bayer pattern). With a internal RAM storage of 12GB, the maximum recording time is around 5 seconds. The result of each scan contains a high resolution mesh that usually consists of 1M triangles, a smoothed medium resolution mesh, a color texture, a world-space normal map, and a displacement map represents the difference between the high resolution mesh and the smoothed mesh. References MA, W.-C., HAWKINS, T., PEERS, P., CHABERT, C.-F., WEISS, M., AND DE- BEVEC, P. 2007. Rapid acquisition of specular and diffuse normal maps from polarized spherical gradient illumination. In EGSR 2007. WENGER, A., GARDNER, A., TCHOU, C., UNGER, J., HAWKINS, T., AND DE- BEVEC, P. 2005. Performance relighting and reflectance transformation with time- multiplexed illumination. ACM Trans. Graph. 24, 3, 756–764.

Transcript of A High-Resolution Geometry Capture System for Facial ...

Page 1: A High-Resolution Geometry Capture System for Facial ...

Copyright is held by the author / owner(s). SIGGRAPH 2008, Los Angeles, California, August 11–15, 2008. ISBN 978-1-60558-466-9/08/0008

A High-Resolution Geometry Capture System for Facial Performance

Wan-Chun Ma Andrew Jones Tim Hawkins Jen-Yuan Chiang Paul Debevec

University of Southern California, Institute for Creative Technologies

X-gradient Y-gradient Z-gradient Full-on

Stripe #1 Stripe #2 Stripe #3 White

Albedo Normal Displacement GeometryFigure 1: Top, images taken under spherical gradient illumination. Middle, A subsetof the structured light pattern images of three different frequencies plus a full whitepattern. The other three stripe patterns have the same frequencies but with a 90 degreephase shift. Bottom, Results for each scan: albedo texture, world-space normal map,displacement map, and a high-resolution mesh.

Introduction We developed a high-resolution, real-time facialperformance capture system based on a spherical gradient photo-metric stereo technique [Ma et al. 2007] and multi-view stereo. Weuse four spherical gradient illumination patterns to estimate nor-mal maps of subjects. A structured-light-assisted two-view stereosystem is employed to acquire 3D positions of the subject. The cap-tured stereo geometry is then enhanced using the gradient normals.This allows details such as dynamic wrinkles and fine-scale stretch-ing and compression of skin pores to be captured in real-time.

Capture System Our real-time 3D capture system uses a com-bination of structured light and photometric stereo to obtain high-resolution face scans. The grayscale structured light patterns areoutput by a high-speed MULE (Multi Use Light Engine) DLP videoprojector from Fakespace Lab, running at 288 frames per second.To capture fine details of human skin, a stereo pair of Vision Re-search Phantom high-speed digital cameras synchronized to theprojector and a similar spherical gradient illumination device as in[Ma et al. 2007] to capture 24 full measurements per second, eachcomprising 12 images.

We use six sinusoidal structured light patterns at varying scalesand a full-on projector pattern. After each structured light sequencewe generate four gradient illumination patterns and an additionaldiffuse tracking pattern with a spherical lighting apparatus. Be-cause our gradient illumination patterns are captured at differentpoints in time, we correct for subject motion as in [Wenger et al.2005] using an optical flow algorithm . We compute this flow be-

Figure 2: The acquired high-resolution geometry (top) and shadings by using the nor-mal map and albedo texture with a simple Phong BRDF shader, from three differentviewpoints.

tween the the first gradient pattern and the tracking pattern, and thenuse this flow to warp the four gradient-lit images to the same pointin time. This allows for accurate calculation of surface normalsusing ratios of the gradient-lit images.

Geometry Processing A dynamic programming algorithm isused to find camera-to-camera correspondences from the ratios ofthe sinusoidal structured light patterns to the full-on pattern. Thestereo geometry is created by triangulating camera rays based onthe correspondences. We compute photometric surface normalsfrom the spherical gradient patterns, and then use the photometricnormals to correct the stereo geometry by minimizing the differencebetween the geometry normals and corrected photometric normals.This step not only adds fine-scale details to the stereo geometry butalso corrects the low frequency geometry noise that is difficult toget rid of by using mesh smoothing algorithms.

Results The two cameras capture data at a resolution of 2400×1800 (Bayer pattern). With a internal RAM storage of 12GB, themaximum recording time is around 5 seconds. The result of eachscan contains a high resolution mesh that usually consists of 1Mtriangles, a smoothed medium resolution mesh, a color texture, aworld-space normal map, and a displacement map represents thedifference between the high resolution mesh and the smoothedmesh.

References

MA, W.-C., HAWKINS, T., PEERS, P., CHABERT, C.-F., WEISS, M., AND DE-BEVEC, P. 2007. Rapid acquisition of specular and diffuse normal maps frompolarized spherical gradient illumination. In EGSR 2007.

WENGER, A., GARDNER, A., TCHOU, C., UNGER, J., HAWKINS, T., AND DE-BEVEC, P. 2005. Performance relighting and reflectance transformation with time-multiplexed illumination. ACM Trans. Graph. 24, 3, 756–764.