High-resolution, Real-time 3D Shape Acquisition...High-resolution, Real-time 3D Shape Acquisition...

10
High-resolution, Real-time 3D Shape Acquisition Song Zhang and Peisen Huang Department of Mechanical Engineering, State University of New York at Stony Brook Stony Brook, New York, 11794-2300, USA Email: {song.zhang, peisen.huang}sunysb.edu Abstract In this paper we describe a high-resolution, real-time 3D shape acquisition system based on structured light tech- niques. This system uses a color pattern whose RGB chan- nels are coded with either sinusoidal or trapezoidal fringe patterns. When projected by a modified DLP projector (color filters removed), this color pattern results in three grayscale patterns projected sequentially at a frequency of 240 Hz. A high-speed B/W CCD camera synchronized with the projector captures the three images, from which the 3D shape of the object is reconstructed. A color CCD camera is also used to capture images for texture map- ping. The maximum 3D shape acquisition speed is 120 Hz (532 × 500 pixels), which is high enough for captur- ing the 3D shapes of moving objects. Two coding methods, sinusoidal phase-shifting method and trapezoidal phase- shifting method, were tested and results with good accu- racy were obtained. The trapezoidal phase-shifting algo- rithm also makes real-time 3D reconstruction possible. 1. Introduction 3D shape acquisition has applications in such diverse ar- eas as computer graphics, virtual reality, medical diagnos- tics, robotic vision, industrial inspection, reverse engineer- ing, etc. [1, 2, 3, 4, 5, 6]. With the recent technological advances in digital imaging, digital projection display, and personal computers (PCs), it is becoming increasingly pos- sible for 3D shape acquisition to be done in real time. Among all the existing ranging techniques [7], stereo- vision is probably the most studied method. Traditional stereovision methods estimate shape by establishing spa- tial correspondence of pixels in a pair of stereo images. Recently, Zhang et al. [8] and Davis et al. [9] developed a new concept called spacetime stereo, which extends the matching of stereo images into the time domain. By us- ing both spatial and temporal appearance variations, it was shown that matching ambiguity could be reduced and ac- Figure 1. 3D models of a human face acquired by our real-time 3D shape acquisition system. The acquisi- tion frame rate was 120 Hz and the resolution was 532 × 500 pixels. Two separate frames selected from a se- quence of over 60 frames are displayed here. Top row: models with color texture mapping. Middle row: mod- els in a shaded mode. Bottom row: zoom-in views of the models. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’04) 1063-6919/04 $ 20.00 IEEE

Transcript of High-resolution, Real-time 3D Shape Acquisition...High-resolution, Real-time 3D Shape Acquisition...

Page 1: High-resolution, Real-time 3D Shape Acquisition...High-resolution, Real-time 3D Shape Acquisition Song Zhang and Peisen Huang Department of Mechanical Engineering, State University

High-resolution, Real-time 3D Shape Acquisition

Song Zhang and Peisen HuangDepartment of Mechanical Engineering, State University of New York at Stony Brook

Stony Brook, New York, 11794-2300, USAEmail: {song.zhang, peisen.huang}sunysb.edu

Abstract

In this paper we describe a high-resolution, real-time3D shape acquisition system based on structured light tech-niques. This system uses a color pattern whose RGB chan-nels are coded with either sinusoidal or trapezoidal fringepatterns. When projected by a modified DLP projector(color filters removed), this color pattern results in threegrayscale patterns projected sequentially at a frequencyof 240 Hz. A high-speed B/W CCD camera synchronizedwith the projector captures the three images, from whichthe 3D shape of the object is reconstructed. A color CCDcamera is also used to capture images for texture map-ping. The maximum 3D shape acquisition speed is 120Hz (532 × 500 pixels), which is high enough for captur-ing the 3D shapes of moving objects. Two coding methods,sinusoidal phase-shifting method and trapezoidal phase-shifting method, were tested and results with good accu-racy were obtained. The trapezoidal phase-shifting algo-rithm also makes real-time 3D reconstruction possible.

1. Introduction

3D shape acquisition has applications in such diverse ar-eas as computer graphics, virtual reality, medical diagnos-tics, robotic vision, industrial inspection, reverse engineer-ing, etc. [1, 2, 3, 4, 5, 6]. With the recent technologicaladvances in digital imaging, digital projection display, andpersonal computers (PCs), it is becoming increasingly pos-sible for 3D shape acquisition to be done in real time.

Among all the existing ranging techniques [7], stereo-vision is probably the most studied method. Traditionalstereovision methods estimate shape by establishing spa-tial correspondence of pixels in a pair of stereo images.Recently, Zhang et al. [8] and Davis et al. [9] developeda new concept called spacetime stereo, which extends thematching of stereo images into the time domain. By us-ing both spatial and temporal appearance variations, it wasshown that matching ambiguity could be reduced and ac-

Figure 1. 3D models of a human face acquired by ourreal-time 3D shape acquisition system. The acquisi-tion frame rate was 120 Hz and the resolution was 532× 500 pixels. Two separate frames selected from a se-quence of over 60 frames are displayed here. Top row:models with color texture mapping. Middle row: mod-els in a shaded mode. Bottom row: zoom-in views ofthe models.

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’04) 1063-6919/04 $ 20.00 IEEE

Page 2: High-resolution, Real-time 3D Shape Acquisition...High-resolution, Real-time 3D Shape Acquisition Song Zhang and Peisen Huang Department of Mechanical Engineering, State University

Wrappedphase map

3D shape

DLPprojector

PC2

RGBfringe

Object

RS232

PC1

B/W camera

Color camera 3D color2D color

photo

Beam splitter

G(0o)R

(-120o)

B(120o)

R

G

B

I1I2

I3

Figure 2. Schematic diagram of our real-time 3D shape acquisition system. A color fringe pattern is generated bya PC and is projected onto the object by a DLP video projector (Kodak DP900). A high-speed B/W CCD camera(Dalsa CA-D6-0512W) synchronized with the projector is used to capture the images of each color channel. Thenimage processing algorithms are used to reconstruct the 3D shape of the object. A color CCD camera (UniqVision UC-930) also synchronized with the projector and aligned with the B/W camera is used to capture colorimages of the object for texture mapping.

curacy could be increased. As an application, Zhang et al.demonstrated the feasibility of using spacetime stereo to re-construct the shapes of dynamically changing objects [8].The shortcoming of spacetime stereo or any other stereo vi-sion method is that matching of stereo images is usuallytime-consuming. Therefore, it is difficult to reconstruct 3Dshapes from stereo images in real time.

Another major group of ranging techniques is structuredlight, which includes various coding methods and employsvarying number of coded patterns. Unlike stereo visionmethods, structured light methods usually use processingalgorithms that are much simpler. Therefore, it is morelikely for them to achieve real-time performance (acquisi-tion and reconstruction). For real-time shape acquisition,there are basically two approaches. The first approach is touse a single pattern, typically a color pattern. Harding pro-posed a color-encoded Moire technique for high-speed 3Dsurface contour retrieval [10]. Geng developed a rainbow3D camera for high-speed 3D vision [11]. Wust and Cap-son [12] proposed a color fringe projection method for sur-face topography measurement with the color fringe patternprinted on a color transparency film. Huang et al. [13] im-plemented a similar concept but with the color fringe patternproduced digitally by a digital-light-processing (DLP) pro-jector. Zhang et al. developed a color structured light tech-

nique for high-speed scans of moving objects [14]. Sincethe above methods all use color to code the patterns, theshape acquisition result is affected to varying degrees by thevariations of the object surface color. In general, the morepatterns used in a structured light system, the better accu-racy that can be achieved. Therefore, the above methodssacrifice accuracy for improved acquisition speeds.

The other structured light approach for real-time shapeacquisition is to use multiple coded patterns but switch themrapidly so that they can be captured in a short period of time.Raskar et al. proposed an imperceptible structured lightmethod for real-time depth extraction, which was based onrapidly switching binary-coded patterns [15]. Rusinkiewiczet al. [16] and Hall-Holf and Rusinkiewicz [17] developeda real-time 3D model acquisition system that uses fourpatterns coded with stripe boundary codes. The acqui-sition speed achieved was 60 Hz, which is good enoughfor scanning slowly moving objects. However, like anyother binary-coding method, the spatial resolution of thesemethods is relatively low because the stripe width must belarger than one pixel. Moreover, switching the patternsby repeatedly loading patterns to the projector limits theswitching speed of the patterns and therefore the speed ofshape acquisition. Huang et al. recently proposed a high-speed 3D shape measurement technique based on a rapid

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’04) 1063-6919/04 $ 20.00 IEEE

Page 3: High-resolution, Real-time 3D Shape Acquisition...High-resolution, Real-time 3D Shape Acquisition Song Zhang and Peisen Huang Department of Mechanical Engineering, State University

phase-shifting technique [18]. This technique uses threephase-shifted, sinusoidal grayscale fringe patterns to pro-vide pixel-level resolution. The patterns are projected to theobject with a switching speed of 240 Hz. However, limitedby the frame rate of the camera used, the acquisition speedachieved was only 16 Hz.

In this paper, we describe an improved version of thesystem developed by Huang et al. [18] for real-time 3Dshape acquisition. This system takes full advantage of thesingle-chip DLP technology for rapid switching of threecoded fringe patterns (Perlin et al. [19, 20] and McDowallet al. [21] used similar techniques for 3D display). A colorfringe pattern with its red, green, and blue channels codedwith three different patterns is created by a PC. When thispattern is sent to a single-chip DLP projector, the projec-tor projects the three color channels in sequence repeatedlyand rapidly. To eliminate the effect of color, color filters onthe color wheel of the projector are removed. As a result,the projected fringe patterns are all in grayscale. A prop-erly synchronized high-speed black-and-white (B/W) CCDcamera is used to capture the images of each color chan-nel from which 3D information of the object surface is re-trieved. A color CCD camera, which is synchronized withthe projector and aligned with the B/W camera, is also usedto take 2D color pictures of the object at a frame rate of26.8 Hz for texture mapping. With this prototype system,3D shape acquisition can be done at a speed of up to 120Hz in a continuous measurement mode.

So long as the number of patterns needed for 3D re-construction is not more than three, any coding methodcan be used to achieve the same 3D acquisition speed. Inthis research, we tried two coding methods, the sinusoidalphase-shifting method and the trapezoidal phase-shiftingmethod. For the sinusoidal phase-shifting method, we usethree phase-shifted sinusoidal fringe patterns with a phaseshift of 120◦. Since in a phase-shifting method, the calcu-lation of an arctangent function, which is relatively slow,is necessary to determine the phase, it is difficult to real-ize real-time 3D reconstruction with an ordinary PC. Toincrease the calculation speed, we developed a novel cod-ing method, namely, the trapezoidal phase-shifting method.With this new coding method, it was demonstrated exper-imentally that high-resolution, real-time 3D shape acquisi-tion and reconstruction were possible.

In Section 2, we discuss the working principle of thestructured light system for real-time 3D shape acquisition.Section 3 describes the two coding methods, the sinusoidalphase-shifting method and the trapezoidal phase-shiftingmethod, implemented in this research. Section 4 presentssome experimental results and Section 5 discusses the ad-vantages of the proposed system. Section 6 summarizes theresults obtained in this research.

Projector

Power supply

Color camera

Timing signal circuit

Beam splitter

B/W camera

Figure 3. Photograph of our real-time 3D shape acqui-sition system (Box size: 24”×14”×10”).

2. Real-time structured light system

Figure 2 shows the layout of the proposed structuredlight system for 3D shape acquisition and Figure 3 showsa picture of the developed hardware system. For the pro-jection of computer-generated patterns, a single-chip DLPprojector is used, which produces images based on a digitallight switching technique [22]. The color image is producedby projecting the red, green, and blue channels sequentiallyand repeatedly at a high speed. Our eyes then integrate thethree color channels into a full color image. To take advan-tage of this unique projection mechanism of a single-chipDLP projector, we create a color pattern which is a com-bination of three patterns in the red, green, and blue chan-nels respectively. In the mean time, we remove the colorfilters on the color wheel of the projector to make the pro-jector operate in a monochrome mode. As a result, whenthe color pattern is sent to the projector, it is projected asthree grayscale patterns, switching rapidly from channel tochannel (240 Hz). A high-speed B/W camera, which is syn-chronized with the projector, is used to capture the threepatterns rapidly for real-time 3D shape acquisition. An ad-ditional color camera is used to capture images for texturemapping.

2.1. Projector and camera synchronization

Figure 4 shows the timing signals of our system. Theprojector trigger signal is generated externally by a micro-controller-based circuit. The internal timing signal of theprojector is disabled. The projection timing chart indicatesthe sequence and timing of the projection of the color chan-nels. The projection cycle starts with the red channel atthe falling edge of the projector trigger signal. The camera

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’04) 1063-6919/04 $ 20.00 IEEE

Page 4: High-resolution, Real-time 3D Shape Acquisition...High-resolution, Real-time 3D Shape Acquisition Song Zhang and Peisen Huang Department of Mechanical Engineering, State University

Cycle time (12.5 ms) Trigger to projector

Exp. R Exp. B Exp. G

2.8ms 2.6ms 2.5ms

Exp. color camera (12.5 ms)

Projection timing

Trigger to B/W camera

Trigger to color camera

R GC B R GC B R GC B

Cycle time for 3D shape acquisition (25.0 ms)

Cycle time for color camera (37.5ms)

Figure 4. System timing chart. The waveform at thetop is the trigger signal to the projector generatedby a microcontroller based timing signal circuit. Thesecond waveform from the top is the projection tim-ing chart, where R, G, B, and C represent the red,green, blue, and clear channels of the projector. Theclear channel is designed to enhance the brightness ofthe projected image. For the patterns used in this re-search, nothing is projected in this channel. The nexttwo waveforms are the trigger signals to the B/W cam-era and the color camera respectively.

trigger signal is generated by the same circuit, which guar-antees its synchronization with the projector trigger signal.Because of the limitation in the frame rate of our B/W cam-era (maximum 262 fps), two projection cycles are neededfor the camera to capture the three phase shifted fringe pat-terns, which results in a frame rate of 40 Hz for 3D shapeacquisition. If a higher speed camera is used, a maximumframe rate of 80 Hz can be achieved. However, since thephase relationship between any neighboring two patterns isthe same, any newly captured pattern can be combined withits preceding two patterns to produce a 3D shape image.Therefore, the real frame rate can be three times higher, or120 Hz for the current system setup and 240 Hz if a higherspeed camera is used. The color picture taken by the colorCCD camera (Uniq Vision UC-930) can be mapped directlyonto the 3D shape once the correspondence of the two cam-eras is determined. To obtain 3D and color informationof the object simultaneously, multi-threading programmingwas used to guarantee that two cameras work independentlyand that the timing of image grabbing is only determined bythe external trigger signal.

2.2. Texture mapping

For more realistic rendering of the object surface, a colortexture mapping method is used. In the sinusoidal phase-shifting method, the three fringe patterns have a phase shiftof 120◦ between neighboring patterns. Since averaging thethree fringe patterns washes out the fringes, we can obtain acolor image without fringes by setting the exposure time of

the color camera to one projection cycle or 12.5 ms. If thesinusoidal patterns are not truly sinusoidal due to nonlineareffects of the projector, residual fringes will exist. However,our experimental results show that such residual fringes arenegligible and the image is in general good enough for tex-ture mapping purpose.

Since aligning the two cameras perfectly is difficult, itis necessary to perform coordinate transformation in orderto match the pixels of the two cameras. For this purpose,we use projective transformation, which is good enough fortexture mapping in this system. That is:

Ibw(x, y) = PIc(x, y) (1)

where Ibw is the intensity of the B/W image, Ic is the inten-sity of the color image, and P is a 3×3 2D planar projectivetransformation matrix. The parameters of the matrix P de-pend on the system setup, which only need to be determinedonce through calibration. Once the coordinate relationshipbetween the two cameras is determined, we can identify thecorresponding pixel of any 3D image pixel in the color im-age for texture mapping.

3. Coding methods

The proposed system provides us with the capability ofprojecting and capturing three coded patterns rapidly. Aslong as three patterns are sufficient for 3D reconstruction,any coding method can be used to achieve real-time acqui-sition speed. In this research, two coding methods, sinu-soidal phase-shifting and trapezoidal phase-shifting meth-ods, were implemented to demonstrate the capability of thesystem.

3.1. Sinusoidal phase-shifting method

Sinusoidal sinusoidal phase-shifting method has beenused extensively in optical metrology to measure 3D shapesof objects at various scales. In this method, a series ofphase-shifted sinusoidal fringe patterns are recorded, fromwhich the phase information at every pixel is obtained. Thisphase information helps determine the correspondence be-tween the image field and the projection field. Once thiscorrespondence is determined, the 3D coordinate informa-tion of the object can be retrieved based on triangulation.

Many different sinusoidal phase-shifting algorithmshave been developed. In this research, the simple three-stepalgorithm is used [23], which requires three phase-shiftedimages. The intensities of the three images with a phaseshift of 120◦ are as follows (See Figure 5 for the plots):

Ir(x, y) = I ′(x, y) + I ′′(x, y) cos [φ(x, y) − 2π/3] , (2)

Ig(x, y) = I ′(x, y) + I ′′(x, y) cos [φ(x, y)] , (3)

Ib(x, y) = I ′(x, y) + I ′′(x, y) cos [φ(x, y) + 2π/3] , (4)

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’04) 1063-6919/04 $ 20.00 IEEE

Page 5: High-resolution, Real-time 3D Shape Acquisition...High-resolution, Real-time 3D Shape Acquisition Song Zhang and Peisen Huang Department of Mechanical Engineering, State University

I(x ,y)

I'(x y) I"(x y)

0

Ir(x y)

Ig(x y)

Ib(x y)

(x, y)2

x0

2(x, y)

Figure 5. Sinusoidal phase-shifting method. The plotat the top is the cross section of the fringe pattern andthe plot at the bottom is the calculated phase.

where I ′(x, y) is the average intensity, I ′′(x, y) is the in-tensity modulation, and φ(x, y) is the phase to be deter-mined. Solving these three equations simultaneously, weobtain φ(x, y):

φ(x, y) = arctan[√

3(Ir − Ib)/(2Ig − Ir − Ib)]

. (5)

This equation provides the so-called modulo 2π phase ateach pixel whose values range from 0 to 2π. By removingthe 2π discontinuity of the image using a phase unwrappingalgorithm, a continuous 3D phase map is obtained. Thisphase map can be converted to the depth map by a phase-to-height conversion algorithm based on triangulation. Asimple phase-to-height conversion algorithm, which is usedin this research, is described in Refs. [18, 24]. In this algo-rithm, surface height is considered proportional to the dif-ference between the phase maps of the object and a flat ref-erence plane with the scale factor determined through cali-bration. More complex but more accurate calibration meth-ods are discussed in Refs. [25, 26, 27].

3.2. Trapezoidal phase-shifting method

Our ultimate goal is to build a true real-time 3D shapeacquisition system in which not only the images are cap-tured in real time, the reconstruction of 3D shape is alsodone in real time. This requires that the processing of theimages be performed rapidly. The traditional sinusoidalphase-shifting algorithm described in the previous sectionworks well in terms of measurement accuracy. However,due to the need of calculating an arctangent function in or-der to obtain the phase, the reconstruction of 3D shape isrelatively slow. Therefore it is difficult to capture and pro-cess the images simultaneously in real time with an ordinaryPC. Currently, we can only record the images in real time.The processing of the images has to be done off-line. In

this section, we describe a newly proposed coding method,namely the trapezoidal phase-shifting method, which com-bines the concept of phase shifting with the intensity ratiomethods for improved processing speed.

Several intensity-ratio based methods have been devel-oped in the past, which use a very simple calculation for 3Dshape reconstruction [28, 29, 30]. In these methods, twopatterns are projected onto the object, a gray-level ramp anda constant illumination or a flat pattern. To eliminate the ef-fect of nonuniform spatial illumination, Savarese et al. usedthree patterns instead of two [31]. These methods had theadvantage of high processing speed but their spatial reso-lution was limited. To alleviate the problem, Chazan andKiryati used a sawtooth like pattern to replace the ramppattern [32]. The spatial resolution was improved, but asa trade-off, ambiguity was introduced which could be aproblem when measuring objects with discontinuous sur-face shapes. The discontinuities in the sawtooth pattern mayalso introduce large errors when the pattern is defocused.

To improve the spatial resolution without introducingthe ambiguity problem, we developed an improved methodcalled trapezoidal phase-shifting method. This method usesthree phase-shifted trapezoidal patterns which can be pro-jected rapidly using our real-time structured light system.The concept is similar to the sinusoidal method. However,instead of phase, intensity ratio is calculated, which is muchfaster in terms of processing speed. Following are the inten-sity equations for the three color channels (See Figure 6 forthe plots):

Ir(x, y) =

I ′′(2 − 6xT ) + I0 x ∈ [T/6, T/3)

I0 x ∈ [T/3, 2T/3)I ′′( 6x

T − 4) + I0 x ∈ [2T/3, 5T/6)I0 + I ′′ Otherwise

, (6)

Ig(x, y) =

I ′′ 6xT + I0 x ∈ [0, T/6)

I0 + I ′′ x ∈ [T/6, T/2)I ′′(4 − 6x

T ) + I0 x ∈ [T/2, 2T/3)I0 x ∈ [2T/3, T ]

, (7)

Ib(x, y) =

I0 x ∈ [0, T/3)I ′′( 6x

T − 2) + I0 x ∈ [T/3, T/2)I0 + I ′′ x ∈ [T/2, 5T/6)I ′′(6 − 6x

T ) + I0 x ∈ [5T/6, T ]

, (8)

where T is the stripe width for each color channel, I0 isthe minimum intensity level, and I ′′ is the intensity mod-ulation. The stripe is divided into six regions, each can beuniquely identified by the intensities of the red, green, andblue channels. For each region, we calculate the intensityratio similar to that in the traditional intensity ratio method.

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’04) 1063-6919/04 $ 20.00 IEEE

Page 6: High-resolution, Real-time 3D Shape Acquisition...High-resolution, Real-time 3D Shape Acquisition Song Zhang and Peisen Huang Department of Mechanical Engineering, State University

x

I(x, y)

Ir(x , y)

I0

I’’

N = 1 2 3 4 5 6

Ig(x , y)

Ib(x , y)

T/6 T/3 T/2 2T/3 5T/6 T0

(a)

x0

1

N = 1 2 3 4 5 6r(x, y)

T/6 T/3 T/2 2T/3 5T/6 T0

(b)

123456

x

r(x, y)

T/6 T/3 T/2 2T/3 5T/6 T0

N = 1 2 3 4 5 6

(c)

Figure 6. Trapezoidal phase-shifting method. (a)Cross section of the fringe pattern. (b) Intensity ra-tio in a triangular shape. (c) Intensity-ratio ramp afterthe removal of the triangular shape.

That is:

r(x, y) =Imed(x, y) − Imin(x, y)Imax(x, y) − Imin(x, y)

, (9)

where r(x, y) is the intensity ratio and Imin(x, y),Imed(x, y), and Imax(x, y) are the minimum, medianand maximum intensity values at point (x, y) respectively.r(x, y) has a triangular shape whose value ranges from 0to 1. This triangular shape can be converted to a ramp byidentifying the region to which the pixel belongs using thefollowing equation:

r(x, y) =2 × round

(N − 1

2

)

+ (−1)N+1 Imed(x, y) − Imin(x, y)Imax(x, y) − Imin(x, y)

, (10)

where N is the region number. The value of r(x, y) rangesfrom 0 to 6. 3D shape can be reconstructed by usingthe triangulation method as in the sinusoidal phase-shiftingmethod.

To obtain a higher spatial resolution, the pattern can berepeated. In this case, the value of the intensity ratio is pe-riodical with a range of [0, 6]. The discontinuity can be

removed by an algorithm similar to the phase unwrappingalgorithm used in the sinusoidal method. The trade-off ofusing a repeated pattern is that potential height ambiguity isintroduced.

Compared to the sinusoidal phase-shifting method, thismethod can process images much faster because it onlyneeds to calculate the intensity ratio at each pixel, which ismuch faster than the calculation of the arctangent function(4.6 ms versus 20.8 ms on a P4 2.8GHz PC for an imagesize of 532 × 500). Compared to the previous intensity-ratio based methods, this method has a resolution that is sixtimes higher. Even more important is the fact that the newmethod is much less sensitive to the blurring of the pro-jected fringe pattern, which could occur when measuringobjects with a large depth. Our simulation results show thateven if the pattern becomes sinusoidal, which is the extremeof blurring, the maximum error is only 3.7%.

One disadvantage of this method, however, is in texturemapping. In the sinusoidal method, averaging the threephase-shifted fringe images cancels out the fringes and gen-erates a grayscale image of the object. This grayscale im-age can be used for texture mapping, thus eliminating theneed for capturing an additional image. For the trapezoidalpattern, theoretically, images for grayscale texture mappingcan be obtained by choosing the maximum intensity valueof the three patterns at each point. However, this approachis practically difficult because of image defocus, which re-sults in residual lines in the image. Even with an additionalcolor camera for texture mapping as in the system describedin this paper, the new method will pose some problems be-cause of the difficulty in eliminating the fringes in the im-age. In the sinusoidal method, we set the exposure time ofthe color camera to be equal to the projection cycle time.By doing so, we guarantee that the image captured by thecolor camera is the average of the three sinusoidal patternsand therefore no fringe will be visible in the image. With thetrapezoidal pattern, it is obvious that the same technique cannot be used. One potential solution is to capture the imagewhen none of the color channels is projected, such as whenthe clear channel is projected. However, this will requirea camera with a much faster sampling speed and possiblyalso an additional flash light. We are currently working onfinding a better solution to this problem.

4. Experiments

After calibration, we first tested the measurement noiseof the system. Figure 7 shows the measured results of a flatboard using the sinusoidal phase-shifting algorithm. Sincethe surface is very smooth, the variations shown in the re-sults are largely due to the noise of the system, which mea-sures approximately at RMS 0.05 mm for the whole mea-sured area (260 × 244 mm). We also measured the same

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’04) 1063-6919/04 $ 20.00 IEEE

Page 7: High-resolution, Real-time 3D Shape Acquisition...High-resolution, Real-time 3D Shape Acquisition Song Zhang and Peisen Huang Department of Mechanical Engineering, State University

(a)

0 20 40 60 80 100 120 140 160−6.15

−6.1

−6.05

−6

−5.95

−5.9

X (mm)

Z (

mm

)

(b)

Figure 7. Measured result of a flat board with a smoothsurface using the sinusoidal phase-shifting method.The measured area is 260 × 244 mm and the noiseis RMS 0.05 mm. (a) 3D plot. (b) One example crosssection.

board with the trapezoidal phase-shifting method, whichshowed a noise level of RMS 0.055 mm for the same mea-sured area.

We then measured human faces with both methods totest their capability in capturing 3D dynamic facial changes.Figure 8 shows the results obtained by using the sinusoidalphase-shifting method. During the acquisition, the subjectwas asked to smile so that facial changes were introduced.The experimental results demonstrate that the system is ableto measure moving objects with good accuracy. Figure 1shows two of the frames of the reconstructed 3D models.It can be seen that the detailed facial expression changeshave been successfully captured. As an application, thistype of high quality models are being used for the tracking,learning, and transferring of facial expressions [33].

We also tried using the trapezoidal phase-shiftingmethod to measure human faces. Figure 9 shows the mea-sured results. The quality of the 3D data is similar to thatof the sinusoidal phase-shifting method. However, texturemapping could not be easily implemented.

It should be noted that in both coding methods, weused multiple fringes or stripes in the patterns to improveresolution. Therefore, the potential ambiguity problem isthere [34]. However, since the human face is smooth, wedid not encounter serious difficulties in 3D reconstruction.

5. Discussion

The proposed real-time 3D shape acquisition system hasseveral advantages over systems based on other methods:

1. High resolution. Both of the proposed coding methodsprovide a pixel-level resolution, which is much higherthan that of the stereo vision and binary coding basedsystems. The depth resolution of the proposed trape-zoidal phase-shifting method is also six times better than

that of the previously developed intensity ratio methodswhen the same number of stripes are used.

2. High acquisition speed. Our system takes advantage ofthe unique operating principle of a single-chip DLP pro-jector to project three coded patterns repeatedly with acycle time of 12.5 ms. If the camera can keep up withthe speed, the maximum achievable 3D shape acquisi-tion speed is 240 Hz (360 Hz if a latest-generation DLPprojector, which has a shorter cycle time, is used). Withthe camera currently used in our system, we were able toachieve a speed of 120 Hz with grayscale texture map-ping. With color texture mapping, the speed is reducedto 26.8 Hz due to the limited frame rate of the colorcamera used. This acquisition speed is faster than thatachieved by most of the other methods proposed.

3. High processing speed. With the new trapezoidal phase-shifting method, the total processing time for 3D shapereconstruction was reduced to around 24.2 ms per frame(532×500 pixels) with a Pentium 4, 2.8 GHz PC. There-fore, if parallel processing is used, it is possible to re-alize real-time 3D shape acquisition and reconstructionwith an ordinary PC.

4. Large dynamic range. Since the sinusoidal phase-shifting method is insensitive to the defocusing of theprojected pattern, a relatively large depth can be mea-sured, provided that the fringe visibility is good enough.The trapezoidal phase-shifting method is also much lesssensitive to defocusing when compared to previously in-troduced intensity ratio methods.

5. Simultaneous 2D and 3D acquisition. This is one of theunique features of the sinusoidal phase-shifting method.From the three phase-shifted fringe images, we can cap-ture not only the 3D information of the object, but alsothe 2D information, which is obtained by averaging thethree fringe images.

6. Diverse coding options. Our system provides the ba-sic capability of rapidly projecting and capturing threecoded patterns for 3D shape acquisition. In this research,we successfully demonstrated the feasibility of two cod-ing methods, the sinusoidal phase-shifting method andthe trapezoidal phase-shifting method. However, othercoding methods are also possible as long as three pat-terns are sufficient for 3D shape reconstruction.

6 Conclusions

A high-resolution, real-time 3D shape acquisition sys-tem has been developed. The acquisition speed can be up to120 Hz for the current setup and 240 Hz if a higher speed

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’04) 1063-6919/04 $ 20.00 IEEE

Page 8: High-resolution, Real-time 3D Shape Acquisition...High-resolution, Real-time 3D Shape Acquisition Song Zhang and Peisen Huang Department of Mechanical Engineering, State University

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 8. Shape acquisition of a human face using the sinusoidal phase-shifting method. (a)-(c) Phase-shiftedfringe images. (d) Wrapped phase map. (e) 2D color image for texture mapping. (f) 3D wire-frame model. (g) 3Dshaded model. (h) 3D model with color texture mapping.

camera is used. Two coding methods, the sinusoidal phase-shifting method and the trapezoidal phase-shifting method,were proposed. With the sinusoidal phase-shifting method,we were able to acquire dynamic 3D models with color tex-ture mapping, but the processing speed was not fast enoughfor real-time 3D reconstruction. In contrast, the new trape-zoidal phase-shifting method allowed for real-time 3D re-construction with a similar resolution and noise level, butunfortunately texture mapping was more difficult. Bothmethods were tested experimentally, which demonstratedthe feasibility of the methods for high-resolution, real-time3D shape acquisition. The noise level was found to beRMS 0.05 mm in an area of 260 × 244 mm for the sinu-soidal phase shifting method and 0.055 mm for the trape-zoidal phase-shifting method. This real-time 3D shape ac-quisition system has potential applications in many differ-ent fields, such as computer graphics, medical diagnostics,plastic surgery, industrial inspection, reverse engineering,robotic vision, etc. If infrared light is used this system canalso be used for security checks in homeland security appli-

cations.

Acknowledgments

The authors thank Susan Brennan, Xiaomei Hao,Alexander Mohr, Dimitris Samaras, Yang Wang, Lei Zhang,etc. for their willingness to serve as models for our exper-iments and Yang Wang and Liu Yang for their help in pro-gramming. This work was supported by the National Sci-ence Foundation under grant No. CMS-9900337 and theNational Institute of Health under grant No. RR13995.

References

[1] D. Beymer and K. Konolige, “Real-time tracking ofmultiple people using continuous detection”, in IEEEFrame Rate Workshop, 1999.

[2] S. B. Gokturk, J.-Y. Bouguet, C. Tomasi, and B. Girod,“Model based face tracking for view-independent fa-

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’04) 1063-6919/04 $ 20.00 IEEE

Page 9: High-resolution, Real-time 3D Shape Acquisition...High-resolution, Real-time 3D Shape Acquisition Song Zhang and Peisen Huang Department of Mechanical Engineering, State University

(a) (b) (c)

(d) (e)

Figure 9. Shape acquisition of a human face using thetrapezoidal phase-shifting method. (a)-(c) Capturedfringe images. (d)-(e) Reconstructed 3D model. Tex-ture mapping is difficult in this case.

cial expression recognition”, in The 5th InternationalConference On Automatic Face and Gesture Recogni-tion, 2002, pp. 20–21.

[3] C. Tomasi, J. Zhang, and D. Redkey, “Experi-ments with a real-time structured-from-motion sys-tem”, in 4th International Symposium On Experimen-tal Robotics, 1995, pp. 123–128.

[4] C. Tomasi, A. Rafii, and I. Torunoglu, “Full-size pro-jection keyboard for handheld device”, in Commun.ACM, 2003, vol. 46, pp. 70–75.

[5] J. Neumann, C. Fermuller, Y. Aloimonos, and V. Bra-jovic, “Compound eye sensor for 3d ego motionestimation”, in IEEE International Conference onRobotics and Automation, May 2004.

[6] H. Jin, P. Favaro, and S. Soatto, “Real-time featuretracking and outlier rejection with changes in illumi-nation”, in The 8th IEEE International Conference onComputer Vision, 2001, pp. I: 684–689.

[7] J. Salvi, J. Pages, and J. Batlle, “Pattern codificationstrategies in structured light systems”, Tech. Rep.,

Insititut d’Informatica i Aplicacions, Girona, Spain,2002.

[8] L. Zhang, B. Curless, and S.M. Seitz, “Spacetimestereo: Shape recovery for dynamic scenes”, in IEEEConference on Computer Vision and Pattern Recogni-tion, 2003, pp. II: 367–374.

[9] J. Davis, R. Ramamoorthi, and S. Rusinkiewicz,“Spacetime stereo: a unifying framework for depthfrom triangulation”, in IEEE Conference on ComputerVision and Pattern Recognition, 2003, pp. II: 359–366.

[10] Kevin G. Harding, “Phase grating use for slop dis-crimination in moire contouring”, in Proc. SPIE,1991, vol. 1614 of Optics, illumination, and imagesensing for machine vision VI, pp. 265–270.

[11] Z. J. Geng, “Rainbow 3-d camera: New concept ofhigh-speed three vision system”, Opt. Eng., vol. 35,pp. 376–383, 1996.

[12] C. Wust and D. W. Capson, “Surface profile measure-ment using color fringe projection”, Machine Visionand Applications, vol. 4, pp. 193–203, 1991.

[13] P. S. Huang, Q. Hu, F. Jin, and F. P. Chiang, “Color-encoded digital fringe projection technique for high-speed three-dimensional surface contouring”, Opt.Eng., vol. 38, pp. 1065–1071, 1999.

[14] Li Zhang, Brian Curless, and Steven M. Seitz, “Rapidshape acquisition using color structured light andmulti-pass dynamic programming”, in The 1st IEEEInternational Symposium on 3D Data Processing, Vi-sualization, and Transmission, June 2002, pp. 24–36.

[15] Ramesh Raskar, Greg Welch, Matt Cutts, Adam Lake,Lev Stesin, and Henry Fuchs, “The office of the fu-ture: A unified approach to image-based modeling andspatially immersive displays”, Computer Graphics,vol. 32, no. Annual Conference Series, pp. 179–188,1998.

[16] S. Rusinkiewicz, O. Hall-Holt, and Levoy Marc,“Real-time 3d model acquisition”, in SIGGRAPH.2002, vol. 1281 of 3D acquisition and image basedrendering, pp. 438 – 446, ACM Press.

[17] O. Hall-Holt and S. Rusinkiewicz, “Stripe boundarycodes for real-time structured-light range scanning ofmoving objects”, in The 8th IEEE International Con-ference on Computer Vision, 2001, pp. II: 359–366.

[18] P. S. Huang, C. Zhang, and F. P. Chiang, “High-speed3-d shape measurement based on digital fringe projec-tion”, Opt. Eng., vol. 42, no. 1, pp. 163–168, 2003.

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’04) 1063-6919/04 $ 20.00 IEEE

Page 10: High-resolution, Real-time 3D Shape Acquisition...High-resolution, Real-time 3D Shape Acquisition Song Zhang and Peisen Huang Department of Mechanical Engineering, State University

[19] K. Perlin, S. Paxia, and J. Kollin, “An autostereo-scopic display”, in SIGGRAPH. 2000, pp. 319–326,ACM Press.

[20] K. Perlin, C. Poultney, J. Kollin, D. Kristjansson, andS. Paxia, “Recent advances in the nyu autostereo-scopic display”, in Proc. of the SPIE, Jan. 2001, pp.196–203.

[21] Ian McDowall, Mark Bolas, Dan Corr, and TerrySchmidt, “Single and multiple viewer stereo with dlpprojectors”, in Proc. of the SPIE, Jan. 2001, pp. 418–425.

[22] L. J. Hornbeck, “Digital light processing and mems:Timely convergence for a bright future (invited ple-nary paper)”, in Proc. SPIE, 1995, vol. 2639, pp. 3–21.

[23] D. Malacara, Ed., Optical Shop Testing, John Wileyand Songs, NY, 1992.

[24] C. Zhang, P. S. Huang, and F. P. Chiang, “Microscopicphase-shifting profilometry based on digital micromir-ror device technology”, Appl. Opt., vol. 48, no. 8, pp.5896–5904, 2002.

[25] Jean-Yves Bouguet, Visual Method for Three-Dimensional Modelling, PhD thesis, California Insiti-tute of Technology, Pasadena, CA, 1999.

[26] Q. Hu, P. S. Huang, Q. Fu, , and F. P. Chiang, “Cal-ibration of a 3-d shape measurement system”, Opt.Eng., vol. 42, no. 2, pp. 487–493, 2003.

[27] Qingying Hu, 3-D Shape Measurement Based on Dig-ital Fringe Projection and Phase-Shifting Techniques,PhD thesis, State University of New York at StonyBrook, Stony Brook, NY, 11794, 2001.

[28] B. Carrihill and R. Hummel, “Experiments withthe intensity ratio depth sensor”, Computer Vision,Graphics and Image Processing, vol. 32, pp. 337–358,1985.

[29] T. Miyasaka, K. Kuroda, M. Hirose, and K. Araki,“Reconstruction of realistic 3d surface model and 3danimation from range images obtained by real time3d measurement system”, in IEEE 15th InternationalConference on Pattern Recognition, Sep. 2000, pp.594–598.

[30] T. Miyasaka, K. Kuroda, M. Hirose, and K. Araki,“High speed 3-d measurement system using incoher-ent light source for human performance analysis”, inXIXth Congress of the international society for pho-togrammetry and remote sensing, July 2000, pp. 16–23.

[31] S. Savarese, JY. Bouguet, and P. Perona, “3d depth re-covery with grayscale structured lighting”, Tech. Rep.,Computer Vision, California Insititute of Technology,1998.

[32] Gilia Chazan and Nahum Kiryati, “Pyramidalintensity-ratio depth sensor”, Tech. Rep. No. 121, Is-rael Institute of Technology, Technion, Haifa, Israel,Oct. 1995.

[33] Yang Wang, Xiaolei Huang, Chan-Su Lee, SongZhang, Zhiguo Li, Dimitris Samaras, DimitrisMetaxas, Ahmed Elgammal, and Peisen Huang,“High resolution acquisition, learning and transfer ofdynamic 3d facial expressions”, in Eurographics 2004(will appear), 2004.

[34] D. Scharstein and R. Szeliski, “High-accuracy stereodepth maps using structured light”, in IEEE Con-ference on Computer Vision and Pattern Recognition,June 2003, pp. 195–202.

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’04) 1063-6919/04 $ 20.00 IEEE