Robust One-Shot 3D Scanning Using Loopy Belief...

Robust One-Shot 3D Scanning Using Loopy Belief Propagation

Ali Osman Ulusoy∗ Fatih Calakli∗ Gabriel Taubin

Brown University, Division of EngineeringProvidence, RI 02912, USA

{ali ulusoy, fatih calakli, taubin}@brown.edu

Abstract

A structured-light technique can greatly simplify theproblem of shape recovery from images. There are currentlytwo main research challenges in design of such techniques.One is handling complicated scenes involving texture, oc-clusions, shadows, sharp discontinuities, and in some caseseven dynamic change; and the other is speeding up the ac-quisition process by requiring small number of images andcomputationally less demanding algorithms. This paperpresents a “one-shot” variant of such techniques to tacklethe aforementioned challenges. It works by projecting astatic grid pattern onto the scene and identifying the corre-spondence between grid stripes and the camera image. Thecorrespondence problem is formulated using a novel graph-ical model and solved efficiently using loopy belief propa-gation. Unlike prior approaches, the proposed approachuses non-deterministic geometric constraints, thereby canhandle spurious connections of stripe images. The effec-tiveness of the proposed approach is verified on a variety ofcomplicated real scenes.

1. IntroductionThree-dimensional (3D) shape acquisition from images

has been one of the most important topics in computer vi-sion due to its wide range of applications in engineering,entertainment, visual inspection and medicine. It has espe-cially been instrumental in digitizing and preserving archae-ological artifacts. Although numerous 3D reconstructionmethods have been proposed for this purpose, there isn’tone solution that is suitable for all sorts of archaeologicalobjects. Challenges include shadowy areas, immobility ofthe object, complex geometry that prevents reaching cer-tain areas, fragile nature of old artifacts, etc. Some of thesechallenges can be alleviated by a hand-held scanner whichcontains one or more cameras and possibly a laser emitter or

∗First two authors contributed equally to this work and order of author-ship was decided through flipping a coin.

(a) (b)

(c) (d)

Figure 1: (a) A bust, (b) the intersection network overlaidonto the image, (c,d) reconstructions from two views.

a projector. It is capable of scanning objects when manuallyswept over them.

Typically, a hand-held scanner works by continuouslyregistering a sequence of 3D views as they are acquired.Obviously, the 3D acquisition has to be done very quickly(ideally within a single frame) as the scanner is constantlymoving. Most commercial hand-held scanners use a singlelaser beam for this purpose which can reconstruct only asingle slit per frame. In such systems, the scanner has to beswept very slowly to capture all the intricate details. Also,since the reconstruction is sparse, the registration step caneasily fail if the scanner moves too fast or makes an abruptmotion.

In this paper, we propose using a grid pattern to han-dle some of these challenges. This pattern allows for build-

115978-1-4244-7030-3/10/$26.00 ©2010 IEEE

ing a “one-shot” system while still maintaining a dense re-construction. We argue that previous approaches for gridbased systems are not adequate for dealing with scenes ofcomplex geometry. To alleviate this, we introduce a novelprobabilistic graphical model and show that it can deal withsuch scenes. Combined with an online registration method,we think that this method shows great potential for use in ahand-held scanner.

2. Related WorkThere exists several 3D reconstruction techniques such

as structured light, multi-view stereo vision and photomet-ric stereo. Structured light techniques have been very popu-lar in applications to archeology due to their accuracy, den-sity and ease of use [9, 3]. Since a high variety of ap-proaches have been proposed to this day, we will only dis-cuss some of the most relevant ones here. A comprehensiveassessment is presented in [14, 1] by Salvi et al.

Most of the early work in structured lighting follow atemporal approach: multiple patterns are projected consec-utively onto the object and a camera captures an image foreach projected pattern. These methods require the objectto be perfectly still while the patterns are being projected.Obviously, this is not acceptable for a hand-held device asthe scanner is constantly moving. If however, the numberof patterns to be projected is reduced to a single pattern,then such methods become feasible. Although there are alsotechniques that use a single but adaptive pattern [7], they re-quire a video projector. In this study, we focus and reviewmethods that project a single static pattern in which case aslide or laser projector would suffice.

Numerous one-shot patterns have been proposed so far.One of the simplest approaches is to project a single slit,typically using a laser beam. Although simple and easy toimplement, these approaches suffer from sparsity, i.e., onlya single slit is reconstructed at each frame. A natural ex-tension is to project multiple slits or to have both horizontaland vertical lines which form a grid pattern. In [12], Ru andStockman suggest using such a grid pattern. As for theirsolution, the observed grid in the camera is matched to theprojected pattern by exploiting a number of geometric andtopological constraints. A major drawback is that the de-tected grid points are numbered with respect to a referencepoint, i.e. relative numbering. This necessitates that at leasta patch of the pattern be extracted perfectly since in the caseof undetected or spurious grid points, the algorithm will fail.Although Proesmans et al. [11] present a more efficient so-lution to the grid based systems, they suffer from relativenumbering also.

Another popular approach to obtain a dense reconstruc-tion is the use of colors [18, 4]. These approaches typicallyuse complex illumination patterns and are highly sensitiveto noise and object texture. Therefore they require sophisti-

cated image processing and possibly color calibration [4].Recently, Kawasaki et al. [5] proposed a two color grid

where vertical and horizontal lines are of different colors.Unlike most other previous approaches, their method doesnot rely heavily on image processing and is robust againstundetected grid points as it does not assume relative order-ing. Their solution includes singular value decomposition(SVD) on a large and very sparse matrix, which is numer-ically instable. In [16], Ulusoy et al. propose a more sta-ble solution with the help of a special grid. Both meth-ods heavily exploit the coplanarity constraints that arise dueto the links between two grid points. However, such con-straints are often violated because spurious links betweengrid points are observed in complex scenes. It is very impor-tant to note that even a single spurious link typically causesthe entire 3D solution to be incorrect, rendering both thesemethods unsuitable for such scenes.

The primary contributions of this paper are: (1) formu-lation of the problem using a novel probabilistic graphi-cal model, (2) handling spurious connections resulting fromcomplicated scenes, (3) an efficient solution using loopy be-lief propagation.

3. Overview of the ApproachIn this paper, we present a structured-light approach con-

sisting of a projector, and a single color camera. Both arecalibrated with respect to a world coordinate system. Theprojector projects a known grid pattern onto a 3D object.The camera captures an image of the object illuminated bythe projected grid. This single image is used to determinethe depth information of points illuminated by grid stripesobserved by the camera. Producing the correct result de-pends on being able to solve the so-called correspondenceproblem, i.e., matching pairs of grid points and their imagesneed to be identified.

At first, finding correspondences might seem to requirean exhaustive search throughout the whole grid. However,geometric constraints greatly simplifies the problem. First,the projector-camera system adds epipolar constraints thatreduce the search space of each grid point to a single line,i.e., the correspondence of the point can only come from oneof grid crossings along its epipolar line. This is explainedin Section 4.1.1. Second, the grid pattern discloses spatialneighborhood information for all captured grid points. Thisadds coplanarity constraints and topological constraints ontheir correspondences, i.e., if two grid points are linked hor-izontally on the captured image, then their correspondencesmust abide to certain constraints. These are explained indetail in Sections 4.1.2 and 4.1.3.

The problem is still challenging as difficulties arise, i.e.,establishing correspondences can be easily tricked by acomplex scene that involves texture, occlusion, shadows,sharp discontinuities, and dynamic change [7]. As a result,

16

(a) (b)

Figure 2: (a) A grid pattern, (b) two separate intersectionnetworks.

observed geometric constraints can be misleading which isexplained in Section 4.1.4. We present a novel probabilisticgraphical model in Section 4.2 that aims to formulate thesegeometric constraints in a stochastic fashion. After build-ing the graphical model, the solution is obtained efficientlyusing loopy belief propagation.

4. Reconstruction using Grid Pattern

We assume that the camera and the projector are cali-brated. The projector projects a grid pattern, composed ofhorizontal blue stripes and vertical red stripes as shown inFigure 2a, to the measured object and the camera is placedto detect it. A simple image segmentation process in thecaptured image, e.g., thresholding color channels, differ-entiates between the horizontal and vertical stripes. Thestripes are then combined to form an input to the system- a 2D intersection network. However, they, in general, maynot result in a single intersections network due to shadows,occlusions and sharp depth discontinuities. An example de-picting this is given in Figure 2b. Our approach solves thecorrespondence problem for each intersection network in-dependently. This section explains the steps for solving thecorrespondence of a single intersection network. Withoutloss of generality, the same steps can be applied to all inter-section networks independently.

As for the notation used in this section, all image mea-surements refer to the homogeneous normalized coordi-nates because the calibration is known for both camera andprojector.

4.1. Labeling Problem

The grid pattern is a set of horizontal and vertical stripes.Suppose those stripes have no width, then we can refer toa stripe as a line π. The intersection of two perpendicu-lar lines is a grid crossing s = (s1, s2, 1)t in the projectorimage. Due to the projection, every grid crossing forms a

Figure 3: A point u and its epipolar line L(u).

light line and every grid line forms a light plane in 3-space.These are reflected on the object surface and identified inthe camera image. The detection of such a light line is re-ferred to as an intersection point u = (u1, u2, 1)t in thecamera image. The reflection of the light plane is referredto as a set of links that connect the neighboring intersectionpoints. We treat the union of links and intersection pointsas an intersection network.

A grid crossing s parameterizes both the horizontal lineπh and the vertical line πv it lies on. Therefore, a horizontaland a vertical triangulation plane is defined by the the gridcrossing s. Given an intersection u, and its correspondences, all the points in the intersection network that are horizon-tally and vertically linked can be triangulated using opticalray-plane triangulation. Thus, in order to triangulate all thepoints in an intersection network, we only need correspon-dences s of intersection points u. Formally, we are lookingfor a labeling function

f : ui → S, i = 1 . . . N (1)

where S = {s1, s2, . . . , sM}. N and M are the number ofall intersection points and grid crossings respectively.

4.1.1 Epipolar Constraint

Assuming that the world coordinate system is placed at theoptical center of the camera and ignoring intrinsic param-eters, the equation of the projection of a 3D point p =(p1, p2, p3)t onto the normalized camera image point u isλu = p. We define another coordinate system at the opticalcenter of the projector for convenience and denote projectorimage point as s. The relation between u and s is

µs = λRu + T (2)

where both λ and µ are unknown scalar values, R is therotation matrix and T is the translation vector which definethe coordinate transformation between the two coordinatesystems.

17

Figure 4: A typical scenario revealing coplanarity con-straints and topological constraints

By eliminating the unknown scalars λ and µ from equa-tion (2), we retrieve the epipolar line L of the camera pointu in projector image as

L(u) = {s : lts = 0} (3)

where l = T̂Ru , and T̂ is the matrix representation of thecross product with T . This representation follows that theEuclidean distance from a grid crossing s to the epipolarline L(u) is

dist(s, L(u)) = |stL(u)|. (4)

Equation (3) says that a captured grid crossing u in thecamera image may correspond only to grid crossings on theepipolar lineL(u) in the projector image as shown in Figure3. However, a correspondence typically does not lie exactlyon the epipolar line due to imperfections in calibration pro-cedure. Therefore, we define a threshold τ on the distancedist(s, L(u)), and consider only the grid crossings s ∈ S ′whose distance is smaller than τ . Note that the size S ′ isdramatically smaller than that of S.

4.1.2 Coplanarity Constraint

A grid pattern consists of a set of discrete horizontal linesπhj , j = 1, 2, . . . ,H , and a set of discrete vertical linesπvk , k = 1, 2, . . . , V . The projection of a horizontal line πh

forms a horizontal plane Πh in 3-space. Similarly, a verticalline πv forms a vertical plane Πv . The reflection of thoseplanes are observed by the camera. In the example of Figure4, an intersection network of four camera points u1 to u4

are given. These points correspond to 3D-points p1 to p4,and also to grid crossings s1 to s4. Consider the horizon-tal link between u1 and u2. Notice that the 3D points p1

and p2 which generated u1 and u2 lie on the same planeΠh

1 . This plane is originated by the horizontal line πh1 in the

projector image. Therefore, the correspondences of u1 and

Figure 5: Challenges in finding correspondences, (I) an oc-cluded stripe, (II) a spurious link.

u2 must lie on the horizontal line πh1 . More generally, if

two detected intersections are linked in the camera image,then their correspondences must be collinear in the projec-tor image. If the link is horizontal, then their correspon-dences must have the same j value, where the integer j isthe horizontal line identifier within the set of H horizontalgrid lines. Similar constraints can be derived for the verticalcase.

4.1.3 Topological Constraint

Again, consider the detected intersections u1 and u2 in theexample of Figure 4. These points are linked horizontally inthe camera image, and we know that their correspondencesmust be collinear. There is more than that. In the physicallayout of the camera image points, u1 is on the left of u2.This is due to the fact that the 3D point p1 is also on the leftof p2. Therefore, the correspondence of u1 must also beon the left of the correspondence of u2. More generally, iftwo intersections are linked in the camera image, then theirtopological structure is preserved for their correspondencesin the projector image. If the link is horizontal, then thecorrespondence for the intersection on the left must havea lower k value than that of the intersection on the right,where the integer k is the vertical line identifier within theset of V vertical grid lines. Similar constraints can be de-rived for the vertical case.

4.1.4 Spurious Connection

Difficulties in finding the correct correspondences mayarise, even when exploiting the geometric and topologicalconstraints. Typical complications are shown in Figure 5,i.e., a depth discontinuity that is parallel to the grid lines ofthe pattern may cause them to disappear in the camera im-age, whereas a depth discontinuity that is not parallel to thegrid lines can let two different grid lines appear as a singlelink in the camera image, namely a spurious link. There-

18

fore, observed links in the camera image may be misleadingfor establishing correspondences.

A probabilistic graphical model is ideal for combiningthe aforementioned geometric and topological constraints,and also account for the non-deterministic nature of theproblem.

4.2. Probabilistic Graphical Model Formulation

We start by interpreting the intersection network in thecamera image as a graph G = (V, E). Each intersectionis a node v ∈ V and links between intersections are edgese ∈ E . We distinguish between the set of horizontal andvertical edges as Ehor and Ever where E = Ehor ∪ Ever.Next, we define X = {X1, X2, ..., XN} to be N discretevalued random variables, where each random variable is as-sociated with a node in V . Random variable Xi representsthe correspondence of an intersection ui to a grid intersec-tion s in the projector image. The labeling problem is toassign a state (correspondence) to each random variable.

In this section, we show that we can exploit the geomet-ric constraints of our system easily by modeling them with aprobabilistic graphical model. Furthermore, we realize thatsolving the correspondence problem becomes equivalent tofinding the maximum a posteriori (MAP) configuration, forwhich we propose a solution based on loopy belief propa-gation [10].

We use the factor graph notation presented in [8] whereeach random variable is depicted as a variable node andfunctions of these variables are depicted as factor nodes, il-lustrated as black squares. Edges connect variables to factornodes if and only if that variable is an argument of the func-tion. We use the factor node and the function it representsinterchangeably.

For our graph, we have two types of factor nodes. Unaryfactors connect to one variable node as they represent afunction of a single random variable. This function mod-els the epipolar line constraint and is denoted as ψepipole.Pairwise factors connect to two variable nodes as they arefunctions of a pair of random variables. These functionsmodel both the coplanarity constraint and the topologicalconstraint due to vertical and horizontal links and are de-noted as ψver and ψhor respectively. The joint probabilitycan be written as a product of the unary and pairwise fac-tors,

p(X ) ∝N∏i=1

ψepipole(Xi)∏

(i,j)∈Ehor

ψhor(Xi, Xj)

∏(i,j)∈Ever

ψver(Xi, Xj)

(5)

Note that we have introduced factor nodes ψver and ψhor

for each e ∈ Ever and e ∈ Ehor respectively. An examplefactor graph construction is given in Figure 6.

(a) (b)

Figure 6: (a) An example intersection network, (b) its cor-responding factor graph. Note that some of the factor labelswere omitted for clarity.

The unary factors ψepipole(Xi = s) models the epipo-lar constraint as described in Section 4.1.1. Note that as-sociated with each random variable Xi we have a positionui in the camera image. Correspondences sufficiently dis-tant from the epipolar line get probability 0, i.e., they arediscarded entirely and the ones closer get a score based ontheir distance to the line. More formally, we have:

ψepipole(Xi = s) =

{1/dist(s, L(ui)), if dist(s, L(ui)) ≤ τ0, otherwise

(6)The pairwise factors ψhor and ψver model both the

coplanarity constraint and the topological constraints due tothe horizontal and vertical links in the camera image. Thecoplanarity constraint as described in Section 4.1.2 statesthat if two intersections are linked with a horizontal edge,then the respective two random variables must be assignedcorrespondences with the same horizontal line identifier,otherwise the compatibility is 0. If this constraint holds,then we check for the topological constraint which is de-scribed in Section 4.1.3 using the function φ. ψhor is for-mally defined in Equation 7. ψver can be defined similarly.

ψhor(Xi = s, Xj = s′) = (7){φ(sgn(ui1 − uj1) . (idv(s)− idv(s′))

), if idh(s) = idh(s′)

0, otherwise

where idh(s) and idv(s) are functions that take a grid inter-section s and return the associated horizontal and verticalidentifiers (described in Section 4.1.2) respectively. Notethat if ui is to the right of uj , the sgn function returns 1 andthe argument (idv(s)−idv(s′)) is passed to φ (see Equation8). As seen in Figure 7, the φ is nonzero only when thereis a positive difference between the vertical line identifiers,i.e., s is to the right of s′. If ui is to the left of uj , the sgnfunction corrects the argument to φ by reversing it.

19

φ(α) =

{exp(1− α), if α ≥ 1

0, otherwise(8)

As seen in Figure 7, φ achieves its maximum whenα = 1. Notice that if φ were 0 for α > 1, then our modelwouldn’t be able to handle occlusions, i.e. undetected gridpoints. We use an exponentially decaying function afterα = 1. This suggests that we do allow for large differencesin vertical line identifiers but we penalize so as to keep thecorrespondences closer to each other. Note that a more for-giving function (less sharp than the exponential) could alsobe used.

Figure 7: The φ function.

4.2.1 Dealing with spurious connections

To deal with these spurious connections described in Sec-tion 4.1.4, we introduce binary nodes bk, k = 1, ..., |E| foreach edge which are connected to the factor nodes as de-picted in Figure 8a. We also define a prior λk for each bk.When the bk = 1, we assume the edge between Xi andXj actually exists so we use the old compatibility measureψver or ψhor. However when bk = 0, we assume the edgedoes not exist, in which case Xi and Xj become indepen-dent as there is no function binding them. So, we assignuniform compatibility to all possible pairs of values. This isformalized in Equation 9.

ψ′hor(Xi, Xj , bk) =

{ψhor(Xi, Xj), if bk = 1

c, otherwise(9)

where c is a constant. Note that for our purposes, we do notneed to estimate bk. Thus, we can simply sum them out asshown in Figure 8b. We end up with factor ψ̄hor defined inEquation 10. One can derive ψ̄ver similarly.

ψ̄hor(Xi, Xj) = λkψhor(Xi, Xj) + (1− λk)c (10)

Figure 8: We sum over the binary edges as we do not need toestimate them. Unary factors have been omitted for clarity.

4.2.2 Loopy Belief Propagation: Max-Product

The correspondence problem for our model is finding themost likely assignment to each random variable, X̂ ={X̂1, X̂2, ..., X̂N} = arg maxX p(X ), i.e., the MAP esti-mation problem. If our graph did not contain cycles (loops),we could have used the max-product algorithm [17] to getan exact answer. This algorithm is a variant of the sum-product algorithm [8] and belief propagation [10]. Unfor-tunately, our graph contains loops in which case the MAPestimation problem becomes NP hard and the max-productis no longer exact. However, approximate solutions can stillbe obtained by running it in an iterative fashion. Remark-ably, max-product has been shown to be very effective inpractice and is widely used [15]. There exists also theo-retical work on explaining the empirical success of max-product [17].

Max-product is an iterative algorithm that works by pass-ing messages in a graphical model [17]. Each message en-codes the likelihood of a random variable using all the infor-mation that has propagated throughout the graph. Althoughboth the sum-product and max-product algorithms can bedefined for factor graphs [8], the equations can be simplifiedwhen the graph has only pairwise factors, i.e., functions ofup to two random variables. This is indeed the case for ourgraph so we will present the algorithm in this form.

A message at iteration t is defined as follows:

mti→j(Xi) ∝ (11)

maxXi

ψepipole(Xi)ψcomp(Xi, Xj)∏

k∈N(i)\j

mt−1k→i(Xi)

where ψcomp denotes either horizontal (ψ̄hor) or vertical(ψ̄ver) compatibility depending on how Xi and Xj are con-nected. N(i)\j means the neighbors of i except for j. Afterpassing messages around for t iterations, the MAP configu-ration can be computed using the equation below.

X̂i = arg maxXi

ψepipole(Xi)∏

k∈N(i)

mt−1k→i(Xi) (12)

5. De Bruijn Spaced GridsIt is known that when solving for correspondences us-

ing uniformly spaced grid patterns, one usually cannot

20

(a) (b) (c) (d)

Figure 9: (a) A comfortable frog, (b) the intersection network overlaid onto the image, (c,d) spurious connections.

obtain a confident solution and is left with ambiguities[5, 16, 13]. These patterns are especially ill-suited for com-plicated scenes with shadows, occlusions and sharp depthdiscontinuities. In such scenes, the grid pattern usually getsbroken into multiple connected components as seen in Fig-ure 2b. Most components contain a small number of inter-sections and links in which case there is very little geomet-ric information to solve for correspondences. This informa-tion is usually not enough to obtain the correct solution.

A similar argument is made in [5] and irregular spac-ings are suggested to disturb the uniformity. The authorsuse a pattern with uniformly spaced vertical lines and ran-domly spaced horizontal lines. This indeed increases therobustness of the search, however, does not guarantee cor-rect convergence for their case. Recently, De Bruijn spacedgrids were proposed in [16]. These are grid patterns withspacings that follow a De Bruijn sequence. A k-ary DeBruijn sequence of order n is a cyclic sequence containingletters from an alphabet of size k, where each subsequenceof length n appears exactly once.

We generate grid patterns where both the vertical andhorizontal spacings between the stripes follow a De Bruijnsequence. Thus, a 2D patch consisting of n vertical spac-ings and n horizontal spacings and containing (n+ 1)2 gridcrossings is unique in the whole grid pattern. This makessure that there is a unique configuration for which the jointprobability is maximum, i.e., the correct MAP estimate isunique.

6. Experiments and Results

We have implemented the proposed system using a 1024x 768 DLP projector and a 1600 x 1200 CCD camera. Notethat since we project a static pattern, we could have useda slide projector as well. We calibrated both the cameraand the projector using the Camera Calibration Toolbox forMatlab [2]. The experiments were carried out under weakambient light.

For the graphical model parameters, we have chosen bi-nary edge prior λk = 0.95 ∀k since we expect spuriousconnections to happen rarely. As for the constant c in Equa-tion 9, we have realized that a small nonzero value such as

c = 0.01 will suffice.For the De Bruijn sequence parameters, we saw that

k = 5 and n = 3 gave good results empirically as almostall detected patches were bigger than the needed size forunique identification and since k is not very large, we didnot sacrifice much on reconstruction density. An examplepattern is given in Figure 2a.

To assess the correctness of our results, we used awell established temporal structured lighting method [6] asground truth. We ran both our method and the temporalmethod on static scenes and checked if all the intersectionswe found were assigned their true correspondences.

First, we present results for a complicated object withsharp depth discontinuities. The object and the intersectionnetworks we capture are given in Figures 9a and 9b. Thenetwork (including the intersections on the background) iscomposed of 6 connected components. Four of these com-ponents contain less than 40 points. Figures 9c and 9d de-pict the spurious connections outlined in bold red. The firstone is due to the sharp depth discontinuity between the headand the arm. Similarly, the second one is caused by the dis-continuity between the head and the belly. The third one ismore interesting: the foot gets linked to the background.

We have triangulated all the intersections and curves andthen used interpolation to fill in between curve segmentsfor visualization purposes. Our reconstruction results aregiven in Figure 10. Notice that the method has been ableto handle the first two spurious connections: the arm, headand belly are separated even though they were connectedto begin with. However, it had problems with the last one.This is expected since there is not enough geometry aroundthe spurious link to force it to its correct position. In fact,there are only 2 intersections on one side of connection asseen in Figure 9d. Comparing with the ground truth, we sawthat only these 2 intersections (out of 1015) were assignedwrongly. This corresponds to 99.8% correct assignment.

The next object is a bust of Michelangelo’s David shownin Figure 1a. The captured intersection network is givenin Figure 1b. For this object, there are only two connectedcomponents. The first component contains only 23 pointsand spans the right arm whereas the second component con-

21

Figure 10: Two views of the reconstruction using the pro-posed method.

tains 1227 points and spans the body, head, the left arm andthe background. There are 10 spurious connections insidethe second component alone. These spurious connectionslink the head, the neck and the background. Since there issufficient geometry around the spurious connections, all ofthem are solved correctly. In fact, for this object we have100% correct assignment. The reconstructions are shownin Figures 1c and 1d.

As advocated, loopy belief propagation worked excellentin practice. Typically, it took about 10 iterations for conver-gence and we haven’t experienced any cases where it didnot converge. In unoptimized MATLAB code, the runningtime for a thousand grid intersections was about three min-utes.

7. Conclusion and Future Work

In this paper, we have demonstrated a novel “one-shot”method to reconstruct 3D objects and scenes using a gridpattern. Unlike previous grid based approaches, our formu-lation, which uses a powerful probabilistic graphical model,can deal with scenes with complicated geometry. We haveshown its performance in two such examples. The next stepis to combine this system with an online registration methodand to replace the DLP projector we currently use with aslide or laser projector in order to build a hand-held scan-ner.

8. Acknowledgments

This material is based upon work supported by the Na-tional Science Foundation under Grant No. IIS-0808718and CCF-0729126. The authors would like to thank Erik B.Sudderth for many helpful discussions.

References[1] J. Batlle, E. Mouaddib, and J. Salvi. Recent progress in

coded structured light as a technique to solve the correspon-dence problem: A survey. Pattern Recognition, 1998.

[2] J. Y. Bouguet. Camera calibration toolbox for matlab.http://www.vision.caltech.edu/bouguetj/calibdoc/index.html,2001.

[3] B. J. Brown, C. Toler-Franklin, D. Nehab, M. Burns,D. Dobkin, A. Vlachopoulos, C. Doumas, S. Rusinkiewicz,and T. Weyrich. A system for high-volume acquisition andmatching of fresco fragments: Reassembling Theran wallpaintings. ACM Transactions on Graphics SIGGRAPH,27(3), Aug. 2008.

[4] D. Caspi, N. Kiryati, and J. Shamir. Range imaging withadaptive color structured light. IEEE Trans. Pattern Anal.Mach. Intell., 20:470–480, 1998.

[5] H. Hiroshi Kawasaki, R. Furukawa, R. Sagawa, and Y. Yagi.Dynamic scene shape reconstruction using a single struc-tured light pattern. IEEE CVPR, June 2008.

[6] S. Inokuchi, K. Sato, and F. Matsuda. Range imaging systemfor 3-d object recognition. In IEEE ICIP, 1984.

[7] T. Koninckx and L. Van Gool. Real-time range acquisition byadaptive structured light. IEEE Trans. Pattern Anal. Mach.Intell., 28(3):432–445, March 2006.

[8] F. Kschischang, B. J. Frey, and H. andrea Loeliger. Factorgraphs and the sum-product algorithm. IEEE Trans. on In-formation Theory, 47:498–519, 2001.

[9] M. Levoy, K. Pulli, B. Curless, S. Rusinkiewicz, D. Koller,L. Pereira, M. Ginzton, S. Anderson, J. Davis, J. Ginsberg,J. Shade, and D. Fulk. The digital michelangelo project: 3Dscanning of large statues. In ACM Transactions on GraphicsSIGGRAPH, pages 131–144, July 2000.

[10] J. Pearl. Probabilistic reasoning in intelligent systems. Mor-gan Kaufmann, 1997.

[11] M. Proesmans, L. Van Gool, and A. Oosterlinck. Active ac-quisition of 3d shape for moving objects. IEEE ICIP, pages647–650, 1996.

[12] G. Ru and G. Stockman. 3-d surface solution using struc-tured light and constraint propagation. IEEE Trans. PatternAnal. Mach. Intell., 11(4):390–402, 1989.

[13] J. Salvi, J. Batlle, and E. Mouaddib. A robust-coded pat-tern projection for dynamic 3d scene measurement. PatternRecognition Letters, 19:1055–1065, 1998.

[14] J. Salvi, J. Pages, and J. Batlle. Pattern codification strategiesin structured light systems. Pattern Recognition, 37(4):827–849, April 2004.

[15] E. B. Sudderth and W. Freeman. Signal and image process-ing with belief propagation. Signal Processing Magazine,IEEE, 25:114–141, March 2008.

[16] A. O. Ulusoy, F. Calakli, and G. Taubin. One-shot scanningusing de bruijn spaced grids. In Proceedings of the 3-D Dig-ital Imaging and Modeling (3DIM), 2009.

[17] Y. Weiss and W. T. Freeman. On the optimality of solutionsof the max-product belief-propagation algorithm in arbitrarygraphs. IEEE Trans. on Information Theory, 47(2):736–744,2001.

[18] L. Zhang, B. Curless, and S. Seitz. Rapid shape acquisi-tion using color structured light and multi-pass dynamic pro-gramming. IEEE 3DPVT, 2002.

22

Robust One-Shot 3D Scanning Using Loopy Belief...

Documents

Transcript of Robust One-Shot 3D Scanning Using Loopy Belief...