[IEEE RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive...

6
16th IEEE International Conference on Robot & Human Interactive Communication August 26 - 29, 2007 / Jeju, Korea Planar Patch based 3D Environment Modeling with Stereo Camera Eunyoung Kim', Gerard Medioni2, Fellow, IEEE, and Sukhan Lee3, Fellow, IEEE 1,2 Eunyoung Kim and Gerard Medioni, Computer Science Department, University of Southern California, Los Angeles, USA, e-mail: (eunyoung.kim, medioni)@usc.edu 3Sukhan Lee, School of Information and Communication Engineering, Sungkyunkwan University, Suwon, South Korea, e-mail:lsh@ece. skku.ac.kr Abstract- We present two robust and novel algorithms to model a 3D environment using both intensity and range data provided by an off-the-shelf stereo camera. The main issue we need to address is that the output of the stereo system is both sparse and noisy. To overcome this limi- tation,, we detect planar patches in the environment by region segmentation in 2D and plane extraction in 3D. The extracted planar patches are used not only to represent the workspace, but also to fill holes in range data. We also suggest a new planar patch based scan matching algorithm to register multiple views, and to incrementally augment the description of the 3D workspace in a sequence of scenes. Experimental results on real data show that planar patch segmentation and 3D scene registration for environment mod- eling can be robustly achieved by the proposed approaches. I. INTRODUCTION 3D environment modeling techniques have been inves- tigated to interpret a workspace in the context of SLAM(Simultaneous Localization And Map building) and robotic manipulation. The accurate inference and represen- tation of the 3D environment is critical, as it provides the geometric information needed for motion planning, collision detection, object recognition and visual servoing. This 3D environment modeling is also an essential process for SLAM. SLAM is the process of incrementally building a map for the unknown environment (Map building) and calculating the location of robot relative to this map(Localization) at the same time. A large number of methods for SLAM have been proposed for indoor navigation of a mobile ro- bot[1][3][4][5][14][16]. For SLAM, motions required to get a complete 3D environment from a sequence of images are used as visual odometry that is one of motion evidences for localization. We also use 3D modeling results for map matching to calculate where a robot is. In this paper, we propose two effective key methods for 3D environment modeling in a sequence of images provided from a stereo camera: (i) Planar patch segmentation, (ii) 3D scene registration. Passive stereo works well in highly textured environ- ments, but fail in textureless ones, such as the door of a re- frigerator. See for example the bottom of bookshelf in Fig 1. Although there is a door on this bookshelf, it is not captured by the stereo system. Due to the sparse nature of information about the environment, a robot planner may generate a path going through the bookshelf, which would cause an unanti- cipated collision. R. ncil e data(Red dot circle) Ipoints To solve this problem, we propose a novel method to segment vertical planar patches by integrating both intensity image and sparse range data. These extracted patches are also used to reconstruct holes in the range data. Note that vertical planar patches are very common in man-made environments. We then present a novel 3D scene matching algorithm to register multiple 3D representations of a scene seen from different viewpoints, which exploits the properties of the descriptions generated above. Both intensity and range data are used efficiently in order to find corresponding planar patches between two scenes. We use the projection matrix of camera to calculate a set of matched planar patches without time consuming search process. To enhance the performance of algorithms, we assume the following: the position of the ground is known (this in- formation can be easily obtained from the robot configura- tion) and many objects in the environment contain planar faces orthogonal to the ground. In Section 2, we review previous work related to this topic. The overview of proposed algorithms are given in Section 3. Section 4 explains the planar patch segmentation approach. The planar patch based scene registration is pre- sented in section 5, followed by some experimental results in section 6. The paper concludes with a discussion of future work. II. RELATED WORK A. Planar patch segmentation A typical method for plane extraction is region growing [1], in which an initial mesh is used to iteratively add neighboring triangles. This approach requires many compu- tational steps and is sensitive to noisy data because region grows locally. Another widely used algorithm is parametric model fitting, generally used with robust statistics, such as RANSAC(Random Sample consensus) [2]. Nuichter et al. [3][4] suggested environment modeling algorithms based on planar features for approximating/refining the environment and compensating for noise of 3D points provided by a 3D laser range finder. The main drawback of RANSAC based 978-1-4244-1635-6/07/$25.00 ©2007 IEEE. TB1 -2 516

Transcript of [IEEE RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive...

Page 1: [IEEE RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication - Jeju, South Korea (2007.08.26-2007.08.29)] RO-MAN 2007 - The 16th IEEE International

16th IEEE International Conference on Robot & Human Interactive CommunicationAugust 26 - 29, 2007 / Jeju, Korea

Planar Patch based 3D Environment Modeling with Stereo Camera

Eunyoung Kim', Gerard Medioni2, Fellow, IEEE, and Sukhan Lee3, Fellow, IEEE

1,2 Eunyoung Kim and Gerard Medioni, Computer Science Department, University of Southern California, Los Angeles, USA,e-mail: (eunyoung.kim, medioni)@usc.edu

3Sukhan Lee, School of Information and Communication Engineering, Sungkyunkwan University, Suwon, South Korea,e-mail:[email protected]

Abstract- We present two robust and novel algorithms tomodel a 3D environment using both intensity and range dataprovided by an off-the-shelf stereo camera.

The main issue we need to address is that the output of thestereo system is both sparse and noisy. To overcome this limi-tation,, we detect planar patches in the environment by regionsegmentation in 2D and plane extraction in 3D. The extractedplanar patches are used not only to represent the workspace,but also to fill holes in range data.

We also suggest a new planar patch based scan matchingalgorithm to register multiple views, and to incrementallyaugment the description of the 3D workspace in a sequence ofscenes.

Experimental results on real data show that planar patchsegmentation and 3D scene registration for environment mod-eling can be robustly achieved by the proposed approaches.

I. INTRODUCTION

3D environment modeling techniques have been inves-tigated to interpret a workspace in the context ofSLAM(Simultaneous Localization And Map building) androbotic manipulation. The accurate inference and represen-

tation of the 3D environment is critical, as it provides thegeometric information needed for motion planning, collisiondetection, object recognition and visual servoing. This 3Denvironment modeling is also an essential process for SLAM.SLAM is the process of incrementally building a map for theunknown environment (Map building) and calculating thelocation of robot relative to this map(Localization) at thesame time. A large number ofmethods for SLAM have beenproposed for indoor navigation of a mobile ro-

bot[1][3][4][5][14][16]. For SLAM, motions required to geta complete 3D environment from a sequence of images are

used as visual odometry that is one of motion evidences forlocalization. We also use 3D modeling results for map

matching to calculate where a robot is.In this paper, we propose two effective key methods for

3D environment modeling in a sequence of images providedfrom a stereo camera: (i) Planar patch segmentation, (ii) 3Dscene registration.

Passive stereo works well in highly textured environ-ments, but fail in textureless ones, such as the door of a re-

frigerator. See for example the bottom of bookshelf in Fig 1.Although there is a door on this bookshelf, it is not capturedby the stereo system. Due to the sparse nature of informationabout the environment, a robot planner may generate a pathgoing through the bookshelf, which would cause an unanti-cipated collision.

R. ncil e

data(Red dot circle)Ipoints

To solve this problem, we propose a novel method tosegment vertical planar patches by integrating both intensityimage and sparse range data. These extracted patches are alsoused to reconstruct holes in the range data. Note that verticalplanar patches are very common in man-made environments.

We then present a novel 3D scene matching algorithmto register multiple 3D representations of a scene seen fromdifferent viewpoints, which exploits the properties of thedescriptions generated above. Both intensity and range dataare used efficiently in order to find corresponding planarpatches between two scenes. We use the projection matrix ofcamera to calculate a set of matched planar patches withouttime consuming search process.

To enhance the performance of algorithms, we assume

the following: the position of the ground is known (this in-formation can be easily obtained from the robot configura-tion) and many objects in the environment contain planarfaces orthogonal to the ground.

In Section 2, we review previous work related to thistopic. The overview of proposed algorithms are given inSection 3. Section 4 explains the planar patch segmentationapproach. The planar patch based scene registration is pre-

sented in section 5, followed by some experimental results insection 6. The paper concludes with a discussion of futurework.

II. RELATED WORK

A. Planarpatch segmentation

A typical method for plane extraction is region growing[1], in which an initial mesh is used to iteratively addneighboring triangles. This approach requires many compu-

tational steps and is sensitive to noisy data because regiongrows locally. Another widely used algorithm is parametricmodel fitting, generally used with robust statistics, such as

RANSAC(Random Sample consensus) [2]. Nuichter et al.[3][4] suggested environment modeling algorithms based on

planar features for approximating/refining the environmentand compensating for noise of 3D points provided by a 3Dlaser range finder. The main drawback of RANSAC based

978-1-4244-1635-6/07/$25.00 ©2007 IEEE.

TB1 -2

516

Page 2: [IEEE RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication - Jeju, South Korea (2007.08.26-2007.08.29)] RO-MAN 2007 - The 16th IEEE International

approach is that it is difficult to determine an appropriatethreshold for various environments and sensors.

EM(Expectation Maximization) based plane extractionwas proposed by Liu et al. [5] to reduce the plane extractionproblem to a computation of eigen values by introducingLagrange multipliers. Lakaemper[6] also introduced a me-thod for segmenting planar structures using extended EM.

Unfortunately, all of the above mentioned algorithmsare most appropriate for data acquired from a 3D laser rangefinder, so scanning time becomes an issue. They also assumea dense distribution of the 3D point cloud.

Recently, some researchers have proposed planar patchsegmentation algorithms from range data provided by stereocamera. Lee et al. [7] found planes in the workspace bycombining the SIFT features [26] and RANSAC. Murray[8]presented an algorithm for generating environment modelsby segmenting the scene viewed from a stereo camera intorectangular planar surface through the use of the patchlets,surface element data structure. These algorithms, however,will fail ifmany patches have limited texture.

To extract planar patches from sparse and noisy depthdata, Cobzas[9] suggested an algorithm to find planarpatches by using both intensity and range data. Regionsegmentation and edge based Delaunay triangulation[27] in2D are applied to find rectangular regions in the environment.Some of extracted regions are eliminated using depth data.

B. Scene Registration

Many techniques have been introduced for 3D scenematching. The most general algorithm is ICP(IterativeClosest Points), originally suggested by Besl and Mckay [10]and by Chen[1 1] and Zhang[ 12]. Ever since, many variantsof ICP have been proposed, and a recent survey and analysiscan be found in [13]. The main problem of ICP is that theresults are not good when the initial relative pose is unknownand in the presence of a large number of outliers.

Another popular approach for scene matching is to usesparse feature. Surmann et al. [14] showed a scan matchingalgorithm based on edge features, but the matching resultsare not satisfactory despite the fast computational speed.SIFT (scale-invariant feature transform) [7][15][16][17] is avery widely used feature for scene registration. Many re-searchers have used SIFT based registration for SLAM andmanipulation due to its good performance. However, it mayfail in the textureless environment such as corridor. He[ 18]introduced a plane-based registration algorithm using an in-terpretation tree from range images scanned by 3D laserscanners.

C. Hole Filling

Most hole filling approaches reconstruct holes in depthdata by looking at neighboring 3D points and interpolatingthem[19][20][21]. In [20], existing surfaces are diffused tofill holes in 3D point cloud. Liepa[21] filled holes by inter-polating the shape and density of the surrounding mesh.

III. OVERVIEW OF THE PROPOSED METHOD

The overall process of the proposed methods is illu-strated in the flowchart of Fig 2. The first stage generates theinput data. A stereo camera rig provides both range data and

*0S@

Frit e. TV

Fig 2. Flowchart of our system

two images. If there is enough texture in the environment,then a dense depth map can be obtained. Otherwise, only asparse set of 3D points is provided. The second stage servesto extract vertical planar patches in the workspace. Planarpatches are found by utilizing both intensity and range data.The initial region segmentation in 2D is performed by awell-know region segmentation algorithm. Then, the 3Dpoints in each region are used to estimate whether a planarpatch fits for that region. If there fit is good, all 3D points onthe planar patches can be generated from the plane equationand the projection matrix of the camera.

Finally, the relative motion between two consecutiveviews should be calculated to map the entire 3D workspace.To do this, we first estimate a set of possible motions byplanar patch matching and count the number of matchedplanar patches which are consistent with this transformation.Each motion candidate gets as many votes as the number ofmatched pairs. The motion which obtains the largest numberof votes is chosen as a final motion for scene registration.

IV. PLANAR PATCH SEGMENTATION

It is difficult to recover 3D data from sparse 3D datasince we have few specific clues to estimate the geometry ofholes. Therefore, to simplify the problem, we only inferdense 3D maps for planar patches. In other words, if a planefits the depth data in the region well, holes in the region canbe filled with appropriate data in 3D. Fig 3. shows a flow-chart of the method for planar patch segmentation.First of all, mean shift [22] is used to segment the image intoregions. For every region, we check whether a segmentedregion has planar geometry, and then reconstruct holes indata by following process.

Because we assume that all planes to be used for fillingholes are orthogonal to the ground plane, each planar patch ismapped into a 2D line segment when projected onto theground plane.

517

N40 9 9 0

Page 3: [IEEE RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication - Jeju, South Korea (2007.08.26-2007.08.29)] RO-MAN 2007 - The 16th IEEE International

P lifdigo~

Fig 3. Flowchart of the proposed algorithm for planar patch segmentation

Based on this fact, the parameters representing the planecan be estimated from 3D points belonging to the region.First, all 3D points in the region are projected onto theground plane (See Fig 4.(c)) and then a HT(Hough Trans-form) [23] is used to extract 2D lines which represent poss-ible planes(See Fig 4.(d)). The candidate which has thelargest number of points is considered as a correct plane forthat region, since a plane is required to fit well with 3D pointsin the region.

Additionally, some redundant and spurious planes canbe eliminated by checking whether 3D points in the regionare around the boundary of the estimated plane (see Fig 5.)

Given planes and projection matrix of camera, 3Dpoints (X, Y,Z) inside the holes can be recovered by followingequation.

Pa 0 b0[Projection matrix M = 0 a c 0

-0 0 1 0-

where x, y are pixel coordinates in the image, a, b, c areelements of the projection matrix, and n(n1,n2,n3), d are pa-rameters representing the plane.

Fig 6 illustrates the result of hole filling. The holes inthe bookshelf and partition are filled with dense 3D points.

The complexity ofthis algorithm is O(RNM), where R isthe number of regions segmented and O(NM) is the com-plexity of the HT for an N*Mimage[24].

In this paper, we have use the HT to extract line seg-ments in order to quickly prove feasibility of our approach.We intend to replace the HT module by tensor voting, a ro-bust methodology for extracting features such as line, curveand surface in ND data[25], in order to overcome the draw-backs of the HT. HT often extracts spurious and inaccurateline segments because it ignores the locality of input data asshown in Fig 7. Fig 7(Right) shows the result of HT. Spu-rious line segments for bottom points are found by HT.

(a) Tested area(red dot circle) in the (b) 3D points in the regionscene

(a) Points projected to the groundFig 4. Line segment extraction

(b) Lines extracted by HT

Fig 5. Examples of correct(left) and incorrect(right)

aX+bZ aY+bZx= z Y = Z

(-p = MP,P = Y) ,p (y))z z

aX +(b x) -nlX-n2y- d_) 0n3

aY+(c y) -nlX-n2Y-d ) 0n3

z -nX -n2y- d

n3In3a -nl(b -x) -n2(b - P)l rX _d(b - x)-nl (c - y) n3a - n2(c -y) LY Ld(c - y)

- d(b -x) + n2(b -x)Yn3a- nl(b -x)

Yd (b - x)(-nl (c - y)) - (n3a - n1(b - x)) * d(c - y)

(nln2(b - x)(c - y) - (n3a - nl(b - p)(n3a- n2(c - y))

Z -n,X -n2Y d

n3

(a) Original 3D data

Fig 6. The result of hole filling

(a) Test image

Fig 7. Result of Hough Transform

(b) Recovered 3D data

(b) Line segmentsy HT

518

Page 4: [IEEE RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication - Jeju, South Korea (2007.08.26-2007.08.29)] RO-MAN 2007 - The 16th IEEE International

V. SCENE REGISTRATION

In this section, we introduce an algorithm for 3D sceneregistration that estimates a motion between two consecutiveviews to construct a more complete, merged 3D environ-ment.

We assume that all objects in the environment are staticand that the ground plane is known. In this case, we candecompose the 6 degree-of-freedom transformation into a3D rotation bringing the ground plane into correspondence inthe two images, followed by a 3 dof transform: Once theground plane is identified and matched, the rigid motioninduced in the image is a rotation with an axis normal to theground plane and a translation on the ground plane.

TCt+1 = PtTPt+1Tct+lCt Ct Pt Pt+i

where TPt is a transformation matrix from the camera frameCt

to the ground frame at t, TCt+1 is a transformation matrixPt+1

from the ground frame to the camera frame at t+1 and TPt+1is a 2D motion between two scenes in the ground frame.Therefore, this second motion to be estimated (Tpt+1) con-sists of three unknown components (two for a 2D translationand one for a 2D rotation) due to the assumption that the poseof ground is already known( TPt TCt+l), even though thegeneral camera motion has six unknown components (threetranslations and three rotations).

The most crucial part of estimating the motion para-meters is to find the plane correspondences between t and t+ 1.Once those are computed, the relative motion between viewsis easily calculated.Fig 8 shows an overview of our algorithm.

Based on the fact that colors of matched planar patchesat t and t+1 are similar as long as the motion of camera isfairly small, each patch at t can have a set of possible cor-responding patches at t+1 which have a similar color distri-bution.

Given a set of matched planes between two images, weare able to calculate a set ofpossible motions. In other words,when at least one pair of matched planes at t and t+1 exists,the relative rotation and translations can be inferred by usinggeometric information such as the normal to the plane.

After extracting a set of possible motions, the followingprocess is applied to every motion candidate to decide thenumber ofvotes, and all planes extracted from scenes at t andt+1 are used to find the best motion.

Every patch at t is transformed to the camera coordinatesat t+1 by applying one of the motion candidates transformsas shown in Fig 9(b) and (c). Then, all other patches areprojected to the image plane of the camera at t+1 (camera2)to find the corresponding plane at t+1. If the parameters ofthe corresponding patch and transformed patch (See Fig 9(b)and (c)) are similar, the correspondence is considered valid.If the number of accepted pairs is larger than two, it is as-signed the number of supporting votes as weight. The motionwith the largest number of votes is chosen as the one de-scribing the relative motion between consecutive images.

Fig 10 and 11 show experimental results of 3D sceneregistration in a sequence of images.

oar3f e le ve plat ar pal rEiathr by onp3nm 4ofasegented

Trhisft&i ai &iardtbW by pai3bgdf&&q i

g*#O ge A t t ig &+l ft # f XM i

inin erepd_ e b;ee 8z 1_e

n r ft th «emobion

Detm th beib tmtionppIygto evefy adide

Fig 8. Flowchart of the proposed algorithm for 3D scene registration

cl c2

cl c~~~~~~2

(a) Experimental environment and extracted planes at camera cI and c2(cI: camera at t, c2:camera at t+1)

cl C2

c2

(b) In case of correct motion (Solid lines: transformed planes, Dot lines:planes at t+1)

Fig 9.... Finding correspondences bet n ps.....

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.. i,,,....l ... =/./.gl |/iJ 11l~~~~~~~~~~cc

c2

(c) In case of incorrect motion (Solid lines: transformed planes, Dotlines: planes at t+l)

Fig 9. Finding correspondences between patches

519

Page 5: [IEEE RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication - Jeju, South Korea (2007.08.26-2007.08.29)] RO-MAN 2007 - The 16th IEEE International

(a) Reference image and range dat (b) Reference image and range dataa at t at t+1

(c) Planes extracted and matching (d) Registration resultFig 10. Experimental result of registration (Cereal box environment)

(a) Reference image and range data (b) Reference image and range dataat t at t+1

(c) Planes extracted and matching (d) Registration resultFig 11. Experimental result of registration (Bookshelf environment)

The complexity of this approach is O(N)(N is thenumber of regions segmented at t).

We measure the accuracy of our approach as follows.After transforming the scene at time t to the scene at t+1 bythe selected motion, we compute the distance between aplanar patch transformed and 3D points at t+1 in the spaceoccupied by the planar patch. If the motion is correctly es-timated, then the distance is small, because both the patchand 3D points present the same face. We define this distanceas the scene registration error. The average error of scenematching in the bookshelf and cereal environments are2.99cm and 1.66cm respectively. The absolute error in thecereal environment is lower because the scene is taken at acloser distance from the camera[8].

We intend to improve our matching module using ten-sor voting. We can easily integrate tensor voting with ouralgorithm based on 3D scene registration algorithm thatcombines the power of expression of geometric algebra with

the robustness oftensor voting to find the correspondences oftwo 3D point sets undergoing a rigid transformation[25].

VI. EXPERIMENTAL RESULTS

We have implemented the proposed algorithms using astereo camera mounted on the end-effector of an arm with aneye-on-hand configuration. The 3D point cloud and the 2Dreference image of the workspace can be acquired from thestereo camera on the fly. Fig. 12 shows the "Bumblebee"stereo camera used for experiments and the eye-on-handconfiguration.

The algorithms consist of two major tasks: (i) planarpatch segmentation and (ii) 3D scene registration. Fig 13 and14 show the experimental results for each ofthese 2 tasks. Asshown in Fig 13 and 14, both the reference image and therange data are used to find vertical planar patches. The mid-dle row of both figures illustrate the matching of planarpatches between consecutive views. The relative motion isthen calculated (See the bottom of Fig 13 and 14). Our me-thod was tested on thirteen consecutive scenes in bothhigh(Fig 13) and low (Fig 14) texture environments.The computational time mainly depends on the number ofregions because all suggested approaches are iteratively ap-plied for every region.

Table 1 shows an average computational time of theproposed modules. The computational time of algorithmsincreases linearly with the number of regions. The perfor-mance of the segmentation module is also affected by thesize of the segmented regions, as the size of regions deter-mines the resolution of the accumulator image in the HT.

The performance of the proposed algorithms can beimproved by utilizing the GPUs(Graphic Processing Units)which can support the parallel processing.

Because of the assumptions we make, our approachescan fail in environments with very few vertical planar faces.Also, the method relies on a good ground registration trans-formation, and may thus fail if this is not the case.

Fig 12. Bumblebee stereo camera(left) and eye-on-hand configura-tion(right)

Fig 13. A result of scene registration. (High texture environment)

520

11

Page 6: [IEEE RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication - Jeju, South Korea (2007.08.26-2007.08.29)] RO-MAN 2007 - The 16th IEEE International

Regiatered sce xvesFig 14. A result of scene registration. (Low texture environment)

Table 1. Computational TimeHigh texture Low texture

Region segmentation 780ms 855ms(Mean shift)

Planar patch extraction+ hole 627ms 650msfilling

3D scene registration 348ms 48ms

VII. CONCLUSION AND FUTURE WORKS

We have presented two effective and novel modulesusing both intensity and range data captured by a stereocamera: (i) planar patch segmentation and (ii) scene regis-tration. Planar patches are extracted by well-known regionsegmentation algorithm and range data. After planar patchesare found, they are used to recover 3D point cloud on theplanar patches. We also introduce a new method for 3Dscene registration using planar patches.

The performance and robustness of the proposed ap-proach are evaluated on real data and the experimental re-sults demonstrate that the proposed methods efficiently pro-vide planar patches and relative motion between views.

Future work addresses the following issues. The currentimplementation is not real-time. We can expect an im-provement of performance by implementing proposed algo-rithms on GPUs. Another research plan is integrating tensorvoting into our proposed methods to enhance the stabilityand robustness.

VIII. ACKNOWLEDGMENT

This research was performed for the Intelligent Robot-ics Development Program, one of the 21St century FrontierR&D Programs funded by the Ministry of Commerce, In-dustry and Energy of Korea.

IX. REFERENCES

[1] D. Hahnel, W. Burgard, and S. Thrun. ,"Learning Compact 3D Modelsof Indoor and Outdoor Environment with a Mobile Robot". Robotics andAutonomous Systems 44(1), 2003, pp.15-27.[2] M.A. Fischler and R.C. Bolles, "Random sample consensus: a paradigmfor model fitting with application to image analysis and automated carto-graphy", Commun. Assoc. Comp. Mach., vol. 24, 1981, pp. 381 - 395.[3] A. Ntuchter, H. Curmann, and J. Hertzberg, "Automatic model refine-ment for 3D reconstruction with mobile robots," in Proceedings of the 4th

IEEE International Conference on Recent Advances in 3D Digital Imagingand Modeling(3DIM'03),2003, pp.394 - 401.[4] A. Nuchter, H. Surmann, K. Lingemann, and J. Hertbeg. "SemanticScene Analysis of Scanned 3D Indoor Environments," in Proceedings ofthe8th International Fall Workshop Vision, Modeling, and Visualization2003(VMV '03), 2003, pp. 19-21.[5] Y. Liu, R. Emery, D. Chakrabarti, W. Burgard, and S. Thrun. "UsingEM to Learn 3D Models of Indoor Environments with Mobile Robots". inProceedings of the International Conference on Machine Learn-ing(ICML '01), 2001.[6] R. Lakaemper and L.J. Latecki, "Using Extended EM to Segment PlanarStructures in 3D," in Proceeding of the 18th International Conference onPattern Recognition(ICPR06), 2006 ,, pp. 1077-1082[7] S. Lee, D. Jang, E. Kim, S. Hong, and J. Han, "A Real-Time 3DWorkspace modeling with Stereo Camera," in Proceeding of IEEEIRSJInternational Conference on Intelligent Robots and Systems(IROS'05),2005, 2140-2147[8] D. Murray and J. J. Little, "Environment modeling with stereo vision," inProceeding ofIEEEIRSJInternational Conference on Intelligent Robots andSystems(IROS'04), 2004, pp. 3116-3122[9] D. Cobzas and H. Zhang, "Planar Patch Extraction with Noisy DepthData," in Proceedings of the 3rd International Conference on 3-D DigitalImaging and Modeling (3DIM'01), 200 ,pp. 240-245[10] P.Besl and N. McKay, "A method for registration of 3D shapes," IEEETransactions on Pattern Analysis andMachine Intelligent, vol. 14, 1992, pp.239-256.[11] Y. Chen and G. Medioni. "Object modeling by registration of multiplerange images," Image and Vision Computing, vol.10, no.3, 1992,pp. 145-155.[12] Z. Zhang. "Iterative point matching for registration of freeform curvesand surfaces," International Journal ofComputer Vision, vol. 13, no.2,1994,pp.119-152.[13] S. Rusinkiewicz and M. Levoy, "Efficient variants of the ICP," inProceedings of the 3rd International Conference on 3-D Digital Imagingand Modeling (3DIM'01), 2001 pp. 145-152[14] H. Surmann, A. Ntuchter, and J. Hertzberg, "An autonomous mobilerobot with a 3D laser range finder for 3D exploration and digitalization ofindoor environments," Journal Robotics and Autonomous Systems,vol. 45,no. 3-4, December,2003, pp. 181-198.[15] D. Huber, 0. Carmichael and M. Hebert, " 3-D Map reconstructionfrom range data," in Proceedings of the IEEE Conference on Robotics andAutomation(ICRA'00), 2000, pp. 891-897.[16] M. A. Garcia and A. Solana, "3D Simultaneous localization and mod-eling from stereo vision," in Proceedings ofIEEE International Conferenceon Robotics andAutomation (ICRA'04), 2004, pp. 847-853.[17] S. Se, D. Lowe and J. Little, "Vision-based mapping with backwardcorrection," in Proceedings of IEEEIRSJ International Conference on In-telligent Robots and Systems(IROS'02), 2002, pp. 153-158[18] Wenfeng He, Wei Ma and Hongbin Zha, "Automatic Registration ofRange Images Based on Correspondence of Complete Plane Patches," inProceedings of the Fifth International Conference on 3-D Digital Imagingand Modeling (3DIM'05),2005, pp. 470 - 475[19] S. Kim and W. Woo, "Projection-based Registration Using a Mul-ti-view Camera for Indoor Scene Reconstruction," in Proceedings of theFifth International Conference on 3-D Digital Imaging and Modeling(3DIM'05),2005, pp. 484-491[20] J. Davis, S.R. Marschner, M. Garr, M. Levoy, "Filling Holes in Com-plex Surfaces using Volumetric Diffusion", in Proceedings of First Inter-national Symposium on 3D Data Processing, Visualization, Transmis-sion ,2002, pp. 428-861[21] P. Liepa, "Filling Holes in Meshes", in Proceedings of the Euro-graphics/ACM Siggraph symposium on Geometry processing, 2003, pp.200-205.[22] Comaniciu, D., Meer, P., "Mean shift: A robust approach towards.feature space analysis", IEEE Trans. Pattern Analysis and Machine. Intel-ligence 24, 2002, pp. 603-619.[23] Duda, R. 0. and P. E. Hart, "Use of the Hough Transformation toDetect Lines and Curves in Pictures," Comm. ACMV, Vol. 15, January, 1972,pp. 11-15.[24] B. S. Morse, Lecture 15: Segmentation (Edge based, Hough Transform),Brigham Young University, 1998-2000[25] L. Reyes, G. Medioni, E. Bayro, Registration of 3D Points UsingGeometric Algebra and Tensor Voting, International Journal of ComputerVision, To appear[26] Lowe, D. G., "Distinctive Image Features from Scale-Invariant Key-points", International Journal ofComputer Vision, 60, 2, 2004, pp. 91-1 10.[27] B. Delaunay, Sur la sphere vide, Izvestia Akademii Nauk SSSR, Ot-delenie Matematicheskikh i Estestvennykh Nauk, 7: 1934, pp.793-800

521