The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and...

20
http://ijr.sagepub.com/ Robotics Research The International Journal of http://ijr.sagepub.com/content/31/1/82 The online version of this article can be found at: DOI: 10.1177/0278364911425836 2012 31: 82 originally published online 31 October 2011 The International Journal of Robotics Research Liila Torabi and Kamal Gupta An autonomous six-DOF eye-in-hand system for in situ 3D object modeling Published by: http://www.sagepublications.com On behalf of: Multimedia Archives can be found at: The International Journal of Robotics Research Additional services and information for http://ijr.sagepub.com/cgi/alerts Email Alerts: http://ijr.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://ijr.sagepub.com/content/31/1/82.refs.html Citations: What is This? - Oct 31, 2011 Proof - Jan 12, 2012 Version of Record >> at Columbia University on January 17, 2012 ijr.sagepub.com Downloaded from

Transcript of The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and...

Page 1: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

http://ijr.sagepub.com/Robotics Research

The International Journal of

http://ijr.sagepub.com/content/31/1/82The online version of this article can be found at:

 DOI: 10.1177/0278364911425836

2012 31: 82 originally published online 31 October 2011The International Journal of Robotics ResearchLiila Torabi and Kamal Gupta

An autonomous six-DOF eye-in-hand system for in situ 3D object modeling  

Published by:

http://www.sagepublications.com

On behalf of: 

  Multimedia Archives

can be found at:The International Journal of Robotics ResearchAdditional services and information for     

  http://ijr.sagepub.com/cgi/alertsEmail Alerts:

 

http://ijr.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

http://ijr.sagepub.com/content/31/1/82.refs.htmlCitations:  

What is This? 

- Oct 31, 2011Proof  

- Jan 12, 2012Version of Record >>

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 2: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

An autonomous six-DOF eye-in-handsystem for in situ 3D object modeling

The International Journal ofRobotics Research31(1) 82–100© The Author(s) 2011Reprints and permission:sagepub.co.uk/journalsPermissions.navDOI: 10.1177/0278364911425836ijr.sagepub.com

Liila Torabi and Kamal Gupta

AbstractWe present an integrated and fully autonomous eye-in-hand system for 3D object modeling. The system hardware consistsof a laser range scanner mounted on a six-DOF manipulator arm and the task is to autonomously build a 3D model ofan object in situ where the object may not be moved and must be scanned in its original location. Our system assumesno knowledge of object shape or geometry other than that it is within a bounding box whose location and size are knowna priori, and, furthermore, the environment is unknown. The overall planner integrates the three main algorithms in thesystem: one that finds the next best view (NBV) for modeling the object; one that finds the NBV for exploration, i.e.exploring the environment, so the arm can move to the modeling view pose; and finally a sensor-based path planner,that is able to find a collision-free path to the view configuration determined by either of the the two view planners. Ourmodeling NBV algorithm efficiently searches the five-dimensional view space to determine the best modeling viewpoint,while considering key constraints such as field of view (FOV), overlap, and occlusion. If the determined viewpoint isreachable, the sensor-based path planner determines a collision-free path to move the manipulator to the desired viewconfiguration, and a scan of the object is taken. Since the workspace is initially unknown, in some phases, the explorationview planner is used to increase information about the reachability and also the status of the modeling view configurations,since the view configuration may lie in an unknown workspace. This is repeated until the object modeling is complete orthe planner deems that no further progress can be made, and the system stops. We have implemented the system with asix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system is ableto autonomously build a 3D model of an object in situ in an unknown environment.

KeywordsPath planning for manipulators, range sensing

1. Introduction

Automated model acquisition of 3D objects is an importantcapability in a variety of application domains, includingautomatic inspection, manipulation, reverse engineering,and service robotics. Such an automated system, in ouropinion, comprises three broad modules: (i) 3D modelbuilding which primarily concerns itself with the scanningand modeling of the object, and essentially deals withmachine vision/graphics aspects; (ii) view planning, i.e.to determine the scanner pose from which to scan theobject, which has both vision and robotics aspects; and (iii)moving the scanner positioning mechanism (a robot or aturntable) to desired scanning poses, which brings in thepath planning aspect.

The pure vision/graphics aspects are encapsulated in the3D model building cycle, shown in the left block in Figure1,consisting of three main phases: a) scan, b) register, andc) integrate. A scan is taken from a camera pose, and

acquired range images are registered using standard tech-niques such as the Iterative Closest Point (ICP) method(Besl and McKay 1992; Rusinkiewicz and Levoy 2001).Finally, registered images are combined into a single objectmodel, a process commonly known as integration. Thecycle is iterated until a complete model is built or sometermination criteria are satisfied. An example of this wouldbe a person using a handheld scanner, such as a PolhemusFastSCAN (Polhemus 2011), to take multiple scans, and the3D model building cycle being carried out by the system.Note that steps (ii) and (iii) are carried out by the user andthere is no autonomy here.

School of Engineering Science, Simon Fraser University, Burnaby, Canada

Corresponding author:Liila Torabi, Robotic Algorithms & Motion Planning (RAMP) Lab, Schoolof Engineering Science, Simon Fraser University, Burnaby, BC V5A 1S6,Canada.Email: [email protected]

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 3: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

Torabi and Gupta 83

Scan

Integrate

Register

Plan the Next Best View(NBV)

Plan a Path for the Robotic Positioning system

Update the W-space & C-space

3D modeling cycle

Get view configuration via Inverse Kinematics

Move to the viewpoint

Fig. 1. Components of a fully autonomous 3D object acquisitionsystem (* represents path planning aspects)

To automate this process, step (ii), i.e. view planning(also called next best view or NBV) and step (iii), i.e. pathplanning, are added as shown in Figure1. The next bestview problem is to determine the best scanner pose to scanthe object from. The primary purpose of an NBV algorithmis to ensure that the entire surface of an object will bescanned. The preference is to minimize the number ofscans taken to build the model, since scanning, processingthe scans, and integrating them into one object model isoften quite time consuming. This step, however, obviouslydepends on the positioning mechanism on which the scan-ner is mounted. Most published works in computer visionplan the next best view based on large and fixed setups thatpossess limited mobility, e.g. a turntable.1 The viewpointspace is mostly limited to one or two degrees of freedom(DOFs), and at best provides limited quality/coverage mod-els for a restricted class of objects. They also all assume thatthe desired viewpoint is reachable. For a truly automatedsystem, particularly for in situ objects, where the objectmust be scanned in its place (e.g. imagine an objet d’art ina museum that cannot be moved from its display location),we argue that there is a need for at least a six-DOF position-ing mechanism, e.g. a manipulator arm, so that the objectcan be viewed from an arbitrary viewpoint, significantlyincreasing the likelihood that a complete model of an objectcould be obtained. The challenge, of course, is to keep theNBV computationally feasible, since the space of possibleviewpoints increases tremendously. We show in this workthat indeed such a goal is achievable with efficient NBValgorithms coupled with the great increase in computingpower of standard PCs in recent years. Furthermore, usinga six-DOF (or even higher) manipulator as a scannerpositioning device necessitates the use of path planning tomove the robot while avoiding collisions with obstacles,which must also be sensed by the robot so that it can avoid

them, since the environment is initially unknown. Whilesuch coupling has been explored in site modeling works(Sequeira et al. 1996; Blaer and Allen 2007), the “position-ing mechanism” is a mobile robot and, in addition, the NBVproblem has a different flavor in these works. Along anotherline, several works have proposed and implemented sensor-based path planners for eye-in-hand systems in unknownenvironments, but these works did not consider objectmodeling and the associated constraints such as overlapbetween images (Kruse et al. 1996; Renton et al. 1999; Yuand Gupta 2004). Such sensor-based path planners, as inYu and Gupta (2004), use a different type of view planner,an exploration view planner, that plans views to explorethe environment so that the robot can move around. Forinstance, in our context, the best view configuration deter-mined for scanning the object may itself be unreachablebecause much of the environment is unknown to the robotor a part of the configuration itself may lie in unknownspace. In such cases, another view must be planned to makemore of the workspace known. (This exploration viewplanner is embedded within the path planner in Figure 1and is not explicitly shown so as not to clutter the diagram.)

More recently, motivated by the need for autonomy, sev-eral works in the humanoid and service robotics area haveaddressed integrating object recognition, path planning, andgrasping, such as the DESIRE project of European Union(Kuehnle et al. 2009; DESIRE 2011). However, in thesesystems, usually the main task involves grasping or recog-nition, and thus they exploit pre-built object models tofind a suitable viewpoint for either grasping or recogniz-ing (Eidenberger et al. 2009). Furthermore, there is no tightcoupling between moving the sensor and moving the arm,as is the case in our system, where the sensor is mountedon the arm. As a result there is no view planning for explo-ration involved, whereas it is an integral component of ouroverall system; in fact, this additional maneuverability is akey advantage of our system. Foissotte et al. (2010) presentsan object modeling system via a head-mounted stereo rangecamera on a humanoid. Feasible postures of the humanoidare generated via a two-step NBV algorithm (see Section 2for more on it). A key difference from our work is thatthe environment (other than the object to be modeled) isconsidered known and hence there is no view planning forexploration, and furthermore no path planning is integratedin the system.

In this paper we present integrated view planningand motion planning for a six-DOF eye-in-hand objectmodeling system. Two key contributions of our work are(i) a modeling NBV algorithm that efficiently searches thefive-dimensional space (because of the symmetrical conicalshape of the scanner field of view, scanner pose only needs2-DOF in orientation, pitch and yaw – roll is immaterial) todetermine the best modeling viewpoint, and (ii) integratingthe modeling NBV with an exploration NBV (Torabi et al.2007), along with a sensor-based motion planner (SBIC-PRM: Yu and Gupta 2004), to make the desired modeling

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 4: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

84 The International Journal of Robotics Research 31(1)

view pose reachable, if required, and plan a collision-freepath to it. In our modeling NBV, the viewpoint spacerepresentation allows us to map the viewing directions forthe target areas, simply by defining a projection function.Our resulting system is a fully autonomous system capableof modeling in situ objects without any human intervention.In addition, we develop and prove an automatic terminationcriterion that can determine if the object model is complete.We show that the system is able to autonomously scanan object in environments while avoiding collisions withunknown obstacles. Each iteration, including all view andpath planning and workspace update, takes about 4 minuteswith the current system on a relatively old machine.However, with several efficiencies incorporated in thealgorithms, we are confident that these runtimes can bebrought down significantly. To the best of our knowledgethis is the first work which considers both view and pathplanning aspects for a six-DOF fully autonomous 3D objectmodeling system.

2. The NBV problem for object modeling:background and related work

The view planning problem for object modeling has beenwell studied and this section is a brief overview of the prob-lem, summarized from existing literature, e.g. Scott et al.(2003). The object surface is analyzed to determine whereto scan next, then a viewpoint which can scan that area iscomputed, and finally the positioning system should movethe scanner to that pose, if possible. Thus, the probleminvolves reasoning about the state of knowledge of threespaces: object surface S, viewpoint space V , and scan-ner/imaging workspace W . The object surface, S, is a 2Dparameterized space, analyzed to determine the target areasfor the next scan. The viewpoint space, V , defines theset of possible viewpoints, which in the general case is 6dimensional. The scanner/imaging workspace, W , is the 3Dregion in which the positioning device moves and the lasercamera takes scans, and in which visibility and collisionconstraints are addressed.

The modeling NBV is essentially composed of the fol-lowing two main subproblems (Pito 1999):

1) What to scan, i.e. determining the target areas to bescanned, and their corresponding viewing directions. Thegoal is to identify and scan previously unseen portionsof the scanner/imaging workspace, which possibly containparts of the object surface. A common approach is to userange discontinuities which indicate target surface areas notyet scanned, thus providing cues for viewing direction. Thetarget area could be either a volumetric element (octreenode or voxel) or a surface element. If the target area isa surface element, the viewing direction is usually calcu-lated by some analysis of the target area, such as normalestimation, but when the target area is a volumetric ele-ment, the viewing direction essentially points to a scannerposition in V .

2) How to scan, i.e. specifying which target areas arevisible from each viewpoint in V , given the scanner model(often called the scanner constraint). A brute-force methodfor such visibility checks would be ray-tracing. Note thatocclusion (visibility constraint) by other objects (or partsof the same object for non-convex surfaces) also needs tobe taken into account. Additional sets of constraints suchas overlap with already scanned parts between successiveviews (for registration) also need to be considered for view-point selection. These constraints involve some form ofray-tracing, and can be computationally intensive. Hencea key challenge, particularly for a high-dimensional view-point space, as in our case, is to to keep the computa-tion fast and efficient to retain the online nature of thesystem.

The existing view planning methods could be classi-fied mainly as volumetric (Connolly 1985; Reed et al.1997; Papadopoulos-Orfanos and Schmitt 1997; Massiosand Fisher 1998; Banta and et al. 2000) or surface-based(Maver and Bajcsy 1993; Whaite and Ferrie 1997; Pito1999; Chen and Li 2005; He and Li 2006; Chen et al. 2008).Volumetric methods select viewpoints by reasoning aboutthe knowledge of W , whereas surface-based methods rea-son about knowledge of S. We also classify these methodsbased on degrees of freedom of the viewpoint space V , andwhether the viewpoint is calculated by searching V basedon the entire set of target areas (global), or is computed forone target area at a time (local).

One of the earliest papers, Connolly (1985), used anoctree representation of the imaging workspace as empty,occupied, or unseen. The viewpoint space V is 2-DOF, thesurface of a sphere with view direction normal to the sphere.They developed two algorithms, one using a global searchof V in which the viewpoint with the largest unseen vol-ume is selected. This is determined via ray-tracing for allviewpoints on the sphere. Direct extension of this methodto high-dimensional view spaces would be computation-ally expensive. The second algorithm is incremental, andcarries out visibility analysis in a local manner by exam-ining faces in the octree common to both unseen andempty voxels. Banta (Banta and et al. 2000) chooses theviewpoint with the greatest number of unknown voxels.Massios (Massios and Fisher 1998) adds a quality factor(angle between estimated local surface normal and view-ing direction) to enhance the previous voxel occupancymethods.

Maver (Maver and Bajcsy 1993) presented an NBV algo-rithm for a 1-DOF viewpoint space (a turntable). Unoc-cluded view angles for target regions were determined viaray-tracing and accumulated in a histogram, and the nextbest view was determined from histogram maxima. Pito’swork (Pito 1999) was a significant contribution to the NBVliterature. He chose small rectangular patches attached tooccluding edges of the seen surface as a cuing mecha-nism for a global search of viewpoint space V . He usedan intermediate “positional space” ( PS), placed between

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 5: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

Torabi and Gupta 85

the object and the scanner workspace. It facilitates a suc-cinct representation of what must be scanned (via theuse of observation rays) and what can be scanned (viathe use of ranging rays) in a single data structure. Forthe definition of observation ray and ranging ray pleaserefer to Section 3.3. He used a turntable (1-DOF view-point space) as a positioning system, with 72 possibleviewpoints. In his work, the memory usage and pro-cessing time are both proportional to the product of thesize of the intermediate positional space and the size ofthe viewpoint space. Thus, clearly, directly extending hismethod to a high-DOF viewpoint space would be compu-tationally expensive both in memory usage and processingtime.

In our NBV algorithm, the observation rays of targetareas are directly projected to the viewpoint space V , with-out any use of the intermediate positional space. The vis-ibility of the target areas over the whole viewpoint spaceare then efficiently computed through a two-level (posi-tion followed by orientation) projection computation. Thistechnique allows us to efficiently determine the globalmaximum over the densely sampled five-DOF viewpointspace.

Several other works search the viewpoint space locally;as a result, the number of scans required could be quitelarge and hence the time taken to build the model could bequite large. For instance, Whaite (Whaite and Ferrie 1997)used the uncertainty in the modeled (using superellipsoids)object as a guide for taking the next view. Chen (Chen andLi 2005) predicts the curvature of the unknown area of theobject and uses it to drive the exploration direction. Hismethod works mostly for smooth surfaces, and has prob-lems for objects with sharp boundaries. Although he uses a6-DOF viewpoint space, since his algorithm is local there isno search involved, also no path planning is considered. InHe and Li (2006), He selects the viewpoint which gives themaximum known partial modeling boundary integral valueof the vector fields as the next best view position. In hiswork the viewpoints are located on the surface of a spherepointing to the center.

More recently, as mentioned earlier, in Foissotte et al.(2010), the object is modeled via a head-mounted stereorange camera on a humanoid. The NBV algorithm is atwo-step process. The first step takes into account sev-eral constraints due to the camera and the humanoid head.It minimizes a composite objective function that incorpo-rates several constraints. A local quadratic approximationthrough a deterministic iterative sampling is carried out at apre-selected sparse set of view positions distributed aroundthe object, and the minimum over all these runs is selectedas the best camera pose. The second step takes into accountposture constraints to generate a suitable posture of thehumanoid. Such local minimization techniques are proneto getting stuck in local minima, particularly for complexobjective functions, and fixed sampling may outperformthem (as the authors themselves point out).

3. Our modeling NBV algorithm

3.1. Notation

A brief explanation of our notation is as follows. Capitalroman letters such as V , W , S represent sets or spaces,and lower case roman letters represent their correspondingelements. Generally, subscript o denotes orientation and pdenotes position. So, for example, a viewpoint is denotedby v which consists of vp, the viewpoint position, and vo,the viewpoint orientation.

3.2. Representation of spaces W and S

We represent the workspace, W , as an octree, primarilybecause it facilitates occlusion checks in view planning, andcollision checks in path planning. The octree model is con-structed from the range image, as in Yu and Gupta (1998).Briefly, the voxel hit by the ranging ray is categorized asobstacle, the intermediate voxels along the ranging ray willbe considered as free, and the region behind the “hit voxel”stays as it was.

The seen part of the object’s surface, S, is representedwith a point cloud, which facilitates registration and alsosubsequent mesh generation to display the constructedobject model. It also aids in determining surface normals,needed for determining valid viewing directions for targetareas. Please note that since the object (whose model is tobe constructed) is also part of the workspace, it is also repre-sented in the workspace octree. This second representationis used for self-occlusion checks and to verify if a targetpoint has been scanned before.

3.3. Target patches, laser scanner model, view-point space

Each target point, tpi, is a point from the unknown regionbordering the seen part of the object surface. The small sur-face patch associated with each target point tpi, togetherwith normal vector ni, is called the target patch, and is aprimary estimation for an unseen part of the object whichis desired to be viewed in the next scan. Although the bestviewing direction for each target patch is the normal vectorni, the patch is also scannable from directions that form anangle less than or equal to the breakdown angle (θb) withthe normal vector of that surface patch.2 A ray or a viewingdirection, or, which emanates from tpi with or ·ni ≤ θb, is anobservation ray for tpi. The set of all observation rays foreach target patch will be referred as its observation cone,and it represents the set of rays that can scan tpi.

We can look at the range sensing process from eitherthe “sensed surface” perspective or from the laser scan-ner perspective. From the scanner perspective, the rangeimage obtained in each scan consists of the readings of a setof ranging rays3 emanating from the origin of the scannerframe, within the field of view (FOV) of the scanner. The

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 6: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

86 The International Journal of Robotics Research 31(1)

Fig. 2. Target points are a set of points on the occlusion surfacesocc within a distance dres from the scanned surface, such as tp1near

and tp1far

scanner field of view is a square pyramid, but we approxi-mate it as a cone (with origin at OFOV and apex angle θFOV ),hence the rotation roll angle around the cone axis does notchange the view, and the viewpoint space V is thereforefive-dimensional, 3 for the position of the cone apex, and2 for orientation (pitch and yaw angles).

V is discretized as follows: concentric spherical surfaces,denoted by VPSi, i = 1 . . . m (in our case m = 4), areused to represent the viewpoint position space, with eachspherical surface discretized to 32 × 32 patches. At eachviewpoint position, vp =( rp, θp, φp), a unit spherical surfacedefines the set of viewpoint orientations, and is discretizedto a 32 × 32 resolution. Each viewpoint orientation (corre-sponding to the axis of the FOV cone) is represented by a2-tuple vo =( θo, φo). As a result, V has 32×32×32×32×4 = 4.19 × 106 distinct viewpoints, and every viewpoint isrepresented as a five-tuple v =( vp, vo) =( rp, θp, φp, θo, φo).

3.4. What to scan

In a range image, the set of range discontinuities are keyevents, because the surface defined by these discontinu-ous ranges, known as occlusion surfaces, are the borderbetween seen and unseen areas. These discontinuities couldbe caused by either (i) non-convex areas, or (ii) a surfaceedge in a non-smooth object, or (iii) a ranging ray tangentialto the object surface. In Figure 2a, we illustrate a disconti-nuity as ray Ri hits the object at p1near , and a neighboring rayRi+1 hits the object at p1far . The line connecting p1near andp1far is an occlusion line, and a contiguous set of occlusionlines is called an occlusion surface, Socc. Any point on Socc

could be considered as a target point, but we only get the kpoints with uniform resolution within distance dres from thescanned surface, such as tp1near and tp1far , as shown in Fig-ure 2b. Key reasons for this choice are: (i) since the object’ssurface must continue into the unseen volume (from theboundary of the seen surface), this is a good place to lookfor more of the object’s surface; (ii) choosing target pointsclose to a seen surface facilitates estimation of surface nor-mals, needed for choosing viewing directions (explainedlater); and (iii) it automatically satisfies overlap constraints,essential for registration. Variations of this theme could usemore points on the occlusion surface, Socc; in this casethe viewing direction will be normal to the occlusion sur-face. This will require additional explicit checks for overlapconstraints.

After the list of target points is extracted, we need toestimate their corresponding surface normals, which areneeded to determine their valid viewing directions. To esti-mate the surface normal for each target point, tpi, first theneighbor points for tpi on the object point cloud and/orocclusion surface are computed through building a kdtree,and then the normal vector for the estimated surface of theseneighbor points is calculated based on a principal compo-nent analysis (PCA) method (Hoppe and et al. 1992; Deyand et al. 2005).

The neighborhood points for surface normal estimationare chosen differently for tpnear and tpfar as follows. For tpfar,where a part of the object surface is not scanned because ofocclusion, the unseen surface of the object close to the seenborder could be estimated by continuing the seen surface.Therefore, the points from the object point cloud, which arein the neighborhood radius of a target point, form a localestimate for its target patch. To do this efficiently, we builda kdtree for the point cloud data, and search it to iden-tify neighboring points for tp1far (Atramentov and LaValle2002).

For tpnear, as is shown in Figure 2b, it is less likely that wecan accurately estimate the surface based on it’s neighborsin the point cloud model, since the seen points in the pointcloud are not sufficient to calculate a good estimate for theunseen surface. Hence, for calculating the normal vector ofthe surface patch, the kdtree is built for the union of pointson the point cloud model and the points on Socc (where theunknown occluded surface lies).

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 7: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

Torabi and Gupta 87

At the end of this step we will have a list of target patches,by storing each tpi and their corresponding normal directionni in a list (see Algorithm 1).

Algorithm 1: Target patch list generator

Input: Last scanned range image, list of target patchesOutput: The updated list of target patchesExtract the range discontinues in the scanned image;for every i’th discontinuity do

Determine pinear and pifar ;if There is no pifar then

Compute the occlusion line OFOV pinear ;Calculate tpinear ;

elseCompute the occlusion line pinear pifar ;Calculate tpinear and tpifar ;

endStore all tpi in a list;Build a kdtree for the object pointcloud as Kdo;for every tpinear in the list do

Compute the closest neighbors for tpinear onKdo;Estimate the patch normal ninear using PCAmethod;

endBuild a kdtree for the union of object pointcloudand target points as Kdot;for every tpifar in the list do

Compute the closest neighbors for tpifar onKdot;Estimate the patch normal nifar using PCAmethod;

endendAdd all the target patches ( tpi, ni) to the existing list.;

3.5. How to scan

Having computed the set of target patches, for each patch( tpi, ni) in this set, the valid viewpoints from which tpi isvisible will be computed. For a patch centered at tpi withnormal ni, the corresponding observation cone (recall that itis a cone with apex located at tpi and oriented along ni, withan apex angle of θb) defines the set of possible viewpointpositions Vpi , from which tpi may be observed. Then foreach valid viewpoint position, the set of valid orientations,Vopi , are limited by the scanner field of view.

Note that the following calculations are for one viewpointposition sphere, VPS, and will be repeated for all VPSi, i =1 . . . m.

First, a base planar surface patch is defined at the originof the viewpoint space as {z = 0, −ε < x < ε, −ε <

y < ε}, along with the corresponding base observationcone with apex located at the origin and oriented along thez-axis. As shown in Figure 3a, the intersection of this cone

Fig. 3. Calculating the set of viewpoints that sample tpi: (b) illus-trates the observation cone which determines the set of valid view-point positions for tpi, and (c) shows the orientation cone for oneof the valid viewpoint positions, Vpi , which determines the set ofvalid orientations for this viewpoint position.

with VPS defines a spherical cap, every point on which is apossible viewpoint position for the base surface patch. Thesilhouette of this spherical cap is a circle, and is stored as alist of points, Vbsil .

The set of possible viewpoint positions for any targetpatch ( tpi, ni) is given by the intersection of its observa-tion cone and VPS, again a spherical cap, as shown in

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 8: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

88 The International Journal of Robotics Research 31(1)

Figure 3b. The silhouette of this cap, Visil , is easily cal-culated by applying a transformation to Vbsil : translationby tpi and rotation by the Euler rotation matrix for theni vector from the z axis. To calculate the points insidethe cap, all visil =( r, θ , φ) ∈ Visil (r being the radius ofVPS) are first sorted according to their φ angle. For everyφ, the corresponding θ interval is discretized, thus creat-ing a list of viewpoint positions Vpi to scan target patchtpi.

These potential viewpoint positions are now checked forocclusion, and only the non-occluded ones are kept. Notethat this leads to significant computational efficiency, sinceas the occluded view positions are discarded, all associatedview directions are also automatically discarded and neednot be calculated.

For each non-occluded candidate viewpoint positionvpi =( rpi , θpi , φpi ) associated with the target patch tpi,the ranging ray rvi is the line connecting vpi to tpi. Allviewpoints positioned at vpi , oriented such that rvi liesinside the scanner FOV cone, are a viewpoint candi-date for scanning tpi. As is shown in Figure 3c, theset of valid orientations for each viewpoint position vpi

is a cone with the apex located at vpi , axis along rvi

and apex angle of θFOV , and is called the orientationcone.

To determine this set efficiently, we define a base gazecone with apex angle of θFOV , centered at the origin (similarto the observation cone), and the points on the silhouette ofthe corresponding spherical cap are stored as Vbgsil . The setof valid orientations is then easily determined by applyinga transformation by rotating and then translating by vpi toVbgsil

. The rotation matrix is the one that rotates the z-axisto rvi . In this manner the set of viewpoints which can scantpi is determined.

After the set of candidate viewpoints for every targetpoint is identified, the viewpoints are sorted based on thenumber of visible target points and stored in a list. Theviewpoint which scans the maximum number of targetpoints is chosen as the next viewpoint for modeling.

The overall algorithm for the modeling NBV algorithmat each iteration is summarized in Algorithm 2. A keyaspect of our algorithm is that by projecting the observationcone for each specified target point on the viewpoint space,a large number of invalid viewpoints are automaticallyexcluded from further analysis, hence we avoid ray-tracingfor every viewpoint. This leads to significant computationalefficiency.

4. Path planning: moving the sensor to thedesired viewpoint

4.1. SBIC-PRM: sensor-based path planning forthe 6DOF manipulator

Having obtained the desired modeling viewpoint, inversekinematics can be used to obtain the corresponding view

Algorithm 2: Modeling NBV algorithm

Input: List of target patchesOutput: Sorted list of the viewpoints for next scanfor every target patch ( tpi, ni) in the list do

for each VPSj doCompute the Visil for tpi;Extract candidate viewpoint positions, Vpi ;for every vpi ∈ Vpi do

if Check visibility ( ( tpi, vpi ) , octree) thenCompute the Vgisil for tpi and vpi ;Extract valid orientations, Vopi ;for every valid orientation, voi ∈ Vopi

doIncrement the number of visibletarget points for vi =( vpi , voi );

endend

endend

endSort viewpoints by the number of visible target points;

configurations (see the overall algorithm in Section 4.3).Here we assume that the view configuration has beencomputed and explain the path planning technique used.We have chosen SBIC-PRM as the path planning tech-nique for moving the 6DOF manipulator to the model-ing configurations. A brief description of SBIC-PRM isgiven here; for further details, please see Yu and Gupta(2004).

The planner places a set of random samples in the config-uration space (C-space) of the manipulator, and determinestheir collision status as one of free/in-collision/unknownwithin the workspace octree, W . The status of a sample con-figuration is: (i) free if the robot body does not intersectany obstacle or unknown nodes of the workspace octree;(ii) in-collision if the robot body intersects an obstaclenode of the workspace octree; and (iii) unknown if therobot body intersects any unknown node but no obstaclenode of the workspace octree. The free samples are addedas nodes to the roadmap, the status unknown samples areadded to a list Lu, and the in-collision samples are dis-carded. Whenever a new scan is taken, the workspace octreemodel is updated, and the C-space roadmap is also updatedby checking the collision status of all the samples in listLu, the obstacle ones are removed from the list, and thefree ones are made nodes in the roadmap and checkedfor connectivity with their neighboring nodes (using a dis-cretized straight line local planner in C-space), and in thisway the roadmap is incrementally expanded. A key com-putation is that of a collision check for the robot bodyat a given configuration with the workspace octree, whichis a fairly standard geometric computation (Yu and Gupta1998).

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 9: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

Torabi and Gupta 89

Fig. 4. A schematic illustration of NBV for exploration. In (a),two modeling view configurations qvm1 and qvm2 are shown inworkspace W , where the white region denotes known free space,the gray region shows unknown space, and black regions denoteknown obstacles. Both view configurations overlap some unknownregion in W and hence they are unreachable. The robot needs tofirst scan this region to determine if it is free or obstacle so it candecide on the reachability of these two configurations. It needs todetermine which exploration view configuration – two are shownhere, qve1 and qve2 – is better, i.e. gives more information. In thiscase, clearly qve2 gives more information. Note that explorationview configurations lie in free space. The corresponding situationin C-space is shown in (b). The roadmap (free nodes and edges) isshown in white, unknown configurations are gray (including qvm1and qvm2) and the obstacle ones are black. The collision-free pathin the roadmap from current robot configuration qcur to the chosenexploration configuration qve2 is shown in dashed white. The robotmoves along this path and takes the exploration scan at qve2. Therobot may need to plan and execute several such exploration scansto make a modeling view configuration reachable.

4.2. Exploration NBV planner: configurationspace-based exploration with occupancygrids

Since the workspace is initially unknown, the collision sta-tus of the best modeling view configuration(s) may notalways be known, i.e. the view configuration may lie (partlyor wholly) in the currently unknown workspace. Clearlythe workspace should be explored (i.e. sensed) such thatthe status of the unknown workspace region occupied bythese configurations is made known. This leads to another

(embedded) and a different type of NBV problem, i.e. NBVfor exploration, and has been addressed mostly in the con-text of mobile robots, with criteria such as the FrontierMethod (Yamauchi 1997; Freda and Oriolo 2005). A fewworks, including our own, have addressed it in the contextof sensor-based planning for articulated arms.

Unlike these previous works, which used workspace-based criteria such as Maximize Physical space volume(MPV) (Kruse et al. 1996; Renton et al. 1999), our ownwork proposed a configuration space-based criterion, theMaximal Entropy Reduction (MER) criterion. It applies tothe general class of robot-sensor systems for sensor-basedplanning and exploration (Yu and Gupta 2004; Wangand Gupta 2006), and formally poses the problem ofview planning for exploration as that of maximizing theinformation (hence minimizing the entropy) gained aboutthe status of C-space. The information is the collisionstatus of the unknown configurations. Using probabilisticmodels of obstacle distribution, the concept of C-spaceentropy is then formally defined and the next best viewfor exploration is the one that yields Maximal expectedC-space Entropy Reduction (MER).

Given a viewpoint, in Torabi et al. (2007) we derived anexpression for expected C-space entropy reduction with theoccupancy grid representation of workspace. It assumes acertain a priori probability assigned to unknown grid-cellsin the workspace and incorporates occlusion constraints.We omit the details of this computation; it suffices to sayhere that the key underlying geometric computation is thatof determining the expected volume of the visible unknownregion (or equivalently, the number of unknown nodes inthe workspace octree) occupied by the robot at a given con-figuration, and is easily computed. In general, the unknownrobot configurations are selected from a random set of con-figurations in C-space, since the goal is to explore thewhole C-space. However, here in the modeling task, theunknown configurations are the set of unknown modelingview configurations output by the modeling NBV planner.

An efficient way to to determine the best view configura-tion for exploration is to view each node in the roadmap asa potential view configuration. We can easily determine theexpected information gain (or entropy reduction) for the setof modeling view configurations for each node, and choosethe one that yields the maximum information gain. Thus,the exploration view planner gives priority to those areasthat increase information about the modeling view config-urations, taking into account the robot’s physical size andshape, thereby facilitating reachability for further views.

More specifically, the list of unknown modeling viewconfigurations, Qum, is passed to the exploration viewplanner. The exploration view planner selects the node(among all nodes in the reachable connected componentof the roadmap) that yields the maximal informationabout Qum as the view configuration qve (see algorithm3), and passes it on to the SBIC-PRM, which determinesa collision-free path to it. The arm then moves along thispath to reach qve, and scans the workspace. Having taken

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 10: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

90 The International Journal of Robotics Research 31(1)

the scan, the workspace octree and the C-space roadmapare both updated.

Algorithm 3: Exploration NBV planner

for every configuration on the roadmap q doIq( Qum) = 0;for every modeling configuration qvm ∈ Qum do

determine Information Gain Iq( qvm);Iq( Qum) = Iq( Qum) +Iq( qvm);

endendqve = arg max

q∈roadmap{Iq( Qu) ]};

4.3. Overall algorithm for 3D modeling by a6DOF manipulator

The control flow of our overall algorithm for autonomousobject modeling is shown in Figure 5. The robot starts inthe initial home configuration with an assumed small initialfree area around it, with the workspace octree correspond-ingly initialized. It assumes that the object to be scanned islocated within a given bounding box, and that the very firstscan in the initial configuration will see at least some partof the object. The new scan (range image) is used to updatethe octree model of the workspace, and the the object pointcloud model is initialized with the points (extracted fromthe range image) that lie inside the bounding box.

The modeling NBV is then invoked to determine a sortedlist (ranked by the number of visible target patches) ofviewpoints for modeling and determines the best viewpoint,vm, which is passed to the path planner SBIC-PRM only ifthe number of target points visible from it is greater than apercentage of the total number of target points (15% in ourexperiments). SBIC-PRM invokes inverse kinematics todetermine a finite set of manipulator configurations (mod-eling view configurations) Qvm = {qi

vm, i = 1 . . . n} (n = 8for our robot) corresponding to vm.4 Each qi

vm is checkedfor collision with the workspace octree (with unknown andobstacle nodes in the octree). If its status is unknown, it issaved in a list Qum, and the planner goes to the next one.The first one found collision-free is deemed the goal con-figuration qvm. SBIC-PRM then searches the roadmap for acollision-free path to qvm, and if it finds one, qvm is added toroadmap, and the robot uses this path to move to the desiredview configuration and proceeds with a new scan there. Inthis case, the octree model of the workspace is updated, andthe new points within the bounding box in the new rangeimage are registered with the current object point cloudmodel using ICP (Besl and McKay 1992; Rusinkiewicz andLevoy 2001) and then integrated into the object model.5

The list of target points is also updated by deleting the onesthat no longer belong to the unknown area. If none of theqi

vms are collision-free, SBIC-PRM gets the the next viewpose in the sorted list from the modeling NBV planner, and

Fig. 5. Overall algorithm for autonomous 3D modeling by a 6-DOF eye-in-hand system

the process repeats, until there is no good modeling view-point in the list. At this point, the view planning switchesfrom modeling phase to exploration phase, and SBIC-PRMcalls the MER-based exploration view planner to explorefree-space so that it can make the status of configurationsin Qum known. The exploration view planner uses the MERcriterion to determine a reachable view configuration (aroadmap node) qve that gives maximum information aboutQum. The robot then moves to this configuration, proceedswith a scan, and the process repeats. If the exploration viewplanner is not able to find a view configuration that yieldsexpected information gain greater than a threshold, thesystem deems that no further progress can be made towardmodeling and exits with the current partial object model.If at any iteration the set of the target points is empty, theobject model is complete (see next section for proof), andthe modeling process is terminated. The overall flowchartfor our autonomous 3D modeling system is shown inFigure 5.

5. Termination criterion for model completion

A complete autonomous 3D object modeling systemrequires a self-termination criterion; i.e. the modeling algo-rithm should autonomously recognize when the goal hasbeen achieved and the object model is complete, or if nomore progress can be made.

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 11: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

Torabi and Gupta 91

5.1. Background

In previous autonomous 3D modeling works, various ter-mination criteria were used, but none are related to explicitrequirements to be met by the reconstructed object model(Scott et al. 2003). There are methods that lie betweena fixed number of predefined viewpoints, and those withsome criteria based on the number and/or status of view-points. Pito calculates the number of unknown patches visi-ble from each viewpointin the viewpoint space (Pito 1999).If this number for all viewpoints is less than a thresholdvalue the process is terminated, or in other words he ter-minates when most of the remaining target points are notvisible from the viewpoints (Pito 1999). Banta and et al.(2000) examined several termination criteria, e.g. termi-nating when the size of either the surface model or theoccluded model ceases to change by a significant amount,or when the ratio of the size of the surface model to thatof the occluded model is large. None of these methods isrelated to explicit requirements to be met by the recon-structed object model, and they mostly reason either overthe visibility of the viewpoints or the progress rate. Thesemethods can thus be somewhat ad hoc, and they are notrobust to object shape completeness. Li calculates the vol-ume encompassed by data from available views and ana-lyzes the variation in the volume due to each successiveviewpoint (Li et al. 2005; He and Li 2006). Since during theintermediate stages the scanned surface model is not com-plete and hence does not form a closed shape, it is difficultto directly calculate the volume encompassed by the surfacedata. Therefore he computes the volume of the data cloudvia computing the surface integral of the triangular meshesand uses Stokes’ theorem to derive the volume enclosed(Li et al. 2005; He and Li 2006). The main problem is,Stokes’ theorem applies only for a closed surface and, atleast the way it is applied in the paper, is questionable tous. Also, mesh computation is required for this terminationtechnique, which could be time consuming.

5.2. Proposed termination criterion for modelcompletion

As different scans are taken from a set of viewpoints, theseen parts of the object surface are modeled. Note that,in every scan, due to occlusion or limited FOV, parts ofthe object will not be visible, and this causes a boundarybetween the seen surface and the unknown space. Intu-itively, one can imagine that if there is no such boundarybetween the object surface and the unknown space, theconstructed model is complete.

Termination criterion (continuous case): If the boundaryset (points in 2D, curves in 3D) is empty, the constructedmodel is complete.

To formally prove this termination criterion, we need aformal geometric definition of an object and a formal modelfor the sensing action. The formal concept of a closed sur-face captures our intuitive idea of a geometrical object.

Along with various formal definitions, we formally showin Athat whenever the set of boundary curves is empty, the3D model is complete.

Note that in practice the laser scanner doesn’t scan con-tinuously, instead, the scan is discretized at a certain resolu-tion, and the boundary curve is saved as a set of boundarypoints. Note that these are precisely the unknown targetpoints in our NBV algorithm.

Termination criterion (discrete case): If the set of bound-ary points is empty, the object model is complete within thegiven resolution.

We show the experimental behavior of the number ofboundary (unknown target) points versus scan iterations inSection 6.

6. Experimental results

6.1. Experimental setup

The system hardware consists of a time-of-flight HokuyoURG-04LX (Hokuyo 2011) line-scan range scanner withmaximum scanning range of about 3 m and angular resolu-tion of 0.3 degrees. The range scanner is mounted on thelast joint of a 6-DOF manipulator composed of powercubemodules (Schunk 2011). The scanner is mounted so that thelast joint of the robot moves in a direction perpendicular tothe scan line. This allows us to obtain a 256 × 256 rangeimage by moving the last joint and hence treat our scanneras a full range image scanner with a conical FOV of aboutπ/4 radians view angle and 3 m range. We have considereda 512 cm3 cube as the workspace, W , and its octree model isconstructed from scanned images as in Yu and Gupta (1998)with the finest resolution of 2 cm. In our system, the view-point position space consists of 4 concentric spheres withradius 30, 40, 50, 60 cm, centered at the center of the bound-ing box. These radii are essentially selected based on themaximum reach of the arm, the size of the bounding box,and the distance between the bounding box and the baseof the manipulator so that the spheres cover the reachableregion around the object bounding box in a uniform manner.

6.2. Acquired models

We carried out a set of experiments to illustrate the perfor-mance of our automated 3D modeling system. Note that weimpose no constraints on the size and the shape of the objectto be modeled, and the only information required for oursystem is a bounding box where the object is located. Also,the workspace is unknown, and practically the experimentscould be done in any environment with static obstacles. Forthe first test, we chose a chair to be modeled, which is afairly complex object which could not be modeled by justsimply moving the scanner around it, because scans fromhard-to-reach areas such as underneath the chair are neededto model it fully. Figures 6 and 7 show how our system isable to build the object model, switching between model-ing scans and exploration scans as needed. The robot getsthe first scan at the home configuration and scans parts of

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 12: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

92 The International Journal of Robotics Research 31(1)

Fig. 6. Continued.

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 13: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

Torabi and Gupta 93

Fig. 6. The six-DOF eye-in-hand system autonomously builds a model of a chair. The snapshots show the modeling viewpoints fromwhich the successive scans were taken; the corresponding point cloud model of the chair is shown as well. While modeling a chair, theworkspace also needs to be explored for reachability, and several exploration views are taken as well. The corresponding octree modelis also built as exploration scans are taken and is shown next to the exploration scans. Please note that we are showing only obstaclevoxels for ease of visualization. Note that with our system the scanner is able to scan from under the chair, which requires full six-DOFmaneuverability.

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 14: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

94 The International Journal of Robotics Research 31(1)

Fig. 7. Continued.

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 15: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

Torabi and Gupta 95

Fig. 7. Continued.

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 16: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

96 The International Journal of Robotics Research 31(1)

Fig. 7. The six-DOF eye-in-hand system autonomously builds a model of the table, watering can, and the box. The snapshots showthe modeling viewpoints from which the successive scans were taken; the corresponding point cloud model is shown as well. Whilemodeling, the workspace also needs to be explored for reachability, and several exploration views are taken as well. The correspondingoctree model is also built as exploration scans are taken and is shown next to the exploration scans. Please note that we are showing onlyobstacle voxels for ease of visualization. The powerful collision-free motion planning is starkly illustrated, particularly in this frame,where the robot gets under the table to take a scan.

Fig. 8. (a) Scanning a small mug. (b) The number of target points versus scan iterations. As various scans are taken, the model becomesmore complete and the number of target points approaches zero (or a small number in practice).

the back and the seat. The object point cloud correspond-ing to this scan is shown in Figure 6b. The system thenplans for next best view for modeling; however, since mostof the workspace is currently unknown, these configura-tions are not reachable, and the system therefore switchesto exploration phase and, as is shown in Figure 6c, therobot moves to an exploring view configuration. One canalso tell that the scanner is quite far from the chair and thatthe robot is scanning the workspace to gain some informa-tion about the workspace and subsequently about the statusof desired modeling configurations. Figure 6d shows theupdated octree model of the workspace after the scan istaken and integrated in the workspace octree (only obstaclesare shown to avoid clutter). Some modeling view configu-rations become reachable, and the robot moves to the bestmodeling configuration to capture the front part, and tocomplete the seat area (Figures 6e, 6f). This is particularlynotable for scan 7 in which the robot scans the area nearthe lower parts of the legs of the chair. Having exploredthis region, the planner then is able to reach a view con-figuration (scan 8) that exactly lies in this region! Furtherscans are taken to build the model of the chair, Figure 6showing the snapshots of the scan, along with the evolv-ing point cloud model of the chair, as more and more scans

are taken. Because of the overlap constraint for registra-tion, each modeling scan should cover a portion of thealready seen area of the object. Thus one can see that the 3Dmodel is constructed gradually. Also, since the environmentis unknown, the manipulator can’t maneuver aggressivelyaround the object for taking scans and, as is shown in Figure6, it must scan to explore the unknown workspace (explo-ration phase) so that it can then scan the object (modelingphase). The octree model of the workspace is also graduallycompleted in the exploration phase.

We also tested our system on a more complicated object,a table with a watering can on top and a box underneath.Figure 7 shows how the robot builds the model through sev-eral iterations. As one may have noticed, in this experimenta rather different scenario happens, i.e. after the first scan,which always is a modeling scan, our system continues withseveral exploration scans, and when enough information isgained over the workspace area the system switches backto modeling. As is illustrated in Figure 7, the robot reachesfor different modeling viewpoints to capture the wateringcan, all around table, and even the box underneath thetable, as far as its length can reach. The attached video ofthis experiment illustrates how our autonomous modelingsystem plans a spot-on modeling configuration, performs

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 17: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

Torabi and Gupta 97

Table 1. Run times

Average processing time per iteration (seconds)

Modeling Scanning 25Registration/integration 62

Modeling view planning NBV algorithm 4Update targets 2

Exploration view planning NBV algorithm 57Path planning Update octree model 50

Update Roadmap 43Path searching 2

Complete iteration 255

collision-free motion planning, explores the workspace, andseamlessly integrates view planning for modeling, explo-ration and collision-free motion planning to build a com-plete model of the object of interest. Again, particularlynote near the end of the video where the robot gets wellunder the table to take a scan. This modeling scan No.14 is also shown in Figure 7q. The rendered video of thisexperiment is available in Extension 1.

In order to evaluate our termination criterion, we trackedthe number of target points versus scan iterations in a sim-ple experiment, as shown in Figure 8. We used a smallobject to ensure that the arm is able to reach around it.In the first scan, only a small part of the object is visible,hence the number of target points is relatively small. Forthe next few scans the number of target points increases.This is because, although some target points become visibleand are removed, many more new ones are added. There-fore the plot has several peaks and valleys, but as the modelgets more complete the number of target points convergesto zero (a small number in practice).

Our overall planner was implemented on a 64-bit DellStudio XPS 8100 (processor: i5 760 @ 2.8 GHz, 8 GBRAM). The overall run-time is about 4 minutes, and com-prises run-times of a set of sub-modules: Modeling (Scan-ning/registration), View planning for Modeling, View plan-ning for Exploration and Path planning (including updatingoctree and Roadmap). As one can see in Table 1, our NBV(modeling phase) planner takes only 2% of the total timein each iteration, and path planning takes a significant part(more than 50%) of the total run time.

Note that the efficiency of updating octree and Roadmap,and registration, could be improved. For example, Roadmapupdate times could be significantly improved via more effi-cient techniques for collision checking; similarly, registra-tion could be significantly faster with fewer sample points,fewer iterations, and more overlap. The octree model couldbe double resolution, a finer one for modeling the object,and a coarser one for modeling the environment, improvingboth memory and runtime efficiency.

7. Conclusion and future works

We have implemented a fully autonomous 3D object model-ing system with a six-DOF eye-in-hand system. The systemis able to build a complete 3D model of an unknown object

Table 2. Various parameters in our implementation

Average number of target points in each iteration 2000Average no. of visible target points by modeling NBV 382Kdtree radius for surface patch estimation 4 cmNumber of nodes in the Roadmap (SBIC-PRM) planner 6000Resolution of the Workspace octree 2 cm

in situ while avoiding collisions in an unknown workspace.Our overall planner integrates three main algorithms in thesystem: a novel next best view (NBV) planner for modelingthe object, an NBV planner for exploration, i.e. exploringthe environment, so the arm can move to the modeling viewpose, and finally a sensor-based path planner, that is ableto find a collision-free path to the view configuration deter-mined by either of the the two view planners. Our modelingNBV algorithm efficiently searches the five-dimensionalview space (4×106 viewpoints) to determine the best mod-eling viewpoint, while considering key constraints such asFOV, overlap, and occlusion. This allows the system to con-struct an accurate model with a small number of scans.If the determined viewpoint is reachable, the sensor-basedpath planner (SBIC-PRM) determines a collision-free pathto move the manipulator to the desired view configuration.Since the workspace is initially unknown in some phases,the exploration view planner is used to increase informationabout the reachability and also the status of the modelingview configurations, since the view configuration may lie inunknown workspace. We also introduced and proved a ter-mination criterion which determines if the object model iscomplete. We have shown, via real experiments, the abil-ity of our system to autonomously scan any object in situ,while avoiding collisions with unknown obstacles.

In this work, it is assumed that the bounding boxencloses completely and only the object(s) to be modeled.If this assumption is not satisfied, either the bounding boxdoes not enclose the entire object, or additional objectsare located inside the bounding box. In the former case,only the portion of the object inside the bounding boxis modeled, and in the latter case, all objects enclosedby the bounding box will be modeled, and further imageprocessing (e.g. segmentation) would be needed to derivethe model of the desired object.

A fixed-base manipulator, although it affords greatmaneuverability to the sensor, nevertheless has limitedreachability, e.g. it is unable to reach across a relativelylarge object such as the chair in our test example. Hence, weare currently extending our autonomous 3D modeling sys-tem and the corresponding planning framework to a mobilemanipulator with nine degrees of freedom, three for themobile base and six for the arm mounted on it.

Notes

1. While some works have considered a six-DOF positioningmechanism (Chen and Li 2005), these do not search the view-point space globally, and furthermore do not consider the pathplanning aspect at all.

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 18: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

98 The International Journal of Robotics Research 31(1)

2. The breakdown angle depends on the surface properties and thelaser scanner. For more details of our laser scanner (HokuyoURG-04LX), refer to Okubo et al. (2009).

3. A ranging ray emanates from the scanner, and an observationray emanates from the surface.

4. If this were a redundant manipulator, one could directly plana path for the desired pose without explicit inverse kinematics(Yao and Gupta 2005).

5. The integration could be done as volume intersection (Curlessand Levoy 1996; Rocchini et al. 2004), or mesh integration(Turk and Levoy 1994; Zhou et al. 2006).

6. Jordan curve: A plane curve which is topologically equivalentto (a homeomorphic image of) the unit circle, i.e. it is simpleand closed.

Funding

This research received no specific grant from any funding agencyin the public, commercial or not-for-profit sectors.

Conflict of interest

None declared.

References

Atramentov A and LaValle SM (2002) Efficient nearest neigh-bor searching for motion planning. In: Proceedings of theInternational Conference on Robotics and Automation 1: pp.632–637.

Ayres WL (1929) On continuous curves having certain properties.Proceedings of the National Academy of Sciences of the UnitedStates of America 15: pp. 91–94.

Banta JE et al. (2000) A next-best-view system for autonomous3-D object reconstruction. IEEE Transactions on Systems, Manand Cybernetics, Part A 30: 589–598.

Besl PJ and McKay ND (1992) A method for registration of 3-Dshapes. IEEE Transactions on Pattern Analysis and MachineIntelligence 14: 239–256.

Blaer PS and Allen PK (2007) Data acquisition and viewplanning for 3-D modeling tasks. In: Proceedings of theIEEE Conference on Intelligent Robots and Systems pp.417–422.

Chen S, Li Y, Zhang J and Wang W (2008) Active Sensor Planningfor Multiview Vision Tasks. New York: Springer.

Chen SY and Li YF (2005) Vision sensor planning for 3-Dmodel acquisition. IEEE Transactions on Systems, Man, andCybernetics 35: 894–904.

Connolly CI (1985) The determination of next best views. In:Proceedings of the International Conference on Robotics andAutomation 2: pp. 432–435.

Curless B and Levoy M (1996) A volumetric method for build-ing complex models from range images. In: SIGGRAPH-96 pp.303–312.

Deutsche Servicerobotik Initiative (2011) http://www.service-robotik-initiative.de/uebersicht/uebersicht/ (last accessed 16June, 2011).

Dey TK et al. (2005) Normal estimation for point clouds: acomparison study for a Voronoi based method. In: Euro-graphics/IEEE VGTC Symposium Proceedings on Point-BasedGraphics pp. 39–46.

Eidenberger R, Grundmann T and Zoellner R (2009) Probabilisticaction planning for active scene modeling in continuous high-dimensional domains. In: Proceedings of the InternationalConference on Robotics and Automation pp. 2412–2417.

Foissotte T, Stasse O, Wieber PB, Escande A and Kheddar A(2010) Autonomous 3D object modeling by a humanoid usingan optimization-driven next-best-view formulation. Interna-tional Journal of Humanoid Robotics 7: 407–428.

Freda L and Oriolo G (2005) Frontier-based probabilistic strate-gies for sensor-based exploration. In: Proceedings of theInternational Conference on Robotics and Automation pp.3881–3887.

He BW and Li YF (2006) A next-best-view method with self-termination in active modeling of 3D objects. In: Proceedingsof the IEEE Conference on Intelligent Robots and Systemspp. 5345–5350.

Hokuyo Automatic Co. (2011) http://www.hokuyo-aut.jp /02sen-sor/ (last accessed 16 June, 2011).

Hoppe H et al. (1992) Surface reconstruction from unorganizedpoints. Computer Graphics 26: 71–78.

Kruse E, Gutsche R and Wahl F (1996) Efficient, iterative, sensorbased 3-D map building using rating functions in configura-tion space. In: Proceedings of the International Conference onRobotics and Automation pp. 1067–1072.

Kuehnle J, Verl A, Xue Z, Ruehl S, Zoellner J, Dillmann Ret al. (2009) 6D object localization and obstacle detection forcollision-free manipulation with a mobile service robot. In:Proceedings of the International Conference on Robotics andAutomation pp. 1–6.

Li Y, He B and Bao P (2005) Automatic view planning withself-termination in 3D object reconstructions. Sensors andActuators A: Physical 122: 335–344.

Massios NA and Fisher RB (1998) A best next view selectionalgorithm incorporating a quality criterion. In: Proceedings ofthe British Machine Vision Conference BMVC98, Southamptonpp. 780–789.

Maver J and Bajcsy R (1993) Occlusions as a guide for plan-ning the next view. IEEE Transactions on Pattern Analysis andMachine Intelligence 15: 417–433.

Okubo Y, Ye C and Borenstein J (2009) Characterizationof the Hokuyo URG-04LX laser rangefinder for mobilerobot obstacle negotiation. Unmanned Systems Technology XI7332.

Papadopoulos-Orfanos D and Schmitt F (1997) Automatic 3-D digitization using a laser rangefinder with a small fieldof view. In: Proceedings of the International Conferenceon Recent Advances in 3-D Digital Imaging and Modelingpp.60–67.

Pito R (1999) A solution to the next best view problem forautomated surface acquisition. IEEE Transactions on PatternAnalysis and Machine Intelligence 21: 1016–1030.

Polhemus (2011) http://polhemus.com/ (last accessed 16 June,2011).

Reed MK, Allen PK and Stamos I (1997) Automated model acqui-sition from range images with view planning. In: In Proceed-ings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition pp. 72–77.

Renton P, Greenspan M, Elmaraghy H and Zghal H (1999)Plan-N-Scan: A robotic system for collision-free autonomousexploration and workspace mapping. Journal of Intelligent andRobotic Systems 24: 207–234.

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 19: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

Torabi and Gupta 99

Rocchini C, Cignoni P, Ganovelli F, Montani C, Pingi P andScopigno R (2004) The marching intersections algorithm formerging range images. The Visual Computer 20: 149–164.

Rusinkiewicz S and Levoy M (2001) Efficient variants of theICP algorithm. Third International Conference on 3-D DigitalImaging and Modeling pp. 145–152.

Schunk GmbH (2011) http://www.schunk-modular-robotics.com/ (last accessed 16 June, 2011).

Scott WR, Roth G and Rivest JF (2003) View planning for auto-mated three-dimensional object reconstruction and inspection.ACM Computing Surveys 35: 64–96.

Sequeira V, Goncalves J and Ribeiro MI (1996) Active viewselection for efficient 3D scene reconstruction. Proceedings ofthe International Conference on Pattern Recognition ’96 pp.815–819.

Torabi L, Kazemi M and Gupta K (2007) Configuration spacebased efficient view planning and exploration with occupancygrids. In: Proceedings of the IEEE Conference on IntelligentRobots and Systems pp. 2827–2832.

Turk G and Levoy M (1994) Zippered polygon meshes fromrange images. In: SIGGRAPH : Proceedings of the 21st AnnualConference on Computer Graphics and Interactive Techniquespp. 311–318.

Wang P and Gupta K (2006) View planning for exploration viamaximal C-space entropy reduction for robot mounted rangesensors. Advanced Robotics 21: 771–792.

Whaite P and Ferrie FP (1997) Autonomous exploration: Drivenby uncertainty. IEEE Transactions on Pattern Analysis andMachine Intelligence 19: 193–205.

Yamauchi B (1997) A frontier-based approach for autonomousexploration. In: Proceedings of the IEEE InternationalSymposium on Computational Intelligence in Robotics andAutomation pp. 146–151.

Yao Z and Gupta K (2005) Path planning with general end-effector constraints: using task space to guide configurationspace search. In: Proceedings of the IEEE Conference onIntelligent Robots and Systems pp. 1875–1880.

Yu Y and Gupta K (1998) An efficient online algorithm for directoctree construction from range images. In: Proceedings ofthe International Conference on Robotics and Automation 4:3079–3084.

Yu Y and Gupta K (2004) C-space entropy: A measure for viewplanning and exploration for general robot-sensor systemsin unknown environments. International Journal of RoboticsResearch 23: 1197–1223.

Zhou H, Liu YH and Li LZ (2006) Incremental mesh-based inte-gration of registered range images: Robust to registration errorand scanning noise. In: Proceedings of the Asian Conferenceon Computer Vision ’06 pp. 958–968.

Appendix

A. Proof for termination criterion

We now present a formal proof for the termination criterion(continuous case) stated in Section 5. For simplicity andease of understanding, we will first show it for the 2D case,and then for the 3D case. In both scenarios, we first give for-mal definitions for objects, and our model of sensing actionis continuous and is detailed further in each case.

Fig. A1. In each scan, a set of open curves from the border of theobject is scanned. Note that we assume that the boundary pointsare scanned.

A.1. Some definitions

An n-manifold (without boundary) will mean a topologicalmanifold such that every point has a neighborhood homeo-morphic to Rn. Let M be a manifold with boundary. Thenthe interior of M , denoted Int( M), is the set of points in Mwhich have neighborhoods homeomorphic to an open sub-set of Rn, and the boundary of M , denoted ∂M , is a manifoldof Rn−1.

Intuitively we know that in the plane a closed curve isa curve with no boundary points and which completelyencloses an area, and a simple curve is a curve that does notcross itself. In the following we present a more mathemati-cal definition. Let I = [a, b] ⊂ R be a closed interval of areal line. A curve (also called a 1-manifold) is a continuousmapping γ : I → X taking values in topological space X .We say that γ is a closed curve whenever γ ( a) = γ ( b), oth-erwise γ is an open curve and a, b are the endpoints. Also,γ is a simple curve if the mapping γ is injective.

Let A ⊂ Rn be any subset, let p ∈ A be a point, and let r >

0 be a number. The open ball in A of radius r centered at p isthe set Or( p, A) defined by Or( p, A) = {x ∈ A| ‖x−p‖ < r}.

A.2. Termination criterion for modeling a 2Dobject

We assume that our 2D object, M , is a simple and closedcurve (Jordan Curve).6 Now consider an ideal laser scannermoving around the object, taking scans, and in the kth scanthe border of the object, M , is sensed as shown in Figure A1.It consists of a set of open curves Ck such that Ck ⊂ M . SeeFigure A1 for an illustration. Note that although point ∂C22

is not scanned since it is occluded, every open neighbor-hood of ∂C22 contains points which are scanned. Hence, inour sensing model we assume that ∂C22 is also sensed. Inspecial cases, an open curve may degenerate into a singlepoint, in which case it is denoted by pk . We use the wordsegment to denote a single scanned (maximal) open curve

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from

Page 20: The International Journal of Robotics Research ...allen/projects/torabi.pdfsix-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system

100 The International Journal of Robotics Research 31(1)

(or point). Multiple segments in the same scan are caused bya limited field of view, occlusion, or when the ranging ray istangent to the object M . For each scanned segment, Ci, therewill be two boundary points, ( ∂Ci1, ∂Ci2). Let C = ∪Ck ,and let Bp be the set of all boundary points of C (isolatedsensed points p are also included in this list). In each itera-tion, after the scan is performed, C is updated with the set ofnew scanned segments, and correspondingly the set of newboundary points and possible single points are added to Bp.A merge/glue process is used so that Bp contains only thosepoints which have not been sensed in any of the scans, i.e.those points that are sensed in other scans are eliminatedfrom Bp. Formally, p /∈ Bp if ∃r > 0, Or( p,R) ⊂ C.

Theorem 1. The set of endpoints Bp is empty at any iter-ation k > 1 if and only if the object model is complete, or,formally speaking, Bp = ∅ if and only if C = M.

Proof.Part 1: If Bp = ∅ then C = M :

The proof will be in two steps: a) if Bp = ∅ then C isa closed and simple curve; b) if C is a closed and simplecurve, then C = M .

a) Assume Bp = ∅. If C is not closed, then C will atleast have one boundary point p such that p ∈ ∂C, therefore�r > 0, Or( p,R) ⊂ C. But since Cis are continuous, thereexists a Cj ⊂ C such that p is an boundary point of Cj, i.e.p ∈ ∂Cj. Since �r > 0, Or( p,R) ⊂ C and p ∈ ∂Cj, by thedefinition of Bp, p ∈ Bp. But, Bp = ∅, thus it leads to acontradiction, hence C is closed.

b) C ⊂ M , and M is simple, thus C is simple. As we knowM is a simple closed curve, i.e. only includes one closedcurve, and C is closed and C ⊂ M , hence M − C is vacuousas desired, and M = C (Ayres 1929).

Part 2: If C = M then Bp = ∅:If Bp = ∅ then ∃p such that p ∈ Bp, and p is a boundary

point for a Ci and �r > 0, Or( p,R) ⊂ C. Since p ∈ Ci thenp ∈ C, and this means C is not closed. Now because M isa closed curve and M = C, C is a closed curve and doesn’thave any boundary point. So p is not a boundary point in C,and this leads to a contradiction, hence Bp = ∅.

Thus, in the modeling process whenever the set of end-points is empty, the model is complete. Note this proofcould be extended to a set of non-intersecting closed curves.In such scenarios, if at any iteration the set of boundarypoints is empty, every object/curve in the scene which hasbeen scanned at least once will have a completed model.

A.3. Termination criterion for modeling a 3Dobject

A surface is a 2-manifold, a closed surface is a 2-manifold(without boundary), an open surface is a 2-manifold withboundary (open manifold), and the boundaries are sets ofclosed curves (1-manifold).

We assume that the 3D object, M , is a closed and simple2-manifold. As the laser scanner moves around the object,and takes scans of the surface of the object, M , it generates aset of open 2-manifolds. More precisely, at the kth iteration,a set of open 2-manifolds Sk , such that Sk ⊂ M , and/or a setof disjoint open curves C (in special degenerate cases dueto alignment with respect to view direction), are generated.Each Si ∈ Sk is bounded by a close curve, ∂Si. Also at thekth iteration the union of the set of scanned areas is denotedby S, i.e. S = ⋃

j=1...k S j.In each iteration, after the scan is performed, S is updated

with the set of new scanned surfaces, and correspondinglythe set of new boundary curves and sensed open curves(if any) are added to Bc. A merge/glue process is used sothat Bc contains only those curves which have not beensensed in any of the scans, i.e. portions of curves that aresensed in other scans are eliminated from Bc. Formally,∂Si ⊂ Int( S) ⇒ ∂Si /∈ Bc.

Bc =⎛⎝ ⋃

j=1...k

∂Sj

⎞⎠ ⋃

⎛⎝ ⋃

j=1...k

Cj

⎞⎠ ,

∂Si ⊂ Int( S) ⇒ ∂Si /∈ Bc.

Theorem 2. The set of boundary curves Bc is empty atany iteration k > 1 if and only if the 3D object model iscomplete, or, formally speaking, Bc = ∅ if and only S = M.

Proof.Part 1: If Bc = ∅ then S = M .

The proof will be in two steps: a) if Bc = ∅ then S is aclosed surface; b) if S is a closed surface, then S = M .

a) If S is not a closed surface, then ∃ at least one closedcurve C ∈ ∂S. Since the Sis are continuous, except at theboundary curves, C ⊂(

⋃∂Si). By definition, for every

point p ∈ C there is no open neighborhood in S homomor-phic to R2, i.e. �r > 0, Or( p,R2) ⊂ S, ∀p ∈ C. However,by definition of Bc, C must belong to Bc. Since Bc = ∅, itis a contradiction. Hence S must be closed.

b) S is closed and we know that S ⊂ M , a simple closedsurface, S = M ).

Part 2: If S = M then Bc = ∅.If Bc = ∅ then ∃ a curve C ∈ Bc. By definition of Bc,

C ⊂ ∂S. But S = M and thus S is a closed manifold and∂S = ∅. This leads to a contradiction, hence Bc = ∅.

B. Index to Multimedia Extensions

The multimedia extension page is found at http://www.ijrr.org.

Table of Multimedia Extensions

Extension Type Description

1 Video Integrated object modeling and motionplanning

at Columbia University on January 17, 2012ijr.sagepub.comDownloaded from