[IEEE 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) - Barcelona,...

8
SPARTAN System: Towards a Low-Cost and High-Performance Vision Architecture for Space Exploratory Rovers Ioannis Kostavelis, Evangelos Boukas, Lazaros Nalpantidis, Antonios Gasteratos Laboratory of Robotics and Automation Department of Production and Management Engineering, Democritus University of Thrace, Xanthi, Greece {gkostave,evanbouk,lanalpa,agaster}@pme.duth.gr Marcos Aviles Rodrigalvarez Advanced Space Systems and Technologies GMV Innovating Solutions, Tres Cantos, Madrid [email protected] http://robotics.pme.duth.gr Abstract The “SPAring Robotics Technologies for Autonomous Navigation” (SPARTAN) activity of the European Space Agency (ESA) aims to develop an efficient, low-cost and accurate vision system for the future Martian exploratory rovers. The interest on vision systems for space robots has been steadily growing during the last years. The SPARTAN system considers an optimal implementation of computer vision algorithms for space rover navigation and is desig- nated for application to a space exploratory robotic rover, such as the ExoMars. The goal of the present work is the development of an appropriate architecture for the vision system. Thus, the arrangement and characteristics of the rover’s vision sensors will be defined and the required com- puter vision modules will be presented. The analysis will be performed taking into consideration the constraints defined by ESA about the SPARTAN system. 1. Introduction The exploration of Mars is one of the main goals for both NASA and ESA, as confirmed by past and recent activi- ties. In the last decades many exploration missions, both on-orbit and surface, were launched to Mars with substan- tially findings. The on-orbit missions (NASA’s Mars Global Surveyor, Mars Odyssey, Mars Reconnaissance Orbiter and ESA’s Mars Express) yielded crucial information for Mars’ environment, whereas the surface missions (NASA’s Mars Exploration Rovers, Phoenix) provided evidence about the planet’s evolution. The aforementioned expeditions initi- ated international discussion about joint Mars exploration efforts, such as the ExoMars rover shown in Fig. 1. Vision-based autonomous exploratory rovers comprise a significant part of such efforts. In this paper we exhibit the inaugural considerations towards reducing as much as pos- sible the overall budgets required for the rover navigation while improving its performance (i.e. accuracy of terrain reconstruction, localization, etc.). The discussed system is expected to operate on a robotic platform capable of long- range traversing and as a consequence it should comply with the corresponding requirements. The first step towards this direction is to identify suitable computer vision algo- rithms for robotic navigation. These algorithms should pos- sess the potential to be implemented into parallel processing chains while having a high performance (i.e. accuracy and frame rate) [15]. Secondly, they should allow realization in space-related devices. Concerning the algorithms’ selec- tion, the most important criteria that should be analyzed are the complexity reduction potential in conjunction with the performance efficiency. The third consideration has to do with the implementation of the algorithms onboard the host Figure 1. The ExoMars rover (ESA). 1 2011 IEEE International Conference on Computer Vision Workshops 978-1-4673-0063-6/11/$26.00 c 2011 IEEE 1994

Transcript of [IEEE 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) - Barcelona,...

Page 1: [IEEE 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) - Barcelona, Spain (2011.11.6-2011.11.13)] 2011 IEEE International Conference on Computer Vision

SPARTAN System: Towards a Low-Cost and High-Performance VisionArchitecture for Space Exploratory Rovers

Ioannis Kostavelis, Evangelos Boukas, Lazaros Nalpantidis, Antonios GasteratosLaboratory of Robotics and Automation

Department of Production and Management Engineering,Democritus University of Thrace, Xanthi, Greece

{gkostave,evanbouk,lanalpa,agaster}@pme.duth.gr

Marcos Aviles RodrigalvarezAdvanced Space Systems and Technologies

GMV Innovating Solutions, Tres Cantos, [email protected]

http://robotics.pme.duth.gr

Abstract

The “SPAring Robotics Technologies for AutonomousNavigation” (SPARTAN) activity of the European SpaceAgency (ESA) aims to develop an efficient, low-cost andaccurate vision system for the future Martian exploratoryrovers. The interest on vision systems for space robots hasbeen steadily growing during the last years. The SPARTANsystem considers an optimal implementation of computervision algorithms for space rover navigation and is desig-nated for application to a space exploratory robotic rover,such as the ExoMars. The goal of the present work is thedevelopment of an appropriate architecture for the visionsystem. Thus, the arrangement and characteristics of therover’s vision sensors will be defined and the required com-puter vision modules will be presented. The analysis will beperformed taking into consideration the constraints definedby ESA about the SPARTAN system.

1. IntroductionThe exploration of Mars is one of the main goals for both

NASA and ESA, as confirmed by past and recent activi-ties. In the last decades many exploration missions, bothon-orbit and surface, were launched to Mars with substan-tially findings. The on-orbit missions (NASA’s Mars GlobalSurveyor, Mars Odyssey, Mars Reconnaissance Orbiter andESA’s Mars Express) yielded crucial information for Mars’environment, whereas the surface missions (NASA’s MarsExploration Rovers, Phoenix) provided evidence about theplanet’s evolution. The aforementioned expeditions initi-

ated international discussion about joint Mars explorationefforts, such as the ExoMars rover shown in Fig. 1.

Vision-based autonomous exploratory rovers comprise asignificant part of such efforts. In this paper we exhibit theinaugural considerations towards reducing as much as pos-sible the overall budgets required for the rover navigationwhile improving its performance (i.e. accuracy of terrainreconstruction, localization, etc.). The discussed system isexpected to operate on a robotic platform capable of long-range traversing and as a consequence it should complywith the corresponding requirements. The first step towardsthis direction is to identify suitable computer vision algo-rithms for robotic navigation. These algorithms should pos-sess the potential to be implemented into parallel processingchains while having a high performance (i.e. accuracy andframe rate) [15]. Secondly, they should allow realizationin space-related devices. Concerning the algorithms’ selec-tion, the most important criteria that should be analyzed arethe complexity reduction potential in conjunction with theperformance efficiency. The third consideration has to dowith the implementation of the algorithms onboard the host

Figure 1. The ExoMars rover (ESA).

1

2011 IEEE International Conference on Computer Vision Workshops978-1-4673-0063-6/11/$26.00 c©2011 IEEE

1994

Page 2: [IEEE 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) - Barcelona, Spain (2011.11.6-2011.11.13)] 2011 IEEE International Conference on Computer Vision

robotic rover. The platform considered for the initial as-sessment of the SPARTAN system’s hardware and softwaremodules is the ERA-Mobi. Overall, the system’s architec-tural design should ensure major reduction of all navigationrelated budgets with respect to the present state of the art,while improving the performance.

The rest of the paper is organized as follows: Section 2covers the system description. It includes the SPARTAN vi-sion system’s requirements as defined by ESA and presentsin depth a camera setup that meets those requirements. Sec-tion 3 discusses the system’s design. The proposed visionsystem and its computer vision algorithms are presented andanalyzed into subordinate modules, while the Unified Mod-elling Language (UML) is used for their graphical descrip-tion. Finally, Section 4, provides concluding remarks andfurther discussions.

2. System descriptionThe basic goal of SPARTAN’s processing algorithms is

to convert visual information from the rover’s cameras into3D local maps and location estimates useful for the nav-igation process. Image processing algorithms suitable forlocalization and 3D map reconstruction have been selectedand will be implemented into a parallel processing chainto achieve high performance, while maintaining efficiencyin terms of energy use, computing power and memory foot-print. The vision system of the SPARTAN project, as shownin Figure 2, consists of the following modules, performingrespective tasks:

• The imaging module acquires stereovision images.

• The 3D reconstruction module reconstructs the 3Dshape of the terrain covering a region of 120o field ofview (FoV) in front of the rover.

• Visual Odometry (VO) provides an estimation of thedisplacement of the rover.

• Visual SLAM (VSLAM) determines the current loca-tion of the rover within its surrounding environment.

• The localization module fuses previously estimatedrover position with possibly available input form otherrover sensors (mechanical odometry, Inertial Measure-ment Unit (IMU), compass, etc.).

2.1. Vision system requirements

According to the system’s requirements posed by ESAthe robotic rover and its vision system should comply withspecific requirements:

• The system should be able to produce 3D local mapshaving a range resolution of 2cm at the farthest observ-able distance, i.e. 4m.

Imaging

3D  MapReconstruction

Visual  SLAM

Visual  Odometry Localisation

Figure 2. Overview of the SPARTAN vision system modules.

• The system should be able to produce local 3D mapswith 120o FoV.

• The robotic platform should be able to carry a stereo-scopic camera placed on a mast at least 1m above theground surface.

• In total two stereo cameras should be onboard, onestereo pair with a wide baseline on the top of the mastand one stereo pair with a narrow baseline at the bot-tom.

• The wide stereoscopic cameras shall be separated by abaseline of at least 0.1m.

These constrains are graphically illustrated in Figure 3.The range resolution is the minimal change in range that

the stereo vision system can differentiate. In general, res-olution deteriorates with distance. The function that calcu-lates the range l within which the resolution is better thanor equal to a desired value r is the following:

l =

√0.5 · r · b · w

c · tan(0.5 · π180 · F )

(1)

where b is the baseline, w is the horizontal image resolu-tion, F is the cameras’ FoV expressed in degrees, and c isthe disparity accuracy expressed in pixels. Apparently, asshown by Eq. 1, by increasing the baseline and/or by de-creasing the FoV the range for a given value of resolution isgrowing. Besides, more accurate stereo results are typicallyobtained by keeping the stereo angle (the angle between theline-of-sight of the two cameras to the minimum-distanceobject) as low as possible and in all cases below 15o, due tothe smaller correspondence search range [4].

Moreover, the function that relates the focal length f ofthe cameras with the field of view F is of great interest forthe determination of a stereo system’s parameters, and canbe expressed as:

f =0.5 · s · 0.001 · wtan(0.5 · π

180 · F )(2)

where s is the physical width of the sensor’s pixels.

1995

Page 3: [IEEE 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) - Barcelona, Spain (2011.11.6-2011.11.13)] 2011 IEEE International Conference on Computer Vision

Figure 3. The complete stereo setup.

2.2. Wide stereo configuration

The wide baseline stereo camera’s configuration is cru-cial for the system’s 3D reconstruction and mapping oper-ations. The extraction of clear disparity maps, which is thefirst step towards the acquisition of reliable 3D reconstruc-tion results, requires the examination and definition of var-ious parameters. These parameters are presented in Table1. The optimal combination of all these parameters shouldbe defined, describing a wide stereo setup that meets theSPARTAN system’s requirements.

The system’s accuracy, which is tightly connected withthe computation of the disparity maps, is firstly considered.In stereo correspondence computation there are two impor-tant constrains having to do with the stereo angle and thesearch range. A stereo angle φ lower than 15o and searchrange lower than 25% of the horizontal resolution facilitatethe correspondence search and generally result in more ac-curate disparity maps. The stereo angle is calculated withrespect to the setup baseline b and the minimum object dis-tance dmin. We have considered values of b ranging from5cm to 30cm and values of dmin ranging form 0.6m to1.5m. Towards this direction, the value intervals of dminand b, and as result the corresponding possible value pairs,were restricted due to the constraint of a 15o stereo angle.

In order to select the value of FoV we have consideredthe constraint that the search range should be kept lowerthan 25% of the horizontal resolution. The relationship be-tween the search range, the FoV and the stereo angle is:d = 100 · φ/F . We have considered values of F between40o and 70o in order to isolate the values of the b and dminthat both fulfil both of the aforementioned constraints.

The next step is to determine the values of the focallength f with respect to the selected FoV F , the pixel size sand the horizontal resolution w. There are restrictions bothfor the commercially available horizontal resolution options(available nominative values: 780, 1040, 1120 and 1280pixels) and the pixel’s physical size (4.54, 5.5, 7.4 and 9µm). Eq. 2 relates f , w, s and F . Applying this equa-tion, the sets of values that comprise possible solutions forthe wide camera arrangement problem can limited. Further-

Figure 4. Overview of the observable areas in front of the rover.

more, considering the constraint of 2cm accuracy within thespecified range the set of possible solutions can be uniquelyidentified. Thus, we concluded to the following set of valesas the optimal solution: b = 0.2m, dmin = 1m, F = 50o,s = 5.5µm, w = 1120 pixels, c = 0.3 pixels, f = 6.6mm.

The wide stereo camera and the pan-tilt unit (PTU) willbe placed on the mast, as shown in Figure 3. The verticalFoV of the wide stereo camera setup is 50o and the respec-tive tilt angle 39o resulting to a view range between 0.48mand 4m, in accordance with the specifications. However,given that the horizontal FoV of the camera is considerablylower than the required coverage of 120o, multiple pictureshave to be taken by panning the wide stereo camera. Morespecifically, for a camera with a horizontal FoV of 50o, 3pictures should be taken with panning angles of −35o, 0o

and 35o, respectively. This results in an overlap of at least25% between each successive couple, which is sufficient fora correct alignment and merging of the images into a widerone covering the total 120o, as shown in Figure 4.

2.3. Narrow stereo configuration

The second stereo camera, i.e. the narrow baseline one,which is placed at the base of the mast, 30 cm above theground, covers the area just in front of the robot, shownby the yellow area in Figure 4. Considering that the pre-viously discussed wide stereo camera is not able to detectobjects very close to the rover, the need for a second one,able to cover this blind space, gets apparent. A Bumblebee2is used as the narrow stereo camera. The characteristics ofthis camera are: w = 640 pixels, b = 0.12m, f = 3.8mm,Fhor = 66o and Fver = 49.5o. These characteristics couldbe considered as initial constraints for this camera setup.Applying the aforementioned functions to these constraints,we need to cover 2.5 m in front of the rover with accuracybetter than 2cm and with dmin of 0.5m. The closest distanceto the robot within the view of the Bumblebee2 is 0.2m en-suring that there will be enough overlap between the FoVsof the wide and narrow stereo cameras. Likewise, the max-imum distance is 2.52m. Finally, the narrow stereo cameratilt angle is 31.55o, as shown in Figure 3.

1996

Page 4: [IEEE 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) - Barcelona, Spain (2011.11.6-2011.11.13)] 2011 IEEE International Conference on Computer Vision

Parameter Symbol DescriptionPixel Size s The pixel size on the sensor.Horizontal Resolution w The horizontal resolution of the sensor.Minimum Object Distance dmin The minimum distance that should be observable in the disparity image.Baseline b The horizontal distance between the two cameras.Field of View F The angular extent of the observable world that is seen at any given moment.Focal Length f The distance of the pin-hole to the center of the sensor, as defined under the pin-

hole camera model.Stereo Angle φ The angle formed by focal points at minimum object distance. It should be kept

lower than 5o.Search Range d The disparity levels that should be examined, expressed as percentage of the hor-

izontal resolutionDisparity Accuracy c The sub-pixel resolution of the acquired disparity map.Range l The distance in which the accuracy is achieved.Accuracy r The accuracy in meters that achieved with a given range resolution.

Table 1. The specified parameters for each stereo setup.

2.4. Complete camera system setup

In the complete camera system setup, the three stereopairs coming from the wide and the one stereo pair com-ing from the narrow stereo camera have to be merged. Thedistance overlap between the observable areas of the twostereo cameras is 2.04m. The areas covered by the stereopairs and their overlapping are shown in Figure 4. A widecircular sector area (120o) up to 4m in front of the robot willbe covered by the wide stereo setup. On the other hand, anarrower (66o) but closer to the robot circular sector willbe covered by the narrow stereo setup. The fusion of theoutputs of both cameras results in a “T”-shaped observableregion, which will be used as input for the various modulesof the discussed system.

3. System design

An object oriented methodology has been used for theSPARTAN architectural design and the Unified ModellingLanguage (UML) has been selected as the standard nota-tion to describe it graphically [14]. The SPARTAN systemprocesses the information collected by the rover’s stereocameras and other non-visual sensors such as the compass,IMU, etc. Visual data are used for map reconstruction andlocalization, whereas the non-visual data are just used toimprove the localization estimation on a subsequent step.The output of the discussed system is the estimated loca-tion of the rover and the computed 3D map of the exploredenvironment.

The actual SPARTAN system comprises six modules.Figure 5 depicts the system’s modules along with their in-ternal dependencies.

Figure 6. The Imaging component and its subordinate modules.

3.1. Camera driver and Imaging

The SPARTAN system is expected to operate on natu-ral terrains of Martian appearance. Such environments typi-cally involve diffuse lightning, low visual contrast, and lightconditions above 200 Lux with 20% surface albedo. As aresult, the modules controlling the cameras and the qualityof their images are crucial for the operation of the wholesystem. The Camera Driver module is responsible for con-trolling the cameras and acquiring images when requested,isolating the dependencies of the actual camera mounted onthe rover from the rest of the system. The Imaging moduleprocesses the raw images as captured from the camera andperforms local processing on them providing appropriateproducts for the subsequent processing steps. The Imagingcomponent relies on its subordinate modules, i.e. Debayer-ing, Contrast, and Rectification, as shown in Fig. 6.

Debayering is the process of digitally filtering a rawframe of video to reconstruct an image with red, green andblue values at each pixel.To produce a color image, thereshould be at least three color samples at each pixel location.One approach is to use beam-splitters along the optical pathto project the image onto three separate sensors. Using acolor filter in front of each sensor, three full-channel colorimages are obtained. This is a costly approach as it requires

1997

Page 5: [IEEE 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) - Barcelona, Spain (2011.11.6-2011.11.13)] 2011 IEEE International Conference on Computer Vision

Figure 5. SPARTAN’s components and their dependencies.

three charge-coupled device (CCD) sensors and moreoverthese sensors have to be aligned precisely (a nontrivial chal-lenge to mechanical design). A more cost-effective solutionis to put a color filter array (CFA) in front of the sensor tocapture one color component at a pixel and then interpolatethe missing two color components [8].

The contrast submodule expands the intensity values’range of its input images to a desired range. Cameras andimage sensors must usually deal not only with the contrastin a scene but also with the image sensors’ exposure to theresulting light in that scene. In a standard camera, the shut-ter and lens aperture settings juggle between exposing thesensors to too much or too little light. Often the range ofcontrasts is too much for the sensors to deal with; hencethere is a trade-off between capturing the dark areas (e.g.,shadows), which requires a longer exposure time, and thebright areas, which require shorter exposure to avoid satu-rating “whiteouts”. Contrast stretching involves modifyingthe brightness (intensity) of the pixels in the image so thatthe range of their intensity values spans to a desired range[5].

Finally, the rectification submodule produces an undis-torted and rectified image from the input image obtainedfrom the camera. Both transformations are combined in onesingle lookup table to allow faster computations [3].

3.2. 3D Reconstruction

The 3D Reconstruction module is responsible for ex-tracting the depth, or equivalently the disparity, informa-tion from the stereo cameras arrangements [10] and provide3D representations of the scene. If multiple complementarystereo pairs are available simultaneously, the depth mapsfrom each of them are merged into a single larger depthmap. The 3D Reconstruction component is composed ofthe following subordinate modules: Disparity computation,Disparity Merging, and Map Generation. Figure 7 depictsthe dataflow and the relations between these modules withintheir parent component.

Figure 7. The 3D Reconstruction component and its subordinatemodules.

3.2.1 Disparity Computation

The Disparity Computation component processes the im-ages coming from the left and right sensors of the stereocamera, captured at time t and computes the disparity map.The disparity value is the horizontal displacement of thesame feature observed in the two images of the stereopair[11]. As a result, the disparity map is equivalent to adepth map for known geometry of the stereo setup.

3.2.2 Disparity Merging

The different disparity maps captured at each position of therover are merged into one making use of the cameras’ rel-ative geometry. More specifically, the left camera of eachstereo pair (i.e. the one capturing the reference image) on-board the system’s platform has a fixed and known positionand orientation with respect to each other. As a result, thereis a re-projection matrix for each other camera that can becomputed beforehand and used for re-projecting the corre-sponding disparity maps towards the central one. Thus, themultiple disparity maps can be merged into a single largerone.

3.2.3 Map Generation

Considering as a starting point an accurate disparity im-age, the v-disparity image can be calculated. Using the v-disparity image it is easy to obtain a 3D reconstructed mapwith low computational complexity [6]. The v-disparity im-age provides a robust representation of the geometric con-

1998

Page 6: [IEEE 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) - Barcelona, Spain (2011.11.6-2011.11.13)] 2011 IEEE International Conference on Computer Vision

tent of scenes. It is constructed by calculating a horizon-tal histogram of the disparity stereo image. Utilizing twostereo images, the v-disparity image can be constructed byaccumulating the points with the same disparity that occuron a horizontal line in the image. In this image the pointsbelonging to the ground plane tend to form a diagonal line.The two parameters describing this line on the v-disparityimage are essential in order to estimate the ground plane,as well as the other points’ altitudes. The ground planecan be modeled using orientation information avoiding thus,the otherwise demanded computationally intensive Hough-transformations. A tolerance region on either side of theterrain linear segment is considered and any point outsidethis region is treated either as an obstacle or a cavity de-pending on its position with respect to the terrain’s line. Foreach pixel corresponding to an obstacle the local coordi-nates are computed. The local map is an occupancy grid ofthe environment consisting of all the calculated points cor-responding to the existing obstacles or cavities.

3.3. Visual Odometry

The Visual Odometry component estimates the rover’sdisplacement relative to a starting location [12, 13]. It isobtained by examining and correlating consecutive stereoimages captured by the localization cameras, i.e. the nar-row stereo camera, of the rover. More precisely, the VisualOdometry component selects point features and matchesthem in stereo image pairs to establish their 3D coordinatesand obtain the resulting 3D point clouds. The rover’s mo-tion between successive stereo pairs is then estimated as anoptimal fitting problem of two 3D point clouds.

The Visual Odometry component is composed of sub-ordinate modules, i.e. Landmark Detection, Landmark 3DReconstruction, Landmark Matching, and Motion Estima-tion. Figure 8 illustrates the relations of these submodulesboth between themselves and with respect to their parentmodule. The block named “1/z” represents a buffer to holdthe landmarks obtained from the previous instant of time.

3.3.1 Landmark Detection

The Landmark Detection module deals with the detection ofinterest points in images obtained by the localization stereocamera and their description so as to make them traceablein subsequent images [7, 1, 2]. To perform reliable match-ing, it is important that the features extracted from an im-age are detectable in another image even under differencesin the image scale, noise level and/or illumination. The de-scriptor is an, often detector-specific, method characteriz-ing and identifying each detected feature. Different featuredetection methods have been tested during preliminary as-sessment tests, i.e. the Harris corner detector, SIFT, andSURF in order to evaluate their performance and suitabil-

ity when considering a space exploration scenario. Resultsof each of the aforementioned three methods on a seriesof tested Mars-like real environment’s images, as the onesshown in Fig. 9, have been used for assessment purposes.SURF, providing an integrated feature detection and match-ing framework, has been found to be a fair compromise be-tween the number of detected features, accuracy, and com-putation time.

3.3.2 Landmark 3D Reconstruction

The Landmark 3D Reconstruction module creates a pointcloud from the features detected in the stereo pair [9].Once the 2D features are extracted from the images, thestereoscopic configuration allows computing their 3D co-ordinates. After the stereo matching procedure, the coordi-nates from the matched features of the left image are com-bined with the depth information of the disparity map and,as result, the corresponding 3D point cloud is obtained.

3.3.3 Landmark Matching

The Landmark Matching process starts after the 3D land-mark coordinates for a given rover’s position are computed.The rover moves a short distance and a second pair of stereoimages is acquired. The features selected from the previ-ous position are projected into the second pair using theapproximate motion, computed in the last frame. Thena correlation-based search re-establishes the 2D positionsprecisely in the second image pair. Stereo matching ofthese tracked features determines their new 3D positionsproviding a new point cloud. Because the 3D positions ofthose tracked features are already known from the previousstep, the stereo matching search range can be greatly re-duced. Features whose initial and final 3D positions differtoo much are filtered out. Finally, the two point clouds arefitted using an error minimizing procedure and the transfor-mation that achieves the minimum error reveals the rover’stranslation and rotation between the two positions.

3.4. Visual SLAM

The Visual SLAM component takes an estimation of thepose of the rover and the map obtained from the 3D recon-struction and outputs both a refined pose estimation and aglobal map that combines all the maps obtained up to thatmoment. The SLAM problem is that of estimating robot’sposition and progressively building a map of its environ-ment. Typically, it can be considered as an extension ofVO since it makes use of its results but also takes into con-sideration the aforementioned 3D reconstruction’s results.The most recently obtained map of the environment has tobe combined with the global map accumulated up to thatpoint. The two maps overlap either partially (when the robotmoves in new areas) or fully (when it moves in previously

1999

Page 7: [IEEE 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) - Barcelona, Spain (2011.11.6-2011.11.13)] 2011 IEEE International Conference on Computer Vision

Figure 8. The Visual Odometry component and its subordinate modules.

(a) (b) (c)

Figure 9. Left stereo images from a tested Mars-like real environment: (a) Features detected with Harris corner detector, (b) Featuresdetected with SURF, (c) Features detected with SIFT.

Figure 10. The Visual SLAM component and its subordinate mod-ules.

explored areas). The result is an updated and, possibly, ex-panded global map, together with a further fine-tuned posethat optimizes the merging step of the two maps. The VisualSLAM component relies on the Map Merging subordinatemodule. Figure 10 illustrates the relations of the submodulewith its parent module. The block named “1/z” once againrepresents a buffer to hold the global map obtained in theprevious instant of time.

3.4.1 Map Merging

The Map Merging module combines a local map estimatedat time t with the global map obtained up to the time stept− 1. The result is the new global map, together with a re-fined rover pose that optimizes the matching of both maps.More precisely, the pose of the rover estimated by VO isused to superimpose the local map on top of the global map.However, the situation of perfectly matched features thatresult in exactly precise pose is barely ever encountered.

Figure 11. The Localization component.

Thus, one further step is required to address this issue. Thesolution proposed is to accept the input pose only as a goodstarting point and proceed with an iterative jittering proce-dure that optimizes the maps’ combination. This procedureat the same time refines the rover’s initially estimated pose.

3.5. Localization

The Localization module fuses the information from sev-eral input sensors and outputs a robust estimation of the lo-cation of the rover. This module integrates different sensorspresent in the system and taking advantage of their comple-mentary nature (long-term and short-term accuracies, lowand high rates sampling, etc.) provides a robust estimate ofthe location of the rover. Visual sensors provide the manda-tory inputs that the component requires, whereas the infor-mation coming from non-visual sensors is optional (Figure11). The module is thus able to operate under the differentconditions derived from the presence or absence of inputfrom these non-visual sensors.

2000

Page 8: [IEEE 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) - Barcelona, Spain (2011.11.6-2011.11.13)] 2011 IEEE International Conference on Computer Vision

4. Conclusions and discussionThis paper discusses the architecture of the SPARTAN

system. A complete vision system arrangement, consistingof two different in characteristics and purpose stereo setups,has been presented. The considerations and the architec-ture of the SPARTAN vision system have taken into accountESA’s requirements and the particular demands that a spaceexploratory rover has to cope with. The various parametersof the system have been defined so as to cover the requiredaccuracy and performance aspects. Furthermore, the basicalgorithmic building blocks of the system have been identi-fied and presented using the UML object oriented method-ology. This analysis involved the Camera driver and Imag-ing, 3D Reconstruction, Visual Odometry, Visual SLAM,and the Localization modules. For each module its subordi-nate parts have been also presented and the system’s inter-connections have been discussed.

The final implementation of the described architecturewill follow a co-design methodology involving a Virtex-6FPGA device [15] and a PC. The most crucial and demand-ing parts of the involved computer vision algorithms will bemapped on the FPGA, so as to make the SPARTAN systemsuitable of real-time space rover navigation.

5. AcknowledgementThe authors would like to thank Prof. D. Soudris, Dr. K.

Siozios and Mr. D. Diamantopoulos from NTUA, Greecefor their remarks and expert opinion on the hardware im-plementability aspects of the discussed algorithms.

This work has been supported by the European SpaceAgency (ESA) project “SPAring Robotics Technologies forAutonomous Navigation (SPARTAN)”.

References[1] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool.

Speeded-up robust features (surf). Computer Visionand Image Understanding, 110(3):346–359, 2008. 6

[2] C. Harris and M. Stephens. A combined corner andedge detector. In Alvey vision conference, volume 15,page 50. Manchester, UK, 1988. 6

[3] S. Kang, J. Webb, C. Zitnick, and T. Kanade. A multi-baseline stereo system with active illumination andreal-time image acquisition. In IEEE InternationalConference on Computer Vision, pages 88–93, 1995.5

[4] K. Konolige. Small vision systems: Hardware andimplementation. In International Symposium onRobotics Research, pages 111–116, 1997. 2

[5] F. A. Kruse and G. L. Raines. A technique for enhanc-ing digital color images by contrast stretching in mun-sell color space. In International Symposium on Re-

mote Sensing of Environment, Third Thematic Confer-ence: Remote Sensing for Exploration Geology, 1985.5

[6] R. Labayrade, D. Aubert, and J. Tarel. Real time ob-stacle detection in stereovision on non flat road geom-etry through v-disparity representation. In IEEE Intel-ligent Vehicle Symposium, volume 2, pages 646–651,2002. 5

[7] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Com-puter Vision, 60(2):91–110, 2004. 6

[8] D. Menon, S. Andriani, G. Calvagno, and T. Erseghe.On the dependency between compression and demo-saicing in digital cinema. In European Conference onVisual Media Production, 2005. 5

[9] L. Nalpantidis, D. Chrysostomou, and A. Gasteratos.Obtaining reliable depth maps for robotic applicationswith a quad-camera system. In International Confer-ence on Intelligent Robotics and Applications, volume5928 of Lecture Notes in Computer Science, pages906–916. Springer-Verlag, December 2009. 6

[10] L. Nalpantidis, G. Sirakoulis, and A. Gasteratos. Re-view of stereo vision algorithms: from software tohardware. International Journal of Optomechatron-ics, 2(4):435–462, 2008. 5

[11] L. Nalpantidis, G. C. Sirakoulis, and A. Gasteratos.A dense stereo correspondence algorithm for hard-ware implementation with enhanced disparity selec-tion. In 5th Hellenic conference on Artificial Intel-ligence, volume 5138 of Lecture Notes in ComputerScience, pages 365–370. Springer-Verlag, 2008. 5

[12] L. Nalpantidis, G. C. Sirakoulis, and A. Gaster-atos. Non-probabilistic cellular automata-enhancedstereo vision simultaneous localisation and mapping(SLAM). Measurement and Science Technology, inpress. 6

[13] D. Nister, O. Naroditsky, and J. Bergen. Visual odom-etry for ground vehicle applications. Journal of FieldRobotics, 23(1):3–20, 2006. 6

[14] J. Rumbaugh, R. Jacobson, and G. Booch. The Uni-fied Modelling Language reference manual. Addison-Wesley, 1999. 4

[15] K. Siozios, D. Diamantopoulos, I. Kostavelis,E. Boukas, L. Nalpantidis, D. Soudris, A. Gasteratos,M. Aviles, and I. Anagnostopoulos. Invited paper:SPARTAN project: Efficient implementation of com-puter vision algorithms onto reconfigurable platformtargeting to space applications. In 6th InternationalWorkshop on Reconfigurable Communication-centricSystems-on-Chip, Montpellier, France, June 2011. 1,8

2001