Computer Vision and Image...

16
Range segmentation of large building exteriors: A hierarchical robust approach Reyhaneh Hesami * , Alireza BabHadiashar, Reza HosseinNezhad Faculty of Engineering and Industrial Sciences, Swinburne University of Technology, John Street, Hawthorn, VIC 3122, Australia article info Article history: Received 11 March 2009 Accepted 17 December 2009 Available online 23 December 2009 Keywords: Large-scale range data Range segmentation Robust estimation Historical building exteriors abstract There are three main challenging issues associated with processing range data of large-scale outdoor scene: (a) significant disparity in the size of features, (b) existence of complex and multiple structures; and (c) high uncertainty in data due to the construction error or moving objects. Existing range segmen- tation methods in computer vision literature have been generally developed for laboratory-sized objects or shapes with simple geometric features and do not address these issues. This paper studies the main problems involved in segmenting the range data of large building exteriors and presents a robust hierar- chical segmentation strategy to extract fine as well as large details from such data. The proposed method employs a high breakdown robust estimator in a coarse-to-fine approach to deal with the existing dis- crepancies in size and sampling rates of various features of large outdoor objects. The segmentation algo- rithm is tested on several outdoor range datasets obtained by different laser rangescanners. The results show that the proposed method is an accurate and computationally cost-effective tool that facilitates automatic generation of 3D models of large-scale objects in general and building exteriors in particular. Ó 2009 Elsevier Inc. All rights reserved. 1. Introduction Earlier development of three-dimensional imaging technologies was mainly inspired by robotics applications involving laboratory- size polyhedral objects. In the last decade, large-scale range mea- surement technology has been significantly advanced [1,2] to the point that accurate dense range data of outdoor objects, up to a few hundred meters in size, can be produced in minutes. As a re- sult, it is now feasible to generate accurate geometric models of ur- ban environment. Existence of such models are important in variety of applications including augmented reality (e.g. [3]), archeology [4], and production of automated urban models of whole buildings and streetscapes (e.g. [5–7]). In particular, extrac- tion of fine architectural details embedded in the façade of impor- tant buildings has found new significance for 3D modeling and preservation of historical and cultural sites (e.g. [2,8–11]). Segmentation of two-dimensional data (such as color images and video sequences) of urban structures has been studied for many years (e.g. [12–15]). However, 3D range data segmentation of large buildings, especially data obtained by laser rangescanners is a relatively new topic. The existing range data segmentation methods for man-made buildings in the computer vision literature can be classified into two main categories: the first class of ap- proaches which has been particularly proposed for building exteri- ors, is based on using architectural features. Stamos and Allen [16] employed attributes such as vanishing points and Cantzler et al. [17] used parallelism of walls and orthogonality of edges to extract linear features of buildings. Another approach is to consider the 3D dataset as a collection of pre-defined classes of segments. For instance, Zhao and Shibasaki [18] used a hierarchical scheme to partition images into instances of vertical, horizontal and non-vertical lines, vegetation and outli- ers classes. Han et al. [19,20] used a jump-diffusion method for segmenting a range image obtained by a 3D laser rangescanner and its associated reflectance image. This method was imple- mented in the Bayesian framework to allow the integration of both geometric and freeform models. To reduce the speed of computa- tion arising from Markov Chain searching scheme, a data-driven course-to-fine approach is employed. Although the experimental results show that the algorithm is accurate, it is computationally expensive (it takes one hour to process a 300 300 pixels dataset [20]) and the quality of segmentation depends on the availability of a priori knowledge of the models as an input to the algorithm. Anguelov et al. [21] developed a learning-based approach for segmentation of complex scenes. In the learning phase (pre-seg- mentation) of this method, the scan points are labeled and weighted according to an appropriate object class (e.g. ground, building, tree and shrub) using maximum-margin learning approach. In the segmentation phase, a Markov Random Field (MRF) segmentation algorithm is applied to the classified data. The learning phase of this method is computationally expensive and because it does not model parts of the objects and their spatial relations, it is not able to effectively segment the objects that have many parts with local similarities. For instance, it is able to 1077-3142/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.cviu.2009.12.004 * Corresponding author. Fax: +61 3 9214 8264. E-mail addresses: [email protected] (R. Hesami), abab-hadiashar@swin. edu.au (A. BabHadiashar), [email protected] (R. HosseinNezhad). Computer Vision and Image Understanding 114 (2010) 475–490 Contents lists available at ScienceDirect Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu

Transcript of Computer Vision and Image...

Page 1: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

Computer Vision and Image Understanding 114 (2010) 475–490

Contents lists available at ScienceDirect

Computer Vision and Image Understanding

journal homepage: www.elsevier .com/ locate/cviu

Range segmentation of large building exteriors: A hierarchical robust approach

Reyhaneh Hesami *, Alireza BabHadiashar, Reza HosseinNezhadFaculty of Engineering and Industrial Sciences, Swinburne University of Technology, John Street, Hawthorn, VIC 3122, Australia

a r t i c l e i n f o

Article history:Received 11 March 2009Accepted 17 December 2009Available online 23 December 2009

Keywords:Large-scale range dataRange segmentationRobust estimationHistorical building exteriors

1077-3142/$ - see front matter � 2009 Elsevier Inc. Adoi:10.1016/j.cviu.2009.12.004

* Corresponding author. Fax: +61 3 9214 8264.E-mail addresses: [email protected] (R. Hes

edu.au (A. BabHadiashar), [email protected].

a b s t r a c t

There are three main challenging issues associated with processing range data of large-scale outdoorscene: (a) significant disparity in the size of features, (b) existence of complex and multiple structures;and (c) high uncertainty in data due to the construction error or moving objects. Existing range segmen-tation methods in computer vision literature have been generally developed for laboratory-sized objectsor shapes with simple geometric features and do not address these issues. This paper studies the mainproblems involved in segmenting the range data of large building exteriors and presents a robust hierar-chical segmentation strategy to extract fine as well as large details from such data. The proposed methodemploys a high breakdown robust estimator in a coarse-to-fine approach to deal with the existing dis-crepancies in size and sampling rates of various features of large outdoor objects. The segmentation algo-rithm is tested on several outdoor range datasets obtained by different laser rangescanners. The resultsshow that the proposed method is an accurate and computationally cost-effective tool that facilitatesautomatic generation of 3D models of large-scale objects in general and building exteriors in particular.

� 2009 Elsevier Inc. All rights reserved.

1. Introduction

Earlier development of three-dimensional imaging technologieswas mainly inspired by robotics applications involving laboratory-size polyhedral objects. In the last decade, large-scale range mea-surement technology has been significantly advanced [1,2] to thepoint that accurate dense range data of outdoor objects, up to afew hundred meters in size, can be produced in minutes. As a re-sult, it is now feasible to generate accurate geometric models of ur-ban environment. Existence of such models are important invariety of applications including augmented reality (e.g. [3]),archeology [4], and production of automated urban models ofwhole buildings and streetscapes (e.g. [5–7]). In particular, extrac-tion of fine architectural details embedded in the façade of impor-tant buildings has found new significance for 3D modeling andpreservation of historical and cultural sites (e.g. [2,8–11]).

Segmentation of two-dimensional data (such as color imagesand video sequences) of urban structures has been studied formany years (e.g. [12–15]). However, 3D range data segmentationof large buildings, especially data obtained by laser rangescannersis a relatively new topic. The existing range data segmentationmethods for man-made buildings in the computer vision literaturecan be classified into two main categories: the first class of ap-proaches which has been particularly proposed for building exteri-ors, is based on using architectural features. Stamos and Allen [16]

ll rights reserved.

ami), [email protected] (R. HosseinNezhad).

employed attributes such as vanishing points and Cantzler et al.[17] used parallelism of walls and orthogonality of edges to extractlinear features of buildings.

Another approach is to consider the 3D dataset as a collection ofpre-defined classes of segments. For instance, Zhao and Shibasaki[18] used a hierarchical scheme to partition images into instancesof vertical, horizontal and non-vertical lines, vegetation and outli-ers classes. Han et al. [19,20] used a jump-diffusion method forsegmenting a range image obtained by a 3D laser rangescannerand its associated reflectance image. This method was imple-mented in the Bayesian framework to allow the integration of bothgeometric and freeform models. To reduce the speed of computa-tion arising from Markov Chain searching scheme, a data-drivencourse-to-fine approach is employed. Although the experimentalresults show that the algorithm is accurate, it is computationallyexpensive (it takes one hour to process a 300 � 300 pixels dataset[20]) and the quality of segmentation depends on the availabilityof a priori knowledge of the models as an input to the algorithm.

Anguelov et al. [21] developed a learning-based approach forsegmentation of complex scenes. In the learning phase (pre-seg-mentation) of this method, the scan points are labeled andweighted according to an appropriate object class (e.g. ground,building, tree and shrub) using maximum-margin learningapproach. In the segmentation phase, a Markov Random Field(MRF) segmentation algorithm is applied to the classified data.The learning phase of this method is computationally expensiveand because it does not model parts of the objects and their spatialrelations, it is not able to effectively segment the objects that havemany parts with local similarities. For instance, it is able to

Page 2: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

476 R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490

separate trees from buildings, but it is not usually able to segmentdifferent types of cars in a scene. Triebel et al. [22] have modifiedthis method to reduce the computation of the training process byusing an adaptive technique based on the kd-tree algorithm. Inthe segmentation phase, this technique uses Associative MarkovNetworks to extract the instances of various classes including:ground, vegetations, buildings and shrubs. Wolf et al. [23] had ear-lier employed Hidden Markov Models technique to segment 3Dterrain maps for autonomous navigation purposes. Their methodconsists of two steps: learning and classification. It divides the out-door scene into coarse segments of navigable (e.g. flat terrain) andnon-navigable (e.g. grass and gravel) areas. Asai et al. [24] has par-titioned range images of outdoor environments into planar sur-faces using a renormalization method for localized plane fitting.Since this method is based on integration of omnidirectional rangeand color images, small parts of data may be excluded due to theocclusion problem. The change in the illumination condition alsohighly affects the accuracy of the final segmentation.

Osorio et al. [25,26] have presented a hierarchical range seg-mentation with contour constrains. In this technique, the first levelof hierarchy starts with an initial small partition of the first orderregions using Least Median of Squares (LMS) fitting algorithm.The algorithm then group the first order regions to the larger re-gion(s) using a parametric compatibility function until an approx-imation limit is reached. In the next step, the method generalizesthe first order regions to the second order regions using Bayesiandecision theory. Unlike parametric robust range segmentationmethods, this method needs to re-adjust the boundaries of each re-gion by performing a geometric intersection technique.

Although the above techniques are generally able to extractlarge segments of various buildings, they are not designed to detectfine details particularly seen on the façade of important buildings.Moreover, these techniques either rely on the existence of distin-guishable features embedded in the scene which their availabilityis often application dependent or employ edge detection or regiongrowing techniques which require significant post-processing toovercome the occlusion problem.

Most recently, Yu et al. [27] presents a combined segmentationmethod to extract planar surfaces from large-scale range data. Thismethod consists of two parts of clustering and refining strategy. Inthe first part, a clustering method based on the local surface fittingproduces the connected components (e.g. co-planar surfaces).Then, in the second part, the algorithm negotiates the hierarchyfrom global to local. This part refines the sub-segment at eachinternal node by fitting global planar surfaces to the individualcomponents. The experimental results of this work show thatalthough this algorithm is able to extract fine planar surfaces fromcomplex man-made object, it suffers from under-segmentationproblem.

There are also segmentation techniques (using robust statisticsmethods) that are developed for partitioning 3D images of smallobjects into instances of simple geometric entities including lines,planes and quadratics (e.g. MSSE [28,29], RESC [30], ASSC [31], PbM[32] and most recently HBM [33]). Although these methods areable to overcome the problems associated with occlusion and com-plexity in 3D images of laboratory-size objects, computationallyfeasible applications of these techniques to complex large-scaleimages are yet to be developed.

Our aim here is to develop a computationally cost-effectivetechnique capable of extracting all possible geometric detailsembedded in the 3D data of large building exteriors. Such a tech-nique facilitates automatic generation of fine 3D models of outdoorenvironments demanded by the emerging multimedia applica-tions. To achieve this goal, we have made the following contribu-tions to the existing knowledge in this area. Firstly, we haveconducted an in depth investigation, analyzes and formulation of

the problems associated with uncertainties and structural com-plexities of outdoor scenes. Secondly, we have developed a newhierarchical model-based (parametric) robust range segmentationtechnique capable of detecting fine details embedded in the façadeof large building exteriors. The proposed segmentation strategyhas been built upon the aforementioned robust techniques[28,29] using a hierarchical approach involving sequential applica-tion of a robust estimator. The new technique significantly reducesthe logarithmic computational cost associated with random sam-pling (see following two sections) and achieves the level of accu-racy required for the segmentation of fine details.

In the next section, we present the main characteristics of out-door scenes (building exteriors, in particular) that complicate geo-metric segmentation of such scenes. Section 3 explains theproposed robust hierarchical scheme and shows how the tech-nique is designed to address those problems associated with thesegmentation of outdoor range data. The results of our experi-ments with real data are presented in Section 4 where we demon-strate that the proposed method is able to effectively segmentrange data of historical building, containing small parts, substantialnoise (of different types) and large disparities in size. Section 5concludes the paper.

2. Characteristics of range data of building exteriors

The data sets of outdoor man-made scenes obtained by modern3D measurement devices, compared to conventional range dataproduced in laboratories, pose serious challenges to the existingrange data processing techniques [34–36]. Production of rangedata of outdoor scenes is unavoidably affected by moving objectssuch as birds, pedestrians and vehicles (e.g. Fig. 1a). Therefore,the required processing techniques for such data have to be robustto the random effect of moving objects leading to issues broadly re-ferred to as the occlusion problem. In addition, man-made objectsin outdoor scenes appear at different sizes and geometric complex-ities. For example, the building of Shrine of Remembrance in Fig. 1bcontains large surfaces such as walls and ceilings; medium size ob-jects such as staircases and statues; and small size objects such asdetailed decorative artifacts embedded in the façade of the build-ing. This is in contrast with range data of similar size indoor objectssuch as ones depicted in the ABW range image database [37]. Thelarge difference in the size of the objects of interest highly compli-cates the extraction of small parts as the required focus often leadsto over-segmentation of larger sections.

In this section, we show the effects of these problems on simplesimulated data and introduce the ways by which they can beresolved.

2.1. Disparity in size

In range images of large structures, the difference between theshape and size of the objects of interest can be quite significant. Forinstance, the building of the Shrine of Remembrance, shown inFig. 1b, contains large surfaces such as walls and ceilings; mediumsize objects such as staircases and statues; and small size objectssuch as detailed decorative carving in the façade of the building.In this example, when data of walls of the building may containas many as 30% of all data points, surfaces associated with roof dec-oration may only contain 1% of all data. Moreover, an object ofinterest that is located close to the laser rangescanner includesmore data points compared to the similar object located furtheraway. Since distant and small structures have a small number ofdata samples, a range segmentation algorithm that relies on aminimum size for structures – as most techniques do – may notbe able to extract all possible structures of interest. The significant

Page 3: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

Fig. 1. (a) Intensity and range images of the Royal Exhibition Building – Melbourne, Australia. Cars, passengers and vegetation are obstacles that could not have been avoidedat the time of data collection. Moving objects appear as straight lines in the range image. (b) Large disparity in size of various features is highlighted (Shrine of Remembrance– Melbourne, Australia).

R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490 477

difference in the size of the objects in the scene highly complicatesthe extraction of small structures as it often leads to over-segmen-tation of larger sections.

To show the effect of disparity in size on the segmentation pro-cess, we have designed and conducted an experiment using syn-thetically generated data. One example of the simulation isillustrated in Fig. 2a. The scene represented by 3D synthetic datacontains two parallel planar structures named P1 and P2, both con-taminated by outliers. P1 created as a 4 � 4 m planar surface(0 < x1 < 4 m, 0 < y1 < 4 m), located in the 20 m distance (Dis-tance = 20 ± e m, e 2 Nð0;0:1Þ) and contains 1600 data points, whilethe size of P2 (so as the number of data points) is varied from0.4 � 0.4 m (16 data points) to 4 � 4 m (1600 data point) in 10 reg-ular steps. The variable plane is located in the 40 meter distance(Distance = 20 ± e m, e 2 Nð0;0:1Þ). Data of both structures are gen-erated using square shape regular grids of 0.1 m (the same grid sizefor both x and y direction) and corrupted by additive Gaussiannoise N(0, 0.1) to represent measurement error. Around 80 uni-formly distributed wrong measurements (representing 2.44–4.72% of whole population) which emulate the effect of gross out-liers, have also been added to the mix. Table 1 shows how 3D syn-thetic data is generated for this experiment in more detail. Aparametric range segmentation algorithm based on robust estima-tion (MSSE [28]) is then applied to segment this data.

The above experiment was repeated 100 times for every valueof the parameter K in MSSE (the proportion of the size of the small-

Fig. 2. (a) A sample of the synthetic data used to demonstrate the effect of disparity in sisuccess in segmenting both small and large structures with a robust estimator versus theof K.

est data group that would be considered a structure by the estima-tor – here, varied from 2% to 12% of the whole population) and thesuccess rates of the robust estimator in separating both planeswere recorded as shown in Fig. 2b. This figure indicates that suc-cessful segmentation of all possible structures greatly dependson the size of the embedded structure. Structures containing lessthan 20% of all data population are less likely to be segmented asa separate structure.

The proposed hierarchical robust segmentation technique isaimed to overcome this issue by performing segmentation at dif-ferent levels and hence recovering small structures without theinterference of the larger ones.

2.2. Existence of very fine details

Modern 3D laser rangescanners are able to capture high resolu-tion dense geometric data points of building exteriors in a shortperiod of time. As a result, outdoor range datasets are rich in detail.Moreover, most of the buildings of interest (e.g. historical build-ings) contain architectural details such as columns, statues andstaircases. Generation of a simplified model of such buildings isnot unfeasible with existing techniques – for instance, see [38].However, extraction of fine details of the ornamental buildingshas remained a challenging task that can only be performed eithermanually [8] or by huge amount of computation. The computationcost associated with either of those methods increases rapidly with

ze for large-scale range data segmentation. (b) A plot representing the percentage ofratio of the size of small structure to the size of whole population for different values

Page 4: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

Table 1Detail of 3D synthetic data created (Fig. 2a) to analyze disparity in the size of data.

P1 (fixed planar surface) P2 (variable planar surface) Outliers

x 0 < x1 < D1 D2 < x2 < D2+(D1 � D D) �D1/2 < x0 < �D1/2 + 2 � D1

y 0 < y1 < D1 D2 < y2 < D2+(D1 � D D) �D1/2 < y0 < �D1/2 + 2 � D1

z (distance) Z1 ± e Z2 ± e 0 < Z0 < Zo

e e N(0,0.1) e e N(0,0.1)

Note: 3D data of Fig. 2a is created based on the following values: D1 = 4 m, D2 = (1 – DD) � D1/2, DD = f0:1; 0:2; 0:3; . . . ;1g, Z1 = 20 m, Z2 = 40 m, Zo = 60 m.

Fig. 3. Plot representing the number of required random samples verses theminimum relative size of the smallest desired detail for different probabilities ofsuccess in 3D. The number of required samples enormously increases when the sizeof desired structure goes below 10%.

478 R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490

modest increases in the desired level of interest. In order to analyzethis phenomenon, we consider RANSAC [12] type robust estima-tion approach.

Most of the robust estimators use a search method such as ran-dom sampling to solve the problem of optimization. As shown byFischler and Bolles [39], if e is the fraction of data points notbelonging to the segment one tries to find by random sampling(so, K = 1�e), the probability of having at least one ‘‘good sample”(a sample belonging to the segment of interest) in p-dimensionalparameter space is:

P ¼ 1� ð1� ð1� eÞpÞm ð1Þ

Therefore, the minimum number of random samples required forhaving at least one good sample (which by itself is far from satisfy-ing the sufficiency condition [40]) is calculated by:

m ¼ logð1� PÞlog½1� ð1� eÞp�

ð2Þ

The above formula shows that the required number of random sam-ples rapidly increases with small inliers ratios (where the value of e

Fig. 4. (a) Sample of the simulation data used for segmentation analysis of distant co-distance of structures for cases where K = 0.1.

is very close to 1) encountered when fine details are to be seg-mented. For instance, if the size of the smallest structure of interestin a multi-structure three-dimensional scenario (p = 3) is 1%(K = 0.01 => e = 0.99) of all data population, then more than two mil-lion random samples are required to find the structure of interest90% of times. Fig. 3 shows the rapid change in the number of re-quired random samples when high level of details is desired.

It is important to note that in practice, the outlier ratio e is notknown a priori and has to be assumed to be fairly large to guaran-tee that small structures are not overlooked. Therefore, for success-ful segmentation of small patches, e values very close to one(means small value close to zero for K) must be chosen whichwould result in prohibitive computation as exemplified above. Asa result, the direct usage of random sampling to solve the objectivefunction of robust estimator is not a practical option. Instead,applying random sampling in a hierarchical approach (as we pro-posed in this paper) would dramatically reduce the computationalcost of the estimation. Random sampling also can be replaced byguided sampling (e.g. [41]), where guidance of the search is basedon the information of the local regions.

2.3. High uncertainty due to construction errors

In our experiments with range data of building exteriors, wehave found that segmentation of building exteriors is highly af-fected by the construction accuracy of the building. Existence of er-rors in the construction of large buildings is generally unavoidableand their scales are significant compared to the accuracy of state-of-the-art 3D measurement systems (a few centimeters verses afew millimeters, respectively). In particular, the effect of construc-tion error becomes a significant issue when different parts of onestructure are located apart. In such circumstances, and dependingon the level of construction error, model-based range segmenta-tion algorithms may no longer be able to detect co-planar surfacesas single structures.

To investigate the effect of construction error on the segmenta-tion process, we have carried out a simulation experiment asshown in Fig. 4a. The scene represents three-dimensional syntheticdata containing co-planar surfaces, named P1 and P2. Each plane

planar surfaces. (b) Likelihood of detecting co-planar surfaces as one segment vs.

Page 5: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490 479

contains a total of 500 data points and are similar in size (5 m wide(W) by 10 m length (L)). The distance between planes (mW) is var-ies by the value of m from 0 to 60 m at 5 m interval. Data of bothplanar surfaces are generated using square shape regular grids ofsize 0.25 m and corrupted by additive Gaussian noise N(0, 0.1) torepresent measurement error. The construction errors are thenmodeled by moving one surface parallel to the other in depth (ltimes the scale of noise). A number of randomly distributed grossoutliers (around 30% of population, representing wrong measure-ments or miscellaneous building parts) have also been added tothe set.

To analyze the effect of construction error on segmentation re-sult, a robust estimator (MSSE) is applied to this data. This exper-iment was repeated 100 times for different values of l (theconstruction error in depth) ranging from 0.5 to 3 times of mea-surement error (here, 10 mm) at 0.5 intervals. The number of timesthat the robust estimator has successfully labeled both patches as asingle plane was recorded and is shown in Fig. 4b. The plot showsthat successful segmentation of co-planar surfaces directly de-pends on the distance separating those coplanar structures andthe amount of construction error. Coplanar structures separatedby more than 2 times of their dimension are not likely to be seg-mented as coplanar and the situation worsens as the constructionerror increases.

The approach we have chosen to address the issues associatedwith large-scale range data (explained in Sections 2.1 and 2.2) isto employ a high breakdown robust estimator. However, the min-imum presumed size of a segment for such an estimator has to beset to very small values requiring a huge number of random sam-ples to solve its optimization problem. Therefore, such a schemewould computationally be impractical both in terms of memoryand computation time (see Table 1 for detail). Furthermore, theexistence of construction errors (explained in Section 2.3) leadsto over-segmentation when a high breakdown estimator is used.

To overcome these issues, in the following section, we proposeto use a high breakdown robust estimator in a novel hierarchicalframework. The scheme segments range data using the three levelsof coarse, medium and fine, simultaneously avoiding the over-seg-mentation issue and significantly reducing the total computationtime and memory requirements.

Table 2Outcomes of direct implementation of the robust range segmentation algorithm for the S

K No. of samples Segmentation time (s)

0.3 84 280.2 286 360.1 2301 1500.08 4496 3980.05 18,419 16610.02 287,821 44,822 (�13 h)0.01 2,302,583 Stopped due to computation

Table 3Outcomes of hierarchical implementation of robust segmentation algorithm for the Shrine

Hierarchy K Number and quality of generated segments

First 0.2 Segment I Segment IIr� 0.01

Second 0 15 Segments I-1 and I-2 Segments II-1

r > 0.01

Third 0.4 Segments I-1.1 to I-1.4 r < 0.01r < 0.01

Segmentation tim

3. Hierarchical robust segmentation scheme (HRS)

As mentioned previously, robust estimation is the tool we havechosen to address the aforementioned complexity and uncertaintyissues associated with segmentation of range images. Robust esti-mation techniques have been broadly used in many computer vi-sion tasks as they have been successfully demonstrated totolerate outliers of various types (such as impulsive noise gener-ated by sensors, neighboring structures — called pseudo-outliers[42] — and environmental noise (e.g. [28,40]). Robust estimatorsare either adopted from the statistics community (e.g. Least Med-ian of Square (LMedS) [43]), or innate to the computer vision field(e.g. Hough Transform (HT) [44], RANdom SAmple Consensus(RANSAC) [39]). Most of these techniques, especially those intro-duced by the computer vision community, have a breakdown pointof more than 50% (breakdown point of an estimator is the propor-tion of outliers or pseudo-outliers an estimator can handle beforegiving an arbitrarily large result). Examples are, Minimize theProbability of Randomness (MINPRAN) [45], Minimum UnbiasedScale Estimator (MUSE) [46], Adaptive Least K th Order Squares(ALKS) [47], Modified Selective Statistical Estimator (MSSE) [28],Maximum Density Power Estimator (MDPE) [48], projection-basedM-estimator (pbM) [32] and most recently High Breakdown M-estimator (HBM) [33]. These methods, however, have only beenused for applications involving segmentation of laboratory-sizedobjects.

In general, robust segmentation algorithms are a class of tech-niques that are designed to extract geometric primitives fromraw data with multiple structures by repeatedly using a robustestimation. Segmentation of the structures is usually performedin a sequential manner. In each iteration, first, the inliers (the datasamples belonging to one structure) are determined by fitting asurface model to the range data and simultaneously estimatingthe model parameters such as surface normal and curvature. Theresulting inliers are then masked out not to be processed in thenext iterations. The algorithms also eliminate the outliers pro-duced by false measurements due to range sensor errors or mal-functioning, or environment changes. The extraction of geometricprimitives, using robust estimators, often entails an optimizationproblem. To solve the cost function of the RANSAC based robust

hrine of Remembrance with different values of K (size of the smallest structure).

No. of segments Segmentation quality

2 No fine details detected3 No fine detail detected8 No fine detail detected

10 Moderate number of fine details15 Moderate number of fine details29 Most details are detected

al limitations

of Remembrance.

Segment III Remainder (outliers)

to II-7 Segments III- 1 to III-7 Segments O-1 to O-9r < 0.01Segment O-10r > 0.01

r < 0.01 Segments O-10.1 to O-10.3r < 0.01

e = 191 (s)

Page 6: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

480 R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490

estimators, a random search is employed involving a minimalnumber of subsets of randomly selected points. Each minimal sub-set (also called a random sample or sample for short) is a p-tuple ofmeasurement data points where p is the dimension of parameterspace. The number of samples has to be adequate to ensure, witha probability close to one, that at least one of the samples is a goodsample.

As explained in Section 2, to deal with the issues associatedwith range segmentation of building exteriors, a single global ro-

Data (e.g. deleting invalid da

CoarseInputs: All range data (from previous stage)accuracy of data acquisition system (σ)) Method: Robust estimation (here, MSSE) Condition: No condition Output(s): Geometric data of each large/coa

EstimationInput: Range data of each coarse segMethod: Least-square regression Condition: No Condition

Output: Estimated scale of noise (∧σ

IntermedInputs: Range data of each coarse segMethod: Robust estimation (here, MS

Condition: ∧σ is greater than 5 times

Output(s): Geometric data of medium

Fine SInputs: Range data of each intermediateMethod: Robust estimation (here, MSS

Condition: ∧σ is greater than the accura

Output(s): Small size structure(s) (if th

3D Range DLaser

Pre-segmentation

First level of hierarchy

Second level of hierarchy

Third level of hierarchy

EstimationInput: Range data of each intermediaMethod: Least-square regression Condition: No Condition

Output: Estimated scale of noise (∧σ

EstimationInput: Range data of each fine segmeMethod: Least-square regression Condition: No Condition

Output: Estimated scale of noise (∧σ

En

Fig. 5. Hierarchical Robust Range Segmentation (HRS). K is the relative size of smallest sby instrument.

bust segmentation approach is unlikely to be sufficient. The hierar-chical robust segmentation (HRS) technique presented here isdesigned to overcome the mentioned problems. In order to processrange data, it has been assumed that most of the structural parts oflarge building exteriors are either planar (due to the ease of theirconstructions) or can be approximated by small planar patches[49,50]. This assumption allows us to use the highly effectivemodel-based approach and take advantage of existing robustsegmentation techniques [28]. However, for applications where

Pre-Processing ta, median filtering, etc.)

Segmentation: . K and σ (here, K is 0.3 and sigma is 10 times of

rse structure and remaining data (known as outliers)

of Scale of Noise: ment/outliers (resulted from previous stage)

) and parameters of each segment

iate Segmentation: ment/outliers, K (here, 0.25) SE)

of accuracy of data acquisition system (σ) size structures (if there is any)

egmentation: segment, K (here, 0.15)

E)

cy of data acquisition system (σ) ere is any)

ata Acquisition using Rangescanner

of Scale of Noise: te segment (resulted from previous stage)

) and parameters of each segment

of Scale of Noise: nt (resulted from previous stage)

) and parameters of each segment

d segmentation

tructure to be segmented. r is range accuracy of measurement system and is varied

Page 7: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490 481

nonlinear forms are of importance, the proposed hierarchicalframework can be extended to include model selection strategiessimilar to those introduced in [29]. In this approach, a robust esti-mator is used to segment the range data at different levels. Thiswould significantly reduce the time and computational cost of seg-mentation to a level achievable by ordinary computers while issueswith scale and size are also resolved (see Tables 2 and 3 for acomparison).

The proposed algorithm starts by specifying number of hierar-chical level for segmentation and a user-defined input to the robustestimator for each level. The user-defined threshold called K, worksas a fine tuning parameter and indicates the ratio of population ofthe smallest region that can be regarded as a separate region andthe size of the entire population. The number of segmentation leveldepends on the proportion of detail one is interested in and dispar-ity in the size of the features. In order to extract structures as smallas 1% of entire data (we found such structure by close observationof data) from data of historical building exterior, number of re-quired level can be three where K1 = 0.2, K2 = 0.15 and K3 = 0.4(K1K2K3 = 0.012). It is also assumed that the scale of typical mea-surement error of the rangefinder is either readily available orcan be measured from data.

The implementation detail of every stage of the proposed rangedata segmentation algorithm, shown in Figs. 5 and 6, is as follows.

3.1. Range data pre-processing

In this stage, those data points whose associated depths are notvalid (due to the limitation of the laser rangefinder used for mea-suring the depth) are eliminated. These points are usually markedby the rangescanner software with an out-of-range number. Dataof outdoor man-made objects, captured by laser technology is con-taminated by noise due to the ‘mixed-pixel’ effect and moving ob-jects. To reduce these effects, a median filter of 5 � 5 pixels isapplied to the entire valid range data.

3.2. Robust range segmentation

A robust segmentation algorithm is applied to the entire data.This algorithm is initially tuned to extract a preliminary collectionof coarse/large segments with appropriately setting K (the value ofK is application dependant). The remaining data (which containpseudo-outliers and outliers) are marked as outliers and storedfor further processing. In this work we have chosen to use the

Fig. 6. Cascade of hierarc

Modified Selective Statistical Estimator (MSSE) [28], because it isstraightforward and has the least finite sample and asymptotic biasin comparison with other popular robust estimators [51,52]. Inaddition, this estimator simultaneously calculates the scale of thenoise of each separated structure which is used as hierarchy crite-rion in the HRS algorithm. It is important to note that other highlyrobust estimators such as pbM and HBM could also be used in thisstep and would be expected to produce similar results. The detailof this estimator is explained in Section 3.1.

3.3. Surface fit

A surface (here, a planar surface described by ax + by + cz = 1) isthen fitted to the data of each coarse/large segment (of the previ-ous stage) using a least-square fitting and calculate the scale ofnoise.

3.4. Hierarchy criterion

If the calculated value of scale is more than scale of noise of themeasurement unit, we consider this segment as a coarse segmentand once again, apply the robust segmentation algorithm. Other-wise, it will be labeled as a large segment. Where applicable, thisstep is repeated to extract all possible details embedded in thedata.

3.5. Sequential segmentation

Data marked as outliers in the first segmentation stage is notdiscarded since the majority of such points may belong to somesmall structures. The robust segmentation algorithm is again ap-plied to these data points. Smaller structures are normally detectedat this stage.

3.6. Model-based robust segmentation using MSSE

This section briefly explains model-based robust estimation anddescribes the implementation of MSSE [28] for segmentation appli-cations. The raw range data captured by range measurement sys-tem can be defined as two main parts: the data structure (inliers)and noise (outliers). The purpose of model-based robust estimationis to separate noise from data by fitting parametric equations toobserved data. The classic linear regression model can be definedas [53]:

hical segmentation.

Page 8: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

perform random sampling

calculate residuals to the plane fitted to the sample

sort residuals (r i2) in descending order

starting from n=K, calculate the scale of noise from:

pn

n

j jr

n −

∑=

=1

2

r2(K+1) < (T2s2n)

yes

no

separate inliers from outliers(r2

(K+m) is classified as the first residual outlier)

calculate s2new n=n+1

start

outliers inliers

end

number of outliers > Kyes

no

Estim

ationSegm

entation

Fig. 7. Robust range segmentation using MSSE.

482 R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490

yi ¼ xi1h1 þ � þ xiphp þ eii i ¼ 1; . . . ; n ð3Þ

where n is the population size and p is number of parameters in themodel. The independent variables xi1; . . . ; xip are called carriers, thedependent variable yi is called response, h is unknown parameter;and ei is the error term. In most computer vision applications, mea-surement error considered as additive Gaussian noise and is de-scribed as ei � G(0, r2). Given the model as Eq. (3), one tries toestimate the vector of unknown parameters from the data:

h ¼

h1

:

:

hp

26664

37775 ð4Þ

The outcome of robust estimation of such data set is model param-eters defined as:

h ¼

h1

:

:

hp

266664

377775; ð5Þ

where the estimates hi are called parameters of fit. Even though theactual model parameters hi are unknown, the estimated value of yi

can be calculated from:

yi ¼ xi1h1 þ � � � þ xiphp ð6Þ

The residual ri is then calculated from the difference between theestimated value and the actual observed value:

ri ¼ yi � yi ð7Þ

The core idea of estimation is to find the variables that have thesmallest residuals to the fit. This fit is then considered as a potential‘good’ segment.

For robust estimation of 3D objects into planar surfaces, Eq. (3)can be simplified as:

yi ¼ xi1h1 þ xi2h2 þ xi3h3 þ ei i ¼ 1; . . . ;n ð8Þ

Eq. (8) can be rewritten to the general notation of normalized pla-nar surfaces (for the simplicity and readability) as:

zi ¼ aþ bxi þ cyi þ ei i ¼ 1; . . . ; n ð9Þ

So the fitted surface can be represented as:

z ¼ f ðx; y; a; b; cÞ ð10Þ

and the residuals can be calculated from:

ri ¼ zi � zi ð11Þ

This robust estimation using MSSE, starts with random sam-pling to determine a number of candidate fits, then ranks thesecandidate fits by least NKth order residuals and estimates the scalefrom the best preliminary fit (NKth = K � N, where K is the fractionof population one is interested in and N is the entire population to

Page 9: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

Fig. 8. Example of synthetic data with multiple structures. In this example, total number of data including gross outliers is 2270 points; number of gross outliers is 30 pointsand size of the smallest structure is 40 points (1.7% of the entire data).

Fig. 9. Segmentation result for dataset presented in Fig. 6.1 where P is set to 0.85 and K (size of smallest structure) is set to 0.05.

1 In Normal (Gaussian) distribution, the majority (50 + 1%) of the samples lie within2.5 times the standard deviation from 0.

R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490 483

be segmented). The algorithm classifies inliers and outliers byusing scale estimation r. Fig. 7 shows the flowchart of range seg-mentation using MSSE.

As shown in the flowchart, a value of K is set (by the user) asthe lower limit of the size of populations one is interested in. Alocalized data group inside the data space in which all the pixelsappear on a flat plane is found using random sampling. A planarmodel with the least NKth order squared residuals is selectedfrom the planar models and fitted to those samples (for planarmodel p is 3 in Eq. (3)). Residuals are then ranked for the se-lected fit. For the accepted model, starting from n = K, the unbi-ased estimate of scale of noise is calculated using the smallest nresiduals:

r2n ¼

Pnj¼1r2

j

n� pð12Þ

where rj is the jth smallest residual and p is the number ofparameters in the model. The (n + 1)th residual squared isweighed against a threshold set by the r2

n. If r2nþ1 > T2r2

n , thenthe residual r2

nþ1 is considered as the first outlier residual. Then,those points whose squared residuals is greater than the thresh-

old (T) multiple of the scale of noise are rejected. Based on a re-cent study on the distribution of noise in range data captured bylaser rangescanner [54], although actual distribution of noise insuch data is neither Gaussian nor White, but with good approxi-mation it can be assumed as Gaussian. In this case, the value ofthe threshold T is often considered to be 2.5 based on the desiredlevel of significance in the normal distribution1 [53]. The equiva-lent characterization of the point of transition from inliers to out-liers occurs when:

r2nþ1

r2n> 1þ T2 � 1

n� pþ 1ð13Þ

The above procedure, from random sampling to outlier rejec-tion, is repeated as long as the remaining data is large enough tohold the remaining segments and each time the segmentation pro-cess leads to a new segment containing all the inliers to the ob-tained fit (regardless of their geometrical location) to begenerated. As a result, the algorithm has the advantage of detecting

Page 10: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

Fig. 10. First level of robust segmentation.

Fig. 11. Second level of robust segmentation.

Fig. 12. Third level of robust segmentation.

484 R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490

and resolving occlusion while segmenting the data. The abovetasks are iteratively performed until the number of remaining datapoints become less than the size of the smallest possible region inthe desired application. Details of implementation of MSSE areshown in Algorithm 1.

3.7. How the hierarchical scheme reduces computation cost

A general drawback of using robust estimation is that no expli-cit formula exists to solve the objective function optimizationproblem for most of the estimators. An accurate solution can only

Page 11: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

Fig. 13. (a) Intensity (left) and range (right) image of the front side of the Shrine of Remembrance. Range data of the building is captured by Riegl (LMS-Z210) laserrangescanner. (b) Segmentation result of the first level of hierarchy (c) Final result of hierarchical segmentation. (d) All possible detail (planar and decorative) of the head ofthe building is successfully segmented.

R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490 485

be determined by searching in the space of all possible estimates.Consider all estimates determined by all possible p-tuples (i.e.three-tuples for 3D segmentation) of data points. In an exhaustivesearch scheme to minimize the median of square residuals (usingthe well-known LMS estimator), there are (np = n!/(p!(n - p)!)) p-tuples and it takes O (np) time to find the median of the residuals

of the whole data for each p-tuple and its cost will therefore in-crease very fast with n (number of data points) and p.

In order to reduce the cost of computation, instead of searchingall space, one can apply random sampling as described in Section 2.However random sampling by itself is again costly. As we men-tioned in Section 2, there is a large disparity in the size of features

Page 12: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

486 R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490

in range data of building exteriors. As a result, in order to extract alldetails embedded in data by a robust estimator (such as MSSE), weneed to assign a very small value for inliers ratio. For three-dimen-sional scenario (p = 3) with very fine structures of interest, Eq. (2)can be simplified as:

m � logð1� PÞK�3 ð14Þ

In this equation (1 � e) which is the smallest possible ratio of inliersin the application is denoted by K. Since K 1 the denominator ofEq. (2) is closely approximated with K3 (log(1 � x) x for small x).

If we assume that the minimum ratios of inliers in the three lev-els of the proposed hierarchical scheme are: K1, K2 and K3, then thesegmentation process would be able to extract segments as smallas K = K1K2K3 times the total number of data. Thus, the total num-ber of random samples required by the hierarchical technique is:

mHRS ¼ logð1� PÞ½K�31 þ K�3

2 þ K�33 � ð15Þ

On the other hand, the number of samples required to segment thesame size of structure by direct usage of a robust estimator is:

mdir ¼ logð1� PÞK�31 K�3

2 K�33 ð16Þ

Comparison of Eqs. (15) and (16) shows how using hierarchicalapproach reduce the number of required samples (and conse-quently the computation time). This fact is experimentally shownin Table 3 of Section 4. This result shows that for segmenting finestructure as small as 1.2% of the whole data (where K1 = 0.2,K2 = 0.15 and K3 = 0.4) the number of required samples is de-creased by a factor of 1325.

4. Experimental results

The hierarchical robust segmentation (HRS) technique has beenapplied to a number of both synthetic and real range datasets to

Fig. 14. Quantitative outcome of three levels of hierarchical robust segmentation for datsmallest structure to be segmented and it is varied in each level. r- hat is the estimsegmentation.

evaluate the performance of the proposed algorithm. The syntheticdata includes complex multiple structures. Real data include rangedata of real-world scenes (here, historical building exteriors) cap-tured by the experimental rangescanner device, developed by theauthor [36], as well as two commercially available rangescannersystems. The experimental results are presented in Sections 4.1and 4.2. All experiments are performed by a desktop PC with a dualcore 2.2 GHz CPU and 3 Mb of RAM using MATLAB environment.

4.1. 3D synthetic data

Three-dimensional synthetic data containing four structureswas generated for this set of experiments. As illustrated in Fig. 8,the inliers belong to the planar surfaces – each representing astructure – containing 1000, 800, 400 and 40 data points corruptedby an additive Gaussian noise of N(0, 0.1). The inliers are mixedwith 30 gross outliers. The synthetic dataset is designed to repre-sent data of a complex multi-structure scene where outliers anddisparities in the population (i.e. size) exist. In this dataset, thenumber of outliers is close to the number of data of the smalleststructure which is about 1.7% of total data. The objective of theexperiment on this dataset is to extract all structures and eliminateoutliers.

The first experiment was performed with P = 0.99 and K = 0.017(40/2270). From Eq. (2), the number of random samples requiredfor segmentation using MSSE is too large to be computationallyfeasible in our computing environment. In the second experimentwith P = 0.85 and K = 0.05, the number of required samples for seg-mentation using MSSE, significantly decreased to about 15 � 104

and the process took about 2 h. Fig. 9 shows the result of thisexperiment. It is observed that although the computational cost(in terms of time and memory usage) is significantly decreased,the smallest structure could not be extracted. In addition, a false

a of Shrine of Remembrance building – Melbourne, Australia. K is the relative size ofated scale of noise for each segment and assesses the goodness of hierarchical

Page 13: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490 487

surface component was extracted (the red and blue structures in2Fig. 8 are detected as one structure). This problem is referred toas a ‘bridging fit’ error [55] and the output of range segmentationevaluated as over-segmentation.

A bridging fit occurs when two distinct neighboring surfacesare segmented as one region and it occurs here due to the lackof enough samples with P set to 0.85 and size of the smalleststructure of interest set to 5% of whole data. As mentioned in Sec-tion 3, to overcome this problem and delineate all possible fea-tures that exist in such a dataset; a single global use of robustestimation is not adequate. A step-wise robust approach needsto be considered for the segmentation problem of complex andlarge datasets.

To evaluate the performance of the HRS, the proposed algorithmhas been applied to the synthetic dataset. Figs. 10–12 show thesteps of the HRS on the synthetic data of multiple structures pre-

Fig. 15. Hierarchical segmentation of the back side of the Royal Exhibition Building –rangescanner manufactured by the authors. (a) Intensity image of the building. (b) Fsegmentation. This figure is best viewed in color.

sented in Fig. 8. Fig. 10 shows the result of segmentation ofcoarse/large structure using the HRS algorithm. In this stage,K1 = 0.15 and P = 0.95. As shown in this figure, only one structureis extracted and other structures are separated as a very coursesegment that needs to be further segmented.

In the second stage, the segmentation algorithm is applied tothe remaining data resulting from the first level of segmentation(data shown in green in Fig. 10). In this stage, K2 = 0.1 andP = 0.95, where K2 is the size of smallest structure in the remainingdata and P is the probability of selecting good sample. Fig. 11 illus-trates the result of the second level of segmentation.

In the last stage of segmentation, the smallest (or finest) seg-ments were detected and the remaining gross outliers were elim-inated from the dataset. This segmentation result is shown inFig. 12. In this stage, K3 = 0.1 (size of the smallest structure one isinterested in, in the remaining data) and P = 0.95.

Melbourne, Australia. Range data of the building is captured by a prototype laserirst level of segmentation (coarse segmentation). (c) Final result of hierarchical

Page 14: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

488 R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490

4.2. Real data

The exteriors of the chosen buildings for this set of experimentsare highly structured with many large planar objects like walls,doors and roofs and smaller planar objects such as stairs, doorwaysand decorative parts. Range data of the first experiment is capturedfrom the front view of a large building called the Shrine of Remem-brance (Melbourne, Australia) by a Riegl (LMS-Z210) laser ranges-canner. The building is pictured in Fig. 13a(left). The scanned rangeimage of the building is sampled on a 250 � 382 grid and containsalmost 105 data points. Angular resolution of the scanner was setto be 0.1� and its measurement error is typically 10 mm. The origi-

Fig. 16. Hierarchical segmentation of the south view of Notre-Dam Church – France. Raimage of the church [56]. (b) First level of segmentation (coarse segmentation). (c) Finalof different parts of building are successfully segmented. This figure is best viewed in c

nal range image and the results of the first and last stages of thesegmentation strategy are shown in Fig. 13b and c. To show thehigh accuracy of the proposed segmentation algorithm, the decora-tive part of the front exterior and its segmentation outcome aremagnified in Fig. 13d.

To highlight the advantages of the hierarchical strategy, the re-sults of the direct and the step-wise implementations of the robustrange segmentation technique have been compared by using therange image of a typical building. Tables 2 and 3 summarize thevarious outcomes of each approach for different values of K (theproportional size of the smallest data group that would be consid-ered a structure). Fig. 14 describes the results presented in Table 3

nge data of the church captured by Leica HDS2500 laser rangescanner. (a) Intensityresult of hierarchical segmentation. (d–f) all possible details (planar and decorative)olor.

Page 15: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490 489

in quantitative detail. This figure shows the value of K consideredfor each level and the estimated scale of noise for each segment.This scale is used as a measure of goodness for each segment sothat segmentation algorithm stops where the estimated scale ofnoise is greatly larger than the actual measurement noise.

As shown in Table 2, when the value of K is decreased to extractmore details (finer structures), the required computational cost issignificantly increased to levels that would be considered imprac-tical for most vision applications. At the same time, the value of r(estimated noise scale) for each segment also decreases, pointingto the fact that the segmentation has become more accurate withthe smaller values of K. The under-segmentation problem howeverworsens when K is fairly small. Table 3 shows the results of hierar-chical approach to the robust range segmentation of the Shrine ofRemembrance (see Fig. 13) requiring a three level pyramid. At thefirst level, the algorithm has focused on the large/coarse segments(e.g. structures that contain at least 20% of the whole population)and has separated data into four parts. Stages two and three havefurther refined those parts into smaller/finer segments where avery accurate segmentation is achieved at its final stage. This tablealso shows that our hierarchical approach to the segmentation,drastically decreases the required time of computation (3 minverses 13 h) and has taken the full advantage of the high accuracythat the MSSE can produce.

In the second experiment, we applied our segmentation algo-rithm to the geometric data of the Melbourne Exhibition buildingexterior (Fig. 15a). The data set was captured by an experimentallaser rangescanner system that has been developed by the authors[36]. Angular resolution and measurement error of the system are0.1� and 300 mm, respectively and the range image is sampled on a320 � 615 lattice. Fig. 15b and c illustrates the first and final levelof segmentation for this set of data. Our results shows thatalthough the measure of noise in this data set is much higher thanthose data obtained by commercial systems (mostly due to thelimitations of the assembly system), the range segmentation hasbeen successful in extracting fine details embedded in the frontside of the building.

Range data of the Notre-Dame Church (Paris) is used for ourthird test (Fig. 16). The range data is captured by Leica (HD2500)laser rangescanner and contains 5 � 105 data points. The measure-ment error of the system is typically 4 mm in depth and its angularresolution generates about 25 mm space between the samples at20 m. Fig. 16b and c shows the first and last level of range segmen-tation, respectively. Fig. 16d, c and f zoom at the different part ofthe building that has been segmented to demonstrate the level ofdetail that can be extracted by our algorithm. It is important tonote that although parts of data, shown in Fig. 16e, is missingdue to the measurement limitation of laser for reflective surfaces,the segmentation algorithm has been able to correctly partitionthe existing parts.

5. Conclusion

The main contribution of this work is twofold: (a) to analyze themain issues associated with the geometrical range segmentation oflarge and significant buildings; and (b) to present a novel methodto efficiently partition range data of such objects into planarpatches using a hierarchical coarse-to-fine segmentation approach.In this paper, we have explained how the proposed hierarchicalsegmentation algorithm extracts detailed planar patches of largeand complex datasets and have provided theoretical proof aswhy the proposed technique imposes modest computational cost.In a case study presented here, it has taken around 190 s to seg-ment 28 planar patches embedded in the range data set of a deco-rative monument. The same task takes around 13 h when a similar

robust segmentation technique is applied directly to this dataset.Our experimental results also show that the segmentation out-comes of the proposed method are more accurate and less proneto the usual over or under-segmentation issues.

It is important to note that the algorithm fails to extract struc-tures located very close together. This failure is directly related tothe accuracy of the sensor. For instance, for data captured withdecimeter accuracy in depth measurement (e.g. data of Royal Mel-bourne Exhibition Building in Fig. 15), we do not expect to extractfeatures located in decimeter neighborhood. Another failure of thealgorithm happens when there is a large construction error (rela-tive to the accuracy of depth measurement). Both failures are outof control and perhaps can be fixed using extra information suchas symmetricity of features in man-made objects or correspondingintensity data for each structure.

Acknowledgments

The authors would like to thank Professor David Suter and histeam at Monash University, Australia, for providing access to theirRiegl laser rangescanner system and Professor Peter K. Allen andMr. Paul Blaer at Columbia University, USA, for providing rangedata of Notre-Dam Church.

References

[1] F. Blais, A review of 20 years of range sensor development, Electronic Imaging13 (1) (2004) 231–240.

[2] G. Godin, J. Domey, M. Picard, J.-A. Beraldin, J. Taylor, L. Cournoyer, M. Rioux, S.El-Hakim, R. Baribeau, F. Blais, P. Boulanger, Active optical 3D imaging forheritage applications, IEEE Transaction on Computer Graphics andApplications 22 (5) (2002) 24–36.

[3] W. Piekarski, 3D modeling with the Tinmith mobile outdoor augmented realitysystem, IEEE Journal of Computer Graphics and Applications 26 (1) (2006) 14–17.

[4] G. Godin, F. Blais, L. Cournoyer, J.-A. Beraldin, J. Domey, J. Taylor, M. Rioux, S. El-Hakim, Laser range imaging in archaeology: issue and results, in: CVPRWorkshop on Applications of Computer Vision to Archaeology (ACVA’03),Madison, Wisconsin, 2003.

[5] C. Früh, A. Zakhor, An automated method for large-scale ground-based citymodel acquisition, International Journal of Computer Vision (IJCV) 60 (2004)5–24.

[6] H. Zhao, R. Shibasaki, Reconstructing urban 3D model using vehicle-borne laserrangescanners, 3-D Digital Imaging and Modeling (3DIM) (2001) 349–356.

[7] K. Fujii, T. Arikawa, Reconstruction of 3D urban model using range image andaerial image, in: IEEE International Symposium of Geosciences and RemoteSensing (IGARSS), vol. 4, 2001, pp. 1928–1932.

[8] P.K. Allen, I. Stamos, A. Troccoli, B. Smith, M. Leordeanu, Y.C. Hsu, 3D Modelingof historic sites using range and image data, in: IEEE International Conferenceon Robotics and Automation (ICRA), Taiwan, vol. 1, 2003, pp. 145–150.

[9] A. Guarnieri, F. Remondino, A. Vettore, Digital photogrammetry and TLS datafusion applied to cultural heritage 3D modelling, in: International Society forPhotogrammetry and Remote Sensing (ISPRS) Symposium, 2006.

[10] S. El-Hakim, J.-A. Beraldin, M. Picard, A. Vettore, Effective 3D modelling ofheritage site, in: IEEE International Conference on 3-D Digital Imaging andModeling (3DIM), Alberta, Canada, 2003, pp. 302–309.

[11] N.-J. Shih, H.-J. Wang, C.-Y. Lin, C.-Y. Liau, 3D Scan for the digital preservationof a historical temple in Taiwan, Advances in Engineering Software 38 (July)(2007) 501–512.

[12] D.C. Baker, S.S. Hwang, J.K. Aggarwal, Detection and segmentation of man-made objects in outdoor scenes: concrete bridges, Optical Society of America 6(6) (1989) 938–950.

[13] S. Kumar, M. Hebert, Man-made structure detection in natural images using acausal multiscale random field, in: IEEE Conference on Computer Vision andPattern Recognition (CVPR’03), PA, USA, vol. 1, 2003, pp. 119–126.

[14] L. Lu, K. Toyama, G.D. Hager, A two level approach for scene recognition, in:IEEE Conference on Computer Vision and Pattern Recognition (CVPR’05), MD,USA, vol. 1, 2005, pp. 688–695.

[15] X. Yong, F. Dagan, Z. Rongchun, Morphology-based multifractal estimation fortexture segmentation, IEEE Transactions on Image Processing 15 (3) (2006)614–623.

[16] I. Stamos, P. Allen, Geometry and texture recovery of scenes of large scale,Computer Vision and Image Understanding (CVIU) 88 (2) (2002) 94–118.

[17] H. Cantzler, R.B. Fisher, M. Devy, Improving architectural 3D reconstruction byplane and edge constraining, in: British Machine Vision Conference (BMVC),Britain, 2002, pp. 43–52.

Page 16: Computer Vision and Image Understandingreza.hoseinnezhad.com/papers/Reyhaneh_CVIU2010_preprint.pdfors, is based on using architectural features. Stamos and Allen [16] employed attributes

490 R. Hesami et al. / Computer Vision and Image Understanding 114 (2010) 475–490

[18] H. Zhao, R. Shibasaki, A vehicle-borne urban 3D acquisition system usingsingle-row laser range scanners, Systems, Man and Cybernetics, Part B 33 (4)(2003) 658–666.

[19] F. Han, Z. Tu, S.-C. Zhu, A stochastic algorithm for 3D scene segmentation andreconstruction, in: European Conference on Computer Vision (ECCV),Copenhagen, 2002, pp. 502–516.

[20] F. Han, Z. Tu, S.-C. Zhu, Range image segmentation by an effective jump-diffusion method, IEEE Transaction on Pattern Analysis and MachineIntelligence (PAMI) 26 (9) (2004) 1138–1153.

[21] D. Anguelov, B. Taskarf, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, A. Ng,Discriminative learning of markov random fields for segmentation of 3D scandata, in: International Conference on Computer Vision and Pattern Recognition(CVPR), CA, USA, vol. 2, 2005, pp. 169–176.

[22] R. Triebel, K. Kersting, W. Burgard, Robust 3D scan point classification usingassociative markov networks, in: IEEE International Conference on Roboticsand Automation (ICRA) Florida, USA, 2006, pp. 2603–2608.

[23] D.F. Wolf, G.S. Sukhatme, D. Fox, W. Burgard, Autonomous terrain mappingand classification using hidden markov models, in: ICRA’05, Spain, 2005, pp.2026–2031.

[24] T. Asai, M. Kanbara, N. Yokoya, 3D modelling of outdoor environments byintegrating omnidirectional range and color images, in: Fifth InternationalConference on 3-D Digital Imaging and Modeling (3DIM’05), Canada, 2005, pp.447–454.

[25] G. Osorio, P. Boulanger, F. Prieto, An experimental comparison of a hierarchicalrange image segmentation algorithm, in: Proceedings of the 2nd CanadianConference on Computer and Robot Vision, 2005, pp. 571–578.

[26] P. Boulanger, G. Osorio, F. Prieto, Hierarchical segmentation of range imageswith contour constraints, in: Proceedings of the Fifth International Conferenceon 3-D Digital Imaging and Modeling (3DIM), Ontario, Canada, 2005, pp. 278–284.

[27] G. Yu, M. Grossberg, G. Wolberg, I. Stamos, Think globally, cluster locally: aunified framework for range segmentation, in: 3DPVT, 2008.

[28] A. Bab-Hadiashar, D. Suter, Robust segmentation of visual data using rankedunbiased scale estimate, Robotica 17 (1999) 649–660.

[29] A. Bab-Hadiashar, N. Gheissari, Range image segmentation using surfaceselection criterion, IEEE Transaction on Image Processing 15 (7) (2006) 2006–2018.

[30] X. Yu, T. Bui, A. Krzyzak, Robust estimation for range image segmentation andreconstruction, IEEE Transaction on PAMI 16 (5) (1994) 530–538.

[31] H. Wang, D. Suter, Robust adaptive-scale parametric model estimation forcomputer vision, IEEE Transaction on Pattern Analysis and MachineIntelligence (PAMI) 26 (11) (2004) 1459–1474.

[32] H. Chen, P. Meer, Robust regression with projection based M-estimators, in:International Conference on Computer Vision (ICCV), Nice, France, vol. II, 2003,pp. 878–885.

[33] R. Hoseinnezhad, A. Bab-Hadiashar, A novel high breakdown M-estimator forvisual data segmentation, in: International Conference on Computer Vision(ICCV), Rio de Janeiro, Brazil, 2007.

[34] J.-A. Beraldin, M. Picard, S. El-Hakim, G. Godin, L. Borgeat, F. Blais, E. Paquet, M.Rioux, V. Valzano, A. Bandiera, Virtual reconstruction of heritage sites:opportunities and challenges created by 3D technologies, in: TheInternational Workshop on Recording, Modeling and Visualization ofCultural Heritage, Switzerland, 2005.

[35] H. Zhao, Reconstructing textured urban 3D model by fusing ground-basedlaser range image and CCD image, University of Tokyo, 1999.

[36] R. Hesami, A. Bab-Hadiashar, N. Gheissari, Large-object range data acquisition,fusion and segmentation, in: DICTA’05, Cairns, Australia, 2005, pp. 258–265.

[37] A. Hoover. <http://www.marathon.csee.usf.edu/range/DataBase.html>.[38] I. Stamos, G. Yu, G. Wolberg, S. Zokai, 3D Modelling using planar segments and

mesh elements, in: Third International Symposium on 3D Data Processing,Visualization & Transmission (3DPVT), University of North Carolina, ChapelHill, 2006, pp. 599–606.

[39] M.A. Fischler, R.C. Bolles, Random sample consensus: a paradigm for modelfitting with applications to image analysis and automated cartography,Communications of the ACM 24 (6) (1981) 381–393.

[40] P. Meer, Robust techniques for computer vision, in: G. Medioni, S.B. Kang(Eds.), Emerging Topics in Computer Vision, Prentice Hall, 2004, pp. 107–190.

[41] B. Tordoff, D.W. Murray, Guided sampling and consensus for motion estimation,in: European Conference on Computer Vision (ECCV), 2002, pp. 82–98.

[42] C.V. Stewart, A new robust operator for computer vision: application to rangedata, in: CVPR’94, USA, 1994, pp. 167–173.

[43] P.J. Rousseeuw, Least median of squares regression, Journal of AmericanStatistical Association 79 (1984) 871–880.

[44] P.V.C. Hough, methods and means for recognising complex patterns. 3 USPatent 069 654, 1962.

[45] C.V. Stewart, MINPRAN: A new robust estimator for computer vision, IEEETransactions on Pattern Analysis and Machine Intelligence 17 (10) (1995)925–938.

[46] J.V. Miller, C.V. Stewart, MUSE: robust surface fitting using unbiased scaleestimates, in: Computer Vision and Pattern Recognition, San Francisco, 1996,pp. 300–306.

[47] K.M. Lee, P. Meer, R.H. Park, Robust adaptive segmentation of range images,IEEE Transaction on Pattern Analysis and Machine Intelligence 20 (2) (1998)200–205.

[48] H. Wang, D. Suter, MDPE: a very robust estimator for model fitting and rangeimage segmentation, International Journal of Computer Vision 59 (2) (2004)139–166.

[49] G. Desouza, A. Kak, Vision for mobile robot navigation: a survey, PAMI 24(2002) 237–267.

[50] Q. Iqbal, J. Aggarwal, Retrieval by classification of images containing largeman-made objects using perceptual grouping, Pattern Recognition 35 (7)(2002) 1463–1479.

[51] R. Hoseinnezhad, A. Bab-Hadiashar, D. Suter, Finite sample bias of robust scaleestimators in computer vision problems, in: International Symposium onVisual Computing (ISVC), Nevada, USA, vol. 4291, 2006, pp. 445–454.

[52] R. Hoseinnezhad, A. Bab-Hadiashar, Consistency of robust estimators in multi-structural visual data segmentation, Pattern Recognition 40 (2007) 3677–3690.

[53] P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection, Wiley,New York, 1987.

[54] X. Sun, P.L. Rosin, R.R. Martin, F.C. Langbein, Noise analysis and synthesis for3D laser depth scanners, Graphical Models 71 (2) (2009) 34–48.

[55] C. Stewart, Bias in robust estimation caused by discontinuities and multiplestructure, IEEE Transaction on Pattern Analysis and Machine Intelligence 19(8) (1997) 818–883.

[56] http://www.mcah.columbia.edu/bourbonnais/.