15arspc Submission 95

download 15arspc Submission 95

of 12

Transcript of 15arspc Submission 95

  • 8/8/2019 15arspc Submission 95

    1/12

    1

    A NOVEL VISION-BASED TREE CROWN AND SHADOWDETECTION ALGORITHM USING IMAGERY FROM AN

    UNMANNED AIRBORNE VEHICLE

    Calvin Hung, Mitch Bryson and Salah SukkariehAustralian Centre for Field Robotics

    The Rose Street Building J04The University of Sydney

    NSW 2006 Australia+61 2 9351 7154

    {c.hung, m.bryson, s.sukkarieh}@acfr.usyd.edu.au

    AbstractThe majority of the weed detection research concentrates on problems whereweeds grow within an artificial plantation in which the weeds exist between linesof crops. These problems are typically solved by learning the plantationgeometry using various pattern recognition techniques, and then any vegetationthat does not fit in the geometric model is classified as weed. In the case whereweeds are distributed over a natural landscape, the distribution of weed andnon-weed vegetation is more complex and cannot be discriminated by using asimple geometric model which describes the relative distribution of vegetation.Further analysis in weed colour and texture is commonly used to improve theclassification results; however the shape property of the weed itself is notusually explored in this field as a means of weed detection.

    In this paper, we examine the use of a template-matching tree crown detectiontechnique, (used successfully Scandinavian forestry studies) for identifyingwoody weeds in a natural landscape based on their shadow. The proposedalgorithm is divided into two stages, the first stage segments the image usingcolour and texture features, the second stage utilises template matching usingshape information related to projected shadow of the woody weed, relying oninformation about the time of day, sun angle and UAV pose informationcollected by the onboard navigation system. We present experimental results ofthe approach using colour vision data collected from a UAV during operations inthe Julia Creek in July 2009 and August 2010.

    1. IntroductionIt is easy for humans to identify different types of objects in a visualenvironment, however robust and efficient vision based object recognition is achallenging problem in the field of robotics and machine vision. Progress hasbeen made in past decades and different algorithms have been applied invarious applications. Examples include: manufacturing control such as binpicking [Perkins, 1978], [Agrawal et. al, 2010] and sorting [Grimson, 1991],

  • 8/8/2019 15arspc Submission 95

    2/12

    2

    remote sensing object detection of objects [Binford, 1989], [Mundy, 1990],pedestrian detection [Gavrila, 2000] and vehicle detection [Betke et. al, 2000].

    This paper concentrates on remote sensing object recognition in the context ofaerial surveying. Various methods have been proposed for detection andrecognition of objects from aerial images [Thompson and Mundy, 1987],[Brooks et. al, 1981], however these techniques mostly deal with well definedman-made objects whose visual signature could be accurately modeled. Unlikeman-made objects, natural objects, such as trees, are less uniform and difficultto generalise with geometric models. A related problem exists in the field offorest research, where various tree counting algorithms have been proposed todetect densely populated tree species with well defined outlines [Pollock, 1996],[Larsen and Rudemo, 1997], [Olofsson et. al, 2006].

    An image frame is a two dimensional projection of the real world from thecamera point of view; this loss of one dimension of the real world is one of thelimiting factors in computer vision using monocular cameras. There aretechniques from computer vision developed to retrieve the lost dimension usingshading with photometric stereo [Brooks, 1989] and shadow by realising thefact that the shadow of an object is a projection from the point of view of thelight source. In this paper we study the effect of utilizing this extra shadowprojection to add an extra dimension to the otherwise two dimensional image.Similar approaches have been used in object recognition and bin picking bycasting shadows [Agrawal et. al, 2010] and ray traced templates to findindividual trees in aerial photographs [Larsen and Rudemo, 1997].

    The aim of this paper is to track or map the distribution of objects in an outdoor

    unstructured environment. For outdoor robotic applications object recognitionhas proven challenging due to the structural complexity of natural objects. Todeal with the complexity this paper introduces a new method for remote sensingobject detection and recognition for aerial surveying of vegetation. Thetechnique models tree observations using a geometric model as well as colour,texture and shadow information to detect and identify individual trees.

    The proposed algorithm is divided into two stages. The first stage is a basicimage segmentation using colour and texture information involving selectingcolour and texture features from the original image and the training of SupportVector Machines (SVM) [Burges, 1998], [Vapnik, 2000] to distinguish betweenbackground, trees and shadows in the image. In the second stage, a targettemplate is generated using prior information. To quantify shape information ofthe target template, it is necessary to construct an object geometric model.This model is then used to produce an appearance template model. Based onthe navigation solution, the position and attitude of the platform and the camerais known. Combining this information with a solar position model, the idealappearance of the object with shading and shadow can be predicted. Therelative position of target object and its corresponding shadow is treated as thecontext information and can be used as supporting evidence of detections. Theproposed algorithm utilises different levels of vision features including low levelfeatures such as colour and luminance, intermediate level features such as

    shape and textured regions and high level features such as context information.

  • 8/8/2019 15arspc Submission 95

    3/12

    3

    Low level vision assigns labels to every pixel whereas high level vision isresponsible for labelling discrete objects. Different levels of vision features areused in the proposed algorithm to extract the most features from theinformation rich vision data.

    2. Related Work

    2.1 object recognitionThe aim of object recognition is to identify predefined targets within an image.Object recognition using geometric models has gone through significantdevelopments in the past four decades. Approaches range from the traditionalobject geometric models to more recent development in statistical learningmethods [Mundy, 2006].

    Three dimensional object recognition using alignment is one of the first model-based object recognition algorithm, the algorithm uses a pre-generated modelprojecting onto the images, and then checks the expected features[Huttenlocher and Ullman, 1987], [Huttenlocher, 1988]. This approach ischosen and extended in this paper, instead of having to model from multiplepossible viewpoints, an exact object template can be generated using theappearance synthetic geometric model with the knowledge of platformnavigation solution and sun path model. This extension makes the algorithmviewpoint invariant, greatly reduces the size of the search space and allows therecognition and detection algorithm to run more efficiently.

    2.2 Tree Crown DetectionA few tree detection algorithms had been proposed in the forest countingresearch field over the past few years to detect natural vegetations fromoutdoor unstructured aerial images. The proposed object detection algorithm isdesigned to tackle the problem with similar settings therefore similar tree crowndetection algorithms are studied in this literature review.

    There are three methods most commonly used in tree crown detection

    algorithm. The first method is a valley finding algorithm [Gougeon, 1995], inwhich the local maxima of the image are treated as the tree crown and the localminima are treated as the edge of the crown. The second method is a regiongrowing method [Erikson, 2003]. In this algorithm, random seed points aregenerated in the image, the colour and texture attributes of the surroundingpixels are calculated and then grouped into the same region if they are similar.The regions grow until the boundary of each regions collides. The third methodis a template matching method [Pollock, 1996]. A tree crown model isgenerated and is matched against the grey scale image. The summary andperformance comparison can be found in [Erikson, 2005].

    The template matching method is chosen in this study, because unlike theother two approaches, the template based approach uses both the low level

  • 8/8/2019 15arspc Submission 95

    4/12

    4

    vision feature of luminance and also intermediate level feature of shape. Thetree recognition in aerial images of forest based on synthetic tree crown imagemodel was first proposed by [Pollock, 1996], this algorithm was then expandedto include shadow [Larsen and Rudemo, 1997] and further improvements and

    implementations to discriminate tree species was shown in [Olofsson et. al,2006].

    3. AlgorithmObject recognition is the identification of certain target objects inside an imageframe and is part of a more general pattern recognition problem. Patternrecognition can be performed using either a priori knowledge or using statisticalinformation learned from the data. In this object recognition application bothapproaches are used. Patterns such as colour and texture which are notobvious to the human eye and difficult to model directly are learned statistically

    from the data set, whereas patterns such as shape, scale and orientation of thetarget object can be modelled directly using a priori knowledge.

    The object recognition algorithm is divided into two modules: an imagesegmentation module that takes the colour and texture information into accountand a model-object matching module that takes the shape, scale and contextinformation into account. The two stage approach breaks down the otherwisedifficult to solve vision problem into manageable sub-problems. This also allowseach distinctive module to be evaluated separately, therefore futureimprovements can be made independently. This also allows us to generalisethe algorithm for different applications with similar structure by re-learning thestatistical models and using other prior knowledge.

    3.1 Image SegmentationThis section discusses the algorithm used to perform first stage imagesegmentation is discussed in this section. The image segmentation stagedivides the original image into three different classes based on the colour andtexture features. The aim is to change the representation of images fromarbitrary colours into meaningful labels which could be analysed by laterstages. The three classes are the target object class, the shadow class and the

    background class.The images were collected using a 3 CCD camera. To extract textureinformation, the MPEG-7 texture descriptors are used [Jain and Farrokhnia,1991] [Manjunath and Ma, 1996]. The texture descriptors consist of 30individual channels, and are a combination of six different orientations and fiveoctaves in the radial direction. Each channel is a 2D Gabor function, mean andvariance values can be obtained in each channel. The images are captured inRed Green Blue (RGB) colour space, to extract the colour features the imagesare transformed to a Hue, Saturation and Value (HSV) colour space to reducethe sensitivity towards change in light intensity.

  • 8/8/2019 15arspc Submission 95

    5/12

    5

    After feature extraction the colour and texture features are grouped into onesingle feature vector consisting of three colour channels and thirty texturechannels. Each feature vector is then assigned with a label representing itsclass. The aim is to segment the original image into three different classes,

    object, shadow and background. SVM is used as the classifier to predict thelabels of the feature vectors.

    Figure 1: Image Segmentation: The classifier convert the original colour images into

    images with meaningful class labels.

    3.2 Object Recognition Using ShapeObject detection algorithms based purely on statistical information have theirperformance limited by the quality of the training data. Additionally, priorknowledge such as shape and position of shadow based on the input lightdirection is too subtle to be incorporated into the statistical learning process.This necessitates separation of image segmentation based on statisticallearning and object appearance model based on prior knowledge.

    The algorithm used to generate a target object appearance model is discussed

    in this section. The platform navigation data and the target object outline areused as the prior knowledge in this algorithm; geometric model is constructedbased on the target object outline, platform position and time are used toestimate the direction of the incident sun light and then used to predict thedirection and outline of the shadow. At the end the platform pose is used toestimate the appearance of the object and shadow in each observed imageframe. The object appearance model encapsulates both the object shape andcontext information.

    In this algorithm simple geometric models are used to approximate the outlineof the target objects. In contrast to industrial applications where more

    complicated CAD models are used, it is not possible to generalise and model

  • 8/8/2019 15arspc Submission 95

    6/12

  • 8/8/2019 15arspc Submission 95

    7/12

    7

    Figure 2: Algorithm Summary: During each observation both vision and poseinformation are obtained. The Vision information is processed using statistical learningand simplified into three classes. The pose information and the solar model are used togenerate the object appearance model which is then compressed into a prior template.The outputs of both approaches are combined together using correlation templatematching, a correlation map is produced at the end of the algorithm, the map indicatesthe likelihood of target object in the image coordinates.

    4. Results and DiscussionsThe results generated using the proposed algorithm are presented in thissection. We also discuss the performance of the algorithm.

    Aerial image data collected from Queensland, Australia was used to evaluatethe proposed algorithm. The flight area was at Julia Creek in a rural area. Thesurvey area was flat and vegetation is distributed sparsely within the areaconsisting mainly of isolated trees. The robotic platform used to collect datawas a one third scale J3 Cub aircraft with a sensor payload consisting of a3CCD camera with 200m by 140m field of view. The navigation data wascollected using GPS and inertial sensors.

  • 8/8/2019 15arspc Submission 95

    8/12

    8

    Figure 3: Scaled down J3 Cub: This platform is used to collect aerial image data overJulia Creek, Queensland, Australia. The mission area is flat with trees distributesparsely, the data set is ideal to test the object detection algorithm.

    The detection algorithm utilises the colour and texture information from thesegmentation stage, the shape information from the weak object and shadowdetectors and the context information from the strong detector.

    In this data set the target objects were trees, they varied slightly in sizedepending on environmental conditions . Three different template sizes were

    defined according to the prior knowledge on the typical size of the trees, thesetemplates were created to capture most targets within the size range. Thismulti-scale correlation map is shown in Figure 4, where the response to large,medium and small size template is colour coded with red, green and bluerespectively.

    To evaluate the overall performance of the algorithm, the correlation map wasthresholded to produce regions of interest, the centroids of each region werecalculated and a correction vector was also calculated to compensate thedifference between the centre of the template and the actual portion of thegeometric model. This process is shown in Figure 4.

  • 8/8/2019 15arspc Submission 95

    9/12

    9

    Figure 4: Detection Result: To evaluate the detection rate it is necessary to identify the

    location of each detected objects. Top left is the original image, top right is thecorrelation map, bottom left is the threshold correlation map with region of interest,bottom right is the original image overlay with the centroid of the corresponding regionof interest.

    The detection results from individual image frames were then transformed intoglobal coordinates using an onboard mapping system [Bryson et. al., 2010] togenerate the global distribution of the vegetation. The result is shown in Figure5. The overall sensitivity and specificity of the algorithm are both at 80%.

  • 8/8/2019 15arspc Submission 95

    10/12

    10

    Figure 5: Mapping over the part of the mission area. White dots represent the detectedcrowns.

    5. Conclusion and Future Work

    This paper presented a novel object detection algorithm using aerial photostaken from a monocular camera. The algorithm utilised two aspects of patternrecognition; a statistical model learnt from the data and prior knowledge fromthe understanding of the problem. The algorithm learnt colour and texturefeatures of the target object and matched the labelled image to the objectappearance model created using prior knowledge. Two weak detection mapswere generated separately for the target object and the corresponding shadow.The weak detection map used the shape information. A strong detection mapis generated by combining both the object and shadow detection maps,encapsulating the context information. The two stage approach broke down theotherwise difficult vision problem into manageable sub-problems. The proposedalgorithm was evaluated and optimised separately. Achieving final sensitivity ofaround 80% and a specificity also around 80%. Also in this particularapplication further vegetation classification can be performed with the globalvegetation distribution map, this can be done by extracting features such ascolour, texture, shape and size around the region of interest and training aseparate classifier to further differentiate species of vegetation.

    There are a few future improvements that can be made. Firstly the algorithmgets confused when targets objects are very close together; because theoutline is an approximation rather than an exact model it is not able todistinguish whether the detection is one large object or multiple smaller objects.

  • 8/8/2019 15arspc Submission 95

    11/12

    11

    This ambiguity problem is intrinsic to the detection algorithm because theperformance is restricted by the resolution of the data. This can potentially beresolved by using an active sensing strategy where the most ambiguous regionis selected for further surveying at different resolution. This monocular image

    detection algorithm could be combined with a stereo vision algorithm in order toprovide extra depth information, although depth estimate could be of poorquality due to the small base line. Depth could be used as an extra input to thestatistical learning algorithm. This algorithm could also be extended to solveother object recognition problems within other aerial imaging scenarios, such aspower-line surveying, traffic monitoring and wild life population monitoring.

    ReferencesA. Agrawal, Y. Sun, J. Barnwell, and R. Raskar. Vision-guided Robot Systemfor Picking Objects by Casting Shadows. The International Journal of Robotics Research , 2010.

    M. Betke, E. Haritaoglu, and L.S. Davis. Real-time multiple vehicle detectionand tracking from a moving vehicle. Machine Vision and Applications , 12(2):69 83, 2000.

    T.O. Binford. Spatial understanding: the successor system. In Proceedings of a workshop on Image understanding workshop , page 20. Morgan KaufmannPublishers Inc., 1989.

    M.J. Brooks and B.K.P. Horn. Shape and source from shading. Shape from

    shading , 1989.RA Brooks. Symbolic reasoning among 3d objects and 2D models. Artificial Intelligence , 17:285 348, 1981.

    M. Bryson, A. Reid, F. Ramos and S. Sukkarieh. Airborne Vision-BasedMapping and Classification of Large Farmland Environments. Journal of Field Robotics, 27(5): 632-655, 2010

    C.J.C. Burges. A tutorial on support vector machines for pattern recognition.Data mining and knowledge discovery , 2(2):121 167, 1998.

    M. Erikson. Segmentation of individual tree crowns in colour aerial photographs

    using region growing supported by fuzzy rules. Canadian Journal of Forest Research , 33(8):1557 1563, 2003.

    M. Erikson and K. Olofsson. Comparison of three individual tree crowndetection methods. Machine Vision and Applications , 16(4):258 265, 2005.

    D. Gavrila. Pedestrian detection from a moving vehicle. Computer Vision ECCV 2000 , pages 37 49.

    FA Gougeon. A crown-following approach to the automatic delineation ofindividual tree crowns in high spatial resolution aerial images. Canadian Journal of Remote Sensing , 21(3):274 284, 1995.

  • 8/8/2019 15arspc Submission 95

    12/12

    12

    W.E.L. Grimson. Object recognition by computer: the role of geometricconstraints. 1991.

    D.P. Huttenlocher. Three-dimensional recognition of solid objects from a two-dimensional image. AITR-1045 , 1988.

    D.P. Huttenlocher and S. Ullman. Object recognition using alignment. InProceedings of the 1st International Conference on Computer Vision , pages102 111, 1987.

    A.K. Jain and F. Farrokhnia. Unsupervised texture segmentation using Gaborfilters. Pattern recognition , 24(12):1167 1186, 1991.

    M. Larsen and M. Rudemo. Using ray-traced templates to find individual treesin aerial photographs. In Proceedings of the Scandinavian Conference on Image Analysis , volume 2, pages 1007 1014. Citeseer, 1997.

    B.S. Manjunath and W.Y. Ma. Texture features for browsing and retrieval ofimage data. Pattern Analysis and Machine Intelligence, IEEE Transactions on ,18(8):837 842, August 1996.

    J. Mundy. Object recognition in the geometric era: A retrospective. Toward Category-Level Object Recognition , pages 3 28, 2006.

    JL Mundy and AJ Heller. The evolution and testing of a model-based objectrecognitionsystem. In Computer Vision, 1990. Proceedings, Third International Conference on , pages 268 282, 1990.

    K. Olofsson, J. Wallerman, J. Holmgren, and H. Olsson. Tree speciesdiscrimination using Z/I DMC imagery and template matching of single trees.

    Scandinavian Journal of Forest Research , 21:106 110, 2006.WA Perkins. A model-based vision system for industrial parts. IEEE transactions on computers , pages 126 143, 1978.

    R.J. Pollock. The automatic recognition of individual trees in aerial images offorests based on a synthetic tree crown image model. The University of British Columbia (Canada) , 1996.

    D. Thompson and J. Mundy. Three-dimensional model matching from anunconstrained viewpoint. In 1987 IEEE International Conference on Robotics and Automation. Proceedings , volume 4, 1987.

    V.N. Vapnik. The nature of statistical learning theory . Springer Verlag, 2000.