[IEEE 2013 IEEE 11th International Conference on Industrial Informatics (INDIN) - Bochum, Germany...

Evaluation of SIFT in machine vision applied toindustrial automation

Gunter N. Loch, Charbel Szymanski, Marcelo R. StemmerDepartment of Automation and Systems

Federal University of Santa CatarinaFlorianopolis, Brazil

{gloch,charbel,marcelo}@das.ufsc.br

Abstract—This paper presents an evaluation of the SIFT(Scale-Invariant Feature Transform) algorithm against two sce-narios related to industrial automation. The first one is relatedto automated inspection systems. The second scenario describesa vision-based robot and vehicle navigation. The evaluation wasperformed in the sense of verifying if SIFT complies with therequirements of both scenarios, and identify how this is done.

I. INTRODUCTION

Machine vision is widely used in industries, mainly for processautomation [1] [2]. Its use is broadening by the facility ofintegration with other solutions, mainly after the popularizationof the PC in the industrial environment. Increasingly the PChas been used to implement machine vision systems, enablingcost reduction of these systems [3] [1]. Another factor thatcontributes to cost reduction is the cheapening of the cameras.In some cases, the solutions can be implemented with use ofsimple or low resolution cameras.

One important challenge to the use of machine vision in theindustry is to deal with the variation of some parameters, suchas image rotation and scale, and illumination of the scene. Thismakes hard the construction of a robust system based on imageprocessing. As an important contribution to machine visionarea and consequently helping to solve this challenge, Loweproposed in his paper [4] the Scale-Invariant Feature Transform(SIFT). The next chapter briefly describes this algorithm.

II. THE SIFT ALGORITHM

Initially, the major goal of this technique was the imagematching. Later, several researchers used Lowe’s proposal inother applications as stereo correspondence, motion tracking,object recognition, etc. This algorithm has several parts, thatcan be grouped in two major stages:

• Detection: Scale-Space Construction and KeypointLocalization.

• Description: Orientation Assignment and KeypointDescription.

Initially, the detection of keypoints is made searching forinterest points over scale-space representation of the originalimage. This representation is built using a pyramid of im-ages with increasing scale and repeated Gaussian smoothing.Keypoint localization begins by finding the local maximaand minima over the difference between Gaussian-smoothedimages of the pyramid. Then, these points are further filtered

using subpixel aproximation (second-order Taylor expansion),low contrast and edge response elimination.

In the description stage, SIFT calculates the principalorientations of each keypoint in order to guarantee its orien-tation invariance. Note that a keypoint can have more thanone orientation and so, more than one descriptor. At last, adescriptor is created using a 2D array of histograms over awindow centered on the keypoint location in the image. Thewindow size depends on the scale in which the keypoint wasfound.

III. RELATED WORKS

For some industrial applications, the position of an object isnot important, for example, in work-piece inspection. In thesecases, the defects are related to the object’s features and not toits location. Bohlool and Taghanaki [1] developed an exampleof such a system, that performs inspection of products anduses SIFT for image registration only. The defect detectionwas based on mathematical morphology and image correlationalgorithm. The defects were generically classified in threetypes: a) missed: pixels that exist in the original image butdon’t exist in the sample image, b) extra: pixels that don’texist in the original image but exist in the sample image andc) surface: other defects that aren’t neither missed nor extra.In their experiments, although SIFT had a low performance, itallowed the image registration to be accomplished with enoughaccuracy to the defect detection.

In the work of Lin and Setiawan [5], the object’s positionis relevant. They used SIFT, affine transformation matrix andsupport vector machines (SVM) to recognize the orientation ofobjects on rigid constraints. The solution developed by themwas applied to a robot arm, in order to make an industrialmanipulator robot. The accuracy of their solution for thedetection of the objects orientation was 90% on average.

Recently, the use of robots in tasks that demand greatercognitve capabilities led researchers to study methods to rep-resent and identify objects and other high level features. One ofthe first uses of SIFT algorithm in robot navigation was doneby [6]. They proposed a vision-based SLAM (SimultaneousLocation and Mapping) algorithm by tracking SIFT features.In order to build a 3D map, they used a trinocular stereo systemto get the world position of SIFT landmarks. These landmarksare defined by matching each camera’s keypoints descriptors,considering the epipolar constraints. This work has shown the

978-1-4799-0752-6/13/$31.00 ©2013 IEEE 414

egomotion estimation through feature match across frames, andSIFT database landmark tracking.

Also, there has been some effort to evaluate these tech-niques in regard to the challenges of mobile robots andautonomous vehicles. As robots usually use low resolutioncameras, [7] tested SIFT under that condition. They used104x67 pixels images obtained from an AIBO robot to evaluatethe Lowe’s implementation and other two feature representa-tion alternatives. They observed that SIFT is more robust toview changes, but not so good under blur and illuminationchanges.

Focusing in visual-SLAM, [8] presented a performanceevaluation framework for visual feature extraction and match-ing. They evaluted SIFT an other two feature extractors usinga single camera for bearing-only SLAM. They pointed that allextractors can be tuned to have good performance, but eachone has advantages and drawbacks that make them suitable fordifferent situations.

IV. SCENARIOS

For the evaluation of SIFT, the present work considers twoscenarios with their respective features. Both scenarios arerelated to industrial automation and will be described in fourcategories: illumination conditions, dynamic of workspace, dy-namic of scene, measurement accuracy and time requirement.The dynamic of workspace in scenario one is defined as thevariation of background and object textures over the area thatthe vision system must consider.

In the case of the second scenario, the dynamic ofworkspace is related to the variation of the objects in theenvironment (positions and quantities) and the possible mod-ifications in the world map. The dynamic of scene, on bothscenarios, consider the variation of objects position and ori-entation relative to the viewpoint of the camera(s). The otherparametres listed above are considered self-explanatory.

A. Scenario One

This first scenario is related to shop floor automation andincludes, for example, machine vision systems embedded intoinspection machines, inspection systems used in conveyors andvision systems for robot arms. The following list presentshow common issues of computer vision affect this type ofapplication:

1) Illumination conditionsThe illumination in most cases is controlled. As theworkspace is static, even when the influence of exter-nal illumination (in case of a software embbed intoan inspection machine) exists, it’s little or constant.

2) Measurement accuracySystems in this sceneario need high precision mea-surement. Considering that this scenario comprisesindustrial inspection systems, the machine vision maybe seen as a subarea of optical metrology. Conse-quently, it is expected that the system is flexibleand accurate, according to the inherent features tothis area [9]. Depending on the application, onemillimeter may represent a defect, as in the case ofinspection of mounted PCBs (printed circuit boards),

in which one millimeter is a common size of a SMD(surface-mount device) pad.

3) Dynamic of workspaceThe workspace is static and can be ajusted to improvede efficiency of the system. The improvements can bechanges in illumination conditions and background.

4) Dynamic of sceneThe dynamic of the scene is limited and the positionof the objects is partly known. In other words, thevariation of the objects position is small and theposition where the objects should be is known [3].

5) Time requirementIn mass production industry, the inspection time isimportant in order not to retard the production line.The processing time must be relatively short. In somecases, one or two seconds can be a long processingtime. In general, the faster the better.

B. Scenario Two

There are several tasks that involve autonomous robotsmoving within an indoor environment. In industrial applica-tions, they are generally associated with moving itens from aplace to another. In some cases, it is desirable or necessary todo that without modifying the environment. Visual-guidanceis usually a solution that could be used in such conditions. Inorder to solve this problem efficiently, the solution candidatemust attend the following characteristics of this scenario:

1) Illumination conditionsAlthough the illumination of indoor environments iscontrollable in a certain degree, its variation is signi-ficative. For instance, shadows could arise from themoviment of the robot through the environment. So,it is desirable to use a feature descriptor insensitiveto potential changes in illumination.

2) Measurement accuracyIn order to perform a correct navigation, two infor-mations are required: the distance to landmarks andthe current position in the environment. When usingvisual methods to estimate this values, the accuracy ofthe estimation is inversely proportional to the distanceof the observed object.

3) Dynamic of workspaceIn the shop floor, the dynamic of the vehicleworkspace depends on the production layout. Somelayouts are very rigid, so the position of objects donot change over time. Other layouts are more flexible,so it is not possible to have an a priori map of theworkspace. Usually the objects that the robot can findare known, so it is possible to build a database ofimages/features of these objects.

4) Dynamic of sceneAs the robot moves through the environment, itsviewpoint varies a lot. Even small moviments cancause a significant change in the appearance of ob-jects. For instance, a door observed from a distance of2 meters presents a different set of features relativeto an observation made from a distance of 30 cen-timeters. Also, common problems like occlusion andbackground clutter represent a challenge to a correctscene interpretation.

415

5) Time requirementWhen used as sole method of localization, it isimperative that the algorithm runs in a reasonabletime relative to the speed of the vehicle.

V. EVALUATION OF SIFT AGAINST THESCENARIOS

Considering the features of the two scenarios presented above,an evaluation was made based on analysis of experiments andon SIFT features. Initially, each part of the SIFT algorithmwill be analyzed in order to identify how it is useful to eachscenario. Then, an evaluation of the algorithm as a whole willbe made for each scenario.

Table I shows the results in a qualitative manner. AGOOD evaluation means that SIFT solves the problems ofsuch categorie almost alone. A REGULAR evaluation meansthat besides using SIFT, further processing or other data arerequired to solve the challenges of such a category.

A. Evaluation of SIFT’s parts

1) Scale-Space Construction: Ideally there is no variationin scale on the first scenario but, in real implementations,uncertainties can generate it. Therefore, it is important thatthe algorithm is also invariant to scale, only to guarantee thesystem’s robustness. On the other hand, the dynamic of thescene in scenario two make this feature of SIFT indispensable.

As explained in section IV-B, the appearance of an objectcould change, as a robot moves through an environment. Whenthere is enough image resolution, the scale-space representa-tion allows the matching of frames with a considerable shiftin direction to the observed object. However, as autonomousvehicles often have low resolution cameras, that could makethe features of an object substantially differs when obtainedfrom different distances.

Also, the use of the Difference of Gaussians (DoG) to con-struct the scale-space helps to improve the time requirement.As shown by [10], the use of DoG leads to a great improvementin time requirement, with a slightly worse matching accuracywhen compared to the Laplacian operator.

2) Keypoint Localization: The location of the keypoints inthe scene is the most important information to define an objectposition. This, in turn, is one of the most used geometricalcharacteristics in optical metrology [11], and the only onecovered by SIFT. The accuracy of the measurement estimationis strongly related to the number of keypoints. The larger thenumber of keypoints, the better the position estimation.

Besides the objects positions, autonomous vehicles also usethe keypoints locations to determine its own localization, bothin the world coordinates. Since SIFT provides the locationof keypoints in image coordinates, it is necessary make othercalculations to estimate the position of obstacles or the movi-mentation of the robot. For instance, the use of a stereoscopicsystem can provide the depth information necessary to convertimage coordinates to 3D. When using a single camera, it ispossible to use other sensors (e.g. IMU - Inertial MeasurementUnit) to derive an initial estimation of the moviment betweentwo frames. Then a further improvement in the movement

estimation can be accomplished using the disparity betweenmatched keypoints [12].

In order to get a better localization, a subpixel aproximationis done based on Taylor expansion. This is specially importantwhen using low resolution cameras, so it is possible to have agreater measurement resolution. Thus, by using this informa-tion, an inspection system is able to determine if an object isabsent or in a wrong position. However, as SIFT is based onappearance, it is not suitable to detect shape problems. Becauseof that, it is not possible to guarantee a distribution of keypointsthat cover all the object. As SIFT just generates keypoints ondistintive parts of the image, defects on an area of an objectthat does not have keypoints will not be detected.

Since it is not known where the keypoints will begenerated, many defects caused by deformations may notbe detected, because some deformations do not producesignificant changes in the image appearance required togenerate keypoints, or the keypoints aren’t generated at aplace that represents well the deformation (for example, atthe center of a deformation).

3) Orientation Assignment: One of the main characteristicsof SIFT is the invariance to rotation. Whereas the scene ispraticaly static in scenario one, a system based on SIFT canconcentrate in analyzing changes in the objects, such as thevariation of objects orientation [5].

In scenario two, its intrinsic viewpoint changes and thepossible workspace variations can cause a greater variationsin the keypoint’s orientation. In general, the second scenarioneeds more complex calculations to determine a reasonableestimation of an object’s orientation. As in the original Lowe’sproposal, affine transformation is commonly used in these case.[13] used this transformation in order to derive the deviationof a robot from the planned trajectory.

4) Keypoint Description: Whereas the keypoint descriptorsare normalized, the affine illumination variations do not affectthe efficiency of the algorithm [4]. As cited on section IV-A,systems characterized in scenario one have active illuminationcontrol. So the invariance of illumination changes provided bySIFT can improve the stability of the system in this scenario.In the other hand, the movement of things can cause the ariseof temporary shadows and other noises in scenario two. Asthere is no active control of ilumination, this characteristic isindispensable.

Since scenario one has a static workspace, it is possible toadjust it in order to improve the image analysis. For instance,the surface where the objects are (e.g. conveyor or objectsupport), can to be enhanced (e.g. flat and without texture),avoiding the arising of keypoints from background. However,in our experiments, we have found that even with complexbackground it is possible to use a similarity threshold ofthe descriptors, which eliminates substantially the matchingsbetween good keypoints and keypoints from background. Wehave performed some experiments to verify our analysis ofSIFT.These experiments are detailed in sections V-B and V-C.

One of the most important characteristic of the algorithmis the distinctiveness of the descriptor, because it reduces theoccurrence of false matchings. This is important, for instance,

416

to minimize problems caused by background clutter. It alsoallows an easy recognition of an object in a workspace, evenwhen using a large database.

B. Experiments using SIFT in scenario one

The experiments performed in scenario one were focusedon PCB visual inspection. The used images were acquired ingray scale with a Basler scA1390-17gc industrial camera. Thecamera was installed within a machine that has good isolationagainst external illumination. The software was developedin C++, using version 2.4.0 of OpenCV library. The SIFTimplementation used in the experiments was the same whichis distributed together the OpenCV.

Figures 1, 3 and 4 illustrate the matching between twoeletronic components of same type. Figure 2 shows oneexample of matching from the same type of component againsta board background. As it can be seen, it was possible touse the same similarity threshold to both images and nonefalse matching was perfomed with the background. Even so,for a robust application, we recommend the use of anothercomplementary technique to verify the absence of components.There are some false matches in figure 1, which do not allowhigh accuracy measurements without using a false matchesfiltering strategy. However, this absence of matchings againstthe board background shows the descriptor’s distinctiveness.

Fig. 1: Matching between two electronic components

Fig. 2: Matching between one electronic component and boardbackground

Very close to “Y” letter in figure 1, there is a small vari-ation of keypoint location. This doesn’t harm the componentlocalization, but it is an example of small variation in keypoint

location, that can harm a high accuracy measurement. Never-theless, the keypoint localization can be considered reasonablefor several purposes, as shown in figure 3 where, in spite ofhaving differences on the components surface, many matchingsappear well located.

Fig. 3: Matching between two electronic components with smallphysical differences

The environment stability (dynamic of workspace anddynamic of scene) can be seen in figures 1 up to 4. Thesepictures don’t have any visible variation in illumination orscale. Therewith, we can realize that small variations that couldexist, would be well treated by SIFT.

A simulated illumination change is shown in figure 4. Evenwith this variation, the matchings are made successfully, evi-dencing that the descriptor normalization solves this problem,even if not completely, as commented in section V-A4. In thisfigure, a false matches filtering was also applied. As a result,only good matches were left, but the execution time of thesoftware was affected.

Fig. 4: Matching between two electronic components with illumina-tion changes and false matches filtering

For the inspection of some components similar to SMD ce-ramic capacitors, we recommend the use of another techniqueas suggested by [14]. These types of component have almostnone difference in appearance and differ mostly by color andsize.

C. Experiments using SIFT in scenario Two

For the second scenario, the evaluation was made in asimulation environment built using the OGRE 3D renderingengine. As a test case, a robot was used in a simulatedoffice room, shown in figure 5. The robot optics emulates theparameters from Focus Robotics nDepth

TMPCI vision system,

with a 6cm baseline stereo camera. The images generatedfor the robot vision simulation have a resolution of 640x480pixels. It was used the last stable release of OpenCV as

417

Characteristics Evaluation

Scenario one Scenario two

Illumination con-ditions

Descriptor normalization GOOD —

Measurementaccuracy

Taylor expansion, Edge responses elimination,Number of keypoints REGULAR GOOD

Dynamic ofworkspace Descriptor distinctiveness GOOD REGULAR

Dynamic of scene Orientation assignment, Scale-space construction, De-scriptor distinctiveness GOOD REGULAR

Timerequirement Scale-space construction REGULAR GOOD

TABLE I: Resume of evaluation of SIFT against the scenarios

Fig. 5: Simulation Environment

implementation of SIFT (version 2.4.3 at the moment of theexperiment).

Through the tests using simulation,it is possible to realizethat the localization of keypoints with taylor aproximationallows a great precision in the calculations of positions inthe stereoscopic system. So, if the feature matching is out ofoutliers, the accuracy in the measurements becomes very good.

On the other hand, in the actual situation of our simulationsystem, it is not possible to directly correlate our results aboutilumination conditions with the real world.

About the dynamic of workspace, it was difficult to identifyobjects in some situations. Problems appeared with multipleinstances of the same object that lay near one to the other.Figure 6 shows the example of identifing two identical cabinetsthat are side by side. In this case, there were matches onlywith the cabinet on the right. On the other side, SIFT is veryrobust to occlusion. Depending on how many features can beobtained from an object, it is possible to recognize it, even ifjust a small part of it is appering.

Fig. 6: Fail to match multiple instances

As SIFT describes appearance features from a 2D image,it usually works well with planar objects (objects which onedimension could be ignored). To be able to identify threedimensional objects from any viewpoint, it is usually requiredto have several images of the object, each one from a differentpoint-of-view. This condition requires the use of more complexdatabases, that link equivalent features from each view of anobject.

Although SIFT isn’t one of the fastest feature extractors,the use of multiples threads can improve the performance. Withan implementation like [4],the execution of each frame takesaround a third of a second, which is good enough for the thedesired application.

418

VI. CONCLUSIONS

Considering the features of both scenarios presented andthe attributes of SIFT, we can assert that this algorithm issuitable for industrial applications. Generally, SIFT is goodfor locating objects and detecting their orientation, when thetransformation from the test image to the model image can bedefined as affine.

However it is sensitive to perspective deformations, asoccurs when an autonomous vehicle moves through theworkspace. It is also not suitable for measurements whichneed morphological information, as the case of work-piecesdeformation inspection.

ACKNOWLEDGEMENTS

The present work has been sponsored by Brazilian -German Collaborative Research Initiative on Production Tech-nology - BRAGECRIM.

REFERENCES

[1] M. Bohlool and S. Taghanaki, “Cost-efficient Automated Visual Inspec-tion system for small manufacturing industries based on SIFT,” in Imageand Vision Computing New Zealand, IVCNZ 2008. 23rd InternationalConference. IEEE, Nov. 2008, pp. 1–6.

[2] A. Kumar, “Computer-vision-based fabric defect detection: a survey,”in Industrial Electronics, IEEE Transactions on, vol. 55, no. 1, Jan.2008, pp. 348–363.

[3] E. N. Malamas, E. G. M. Petrakis, M. Zervakis, L. Petit, and J.-D.Legat, “A survey on industrial vision systems, applications and tools,”Image and Vision Computing, vol. 21, no. 2, pp. 171–188, 2003.

[4] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,”International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110,Nov. 2004.

[5] C. Lin and E. Setiawan, “Object orientation recognition based on SIFTand SVM by using stereo camera,” in International Conference onRobotics and Biomimetics. IEEE, Feb. 2009, pp. 1371–1376.

[6] S. Se, D. Lowe, and J. Little, “Mobile Robot Localization and Mappingwith Uncertainty using Scale-Invariant Visual Landmarks,” The Inter-national Journal of Robotics Research, vol. 21, no. 8, pp. 735–758,Aug. 2002.

[7] D. Q. Huynh, A. Saini, and W. Liu, “Evaluation of three localdescriptors on low resolution images for robot navigation,” in 24thInternational Conference Image and Vision Computing, no. Ivcnz.IEEE, Nov. 2009, pp. 113–118.

[8] J. Klippenstein and H. Zhang, “Quantitative evaluation of feature ex-tractors for visual slam,” in Fourth Canadian Conference on Computerand Robot Vision, 2007, pp. 157–164.

[9] R. Schmitt and A. Pavim, “Flexible optical metrology strategies forthe control and quality assurance of small series production,” in SPIE- Optical Measurement Systems for Industrial Inspection VI, P. H.Lehmann, Ed., vol. 7389. Society of Photo-Optical InstrumentationEngineers, 2009, pp. 738 902–738 902–12.

[10] A. Ramisa, D. Aldavert, S. Vasudevan, D. Toledo, and R. L. de Man-taras, “Evaluation of Three Vision Based Object Perception Methodsfor a Mobile Robot,” Journal of Intelligent & Robotic Systems, vol. 68,no. 2, pp. 185–208, Apr. 2012.

[11] T. Pfeifer, R. Freudenberg, G. Dussler, M. Zacher, and A. Bay, “Opticalmetrology - Enabeling factor for successful production,” in 3th BrazilianCongress of Metrology, 2003.

[12] P. Pinies, T. Lupton, S. Sukkarieh, and J. D. Tardos, “Inertial aidingof inverse depth SLAM using a monocular camera,” in InternationalConference on Robotics and Automation, ICRA2007, 2007, pp. 2797–2802.

[13] K. Chen and W. Tsai, “Vision-based autonomous vehicle guidancefor indoor security patrolling by a SIFT-based vehicle-localizationtechnique,” in Vehicular Technology, IEEE Transactions on, vol. 59,no. 7, 2010, pp. 3261–3271.

[14] H. Wu, G. Feng, H. Li, and X. Zeng, “Automated Visual Inspection ofSurface Mounted Chip Components,” in International Conference onMechatronics and Automation. IEEE, Aug. 2010, pp. 1789–1794.

419

Powered by TCPDF (www.tcpdf.org)

[IEEE 2013 IEEE 11th International Conference on Industrial Informatics (INDIN) - Bochum, Germany...

Documents

Transcript of [IEEE 2013 IEEE 11th International Conference on Industrial Informatics (INDIN) - Bochum, Germany...