B.Tech Thesis

52
i Vision Based Robotic Manipulation of Objects A Graduate Project Report submitted to Manipal University in partial fulfilment of the requirement for the award of the degree of BACHELOR OF TECHNOLOGY In Instrumentation and Control Engineering Submitted by Susobhit Sen Under the guidance of Prof. (Dr.) Santanu Chaudhury Professor, Dept. of Electrical Engineering, IIT Delhi And Prof. P Chenchu Sai Babu Assistant Professor, Dept. of Instrumentation and Control Engineering, MIT, Manipal DEPARTMENT OF INSTRUMENTATION AND CONTROL ENGINEERING MANIPAL INSTITUTE OF TECHNOLOGY (A Constituent College of Manipal University) MANIPAL 576104, KARNATAKA, INDIA July 2015

Transcript of B.Tech Thesis

Page 1: B.Tech Thesis

i

Vision Based Robotic Manipulation of

Objects

A Graduate Project Report submitted to Manipal University in partial

fulfilment of the requirement for the award of the degree of

BACHELOR OF TECHNOLOGY

In

Instrumentation and Control Engineering

Submitted by

Susobhit Sen

Under the guidance of

Prof. (Dr.) Santanu Chaudhury

Professor, Dept. of Electrical Engineering, IIT Delhi

And

Prof. P Chenchu Sai Babu

Assistant Professor, Dept. of Instrumentation and Control Engineering,

MIT, Manipal

DEPARTMENT OF INSTRUMENTATION AND CONTROL ENGINEERING

MANIPAL INSTITUTE OF TECHNOLOGY

(A Constituent College of Manipal University)

MANIPAL – 576104, KARNATAKA, INDIA

July 2015

Page 2: B.Tech Thesis

ii

DEPARTMENT OF INSTRUMENTATON AND CONTROL ENGINEERING

MANIPAL INSTITUTE OF TECHNOLOGY

(A Constituent College of Manipal University)

MANIPAL – 576 104 (KARNATAKA), INDIA

Manipal

July 3rd

2015

CERTIFICATE

This is to certify that the project titled Vision Based Robotic Manipulation of

Objects is a record of the bonafide work done by SUSOBHIT SEN (110921352)

submitted in partial fulfilment of the requirements for the award of the Degree of

Bachelor of Technology (B.Tech) in INSTRUMENTATION AND CONTROL

ENGINEERING of Manipal Institute of Technology Manipal, Karnataka, (A

Constituent College of Manipal University), during the academic year 2014-15.

P Chenchu Sai Babu

Assistant Professor

I & CE

M.I.T, Manipal

Prof. Dr. Dayananda Nayak

HOD, I & CE.

M.I.T, MANIPAL

Page 3: B.Tech Thesis

iii

Page 4: B.Tech Thesis

iv

ACKNOWLEDGMENTS

Throughout my engineering years at Manipal, it has been my privilege to work with and learn

from excellent faculty and friends. They all have had a significant impact on the quality of

my research, education and my professional growth. It is impossible to list and thank all of

them, but I would like to acknowledge everyone at the Department of Instrumentation and

Control Engineering for enabling access to quality education and an excellent learning

environment to students.

I would like to start by expressing my sincerest gratitude to my research project guide at

Indian Institute of Technology, Delhi (IIT Delhi) Professor (Dr.) Santanu Chaudhury for his

wisdom, patience, and for giving me the opportunity to pursue my final semester project in

PAR (Programme in Autonomous Robotics) Lab. His guidance and support were the most

important assets that led the completion of this report. I also thank him for making me

accustomed to new advances in the field of computer vision, robotics and object

identification also, helping me figure out my research interest in the field of robotics and

computer vision.

I would also like to offer my sincere gratitude to Dr. Dayananda Nayak, HOD, Dept. of

Instrumentation and Control Engineering, MIT, Manipal and Mr P Chenchu Sai Babu,

Associate Professor, Dept. of Instrumentation and Control Engineering, MIT Manipal, for

allowing me to complete my final year project in IIT Delhi and for providing their valuable

insights into my project.

Also I will like to thank Mrs Shraddha Chaudhary, Mr Ashutosh Kumar and other members

at PAR Lab and IIT Delhi for helping me complete my project. I had a great time learning

from them and hopefully our paths will cross again.

Page 5: B.Tech Thesis

v

ABSTRACT

This project addresses manipulating pellets in a cluttered 3D environment using a robotic

manipulator with help of a fixed camera and a laser range finder. The output data from the

camera and the laser sensor, after processing, guide the manipulator and the gripper to

perform the task at hand. The intention is to have an autonomous system to perform the task

of picking and inserting in a tube. This is not a straightforward implementation as it involves

synergistic combination of data from the Micro Epsilon ScanCONTROL Laser Scanner and

the camera with the position KUKA KR-5, a 6 DOF robot with a two finger and a suction

gripper. The fusion of data has been achieved through derivation of a robust calibration

between the optical sensors. The depth profile is added to the image information adding an

extra dimension. To pick up the workpiece from a collection of work pieces in a random

clutter scenario, it is important to know the precise location of the workpiece. This accuracy

was achieved to 0.4 mm combined with 0.1 mm intrinsic position error of the KUKA, trials

carried out gave good 90% repeatability in locating the pellet (with respect to the base frame).

The pick up using a suction gripper increased the repeatability to further 96% due to the

bellow effect of the gripper and tolerances of the bellow.

Today, robotics has taken a lot of complex responsibilities and has served the mankind in

innumerable ways. From assisting technicians on assembly line of an automobile plant to

performing complex laparoscopy procedures, they have proved to have great repeatability,

accuracy and precision. Lot of manufacturing facilities remands the classical “bin picking”

procedure. This project deals with this classical problem combined with complexities such as

identical and texture less work pieces and more importantly bin picking with a 3D

arrangement of work pieces. In order to cater to the clutter and occlusion, the data required

for processing has to be three dimensional. This is achieved by sensor fusion and by

obtaining synchronised data from the sensors.

Page 6: B.Tech Thesis

vi

LIST OF TABLES

Table No Table Title Page No

3.1 Sensor Frame to TCP Calibration Matrix 28

3.2 TCP frame to Sensor frame calibration matrix 30

4.1 Rotation Matrix elements for camera calibration 44

4.2 Translation Matrix elements for camera calibration 44

4.3 Camera Intrinsic Matrix elements for camera calibration 44

4.4 Joint Calibration results (1) 44

4.5 Joint Calibration results (2) 45

4.6 URG Sensor results 46

Page 7: B.Tech Thesis

vii

LIST OF FIGURES

Figure No Figure Title Page No

2.1 Algorithm for pellet pickup 17

2.2 Surface Fitting to cylinders 18

2.3 Estimation of object orientation using vision system 18

2.4 Query edge mapping 19

2.5 Application of canny edge detection and depth edges in clutter 19

3.1 Experimental setup with robot end effector, bin and workspace 21

3.2 Different reference frames (Left) and sensor mounted on robot end

effector(Right)

22

3.3 Pinhole Camera Model 23

3.4 Camera calibration feature points 23

3.5 Camera orientation (Left) and re-projection error (Right) 25

3.6 Laser and camera joint calibration 26

3.7 Camera and laser calibration using coplanar points 27

3.8 Laser scan of 3 different height objects 28

3.9 Single pellet from two different views 29

3.10 Pellets kept in random (clutter) arrangement 29

3.11 Image from camera(left), Plot of points in X-Z plane(middle), Plot

of points in Y-Z plane (right)

30

3.12 Point Cloud Data (sorted) MATLAB. 31

3.13 Point Cloud Data Visualisation in MeshLab 32

3.14 Matlab Plot of data from laser scanner showing 3D arrangement of

pellets

33

3.15 Simple block diagram representation of the algorithm. 34

3.16 Hough Transform of a standing and two lying pellets showing the

longest edge and centre.

35

3.17 3D arrangement of standing pellets 36

3.18 Images showing the different levels with pellets 36

3.19 Application of Hough Transforms gives distinct circles. 36

3.20 Algorithm for pickup of the pellet 37

3.21 Laser data for a single pellet with color variation on basis of height 38

3.22 Transformation of laser data onto Image data 38

3.23 Mapping of laser data onto Image data for cluttered arrangement 39

3.24 Image showing cylinder fitting (a), optimized results (b) 40

3.25 Normals and plane passing through a point on the cylinder 40

3.26 Dimension and Range data of the Micro Epsilon SCANControl 41

Page 8: B.Tech Thesis

viii

Laser Scanner

3.27 GUI of SCANControl Software 41

4.1 Depth profile from URG Sensor 45

4.2 Hough Transforms detecting circles for complete data as well as

for partial data

46

Page 9: B.Tech Thesis

ix

Contents

Page No

Acknowledgement i

Abstract ii

List Of Figures iii

List Of Tables vi

Chapter 1 INTRODUCTION

1.1 Introduction 10

1.1.1 Area of work 11

1.1.2 Present day scenario of work 11

1.2 Motivation 11

1.2.1 Shortcomings of previous work 12

1.2.2 Importance of work in present context 12

1.2.3 Significance of End result 12

1.2.4 Objective of the project 12

1.3 Project work Schedule 13

1.4 Organization of Project Report 14

Chapter 2 BACKGROUND

2.1 Introduction 16

2.2 Literature Review and about the project 16

Chapter 3 METHODOLOGY

3.1 Introduction 21

3.2 Methodology 22

3.2.1 Calibration 22

3.2.2 Data Interpretation 31

3.2.3 Algorithms for Pick Up 32

3.2.4 Component Selections and Justifications 40

Chapter 4 RESULT ANALYSIS

4.1 Introduction 43

4.2 Results 43

4.2.1 Calibration Results 43

4.2.2 Laser Sensor Evaluation Results 43

4.2.3 URG Sensor results 45

4.2.4 Repeatability Tests and Analysis 46

Chapter 5 CONCLUSION AND FUTURE SCOPE

5.1 Work Conclusion 47

5.2 Future Scope of Work 47

REFERENCES 48

Page 10: B.Tech Thesis

10

CHAPTER 1

INTRODUCTION

1.1 INTRODUCTION

As biological organisms we are constantly inundated with stimuli informing us of a rich

variety of factors in the environment that may affect our well-being. These stimuli may be

sensory, social or informational. The ability to organize these simultaneous stimuli into

meaningful groups is a fundamental property of intelligence. In the practice of science, we

expose observations of nature to similar analysis. Robots on the other hand unlike us, lack the

ability to understand or interpret the data obtained from environment. Building smarter, more

flexible, and independent robots that can interact with the surrounding environment is a

fundamental goal of robotics research. Potential applications are wide-ranging, including

automated manufacturing, entertainment, in-home assistance, and disaster rescue. One of the

long-standing challenges in realizing this vision is the difficulty of ‘perception’ and

‘cognition’, i.e. providing the robot with the ability to understand its environment and make

inferences that allow appropriate actions to be taken. Perception through inexpensive contact-

free sensors such as cameras and laser are essential for continuous and fast robot operation. In

this research, we address the challenge of robot perception in the context of industrial

robotics.

For any task that requires manipulation of objects in a workspace, an autonomous robotic

system would require data depending on the type of task at hand. Direct application may be

to solve a wide industrial problem of bin picking. Often human labor is employed to load an

automatic machine with work-pieces. Such jobs are monotonous and do not add any value to

the skill set of the workers. While the cost of labor is increasing, human performance remains

essentially constant. Furthermore, the environment in manufacturing areas is generally

unhealthy. Also, when work-pieces are inserted into machines by hands limbs are often

exposed to danger. This is where the concept of process automation using computer vision

plays a vital role. Use of computer vision in such cases is an obvious solution.

A lot of research has already been put into automation of industrial practices. One of the

classical computer vision problems is estimation of 3D pose of the work piece. The currently

widely used research techniques employ use of camera for pose and orientation estimation.

This works out well but only for the 2D or planar arrangement of workpiece but fails when it

is arranged in 3D or in cluttered form. The reason here is because an image from a stationary

camera (fixed focus) would only give output in 2 dimensions whereas the requirement is for 3

Page 11: B.Tech Thesis

11

dimensional data. To obtain the 3D data from the workpiece that is pellets in this case, we

would require some other research methodology. In this project, identical pellets (cylindrical

in shape) textureless and with same colour are to be picked from a bin using a robotic arm.

Pellets are located using data from a Laser scanner adding the aforementioned extra

dimension for further computations. This scanner has a resolution of 24 micro meter. Scan

line of 14.9 cm and a variable tunable frequency of 200 – 2000 Hz . This high accuracy and

precision enables fast scanning of the workspace. Manipulation of data has been done using

OpenCV and PCL (Point Cloud Library) on Microsoft Visual Studio 2010 .

1.1.1 Area of work

This research mainly deals with information fusion between two sensors and subsequently

locating a pellet that can be picked up. In this project, data from camera and a laser range

finder are analyzed, synthesized and combined together so that the exact location of pellets

may be found out. Our idea here is to pick the pellet which is least occluded. This would

include calibration of individual sensors, subsequent optimization, image processing, Point

cloud processing and finding point to pixel correspondence for data from the laser to the

camera. Finally, mathematical and heuristic calculations yield us the least occluded pellet for

picking up.

1.1.2 Present day Scenario of work

Robotics has found a strong foothold in numerous manufacturing industries. The present

progress with respect to bin picking has been restricted to only planar arrangements. A planar

arrangement may be defined as a 2D arrangement of the object of interest. Recent advances

have dealt with problem minor occlusion by using different approaches using data from

camera. The novelty of this project lies in additional use of a range sensor for localization of

object. Presently, a lot of algorithms deal with the planar arrangement of identical objects.

This include heuristic approaches or using machine learning algorithms[4] (SVM,SVD etc.).

Main disadvantage with SVM and SVD is requirement of a large data set which is to be used

for learning.

Page 12: B.Tech Thesis

12

1.2 MOTIVATION

Industries generally rely on robots for automation to save time and money. One reason is that when

humans do the same task repetitively they feel uninterested and their efficiency to perform the task

with accuracy decreases over time. Another reason is the cost of human labor. Robots are designed

in a manner that they can perform these repetitive tasks with the same precision in lesser time. Pick

and place of object is one of the essential tasks in industries. After manufacturing a product, one has

to pick the manufactured object from plates/bin and organize them for shipment or storage. One

solution is to use manual labor for the whole process, where laborers pick objects and organize them

suitably. Another solution can be that humans arrange those objects in a tray and pass it to a robot.

A robot then picks them one by one and places them in suitable fashion. Robot can be trained to go

to particular locations in a tray again and again and humans can arrange the objects in the tray

accordingly. But the use of a robot becomes inevitable when the workspace environment is not

suitable for humans. Nuclear reactor and hot furnace are some of the few examples of such

environment. One cannot survive after a specified quantized amount of exposure to radioactive

materials. So for the handling of such materials we need the help of robot. Our motivation comes

from similar situations, where no kind of human intervention is possible and the whole process

needs to be automated. This will not only save time, but will also save humans from being exposed

to unhealthy environment. General solution to this kind of problem is to make use of stereo setup. In

stereo setup one can use two cameras to generate 3D reconstruction of a workspace and estimate

pose of objects. To create 3D reconstruction one needs to find correspondence between two

images taken by two cameras of a given scene, i.e., match same points on an object that are

common in both images. Generally objects used in industries are simple and featureless. Finding

point correspondence for such objects becomes very difficult and even impossible in certain cases.

In this thesis we are proposing an approach that uses mono-vision model combined with a laser

range finder data based recognition to solve pose estimation problem.

1.2.1 Shortcoming of previous work

Long-time back, bin picking problem was tackled using mechanical vibratory feeders, where vision

feedback was not available. But this solution had the problem of mechanical parts getting jammed

and it was highly dedicated. Due to these shortcomings, next generation of bin picking system

performed grasping and manipulation operations using vision feedback. Computer vision has

made rapid progress in the last decade, moving closer to definitive solutions for longstanding

problems in visual perception such as object detection [9],[10],[11] object recognition and

pose estimation. While the huge strides made in these fields lead to important lessons, most

of these methods cannot be readily adapted to industrial robotics because many of the

common assumptions are either violated or invalid in such settings.

Page 13: B.Tech Thesis

13

1.2.2 Importance of work in present context

This work has a lot of significance in the present context as an industrial alternative. This

may have a vast application in manufacturing set ups. A combination of laser range finder

with a camera can solve complex problems of object localisation and identification. In cases

of bin picking, this method is again very beneficial. The main advantage is in the fact that the

data from both sensors combined together and the resultant data will be an image with depth

information.

1.2.3 Significance of the end result

This project deals with improving the conventional methods of object identification, pose

estimation and picking using robotic arm given that the workpiece is identical in shape, size

and texture. The end result showed a repeatability of 96% with a tolerable error of 0.3 mm.

1.2.4 OBJECTIVE

The project aims to:-

Joint Calibration of optical sensors

“Hand Eye Calibration” of visual sensors with the robotic arm

Find pose , orientation of objects placed in a 3D arrangement

Fusion of data from two independent visual sensors

Pellet picking with help of suction gripper

Result and model verification

Page 14: B.Tech Thesis

14

1.4 PROJECT SCHEDULE

January 2015

o Processing of image from camera.

o Review of existing literature

February 2015

o Feature based contour extraction and Hough circle estimation.

o Processing data from Micro Epsilon SCANControl Laser Scanner

o Practical Calibration of camera to define camera coordinate system, Intrinsic as

well as extrinsic parameters

March 2015

o Algorithm and code designing for the use of above coordinate system in real time

using OpenCV.

o Extrinsic calibration of the Laser Scanner to the camera.

April 2015

o Algorithm Design for the pick up of pellet from a bin.

o Working with KUKA Robot to train it as per the problem statement.

May 2015

o Robustness analysis of algorithms.

o Final Implementation, revision and modifications.

June 2015

o PCL Point Cloud data analysis.

o Project report and Documentation.

1.5 ORGANISATION OF REPORT

CHAPTER 1: This chapter briefly describes about the project and the aim of the project. The

chapter also describes about the shortcomings of the previous work and the steps taken to

overcome the problems. The project objectives and schedule are also included in this chapter.

Page 15: B.Tech Thesis

15

CHAPTER 2: This chapter briefly describes about the project title and the content of the project.

It also lays emphasis of the work in the present day scenario and the literature review which was

done to carry out this project. Theoretical discussions along with conclusion marks the end of this

chapter.

CHAPTER 3: This chapter describes about the methodology which is used in this project. Various

component which are used are explained along with software while doing this project. The main

block diagram of the project is also explained in this chapter.

CHAPTER 4: In this chapter the result obtained are explained and also the explanations for the

result obtained are discussed .The significance of the result obtained has also been discussed in

this chapter.

CHAPTER 5: This chapter highlights the brief summary of the work and the problem faced while

doing the project. It also discusses about the significance of the result which is obtained along

with future scope for the project.

Page 16: B.Tech Thesis

16

CHAPTER 2

BACKGROUND THEORY

2.1 INTRODUCTION

This chapter deals with the project title and the literature review which was done to carry out

this project. General theoretical discussions are carried out as well as a brief description of the

latest research in this field are cited. This chapter also explains the important references that

have been used in derivation and algorithm designing.

2.2 LITERATURE REVIEW AND ABOUT THE PROJECT

Object identification and estimation of parameters cannot be done using a single camera.

Image segmentation doesn’t work in these cases as with increasing height, the background

essentially merges with the object and causes failure of conventional edge detection

algorithms such as canny edge detectors.

Fig 2.1 Algorithm for pellet pickup

Page 17: B.Tech Thesis

17

To solve this problem by using vision solution initially was based on modeling parts using

2D surface representation and these 2D representation were invariant shape descriptors [19].

There were other system as well, which recognized, scene objects from range data using

various volumetric primitives such as cylinders ([20]).

2.2.1 EXTRACTION OF CYLINDERS AND ESTIMATION OF THEIR PARAMETERS FROM

POINT CLOUDS

This is a method [21] devised in order to perfectly approximate the position and orientation

of cylinders from a point cloud data. The main idea is of region growing. The author takes a

point on the surface of cylinder and then selects a neighbourhood . The norm at these points

give the axis of the cylinder. RANSAC algorithm is used here. Once the axis of the cylinder

is found out, the other points that were not in the neighbourhood are taken and projections on

to the axis are made and the model gets optimized.

One of the disadvantages of this method is that it requires a complete model of the cylinder

and they should be sparse. This is not possible in our case as the cylinders are in cluttered

environment. Due to the clutter, we only obtain partial point cloud

2.2.2 EFFICIENT HOUGH TRANSFORM FOR AUTOMATIC DETECTION OF

CYLINDERS IN POINT CLOUDS

This research [22] essentially converts the 3D problem into a 2D problem. The initial steps

are same as a point is selected and a region is defined around it. This point selection is made

on the basis of curvature. Next, a base point and a Gaussian sphere is defined around it. Once

this is done, author creates normal projections on to the sphere passing through the origin.

Next step is application of Hough circles on the intersection of a sphere and a plane (normal

to the axis) which would give a circle. The center can be easily found out once the circles and

their centers are detected.

Again, this method also works well for sparse cylinders. Also they have assumed the

dimensions. So for two different radii cylinders, this method would fail.

Page 18: B.Tech Thesis

18

Fig 2.2 Surface Fitting to cylinders

2.2.3 3D VISION SYSTEM FOR INDUSTRIAL BIN-PICKING APPLICATIONS

This paper[18] explains the same problem of bin picking in a 3D cluttered environment. Here

they propose an experimental setup using two laser projectors and a camera. The idea is

somewhat similar to our methodologies. They also use hough transforms for detection. Here

pose estimation is a two step process. This methodology was tested on bigger work pieces

and the surface area ratio as compared to our workpiece is approximately 6:1 and hence even

the error tolerance for this model is higher than our system. So, it cannot be applied directly

for our problem statement

Fig 2.3 Estimation of Object Orientation using vision system

Page 19: B.Tech Thesis

19

2.2.4 FAST OBJECT LOCALIZATION AND POSE ESTIMATION IN HEAVY CLUTTER

FOR ROBOTIC BIN PICKING

This research[9] also addresses bin picking in cluttered environment. They have designed and

introduced FCDM algorithm. Fast directional chamber matching algorithm is based on

finding alignment between template edge map and query edge map. This is further calculated

using a warping and distance cost function.

Fig 2.4 Query edge mapping

Next step is matching of directional edge pixels. Further they propose an edge image.

Fig 2.5 Application of canny edge detection and depth edges in clutter

This algorithm only retains edge points with continuity and sufficient support, therefore the noise and isolated edges are filtered out. In addition, the directions recovered through the fitting procedure has been shown to be more precise than as compared to image gradients.

Page 20: B.Tech Thesis

20

2.2.5 TRAINING BASED OBJECT RECOGNITION IN CLUTTERED 3D POINT CLOUDS

This paper[17] combines machine learning with object identification in 3D point cloud. The

algorithm is divided into two parts, Training and detection. First the detector detects the point

cloud and followed by training. Firstly weak classifier candidates are generated and trained.

Once training is completed, objects are classified. This is done on a basis of object template

defined as collection of grids and matching templates. A feature value is also defined which

corresponds the ratio of matching grids.

2.3 CONCLUSION

On the basis of analysis from the above papers, the algorithm we decided had to first localize

the pellets. Once that is being done, find normal and subsequently the curvature. Curvature

can further be used to classify cylindrical surface and planes. This is to be followed by rough

cylinder fitting. This could give rough values of the axis and the center with some error.

Further this can be optimized by using machine learning procedures and voting . Once the

pellet is picked, those points are to be deleted.

Page 21: B.Tech Thesis

21

CHAPTER 3

METHODOLOGY

3.1 INTRODUCTION

This chapter briefly describes about the methodology which is being used in the project. The

idea here is to pick up pellet from a 3D clutter. The pellets may be tilted, standing or

sleeping. This chapter includes the detailed methodology in order to solve the problem

statement. This includes assumptions and approximations made. Block diagrams to deal with

different devised approaches. Component specifications and justification of different

apparatus used, preliminary result analysis followed by conclusions.

3.2 METHODOLOGY

In order to find exact position of the pellets and to pick up, we used three different

approaches. In any cluttered environment combined with occlusion we may classify objects

(identical in shape) in three cases as standing, sleeping and tilted cylindrical pellets. Below

diagram shows the experimental setup with the robot, pellets in bin and vertical tubes (for

insertion).

Fig 3.1 Experimental Setup with robot end effector, bin and workspace

Page 22: B.Tech Thesis

22

3.2.1 CALIBRATION

This section discusses the various calibration procedures that were used in this project.

Calibration is one of the very important aspects of sensor fusion. When multiple sensors are

in play, it is essential to analyze data in a common frame of reference. Due to complexity of

the problem here, dealing with 3D data, various methods for calibration were used in order to

obtain a robust transformation.

Fig 3.2 Different reference frames(left) and sensors mounted on robot end effector(right)

To start with, it is important to understand the importance reference frames. Every sensor

kept in the real world has its own reference frame. Hence it is important to map these

different reference frames into a common reference frame so as to enable robotic

manipulation of objects. For a combined use of the camera and the laser range finder, it is

important to first have a common reference system. Calibration is the process of estimating

the relative position and orientation between the laser range finder and the camera. It is

important as it effects the geometric interpretation of measurements. We describe theoretical

and experimental results for extrinsic calibration of a robotic arm consisting of a camera and

a 2D laser range finder. The calibration process is based on observing a planar checkerboard

and solving the constraints between “views” of a planar checkerboard calibration pattern

from a camera and a laser range finder. The entire calibration process has been explained

below as subsections for each of the visual devices.

Page 23: B.Tech Thesis

23

3.2.1.1 CAMERA CALIBRATION

Camera is also used to obtain the 2D data. The image maps a 3D world into a 2D world. The

schematic of a pinhole camera is shown below [5]:

Fig 3.3 Pinhole camera model

Calibration of camera is also required to obtain the intrinsic as well as extrinsic properties of

the camera. Intrinsic properties tend to stay constant whereas the extrinsic properties depend

on the position and orientation of the camera. Calculation of extrinsic parameters may be

further classified into two parts. Calculation of rotation matrix and transformation matrix. A

combination of both the matrices maps the 3D world to pixels of an image. There are

different ways of camera calibration. [1] is a widely used method of calibration.

Fig 3.4 Camera calibration feature points

Page 24: B.Tech Thesis

24

Figure 1 shows a camera with centre of projection O and the principal axis parallel to Z axis.

Image plane is at focus and hence focal length ‘f’ away from O. A 3D point P = (X; Y; Z) is

imaged on the camera’s image plane at coordinate Pc = (u; v). As discussed above, We first

find the camera calibration matrix C which maps 3D P to 2D Points. Using similar triangles

as :

(3.1)

This is equivalent to

(3.2)

(3.3)

From the above equations 3.2 and 3.3, we can write a generalized homogeneous coordinates

for Pc as :

(3.4)

For the case when the origin does not coincide with of the 2D image coordinate system does

not coincide with where the Z axis intersects the image plane, we need to translate Pc to the

desired origin. Let this translation be defined by ( , ). Hence, now (u, v) is

(3.5)

(3.6)

This can be written in similar form 3.4 as

(3.7)

Page 25: B.Tech Thesis

25

The above matrix denotes the intrinsic characteristics of the camera. It is important to note

the units of focal lengths and the offset from the centre. To find the extrinsic calibration

matrix, we need a rotation and translation to make the camera coordinate system coincide

with the configuration in Figure 1. Let the camera translation to origin of the XYZ coordinate

be given by T (Tx, Ty, Tz). Let the rotation applied to coincide the principal axis with Z axis

be given by a 3 x 3 rotation matrix R. Then the matrix formed by first applying the translation

followed by the rotation is given by the 3 x 4 matrix.

E=(R|RT) (3.8)

Hence combining the above two equations, we get

(3.9)

Fig 3.5 Camera orientations(left) and reprojection error(right)

The camera matrix C gives Pc that is the projection of P. C is a 3 x 4 matrix usually called the

complete camera calibration matrix. Note that since C is 3 x 4 we need P to be in 4D

homogeneous coordinates and Pc derived by CP will be in 3D homogeneous coordinates. The

exact 2D location of the projection on the camera image plane will be obtained by dividing

the first two coordinates of Pc by the third.

3.2.1.2 EXTRINSIC CALIBRATION OF CAMERA AND LASER (Zhang and Pless, IROS

2004)

This method is a one of the earliest work done in calibration of laser range finder and a

camera. The experimental results verify the calibration. One of the restrictions is the

constraint that requires the checkerboard to be in “view” to both the laser range finder as well

as the camera. Method here is to reduce the algebraic error in the constraint. This data is then

further modified by non-linear refinement which minimizes a re-projection error. The goal of

Page 26: B.Tech Thesis

26

this paper was to study a calibration method that finds the rotation ⱷ and the translation ∆

which transform points in the camera coordinate system to points in the laser coordinate

system

The basic method here is to decompose the calibration into two parts: Extrinsic and Intrinsic.

The external calibration parameters are the position and orientation of the sensor relative to

some reference coordinate system.[6] Authors have assumed that the internal parameters of

the camera are already calibrated. The arrangement is shown below in fig 3.6

Fig 3.6 Laser and camera joint calibration

From equations K is the camera intrinsic matrix, R a 3 x 3 orthonormal matrix representing

the camera’s orientation, and T a 3-vector representing its position. In real cases, the camera

can exhibit significant lens distortion, which can be modelled as a 5-vector parameter

consisting of radial and tangent distortion coefficients. The laser range finder reports laser

readings which are distance measurements to the points on a plane parallel to the floor. A

laser coordinate system is defined with an origin at the laser range finder, and the laser scan

plane is the plane Y = 0. Suppose a point P in the camera coordinate system is located at a

point in the laser coordinate system, and the rigid transformation from the camera

coordinate system to laser coordinate system can be described by:

(3.10)

Where, ⱷ is a 3x3 orthonormal matrix representing the camera’s orientation relative to the

laser ranger finder and ∆ is a 3-vector corresponding to its relative position.

Without loss of generality, we assume that the calibration plane is the plane Z = 0 in the

world coordinate system. In the camera coordinate system, the calibration plane can be

parameterized by 3-vector N such that N is parallel to the normal of the calibration plane, and

its magnitude, ||N||, equals the distance from camera to the calibration plane. Using (1) we

can derive that:

(3.11)

Where, is the 3rd column of rotation matrix R, and t the centre of the camera, in world

coordinates. Since the laser points must lie on the calibration plane estimated from the

Page 27: B.Tech Thesis

27

camera, we get a geometric constraint on the rigid transformation between the camera

coordinate system and the laser coordinate system. Given a laser point in the laser

coordinate system, from (2.10), we can determine its coordinate P in the camera reference

frame as . Since the point P is on the calibration plane defined by N, we

have

(3.12)

The solution to the above problem was done in the following two ways, first a linear solution

is found by using the constraints. In the subsequent steps, it is further refined by optimisation.

A notable fact about this paper is that the camera calibration matrix is further refined and

hence accurate as compared to the camera calibration data using Zhang’s method [8].

3.2.1.3 CALIBRATION USING COPLANAR POINTS COMMON TO WORKSPACE

(Optimised using gradient descent optimisation methods)

Fig 3.7 Camera and laser calibration using coplanar points

This calibration method leveraged on the fact that laser scan line is also visible to the camera.

Calibration required 4 points common to both the camera as well as the laser range finder in

order to find the intrinsic as well as the extrinsic calibration matrices. Furthermore, after

obtaining initial values, the resultant calibration matrix was further optimized using gradient

descent method of optimization. This required additional 10 points.

Page 28: B.Tech Thesis

28

In this method, two point vectors are first taken which are accurately known in both base as

well as in the sensor frame of references. Cross product yields us a direction vector that is

perpendicular to the base frame. Further direction cosines when taken with another point, this

gives the other two axes. In order to calculate accurately for 3D data, we take another point

Fig 3.8 Laser scan of 3 different height objects

that is non-coplanar and has (x,y,z) coordinate. These calculations provide an initial

estimation to the calibration matrices. After optimization, we found following result:

0.8001 0.5972 -0.0565 -39.2808

0.5996 -0.7989 0.0474 -14.6918

-0.0168 -0.0718 -0.9973 329.3757

0 0 0 1

Table 3.1 Sensor Frame to TCP Calibration Matrix

The above table shows the calibration matrix from the sensor frame to the the TCP

(Tool Centre Point) frame. The first 3 rows and the 3 columns are the rotation matrices and

the last column element are the transformation matrix elements.

Page 29: B.Tech Thesis

29

Fig 3.9 Single pellet from two different views

The above figure shows the cylindrical pellet. For computational standardisation, all pellets are of

same dimensions and color.

Fig 3.10 Pellets kept in random (clutter) arrangement

The transformation from the sensor frame to the base frame is done in two steps. The first one

is to find the rigid transformation to the TCP (Tool Centre Point) frame and then to find the

transformation to the Base frame. For every motion of the robot end effector during the

scanning procedure, this transformation will change.

To find the TCP to Base conversion, we make use of the angles from the KUKA end effector.

These being the roll, pitch and yaw angles. The rotation parameters are combination of three

matrices :

Page 30: B.Tech Thesis

30

The multiplication of the above matrix gives us the complete rotational matrix element. The

3x1 transformation matrix would directly be the positional coordinates of the end effector.

Q=

0.5490 0.8350 0.0169 177.35

-0.8348 -0.5500 0.0178 227.68

-0.0240 0.004 -0.9996 189.67

0 0 0 1

Table 3.2 TCP frame to Sensor frame calibration matrix

The above matrix is the calibration matrix that maps the points in the TCP frame to the base frame.

It is important to note that the 1st row and 4th column element is to change for every scan line. This is

because the position of the robot changes for every scan line. This can be shown by the below

figures:

Fig 3.11 Image from camera(left), Plot of points in X-Z plane(middle), Plot of points in Y-Z plane

(right)

Page 31: B.Tech Thesis

31

Fig 3.12 Point Cloud Data (sorted) MATLAB.

3.2.2 DATA INTERPRETATION (LASER SCANNER)

A robust algorithm has been designed for picking up pellet from the clutter. After calibration

is done and a common reference frame is decided, the next step is data acquisition. The

MicroEpsilon SCANControl Laser Scanner gives a 2D profile of points along the X-Z axis.

The X coordinate of the point depicts the location in the sensor frame of reference and the Z

axis give the height with least count of 1 micro-meter.

We found that incase of standing pellets, the location of pellets may be found out by only

using the laser data. The 2D profile from the laser is converted to a 3D reconstruction of the

scene. This 3D data is a collection of points as shown in the figure. The X-Z axis are obtained

from the laser scanner and the Y axis is obtained by the motion of the robotic arm. The

resolution among the Y axis is 1 mm. This may be varied. A higher resolution would give a

sparce data whereas a lower resolution will yield a dense data and might increase

computational load.

Page 32: B.Tech Thesis

32

Fig 3.12 Point Cloud Data Visualisation in MeshLab

The point cloud data represents the environment and the 3D structures. The First step is to

process the data. The Point cloud data is saved in a .csv(comma separated value) or a .txt file.

This data is obtained in “append” mode from the SDK C++ code of the Laser Scanner

manufacturer Micro Epsilon. Once the data acquisition is done, the data set is reduced to only

set of points above a fixed threshold of about 0.5mm. This rejects all the points on the

ground. Due to very low resolution and high accuracy, one can very well remain certain

about the very low levels of noise.

3.2.3 ALGORITHMS FOR PICKUP

After the data preprocessing, the next step is to precisely find the location of the pellet. As

mentioned earlier, the pellet that is least occluded has to be picked up first. This is because of

many benefits such as, the least occluded pellet will have the most approachable path. Thus

the movment of the robotics arm wont disturb the other pellets. It is also important to know

that as the scanning takes time, it would be ideal to have the entire cleanup with least number

of scans. In order to achieve that, pellet that least disturbs the arrangement is the best option

to pick and is often the least occluded pellet.

We have developed different approaches in order to address the pick up. These can be

subdivided into following approaches.

3.2.3.1 PELLET PICKUP USING POSITION ESTIMATION (HEURISTIC APPROACH)

In this approach, we first try to find the least occluded pellet. This pellet has to be on the top

of the clutter given that no other pellet lie outside the boundary of the clutter and inside the

workspace. The pellet on top will have the topmost point in the point cloud. This idea is used

to sort the data from top to bottom. The topmost point on the stack would correspond to the

pellet on top.

Page 33: B.Tech Thesis

33

Fig 3.13 Matlab Plot of data from laser scanner showing 3D arrangement of pellets

As it can be seen from the above scanning, the top most point corresponds to a point on the

top pellet. In order to have the best pick up point on the surface area, that is the centre on the

circle in case of standing pellet and centre of the curved surface area in case of sleeping

pellet, we need to determine that position.

In order to do so, we use an averaging filter to the data set of the surface points. Surface

points are extracted by applying a tolerance to the Z value of topmost point. For example, If

the Zmax value is 165.89 mm , a tolerance of approximately 2 mm takes care of all the

manufacturing errors of the pellets. A usual standing pellet has approximately 1500 points

and a sleeping pellet has approximately 2500 points. These number of points vary with

position and orientation of the pellet .

The cases with two or more pellets at the same level, and are occluding, the calculations yield

a wrong location as it uses averaging. This was avoided by sorting the values again on the

basis of the X coordinate. The least value of X coordinate will give the leftmost point on the

workspace. The corresponding Y coordinate and combined with the sorted X value is then

used to find the location of the pellet.

Page 34: B.Tech Thesis

34

Fig 3.14 Simple block diagram representation of the alorithm.

3.2.3.2 PELLET PICKUP USING HOUGH TRANSFORM

Mathematical approximation in the above methodology does not always hold good as the

averaging method sometimes give out erroneous results. This may be due to even one single

error value as it directly hampers the average and other statistical computations. Error values

are often encountered maybe due to shadowing effect or back reflections

Inorder to design a more robust system, we use Hough transforms to locate the centre of

pellet in case of standing pellets and longest edge algorithm for sleeping pellets. This method

turned out to be robust as after setting the hough transform parameters, the algorithm could

decide the circles and also the centre of the circles.

The Hough transform is a feature extraction technique used in image analysis, computer

vision, and digital image processing. The purpose of the technique is to find imperfect

instances of objects within a certain class of shapes by a voting procedure. This voting

procedure is carried out in a parameter space, from which object candidates are obtained as

local maxima in a so-called accumulator space that is explicitly constructed by the algorithm

for computing the Hough transform. The classical Hough transform was concerned with the

Page 35: B.Tech Thesis

35

identification of lines in the image, but later the Hough transform [3] has been extended to

identifying positions of arbitrary shapes, most commonly circles or ellipses.

Fig 3.15 Hough Transform of a standing and two lying pellets showing the longest edge and centre.

Above image shows hough transform approximation for a single layer application. This

usually fails with a multi-layer arrangement. We tried out with multi-dimensional

checkerboard grid detection and used the data for calibration of camera. This enhancement

enabled us to have hough transforms for 2 layer as well. Due to the error rate and failure to

identify the circle centre, we applied hough transform on the 3D point cloud data directly.

In order to apply Hough transform to the 3D data, we first segregated the point cloud data

into different levels. These levels may be defined as n and d where n and d are the length and

diameter of the pellets respectively. The quantisation of point cloud data is distributing the

point cloud into n, 2n, 3n…. and d, 2d, 3d …. Categories. As the values of diameter and the

length are known, these values are compared to the Z coordinate of the point cloud data. The

above mentioned tolerance value come into play while distributing the points into the

quantised levels.

After the quantisation is done, we observe some set of points in each of the levels. For

example if we have a 2 layer data of standing pellets only then we will finds points in levels n

and 2n. Incase we have points in d level then that would mean that there are sleeping pellets

also present in the arrangement.

Page 36: B.Tech Thesis

36

Fig 3.16 3D arrangement of standing pellets

Level 1 – Ground level Level 2 Level 3

Level 4- Top Level

Fig 3.17 Images showing the different levels with pellets

Using OpenCV, and by application of hough transform on the above data we obtain the

following result

Fig 3.18 Application of Hough Transforms gives distinct circles.

Page 37: B.Tech Thesis

37

As we observe that perfect circles are detected in the above hough transform from the point

cloud data. These two centres are the centres of pellets in the second level. Once the precise

centre are obtained, these values are transferred to the KUKA and the end effector picks up

the pellet.

Fig 3.19 Algorithm for pickup of the pellet

The advantage of this method is that the number of scans required are reduced. The averaging

procedure explained in the above section fails in a lot many cases. This system is robust and

gives better results. Although in some cases , with change of illumination or color of pellet,

and keeping the Hough parameters constant, this method did not work.

3.2.3.3 MERGING POINT CLOUD DATA WITH IMAGE DATA

The point cloud data from the laser scanner has a point to pixel correspondence with the

image taken from the camera. This was found out using the calibration matrices correlating

the point cloud data to the pixels. This does have some errors which may be due to

approximations and errors during the process of calibration.

Page 38: B.Tech Thesis

38

Fig 3.20 Laser data for a single pellet with color variation on basis of height

Fig 3.21 Transformation of laser data onto Image data

In the above figure we can see the mapping of points on a pellet. The laser scan line is shown

in red colour. Similar mapping can be done with surfaces on pile of pellets as follows:

Page 39: B.Tech Thesis

39

Fig 3.21 Mapping of laser data onto Image data for cluttered arrangement

The use of mapping gives an error of 2% and approximately 7-8 pixels. This mapping of data

may be avoided in case of standing or sleeping pellets but it has to be used in case of tilted

pellets. Tilted case is a special case when the points do not fall to either of the above defined

levels. This case may arise when we find residue points outside the quantised levels. These

residue points are taken, mapped into image and from the image, older algorithms of pose

estimation are applied.

3.2.3.4 PELLET POSE ESTIMATION USING POINT CLOUD LIBRARY

A complete and accurate pose estimation can be obtained when we locate the axis of the

cylindrical pellet along with the centre through which it passes. The estimation of the exact

axis may be obtained using curve fitting methods. Algorithms such as RANSAC fits surfaces

or solids to the point cloud data. Most of point cloud curve/solid fitting algorithm work on the

basis of normal fitted to the surface. The cos product between normals on same surface is

different as compared to the normals at the edge of the surface or at edges or vertices. This

property of normals help fit curves to the point cloud data.

Page 40: B.Tech Thesis

40

Fig 3.22 Image showing cylinder fitting (a), optimized results (b)

Fig 3.23 Normals and plane passing through a point on the cylinder

3.2.4 COMPONENT SPECIFICATION AND JUSTIFICATION

The main component used here along with camera is the Micro Epsilon SCANControl Laser

Scanner. The model number which we used here is 29xx-100. The main specifications is the

Z axis range being 290mm with line length as 100mm. The frequency of operation is variable

between 300 – 2000 Hz and there are a maximum of 1280 points that may be obtained per

profile. The laser characteristics are 658nm wavelength with 8mW power. This is a class 3B

standardized and should not come in direct contact with eyes. Inputs and outputs are

configurable using Ethernet or RS422 which follows half duplex communication.

Page 41: B.Tech Thesis

41

Fig 3.24 Dimension and Range data of the Micro Epsilon SCANControl Laser Scanner

Fig 3.25 GUI of SCANControl Software

There are other existing alternatives for this problem statement. Microsoft Kinect, is a widely

used software for 3D data acquisition in computer vision. But this revolutionary technology

also has drawbacks. Error in acquired data is one. Secondly, dynamic environment requires

re-calibration of the RGB-D sensor. Small changes in workspace such as change in intensity

of light may trigger different output from the sensor. Moreover it might not be feasible to

Page 42: B.Tech Thesis

42

mount the sensor on the end effector of an industrial robot. Other alternative maybe use of 3D

[2] laser sensors but these are also bulky and poses a similar challenge as explained above.

The dataset of a 3D scan is usually exceptionally big. This data set refers to the 3D point

cloud data. A bigger data poses a challenge in computational capabilities and time required as

the entire process has to be done online and continuous till the bin is empty. Hence this laser

scanner hits a sweet spot in terms of frequency, range, compactness and optimum cloud data

output.

Page 43: B.Tech Thesis

43

CHAPTER 4

RESULT ANALYSIS

4.1 INTRODUCTION

This chapter deals with the analysis of the result which is obtained during our project.

Various graphs are plotted along with the explanation of the result. The significance of the

result obtained is also discussed in this chapter.

4.2 RESULTS

This section consist of results with respect to calibration, pixel errors and optimisation results

and repeatability for different cases.

4.2.1 CALIBRATION RESULTS

Calibration is the process of estimating the relative position and orientation between the

laser range finder and the camera. It is important as it effects the geometric interpretation

of measurements. This section has calibration result for both the optical devices camera as

well as the laser range finder.

4.2.1.1 CAMERA CALIBRATION RESULTS

The camera calibration results are as follows:

Focal Length: fc = [ 2399.95141 2208.98499 ] ± [ 65.88081 131.74685 ]

Principal point: cc = [ 1200.15931 1307.56970 ] ± [ 55.39357 213.85137 ]

Skew: alpha_c = [ 0.00000 ] ± [ 0.00000 ] => angle of pixel axes = 90.00000 ±

0.00000 degrees

Distortion: kc = [ -0.17778 0.13136 -0.00637 0.00124 0.00000 ] ± [ 0.03964

0.09832 0.01405 0.00429 0.00000 ]

Pixel error: err = [ 0.80067 1.64925 ]

These results effect the intrinsic parameters. The camera calibration matrix is given by:

Page 44: B.Tech Thesis

44

R Matrix as shown below is the rotation matrix.

0.999449 -0.033176 0.000906

-0.033184 -0.999377 0.011981

0.000508 -0.012005 -0.999928 Table 4.1 Rotation Matrix elements for camera calibration

T Matrix as shown below is the translation matrix.

T=

Table 4.2 Translation Matrix elements for camera calibration

CI matrix is the camera Intrinsic parameter matrix. This is intrinsic parameter matrix which

depends on the focal length and pixel sizes.

1428.525269 0.000000 1241.642390

0.000000 1427.679987 1035.951211

0.000000 0.000000 1.000000 Table 4.3 Camera Intrinsic Matrix elements for camera calibration

The above results are obtained from the Bouget’s toolbox which uses Zhang’s calibration

algorithm for pinhole camera.

4.2.1.2 LASER CALIBRATION RESULTS

The laser calibration was a two step procedure. This is done by first converting the laser

sensor frame to the tool centre point frame (TCP) . The next step is to convert the points from

the tool centre point frame to the base frame.

The sensor to TCP frame calibration is given by the following matrix :

0.8001 0.5972 -0.0565 -39.2808

0.5996 -0.7989 0.0474 -14.6918

-0.0168 -0.0718 -0.9973 329.3757

0 0 0 1

Table 4.4 Joint Calibration results (1)

-178.640255

161.384730

191.422883

Page 45: B.Tech Thesis

45

After the coordinates are obtained in the TCP frame, the next conversion is to the base frame

using the following matrix.

0.5490 0.8350 0.0169 177.35

-0.8348 -0.5500 0.0178 227.68

-0.0240 0.004 -0.9996 189.67

0 0 0 1

Table 4.5 Joint Calibration results (2)

The multiplication of the above two matrices to the point in the sensor frame converts it into

the reference or the base frame. These values are the optimised values by using least squares

approach.

4.2.2 URG SENSOR RESULTS

As to seek different alternatives to the laser scanner, we used Hokuyo URG laser scanner as

well. This scanner scans in angular form. The error is significantly higher as compared to the

Micro Epsilon SCANControl Laser Scanner which was hence used for all the algorithms and

manipulations.

Fig 4.1 a. Grey scale image Fig b. Depth profile Fig c. Fused segmented image

In depth analysis of measurement errors may be found in the following table.

Page 46: B.Tech Thesis

46

S.no Coordinates w.r.t Depth scanner(mm) x y

3D Coordinates w.r.t camera(mm) x y z

3D Coordinates w.r.t camera(Estimated)(mm) x y z

Error (mm) x y z

1 265 -209 230 -67 36 221 -46 41 9 -21 -5

2 279 -57 87 101 46 66 101 40 21 0 6

3 285 51 -32 -108 56 -43 -127 51 11 19 5

4 269 119 -108 -104 53 -115 -126 72 7 22 -19

5 307 -49 57 -121 22 62 -94 17 -5 -27 5

6 326 -114 128 122 4 102 117 10 26 5 -6

7 325 -98 113 -122 34 115 -124 48 -2 2 -14

8 334 -58 79 -116 36 75 90 25 4 -26 -11

Table 4.5 URG Sensor results

Due to high error and low least count, this was not an ideal sensor for robotic manipulation.

4.2.3 REPEATABILITY TESTS AND ACCURACY ANALYSIS

Robustness of a system can only be checked if the system has high repeatability and a good

accuracy. This plays a vital role in the field of robotics as the least counts are low and

precision is of utmost importance. Repeatability is the variability of the measurements

obtained by one person while measuring the same item repeatedly. This is also known as the

inherent precision of the measurement equipment. Keeping in mind that minimum number of

scanning as ideal, the use of Hough transforms turn out as a more robust scenario. This is due

to its ability to synthesize all the data and fit possible approximate circles to all these points

and not just to perfect circles.

Fig 4.2 Hough Transforms detecting circles for complete data as well as for partial data

Page 47: B.Tech Thesis

47

4.2.4 RESULT ANALYSIS OF DIFFERENT ALGORITHMS

We started this project of pellet picking using robot end effector keeping in mind the design

aspects, speed and accuracy. Along the way, we devised various algorithm that reduces the

number of scanning required for the modeling of the workspace and subsequently the pick up

of the pellet from the stack. This project report mainly discusses the three main algorithms .

These being averaging and finding the centre, point cloud analysis and using Hough

Transforms. The following table gives the experimental results pertaining to the following

algorithms.

ALGORITHM NUMBER OF

PELLETS PICKED

PER SCAN(average)

ACCURACY IN

ESTIMATION OF

POSITION(Pick

Up Point)

ACCURACY IN

PICKING UP THE

PELLET USING

SUCTION GRIPPER

MAXIMUM NUMBER OF PELLETS CORRECTLY LOCATED FROM ONE SCAN LINE

AVERAGING

METHOD

1 78% 76% 4

HOUGH

TRANSFORM

METHOD

5 96% 96% 8

Page 48: B.Tech Thesis

48

CHAPTER 5

CONCLUSION AND FUTURE SCOPE OF WORK

5.1 SUMMARY

This project majorly dealt with pose, orientation and estimation of pellets on the basis of data fusion

from image and laser scan of the workspace. After this is achieved, the pellet is to be picked up from

the occlusion using a suction gripper.

Initially both the sensors required calibration which was achieved with high accuracy by

using optimisation method (Gradient Descent).

Once, data was available in the base frame, we first designed an algorithm that sorted the

top points and used an averaging method and norm calculation to find a pick up point. The

result were accurate only in cases where error values were absent.

For a better and reliable methodology, we introduced to concept of hough transform

implementation on 3D point cloud data for standing pellets and sleeping pellets. In order to

achieve this, we quantized the point cloud data into levels on the basis of height. This

improved the accuracy upto 96% .

Algorithm was derived for finding the surface on a tilted pellet . A better method has been

proposed with the application of PCL (point cloud library). This is essentially curve fitting on

a set of data points.

The least occluded pellet has to be found out. This is so as to reduce the disturbance on the clutter

due to movement of the robot end effector. The application of Hough transform enabled us to

correctly find the centre from the cloud data. This worked for occluded pellets as well and hence

reducing the number of scans required. The final proposed PCL algorithm is a fail proof method to

obtain the axis of the cylinders as well, together with the surface for the pick up.

Page 49: B.Tech Thesis

49

5.2 CONCLUSION

To conclude, an implementation of bin picking of identical objects was carried out. This was

done by design of a robust algorithm that used data from an image (from a camera) and point

cloud (2D laser scanner). Further, in order to reduce the scanning process, Hough transform

was implemented in the project. From the results we found 96% accuracy in locating the

pellet and its pick up. The novelty in this entire research has been the increased pace, Correct

calibration with optimisation methods. The above calibration has not been done yet for

industrial robot and the existing calibration methods which have lesser accuracy have been

mainly devised for sensor fusion in mobile robotics and SLAM. A picking algorithm has also

been devised which is based on Hough transforms application on the cloud data. This enabled

better results and also more increased the ratio of number of pellets picked per scans taken,

hence increasing the speed of the entire process.

It is very important to reduce the time in industrial processes. This can be done when the

number of pellets correctly identified with accurate sensor is high. If this is less then there is a

requirement for another scan which will further take time. Our Hough transform routine takes

care of this aspect locating correctly 8 pellets and picking them with an accuracy of 96% as

the centre calculated were exact. The averaging method is not exactly ideal as it requires

scanning each time a pellet is picked. This is due to the statistical approach in this method

and the result is a function of each and every point in the point cloud data.

5.3 FUTURE SCOPE

A complete fitting of cylinder would be truly accurate when the axis can be determined from

a set of point clouds. There is a lot of future scope pertaining to our specific case as the data

may not be complete and the conventional approaches of cylinder fitting based on complete

models may not suffice. Mathematical methods of planes are being investigated . A method

that would use image data as a foreground-background segmentation and further a voting

pattern may be introduced. After this is achieved, the idea is to fit parametric equations to the

points or maybe use Gaussian distribution to estimate the surface.

Page 50: B.Tech Thesis

50

REFERENCES

Journal / Conference Papers

[1] Z. Zhang. "A flexible new technique for camera calibration", IEEE Transactions on

Pattern Analysis and Machine Intelligence, 22(11):1330-1334, 2000

[2] Nadia Payet and Sinisa Todorovic,"From Contours to 3D Object Detection and Pose

Estimation", 2011 IEEE International Conference on Computer Vision

[3] P. Tiwan, Riby A. Boby, Sumantra D. Roy, S. Chaudhury, S.K.Saha,"Cylindrical Pellet

Pose Estimation in Clutter using a Single Robot Mounted Camera",2013 ACM July 04-

06 2013.

[4] Rigas Kouskouridas, Angelos Amanatiadis and Antonios Gasteratos, "Pose Manifolds

for Efficient Visual Servoing", 2012 IEEE.

[5] Hartley, R. and Zisserman, A. Multiple View Geometry in Computer Vision,

Cambridge University Press, Cambridge 2001.

[6] Q. Zhang and R. Pless, “Extrinsic Calibration of a Camera and Laser Range Finder”,

IEEE Intl. Conference on Intelligent Robots and Systems (IROS) 2004.

[7] Zhang, Z., 1998. A flexible new technique for camera calibration. In: IEEE

Transactions on Pattern Analysis and Machine Intelligence. pp. 133 to 334.

[8] Aliakbarpour, H., Nunez, P., Prado, J., Khoshhal, K., Dias, J., 2009. An efficient

algorithm for extrinsic calibration between a 3d laser range finder and a stereo camera

for surveillance. In: 14th International Conference on Advanced Robotics, Munich,

Germany. pp. 1 to 6.

[9] Fast object localization and pose estimation in heavy clutter for robotic bin picking

Ming-Yu Liu, Oncel Tuzel, Ashok Veeraraghavan, Yuichi Taguchi, Tim K Marks and

Rama Chellappa The International Journal of Robotics Research published online 8

May 2012

[10] Viola P and Jones M (2001) Rapid object detection using a boosted cascade of simple

features. In Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition, vol. 1, pp. 511–518.

[11] Dalal N and Triggs B (2005) Histograms of oriented gradients for human detection.

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,

pp. 886–893.

[12] Ashutosh Saxena, Justin Driemeyer and Andrew Y. Ng, "Learning 3-D Object

Orientation from Images", NIPS workshop on Robotic Challenges for Machine

Learning, 2007.

[13] Eric Royer, MaximeLhuillier, Michel Dhome, and Jean-Marc Lavest. 2007.

"Monocular Vision for Mobile Robot Localization and Autonomous Navigation". Int. J.

Page 51: B.Tech Thesis

51

Comput. Vision 74, 3 (September 2007), 237-260. DOI=10.1007/s11263-006-0023-y

http://dx.doi.org/10.1007/s11263-006-0023-y

[14] Danica Kragic and Markus Vincze. 2009."Vision for Robotics". Found. Trends Robot

1, 1 (January 2009), 1-78. DOI=10.1561/2300000001

http://dx.doi.org/10.1561/2300000001

[15] Michael J. Tarr and Isabel Gauthier. 1999. "Do viewpoint-dependent mechanisms

generalize across members of a class?. In Object recognition in man, monkey, and

machine" MIT Press, Cambridge, MA, USA 73-110.

[16] S. Belongie, J. Malik, and J. Puzicha. 2002."Shape Matching and Object Recognition

Using Shape Contexts". IEEE Trans. Pattern Anal. Mach. Intell. 24, 4 (April 2002),

509-522. DOI=10.1109/34.993558 http://dx.doi.org/10.1109/34.993558

[17] Guan Pang; Neumann, U., "Training-Based Object Recognition in Cluttered 3D Point

Clouds," 3D Vision - 3DV 2013, 2013 International Conference on , vol., no., pp.87,94,

June 29 2013-July 1 2013 doi: 10.1109/3DV.2013.20

[18] Pochyly, A.; Kubela, T.; Singule, V.; Cihak, P., "3D vision systems for industrial bin-

picking applications," MECHATRONIKA, 2012 15th International Symposium , vol.,

no., pp.1,6, 5-7 Dec. 2012

[19] Zisserman, A.; Forsyth, D.; Mundy, J.; Rothwell, C.; Liu, J. & Pillow, N. (1994)."3D

object recognition using invariance", Technical report, Robotics Research

Group,University of Oxford, UK.

[20] Zerroug, M. & Nevatia, R. (1996)."3-D description based on the analysis of the

invariant and cvasi-invariant properties of some curved-axis generalize dcylinders",

IEEE Trans.Pattern Anal. Mach. Intell., vol. 18, no. 3, pp. 237-253.

[21] Trung-Thien Tran, Van-Toan Cao, Denis Laurendeau. “Extraction of cylinders and

estimation of their parameters from point clouds” Computers & Graphics, Feb @015,

Pages 345-357

[22] Tahir Rabbani, Frank can den Heuvel, “Efficient Hough Transform for Automatic

Detection of Cylinders in Point Clouds” ISPRS Workshop ,”Laser Scanning ”

September 12-14, 2005.

Page 52: B.Tech Thesis

52

PROJECT DETAILS

Student Details

Student Name Susobhit Sen

Register Number 110921352 Section / Roll

No

A / 30

Email Address [email protected] Phone No (M) 09971798476

Project Details

Project Title Robotic Manipulation of Objects Using Vision

Project Duration 24 weeks Date of reporting 18th

Jan,2015

Organization Details

Organization Name Indian Institute of Technology, Delhi (IIT Delhi)

Full postal address

with pin code

Hauz Khas, New Delhi, 110016

Website address www.iitd.ac.in

Supervisor Details

Supervisor Name Dr. (Prof) Santanu Chaudhury

Designation Professor, Electrical Engineering

Full contact address

with pin code

Electrical Engineering Department, IIT Delhi, Hauz Khas, New Delhi,

110016

Email address [email protected] Phone No (M) 011-26512402

Internal Guide Details

Faculty Name Mr. P Chenchu Sai Babu

Full contact address

with pin code

Dept of Instrumentation and Control Engineering, Manipal Institute of

Technology, Manipal – 576 104 (Karnataka State), INDIA

Email address [email protected]