Android : shape Classification using OpenCV,JavaCV and SVM

Android :SimpleShape Recognition

usingOpenCV,JavaCV

Pi19404

April 9, 2013

Contents

Contents

Android JavaCV:Simple Shape Recognition using OpenCV,JavaCV 3

0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.2 Recognizing Gesture Shape . . . . . . . . . . . . . . . . . . . . . 3

0.3 Data Pre Processing . . . . . . . . . . . . . . . . . . . . . . . . 3

0.4 Gesture Normalization . . . . . . . . . . . . . . . . . . . . . . . . 40.4.1 Registering candidate . . . . . . . . . . . . . . . . . . . 4

0.4.2 Rejecting invalid Gesture . . . . . . . . . . . . . . . . . 5

0.4.3 Re-sampling points . . . . . . . . . . . . . . . . . . . . . . 5

0.4.4 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

0.4.5 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . 70.4.6 Feature Extraction : Histogram of Oriented Gra-

dients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80.5 Classification Task . . . . . . . . . . . . . . . . . . . . . . . . . . 90.6 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . 11

0.6.1 Implementation details of HOG . . . . . . . . . . . . . . 11

0.6.2 libSVM File . . . . . . . . . . . . . . . . . . . . . . . . . . . 150.7 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 | 16

Android JavaCV:Simple Shape Recognition using OpenCV,JavaCV

Android JavaCV:Simple ShapeRecognition using OpenCV,JavaCV

0.1 IntroductionIn this article a simple gesture shape recognition tecnhique is ex-plored. The features used to represent the gestures are Histogramof Gradients and SVM classifier will be used as discriminative clas-sifier to classify the gesture shape.

0.2 Recognizing Gesture Shape

In earlier artile 10 HOG was used to recognize the shape.The sameapproach will be used to recognize the gestures.The feature extrac-tion and training is performed on desktop . The training generatesa SVM model file.

This file is copied to the AndroidGesture directory on the mobiledevices and will be used by the SVM prediction code.

The SVM code is available in java 11The code is used with slightmodification for purpose of prediction.

0.3 Data Pre Processing

The data pre processing steps are the same ones in $1 gesturerecognizer described in the earlier article only the method to com-pute the similarity between the candidate and template has beenchanged from using euclidean path distance to using FAST DTWalgorithm.

The pre-processing steps are included below.

3 | 16


0.4 Gesture NormalizationThe template and the candidate points may contain different num-ber of sampled points,they may differ in size and spatial locationare not the same ie the template and candidate points would notline up. Hence the first step is to perform gesture normalizationso that candidate and template points are transformed such thatthey can be compared this pre processing step is called as gesturenormalization.

The aim to transform the gesture so that they are invariantto translation and rotation

0.4.1 Registering candidate

The first step is to capture the candidate template.This step iscalled registering the candidate. As mentioned earlier the numberof points captured would depend on the device spatial resolution.

The gesture capture processing is defined to be in one of thethree states dragged,released,start.

The start state indicates that gesture has started and to clearany previous information stored. The dragged state indicates thatunistroke is being performed without lifting the figure and the2D co-ordinates of gesture are being captured.The released stateindicates that figure has been lifted and gesture capture processhas been completed and to start with gesture recognition process.

The class AndroidDollar defines android routines to capture thetouch gesture performed by the user .

The class Data Capture provide high level interface to capturethe data and to initiate gesture recognition.AndroidDollar classcontains a instance of DataCapture and whose methods are calledbased on the touch event detected by the userThe java class DataVector is defined which captures the 2D co-ordinate information of drawn gesture. The DataCapture Classcontains instance of DataVector.

4 | 16


The class PUtils contains all the methods for gesture pre pro-cessing and recognition. The DataCapture class contains instanceof PUtils class .

0.4.2 Rejecting invalid Gesture

A simple check is incorporated to check whether the gesture wasintentional or not by specifying a path length criteria.If the pathlength of the gesture is less than a specified threshold no furtherprocessing is performed and no gesture recognized status will bedisplayed.

The PathLength method defined in the PUtils class simply com-putes the sum of linear distances between all the adjacent pointsof the captured/template gesture.

1 public double PathLength(Vector points)

2 {

3 double length = 0;

4 for (int i = 1; i < points.size(); i++)

5 {

6 length += Distance (( Point) points.elementAt(i - 1), (Point)

points.elementAt(i));

7 }

8 return length;

9 }

In the present implementation the path length threshold used in100. The code is implemented by the method PathLength is thePUtils Class.

0.4.3 Re-sampling points

Once the gesture has been captured before the process of compar-ing the candidate gesture with template gesture some pre processingoperations are performed .Resampling is one such operation.

The re-sampling operations selects from the provided candidate/tem-plate gesture a fixed subset of points. This ensures than candidateand template have the same number of points enabling us to perform

5 | 16


point based comparison.

The method used for sampling the data points is uniform sampling.Thepath length is divided by the number of re-sampled points.This willbe the interval length between the points.

We start with the initial point and next point is selected suchthat distance between points is greater than of equal to intervallength.Let points be labeled pt1 and pt2

A linear path is assume to exist between adjacent sample points.

using simple trigonometric relationship of sin and cos we can es-timate of location at distance of uniform path interval which liesbetween pt1 and pt2.

This new co-ordinate replaces pt2 in the candidate/template co-ordinate array and the same process is repeated till the last pointof the co-ordinate array is reached.

1 double d = Distance(pt1 , pt2);

2 if ((D + d) >= I)

3 {

4 // computation of new co -ordinate using cos relationship

5 double qx = pt1.x + (I / (D+d)) * (pt2.x - pt1.x);

6 // computation of new co -ordinate using sin relationship

7 double qy = pt1.y + (I / (D+d)) * (pt2.y - pt1.y);

8 Point q = new Point(qx , qy);

9 // adding the point in resampled array

10 dstPts.addElement(q);

11 // replacing the point in the source array

12 srcPts.insertElementAt(q, i);

13 // resetting cumulative distance

14 D = 0.0;

15 }

16 else

17 {

18 // computing cumulative distance

19 D=D+d;

20 }

This is implemented by the method Re-sample in the PUtils Class.

In the present implementation the number of re-sampled pointsused is 32.

6 | 16


0.4.4 Scaling

The next pre-processing step is to scale the co-ordinates suchthat third width and height remain within a fixed bounds.First thebounding width and height of current set of points are computedwhich is simply max(x) � min(x) and max(y) � min(y).The new widthand height are denoted by W and H and all the points are scaledby factor W=max(x)�min(x) and H=max(y)�min(y)

1 Rectangle B = BoundingBox(points);

2 Vector newpoints = new Vector(points.size());


4 {

5 Point p = (Point)points.elementAt(i);

6 double qx = p.x * (size / B.Width);

7 double qy = p.y * (size1 / B.Height);

8 newpoints.addElement(new Point(qx , qy));

9 }

This provides step provides invariance wrt scaling since all gesturesare bounded to lie within rectangle of same size.This is implementsby method ScaleToSquare1 in PUtils class.

In the present implementation the scale is done so than boundingbox is a square of dimension 250.

The above method perform uniform scaling another method of scalingis non uniform scaling that maintains the aspect ratio. Compute theration between the width and height or viceversa of the boundingrectangle if ratio is close to 0 than 1 then perform non uniformscaling else perform uniform scaling.

The ScaleDimTo method implements this in the PUtils Class.

0.4.5 TranslationThe first step required is computation of mean/centroid of set ofco-ordinate location.

1 Enumeration e = points.elements ();

2 while (e.hasMoreElements ())

3 {

4 Point p = (Point)e.nextElement ();

5 xsum += p.x;

6 ysum += p.y;

7 }

7 | 16


8 return new Point(xsum / points.size(), ysum / points.size());

This is implemented by the method Centroid in class PUtils.

The next step is to translate all the points such that centroidlies at the origin of co-ordinate system. Translate all the pointsby (�centroidx;�centroidy).

1 Point c = Centroid(points);

2 Vector newpoints = new Vector(points.size());


4 {

5 Point p = (Point)points.elementAt(i);

6 double qx = p.x - c.x;

7 double qy = p.y - c.y;

8 newpoints.addElement(new Point(qx , qy));

9 }

This is implemented in the method TranslateToOrigin in the PUtilsClass.

After the completion of gesture normalization the candidate gesturepoints are plotted on plImage data structure available in java CV.

And the image is passed to the Feature extraction method.

0.4.6 Feature Extraction : Histogram of OrientedGradients

Histogram of oriented gradient features are used to represent theimage.

As the name suggests it a histogram of gradients in differentorientation directions. The Hog descriptor has become one of themost popular low-level image representations in computer vision. Lo-cal shape information often well described by the distribution ofintensity gradients or edge directions even without precise infor-mation about the location of the edges themselves.Shape informationis encoded by HOG and spatial information is encoded by sliding

8 | 16


windows

The derivative of images is computed along x and y directions.ApplyingCartesian to polar transformation we obtain magnitude and orien-tation of the gradient at every point of the image.

We consider the orientations along 9 orientation directions,a ori-entation resolution of 200 . We compute the histogram of orientedgradients that lie along these predefined orientations. This willgive a feature vector of length 9.

The image is subdivided into block and HOG is computed overthe each block To encapsulate correlation amongst neighborhoodblock simple techniques of sliding windows is used.

To speed up the computation integral images are used to com-pute sum of pixels ie histogram bin count quickly over the windows.

The most basic features are raw pixel feature .If we used rawpixels directly we would get a very long feature vector.Using HOGfeature representation we get a relatively small feature represen-tation.Below is a example of HOG feature vector computed for 9 images

and relatively small training set of 20 samples per class are usedfor training.The training time required is also very small can beperformed in real time. Following are the parameters :

1. Image Size : 160x120

2. Num of Orientations : 12

3. Number of Blocks : 3x3=9

4. Feature descriptor : 12x9=108

The descriptors can be computed for set of training images,theimage as well class label is written to the csv file.

0.5 Classification TaskGiven feature set corresponding to unknown gesture we are re-quired to classify it to one of the known classes. We will use SVMas a classification tool.LibSVM software package is used to train the

9 | 16


Figure 1: Normalized gesture plots

classifier and test accuracy for a small gesture set.We will use thejava classes provided by LibSVM software package.

The file earlier generate is in format suitable for LibSVM pack-age.

The file contains the feature label number followed by the featurevalue.The First task is to perform feature scaling so that all the featurelie in a predefined range This is required to ensure than featurevalue along one dimension does not bias the classifier.

Along each dimension the features are scaled to lie in a fixedrange .The default value of provided by the LibSVM tool is (-1,1).

1. train.file - input training data filename

2. test.file - input test data filename

3. t1.range - file containing feature scaling parameters

4. t1.scale - output data file name after performing feature scalingon training data file

5. t2.scale - output data file name after performing featurescaling on test data file

6. t1.model - SVM classifier model file

10 | 16


command to perform feature scaling isJava svm_scale -s t1.range train.file > t1.scale

command to train the classifierjava svm_train t1.scale t1.model

Command for feature scaling the test datajava svm_scale -r t1.range test.file > t2.scale

command to perform prediction on test dataJava svm_predict t2.scale t1.model test.out

we obtain perfect classification on this small data set.

About 300 samples of each class were used for training.The shapeincluded in the training stage consist of high variability .

0.6 Implementation Details

The java version of the code to extract HOG features and SVMis used as it provides simpler interface with android. In the laterarticle native C/C++ version of the code will be included.

0.6.1 Implementation details of HOG

The code used for HOG is based on the paper by Ludwig et al.,2009 ,the matlab code provided at Ludwig, 2010 is used with somemodification.

The HOG code is a generic code which can also used to rep-resent textured objects.

The first step of HOG is to compute first order gradients along xand y directions.

The below code creates the filter for computing first order deriva-tives along x and y dirctions

1 hx=cvCreateMat (1,3,CV_32F);

2 FloatPointer hxf=hx.data_fl ();

3 hxf.put(0, -1);

4 hxf.put(1, 0);

5 hxf.put(2, 1);

6 hy=cvCreateMat (3,1,CV_32F);

11 | 16


7 FloatPointer hyf=hy.data_fl ();

8 hyf.put(0, 1);

9 hyf.put(1, 0);

10 hyf.put(2, -1);

The below code create a floating point representation of theimage and computes the first order derivatives along the x and ydirections.

From the derivatives cartesian to polar transformation is per-formed to obtain the gradient magnitude and orientation

The orientation is normalized to lie between �pi to pi.1 cvConvertScale(Im,Im1 ,1.0 ,0.0);

2 /** computing gradient along x and y directions **/

3 cvFilter2D(Im1 , grad_xr ,hx ,cvPoint (1,0));

4 cvFilter2D(Im1 , grad_yu ,hy ,cvPoint (-1,-1));

5 /** cartesian to polar transformation **/

6 cvCartToPolar(grad_xr , grad_yu , magnitude , orientation ,0);

7 /** normalization of orientation **/

8 cvSubS(orientation ,cvScalar(pi,pi ,pi ,0),orientation ,null);

If the image is color image then orientation corresponding to thedominant channels is extracted.

1 cvSplit( orientation , I1, I2, I3 , null );

2 cvSplit( magnitude , I4 , I5 , I6 , null);

3 FloatBuffer I4b = I4.getFloatBuffer ();// magnitude

4 FloatBuffer I5b = I5.getFloatBuffer ();


6 FloatBuffer I1b = I1.getFloatBuffer ();// orientation



9

10 while(i1<I4b.capacity ())

11 {

12 float pt;

13 float pt1=I4b.get(i1);float pt2=I5b.get(i1);float pt3=I6b.get(i1);

14 float max = pt1;

15 if (pt2 > max) {I4b.put(i1 ,pt2);I1b.put(i1,I2b.get(i1));} /* end

if */

16 if (pt3 > max) {I4b.put(i1 ,pt3);I1b.put(i1,I3b.get(i1));}

17 I1b.put(i1,pt);

18 i1++;

19 }

20 cvCopy(I4 ,magnitude1 ,null);

21 cvCopy(I1 ,orientation1 ,null);

12 | 16


The next step is to create a IplImage data structure correpondingto each histogram bin and integral image corresponding to eachhistogram bin.

1 IplImage bins []= new IplImage[B];

2 IplImage integrals []= new IplImage[B];

3 for (int i = 0; i < B ; i++) {

4 bins[i] = cvCreateImage(cvGetSize(magnitude1), IPL_DEPTH_32F ,1);

5 cvSetZero(bins[i]);

6 }

7 for (int i = 0; i < B ; i++) {

8 integrals[i] = cvCreateImage(cvSize(size.width ()+1,size.height ()

+1),IPL_DEPTH_64F ,1);

9 cvZero(integrals[i]);

10 }

Next IplImage correponding to each histogram orientation is populatedby segmenting the original magnitude image.

Next compute the integral image representation for each of magnitueimage corresponding to each histogram bins.

1 temp_gradient=ptr1b.get(index); // magintude and gradient image

values

2 temp_magnitude=ptr2b.get(index);

3 for(int i=0;i<B;i++)

4 {

5 if ( temp_gradient <= -pi+(((i+1) *2*pi)/B)) {

6 ptrs[i].put(index ,temp_magnitude);

7 }

8 /** compute integral image for each orientation image **/

9 for (int i = 0; i <B ; i++){

10 cvIntegral(bins[i], integrals[i],null ,null);

11 }

As mentioned earlier image is divided into blocks and HOG featureis required to be computed for each of the blocks seperately.

The function calculateHOG_rect method in HOG3_Fast class per-forms this operation using integral image representation.

The input to the methods are x,y co-ordinates of starting point,width and height of block and integram images

If A,C,B,D are the corners of image block in clockwise direction

13 | 16


then the mean value of magnitudes needs to be computed withingthis block corresponding to specific orientation image.

The method performs integral computation based on the formula

I = I(A) + I(B)� I(C)� I(D): (1)

This computation is perform for each orientation integral image.Thiswill provide us with 9 values for each block of image.After computa-tion L2 normalization is performed on the values.

This will normalize the histogram values to lie between 0 and1.

1 for (int i = 0; i < B ; i++){

2 IplImage a1=integrals[i];

3 DoubleBuffer da1=integrals[i]. getDoubleBuffer ();

4 double a =da1.get((cell.y()+0)*a1.width ()+cell.x());

5 double b = da1.get((cell.y()+cell.height ())*a1.width()+cell.x()+

cell.width());

6 double c = da1.get((cell.y()+0)*a1.width ()+cell.x()+cell.width());

7 double d = da1.get((cell.y()+cell.height ())*a1.width()+cell.x()+0)

;

8 double f=( float)((a+b)-(c+d));

9 hog_cell.put(i, f);

10 }

11 cvNormalize(hog_cell , hog_cell , 1, 0, 4,null);

After obtaining the N descriptors for each block they are con-catenated and considering 9 blocks we will have a final descriptorlength of 9N.

SVM prediction routines are called to predict the class which isrepresented by the feature vector.

The output of prediction routine is class lable and probability.

Even if some gesture not belonging to defined gesture set isperformed it will generate a class label and probability.Only gesturesabove certain probability threshold will be considered as valid ges-ture.

For each gesture different probability thresholds are used basedon test data.

14 | 16


0.6.2 libSVM FileThe files for svm predication and scaling taken from the libsvm pack-age are svm_model.java,svm_node.java,svm_parameter.java,svm_predict.java,svm.java,svm_problem.java.

feature_scale.java is a file written refering to libsvm files whichperforms feature scaling before classification is performed.

classifier class defined in classifier.java provides a high level interfaceto perform feature extraction,feature scaling and classification.

The output of the classification routine is a class label repre-senting the shape.

The SVM training and feature scaling files is placed on the sdcardand will be used by SVM routines

0.7 CodeThe code can be found in code repository https://github.com/

pi19404/m19404/tree/master/Android/AndroidOpenCV or https://code.google.

com/p/m19404/source/browse/Android/AndroidOpenCV. The svm trainingand scaling files are placed in the svm directory of repository.Copythe files to AndroidShapeClassifier on the mobile sdcard.

15 | 16

https://github.com/pi19404/m19404/tree/master/Android/AndroidOpenCV

https://github.com/pi19404/m19404/tree/master/Android/AndroidOpenCV

https://code.google.com/p/m19404/source/browse/Android/AndroidOpenCV

https://code.google.com/p/m19404/source/browse/Android/AndroidOpenCV

Bibliography

Bibliography

[1] Lisa Anthony and Jacob O. Wobbrock. �A lightweight multistroke recognizer foruser interface prototypes�. In: Proceedings of Graphics Interface 2010. GI '10. Ot-tawa, Ontario, Canada: Canadian Information Processing Society, 2010, pp. 245�252. isbn: 978-1-56881-712-5. url: http://dl.acm.org/citation.cfm?id=1839214.1839258.

[2] Dynamic Time Warping. at http://web.science.mq.edu.au/~cassidy/

comp449/html/ch11s02.html.

[3] Ricardo Gutierrez-Osuna. �Introduction to Speech Processing�. In: CSE@TAMU.

[4] O. Ludwig et al. �Trainable classi�er-fusion schemes: An application to pedestriandetection�. In: Intelligent Transportation Systems, 2009. ITSC '09. 12th Interna-

tional IEEE Conference on. 2009, pp. 1 �6. doi: 10.1109/ITSC.2009.5309700.

[5] Oswaldo Ludwig. HOG descriptor for Matlab. 2010. url: http://www.mathworks.in/matlabcentral/fileexchange/28689-hog-descriptor-for-matlab.

[6] Salvador Stan and Chan Philip. �Toward accurate dynamic time warping in lineartime and space�. In: vol. 11. 5. Amsterdam, The Netherlands, The Netherlands:IOS Press, Oct. 2007, pp. 561�580. url: http://dl.acm.org/citation.cfm?id=1367985.1367993.

16 | 16

http://dl.acm.org/citation.cfm?id=1839214.1839258


http://web.science.mq.edu.au/~cassidy/comp449/html/ch11s02.html

http://web.science.mq.edu.au/~cassidy/comp449/html/ch11s02.html

http://dx.doi.org/10.1109/ITSC.2009.5309700

http://www.mathworks.in/matlabcentral/fileexchange/28689-hog-descriptor-for-matlab

http://www.mathworks.in/matlabcentral/fileexchange/28689-hog-descriptor-for-matlab



Android : shape Classification using OpenCV,JavaCV and SVM

Documents

Transcript of Android : shape Classification using OpenCV,JavaCV and SVM