[IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011...

5
On the discriminative power of different feature subsets for handwritten numeral recognition using the box-partitioning method Seba Susan Veerendra Singh [email protected] [email protected] Department of Information Technology, Delhi Technological University, New Delhi, India Abstract— This paper proposes to find an optimal feature set for handwritten numeral recognition using the box partitioning method. The feature under study is the mean normalized distance measure which is the most popular descriptor in this regard. However it is always used in combination with other descriptors and does not give good classification results when used on its own. The descriptor vector is obtained by partitioning the numeral image into sub- boxes and computing the distance measure from each sub-box taken in order. A series of evaluations is carried out in this work to verify the optimal size and number of sub-boxes by subjecting the resulting feature vectors to a rigorous handwritten numeral classification test, using a simple MLP neural network classifier. It is proved in our work that better results are obtained when the number of partitions along the horizontal and vertical axis of the image is fixed, rather than the conventional technique of arbitrarily dividing the image into sub-boxes of pre-defined dimensions. Keywords- Box Partitioning Method, Feature Descriptor, Mean Normalized distance measure, MLP Neural Network. I. INTRODUCTION The recognition of handwritten numerals finds wide applications in the field of pattern recognition and document analysis and plays a significant role in the verification of documents in banking applications and for detection of forgery. The recognition of handwritten numerals by an automated program is a challenging task because the writing style of every person is different and the problem is compounded by the diversity in shapes and sizes of the numerals. This makes the recognition of handwritten numerals an intensive area of research in pattern recognition. Earlier systems based on optical character recognition (OCR) could recognize only the handwritten numerals of fixed size. But, 100 percent efficiency is out of reach with these systems. In [1] the recognition of handwritten numerals is done based on a fuzzy model with the help of exponential membership functions. The feature extraction method used is the simple and efficient box- partitioning technique which consists of partitioning each numeral image into a number of sub-boxes and computing a set of features from each sub-box. Some examples of feature based handwritten numeral recognition techniques are given in [2-4]. Histograms and chain codes are used with neural network and statistical classifiers in [8] for successful classification while transitions from foreground to background pixels in vertical and horizontal directions are used as the descriptors for classification in [9]. A feature extraction technique that extracts the orientation from the numeral image and later is coupled with neural network classifiers is given in [5], while in [6] an OCR based approach is discussed that relies on local features and combines neural network and learning vector quantization for increasing the recognition rate. In [7] the numeral images are partitioned into homogeneous regions and the number of vertical, horizontal and diagonal divisions is fed to a classifier to identify the numeral image. Some shape based techniques of handwritten numeral recognition are outlined in [11, 13] with prominence given to ring and sector shapes for recognition. Different classifiers have been used for the classification of handwritten numerals, the popular choices being neural networks [6,9,10] and the nearest neighbor classifier [13]. Robust techniques which are tolerant to distortion in shape of the numeral are proposed in [14 -17]. In this paper we explore the different possible feature subsets for the box-partitioning or block based method used in [1,18,19] and determine through evaluations, the most optimal sub-box arrangement for feature extraction, for achieving a high recognition rate. The box partitioning approach is a simplified zoning [17] technique in which the numeral is enclosed inside a rectangular box and partitioned into equal sized sub-boxes rather than having different partitions for different sections of the image. The mean normalized distance measure is the feature selected for our study due to its relative popularity and higher recognition rate [1,19]. A simple MLP neural network is used as the classifier in our work and a small but robust database of handwritten numerals is created

Transcript of [IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011...

Page 1: [IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011 Annual IEEE India Conference - On the discriminative power of different feature subsets

On the discriminative power of different feature subsets for handwritten numeral recognition using the box-partitioning method

Seba Susan Veerendra Singh [email protected] [email protected]

Department of Information Technology, Delhi Technological University,

New Delhi, India

Abstract— This paper proposes to find an optimal feature set for handwritten numeral recognition using the box partitioning method. The feature under study is the mean normalized distance measure which is the most popular descriptor in this regard. However it is always used in combination with other descriptors and does not give good classification results when used on its own. The descriptor vector is obtained by partitioning the numeral image into sub-boxes and computing the distance measure from each sub-box taken in order. A series of evaluations is carried out in this work to verify the optimal size and number of sub-boxes by subjecting the resulting feature vectors to a rigorous handwritten numeral classification test, using a simple MLP neural network classifier. It is proved in our work that better results are obtained when the number of partitions along the horizontal and vertical axis of the image is fixed, rather than the conventional technique of arbitrarily dividing the image into sub-boxes of pre-defined dimensions.

Keywords- Box Partitioning Method, Feature Descriptor, Mean Normalized distance measure, MLP Neural Network.

I. INTRODUCTION The recognition of handwritten numerals finds wide applications in the field of pattern recognition and document analysis and plays a significant role in the verification of documents in banking applications and for detection of forgery. The recognition of handwritten numerals by an automated program is a challenging task because the writing style of every person is different and the problem is compounded by the diversity in shapes and sizes of the numerals. This makes the recognition of handwritten numerals an intensive area of research in pattern recognition.

Earlier systems based on optical character recognition (OCR) could recognize only the handwritten numerals of fixed size. But, 100 percent efficiency is out of reach with these systems. In [1] the recognition of handwritten numerals is done based on a fuzzy model with the help of exponential membership functions. The feature

extraction method used is the simple and efficient box-partitioning technique which consists of partitioning each numeral image into a number of sub-boxes and computing a set of features from each sub-box. Some examples of feature based handwritten numeral recognition techniques are given in [2-4]. Histograms and chain codes are used with neural network and statistical classifiers in [8] for successful classification while transitions from foreground to background pixels in vertical and horizontal directions are used as the descriptors for classification in [9]. A feature extraction technique that extracts the orientation from the numeral image and later is coupled with neural network classifiers is given in [5], while in [6] an OCR based approach is discussed that relies on local features and combines neural network and learning vector quantization for increasing the recognition rate. In [7] the numeral images are partitioned into homogeneous regions and the number of vertical, horizontal and diagonal divisions is fed to a classifier to identify the numeral image. Some shape based techniques of handwritten numeral recognition are outlined in [11, 13] with prominence given to ring and sector shapes for recognition. Different classifiers have been used for the classification of handwritten numerals, the popular choices being neural networks [6,9,10] and the nearest neighbor classifier [13]. Robust techniques which are tolerant to distortion in shape of the numeral are proposed in [14 -17].

In this paper we explore the different possible feature subsets for the box-partitioning or block based method used in [1,18,19] and determine through evaluations, the most optimal sub-box arrangement for feature extraction, for achieving a high recognition rate. The box partitioning approach is a simplified zoning [17] technique in which the numeral is enclosed inside a rectangular box and partitioned into equal sized sub-boxes rather than having different partitions for different sections of the image. The mean normalized distance measure is the feature selected for our study due to its relative popularity and higher recognition rate [1,19]. A simple MLP neural network is used as the classifier in our work and a small but robust database of handwritten numerals is created

Page 2: [IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011 Annual IEEE India Conference - On the discriminative power of different feature subsets

especially for our experiments. The paper is organized as follows. Section II describes the mean normalized distance feature for handwritten numeral recognition and the box partitioning method. Section III describes the feature extraction algorithm used for our experiment. Section IV contains the experimental results and discussions. Finally the conclusion from overall results is given in section V.

II. THE MEAN NORMALIZED DISTANCE FEATURE AND THE BOX PARTITIONING METHOD

The features extracted from the numeral image play an important role in determining the recognition rate and reducing the misclassification errors. In this paper we focus on the box-partitioning method [1,18,19] as the feature extraction technique due to its ease of implementation and since its pixel based calculations do not involve any computationally intensive operations, and the mean normalized distance as the feature under study. An additional advantage of the selected feature is that it is seldom used by itself due to poor results and thus provides an ideal case study for the box-partitioning technique. The box-partitioning or block-based method is described next. The numeral image is binarized with the numeral pixels in black against a white background, and the numeral is then bounded by a box which is normalized to a fixed size image having aspect ratio of approximately 1.5. The resulting image is partitioned into a pre-defined number of sub-boxes. Let the total number of numeral pixels inside a sub-box be denoted by L. Then di is defined as the Euclidean distance of the numeral pixel i (1≤ i ≤L) having coordinates (k,l), from the origin of the sub-box having coordinates (r,c) and it is given by,

12 2 2(( ) ( ) )id k r l c= − + − (1)

Fig. 1 shows a single sub-box with the numeral pixels i=1,2…,L outlined in black. The scalar distances d1,d2,…..,dL are computed using (1) from the origin O of the sub-box which is the lower right corner of the sub-box as shown in Fig. 1.

Fig 1: Measuring the distances d1,d2……dL in each sub-box The mean normalized distance measure d for each sub-box is obtained by computing the mean of the normalized value of di given in (1) for i=1,2…,L as shown below,

1

1 min( )(max( ) min( ))

Li i

i ii

d ddL d d=

−=−∑ (2)

where max(di) and min(di) represent the maximum and minimum values of di respectively for i=1,2,…,L. The Euclidean distance measure di is extracted from each sub-box for i=1,2…,L using (1) and normalized and averaged over all di’s as shown in (2) to generate a single distance measure for each sub-box. The resulting feature vector {d} generated from all the sub-boxes is given as the input to the classifier. The aspect ratio of the numeral image is set to about 1.5 on the lines of research work done in [1].

III. FEATURE EXTRACTION ALGORITHM

The steps for feature extraction using the box-partitioning method are given below. Step 1. The numeral images are captured by a digital

camera.

Fig. 2 Captured image

Step 2. The numeral image is binarized such that black pixels represent the numeral against a white background

Fig. 3 Binarized image

Step 3. The image is smoothed to connect all unconnected pixels using morphological operations. This ensures that there are no ‘extra’ pixels and the numeral is not broken up at any place along its curvature.

Fig. 4 Smoothened image

Step 4. Crop the numeral from the image by defining a bounding box based on the coordinates of the first and last pixels encountered along the horizontal and vertical directions.

Fig. 5 Cropped image

Page 3: [IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011 Annual IEEE India Conference - On the discriminative power of different feature subsets

Step 5. Normalize the cropped image to a pre-defined size of aspect ratio approximating 1.5.

Fig. 6 Normalized image

Step 6. The normalized image is partitioned into a pre-defined number of sub-boxes. Each sub-box contains some or nil portion of the numeral as shown in Fig. 7.

Fig 7. The 16 sub-boxes for the ‘0’ shown in Fig. 6

Step 7. Compute the mean normalized Euclidean distance measure d for each sub-box as explained in Section II. The resulting feature vector {d}, is applied as input to the MLP classifier. The mean normalized distance measure d for each sub-box in Fig. 7 is shown in Fig. 8 in the same order. The sub-boxes that do not contain any portion of the numeral are assigned the value of ‘10’. Fig 8. The mean normalized distance feature computed for the 16 sub-boxes shown in Fig. 7 Step 8. The test inputs are applied after the training phase and the percentage of correct classification is calculated.

IV. EXPERIMENTAL RESULTS AND DISCUSSION

A comprehensive evaluation is carried out in this work to determine the most optimal feature subset for high accuracy in classification of handwritten numerals. In literature, the sub-boxes used by convention are square in shape and their size and number are determined arbitrarily [1]. In this work we aim to determine the most optimal arrangement of sub-boxes for high classification accuracy, under the constraint that the numeral image has an aspect ratio (AR) of around 1.5. The experiments are performed on a small but diverse database of handwritten numeral images shown in Fig. 9, captured by a digital camera (resolution 10 Mega Pixels and 3x optical zoom). The numerals selected for the tests are ‘0’, ‘7’ and ‘8’ due to the structural similarities among them.

Fig. 9 Numeral Images for our experiment The software used for our experiments is the MATLAB version 7.9, run on a system configured with Intel processor of speed 2.63 GHz. A simple Multi-layer Perceptron (MLP) neural network is used as the classifier for our experiments, the parameters of which are listed in Table 1. The simple structured MLP classifier together with the small and diverse database is used for our experiments in order to study the worst case scenario for classification. Out of the ten images for each numeral in Fig. 9, the first three are assigned for training purpose and the rest of seven are the test images, defining a rigorous test condition in order to grade the different tests performed.

Table 1. Parameters of the MLP classifier Number of hidden layers 1

Number of output layers 1

Number of neurons in hidden layer 2

Number of neurons in output layer 1

LR(Learning rate) 0.02

Є(Error threshold) 0.0001

Table 2 shows the experimental set up for the ten tests

labeled T1,T2,T3,T4,T5,T6,T7,T8,T9,T10.The experimental setup indicates the partitioning plan for the boxed numeral image with subdivisions along horizontal (H) and vertical directions (V) marked separately (column 1 of Table 2). The row corresponding to test T4 in Table 2 represents the conventional setup for feature extraction in [1] wherein square sub-boxes of fixed size are used to partition the whole image leading to unequal number of boxes (6,4) along H and V directions. The various tests performed are grouped into three main categories:

1. Increasing the number of sub-boxes and comparing square and rectangular shaped sub-boxes (T1-T4) 2. A Hierarchical arrangement of sub-boxes (T5, T6) 3. Increasing the feature dimension from each sub-box by considering multiple origins in each sub-box as shown in Fig. 10. (T7-T10 for the figures shown in Fig. 10 (i) to (iv) respectively)

0.5366 0.4815 0.5388 0.4304 0.5173 10 10 0.4968 0.4888 10 10 0.5359 0.4966 0.4374 0.3118 0.3436

Page 4: [IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011 Annual IEEE India Conference - On the discriminative power of different feature subsets

Table 2. The experimental setup for the various tests performed

Test Number of Sub-

divisions, (HxV)

Number Of

origins

Size of Feature Vector

Size of Sub-box AR of Sub-box

Size of Numeral Image

AR of Numeral Image

T1 9,(3x3) 1 9 12x10 1.2 42x30 1.4 T2 16,(4x4) 1 16 10x8 1.25 40x32 1.25 T3 25,(5x5) 1 25 8x6 1.33 40x30 1.33 T4 24,(6x4) 1 24 7x8 0.875 42x32 1.3125

T5 1,(1x1) 4,(2x2) 9,(3x3)

1 13 42x32 21x16 12x10

1.3125 1.3125

1.2

42x32 42x32 42x30

1.3125 1.3125

1.4

T6 9,(3x3) 16,(4x4) 1 25 12x10

10x8 1.2

1.25 42x30 40x32

1.4 1.25

T7 16,(4x4) 2 32 10x8 1.25 40x32 1.25 T8 16,(4x4) 2 32 10x8 1.25 40x32 1.25 T9 16,(4x4) 2 32 10x8 1.25 40x32 1.25

T10 16,(4x4) 3 48 10x8 1.25 40x32 1.25

Table 3. The results of evaluation by various tests:

A=Number of test samples (out of 7) correctly classified, B=Percentage of correct classification

(i) (ii) (iii) (iv)

Fig 10: Multiple Corners of the sub-box considered as origin for our experiments: ( (i) to (iii) Two origins per sub- box (iv)Three origins per sub-box

It is observed from the classification results of various tests in Table 3 that the choice of the number and size and shape of sub-boxes used indeed play a significant role in determining the recognition rate. It is firstly observed that the conventional technique of pre-defining the size of the sub-box irrespective of the layout of the numeral image (as seen for T4) fails to perform under the given rigorous conditions. It is observed from the results of test T2 in Table 3 that the

number of partitions of the boxed numeral along the horizontal and vertical directions, that yields the highest classification accuracy (80.93%), is four resulting in a total of 16 sub-boxes. The recognition rate falls as the number of sub-boxes are increased or decreased from 16. The 16-dimensional feature vector so obtained, therefore gives the best classification rate as compared to other sub-box sizes (tests T1,T3) and the accuracy is further improved (95.23%) by tripling the feature information from each sub-box by including two more origins (test T10). The sub-boxes in all the tests except T4, are rectangular in shape due to the unequal lengths of the sides of the boxed numeral image (Aspect Ratio (AR) =1.5 approximately) and the equal partitioning along H and V directions. The hierarchical feature vectors fail to classify efficiently as proved by tests T5 and T6 proving that a mixed size of sub-boxes to extract the feature is not a good idea. Increasing the feature dimension from each sub-box using the multiple origin scheme shown in Fig. 10, report varying results with a high recognition rate for three origins (T10) and an origin specific success rate when two origins are used (T7,T8,T9). It is

Numerals T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 A B A B A B A B A B A B A B A B A B A B

0 7 100% 6 85.7% 7 100% 7 100% 7 100% 7 100% 7 100% 0 0% 7 100% 7 100% 7 0 0% 7 100% 0 0% 0 0% 1 14.2% 2 28.5% 6 85.7% 6 85.7% 0 0% 7 100% 8 6 85.7% 4 57.1% 7 100% 6 85.7% 7 100% 4 57.1% 4 57.1% 5 71.4% 4 57.1% 6 85.7%

Mean Classification

Rate 61.9% 80.93% 66.67% 61.9% 71.4% 61.86% 80.93% 52.36% 52.36% 95.23%

Page 5: [IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011 Annual IEEE India Conference - On the discriminative power of different feature subsets

concluded from the results obtained that the best classification results are obtained when the image is partitioned into 16 rectangular (AR of rectangular sub-box approximating the AR of the image) sub-boxes with four partitions each along H and V directions. While the single-origin 16-dimensional feature vector (T7) gives a satisfactory classification result of 80.93%, the highest classification rate of 95.23% is achieved for the three-origin 48-dimensional feature set (T10) for the 16 sub-box arrangement. Therefore, the setup for both tests T2 and T10 as given in Table 2, are most suitable for high accuracy classification as compared to the conventional technique in T4.

V. CONCLUSION Variation in style of handwriting recognition is a difficult

problem due to the similarities among the numerals. The handwritten numerals are often ambiguous and require an efficient feature extraction technique to reduce the classification error. In this paper we focused on the box-partitioning method due to its ease of implementation and high efficiency, and the mean normalized distance as the feature extracted for classification purpose. We have conducted an analysis of the feature sets obtained from different types of sub-boxes and we conclude that the results are best when the sub-boxes have the same aspect ratio as the numeral image and the number of divisions along the horizontal and vertical directions is four each, generating a 16-dimension/48-dimension feature vector from the 16 sub-boxes. A rigorous test was performed on a small but diverse database and a simple MLP neural network was used as the classifier to study the worst case scenario. The results indicate that equal partitioning along the horizontal and vertical axis of the image yields improved results than the conventional technique of pre-defining the sub-boxes and arbitrarily partitioning the image.

REFERENCES [1] Hanmandlu, M., & Murthy, O. V. R. “Fuzzy model based recognition of handwritten numerals” Pattern Recognition, 40(6) (2007) 1840–1854.

[2]. J. Cao, M. Ahmadi & M. Shridhar, “Handwritten numeral recognition of with multiple feature and multistage classifier” Pattern Recognition, Vol. 28(2), 1995 153-160. [3].C.N. Mahender & K.V.Kale, “Structured based Feature Extraction of Handwritten Marathi Word” International Journal of Computer Applications (0975 – 8887)Vol. 16– No.6, Feb. 2011 42-47. [4].C. H. Leung & L. Sze “Feature Selection in the Recognition of Handwritten Chinese Characters” Eng. Applic. Artif. lntelL Vol. 10, No. 5, 1997 495-502. [5] M. Blumenstein, X.Y. Liu, B. Verma, “A modified direction feature for cursive character recognition” IEEE International Joint Conference on Neural Networks, vol. 4, 2007, pp. 2983–2987. [6] F. Camastra, A. Vinciarelli, “Combining neural and learning vector quantization for cursive character recognition” Neurocomputing 51 (2003) 147–159. [7] S. Singh, M. Hewitt, “Cursive digit and character recognition on cedar database” International Conference on Pattern Recognition, (ICPR 2000), Barcelona, Spain, 2000, pp. 569–572. [8] F. Kimura, N. Kayahara, Y. Miyake, M. Shridhar, “Machine and human recognition of segmented characters from handwritten words” International Conference on Document Analysis and Recognition (ICDAR ‘97), Ulm, Germany, 1997, pp. 866–869. [9].P. D. Gader,M. Mohamed & J.-H.Chiang, "Handwritten word recognition with character and inter-character neural networks, IEEE Transactions on System, Man and Cybernetics—Part B: Cybernetics 27(1997)158–164. [10] S.B. Cho, Neural network classifiers for recognizing totally unconstrained handwritten numerals, IEEE Trans. NeuralNetworks 8 (1) (1997) 43–49. [11] J. Rocha, T. Pavlidis, A shape analysis model with applications to a character recognition system, IEEE Trans. Pattern Anal. Mach. Intell. 16 (4) (1994) 393–404. [12] M. Hanmandlu, K.R. Murali Mohan, H. Kumar, Neural-based handwritten character recognition, Fifth International Conference on Document Analysis and Recognition, Bangalore, India, 1999, pp. 241–244. [13] H. Yan, “Handwritten digit recognition using an optimized nearest neighbor classifer”, Pattern Recognition Lett. 15 (1994) 207–211. [14] T.A. Mai, C.Y. Suen, A generalized knowledge based system for the recognition of unconstrained handwritten numerals, IEEE Trans. Systems Man Cybernet. 20 (4) (1990) 835–848. [15] C.Y. Suen, C. Nadal, R. Legault, T.A. Mai, L. Lam, Computer recognition of unconstrained handwritten numerals, Proc. IEEE, 80 (7) (1992) 1162–1180. [16] T. Wakahara, Toward robust handwritten character recognition, Pattern Recognition Lett. 14 (1993) 345–354. [17] C.Y. Suen, Z.C. Li, Analysis and recognition of alphanumeric handprints by parts, IEEE Trans. Systems Man Cybernet. 24 (4) (1994) 614–631. [18].Majumdar, Chaudhuri,” A MLP classifier for both printed and handwritten Bangla numeral recognition”,ICVGIP, 2006, 796-894. [19] M. Hanmandlu, K.R.M. Mohan, S. Chakraborty, S. Goyal and D. Roy Choudhury, “Unconstrained handwritten character recognition based on fuzzy logic”, Pattern Recognition, vol. 36, no. 3, pp. 603-623, March 2003.