07010206_226_offline Handwritten Character Recognition

8/9/2019 07010206_226_offline Handwritten Character Recognition

1/27

Offline Handwritten Character Recognition

A Thesis submitted in partial fulfillment of the requirements for the award of the degree of

Bachelor of Technology

by

ANSHUL GUPTA

(07010206)

MANISHA SRIVASTAVA

(07010226)

Department Of Electronics And Communication Engineering

INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI

ASSAM, INDIA - 781039

April, 2011


2/27

Certificate

This is to certify that work reported in this thesis entitled Offline Handwritten CharacterRecognition in partial fulfilment of the requirements for the award of the Degree of Bachelorof Technology, is submitted by Anshul Gupta(07010206) and Manisha Srivastava(07010226) inthe Department of Electronics and Communication Engineering, Indian Institute of TechnologyGuwahati , under the supervision of Dr. Chitralekha Mahanta, Department of ECE, IIT Guwa-hati. The matter embodied in this thesis has not been submitted elsewhere for the award of anyother degree.

Place :Date :

(Supervisor Signature)Dr. Chitralekha Mahanta

Associate Professor

Dept. of Electronics and Communication EngineeringIndian Institute of Technology, Guwahati

1


3/27

Acknowledgment

First and foremost, We would like to take this opportunity to express our deepest andsincere gratitude to our thesis supervisor. We value the freedom she gave us to carry out

research in the field of our interest and we sincerely thank her for that. Her stimulatingsuggestions and encouragement helped us in the time of research and writing of this thesis.We are very much thankful for her continuous help and support during entire semester.Finally, We would like to thank our parents and siblings and friends for their immenselove and support during our entire student life.

2


4/27

Abstract

Character Recognition (CR) has been an active area of research and due to its diverseapplicable environment, it continues to be a challenging research topic. In this project, we

focus specially on off-line recognition of handwritten English words. The main approachesfor off-line cursive word recognition can be divided into segmentation-based and holisticone. The holistic approach is used in recognition of limited size vocabulary where global

features, extracted from the entire word image are considered. As the size of the vocabu-lary increases, the complexity of algorithms also increases linearly due to the need for alarger search space and a more complex pattern representation. Additionally, the recogni-tion rates decrease rapidly due to the decrease in interclass variances in the feature space.The segmentation based strategies, on the other hand, employ bottom-up approaches, start-ing from stroke or character level and going towards producing a meaningful text. Withthe cooperation of segmentation stage, the problem is reduced to the recognition of simpleisolated characters or strokes, which can be handled for unlimited vocabulary. We here

adopt segmentation based character recognition using neural nets. A number of techniquesare available for feature extraction and training of CR systems each with its own superi-orities and weaknesses. We will try to explore these techniques in order to obtain a goodrecognition rate.

3


5/27

1 Introduction

It is a challenging issue to develop a practical cursive, handwritten CR system which can maintainhigh recognition accuracy and is independent of the quality of the input documents. Very often

adjacent characters tend to be touched or overlapped.Therefore, in the segmentation-based strategy, it is essential to segment a given string correctly intoits character components. The complexity of character segmentation stems from the wide variety offonts, rapidly expanding text styles and poor image characteristics. Touched, overlapped, separated,and broken characters are major factors for causing segmentation errors. In most of the existingsegmentation algorithms, human writing is evaluated empirically to deduce rules. Sometimes the rulesderived are satisfactory but there is no guarantee for their optimum results in all styles of writing.Moreover human writing varies from person to person and even for the same person depending onmood, speed, environment etc. On the other hand researchers have employed techniques like artificialneural networks, hidden Markov models and statistical classifiers to extract rules based on numericaldata.Another crucial module is a cursive character classifier for scoring individual characters. It has tocope with the high variability of the cursive letters and their intrinsic ambiguity (letters like e and lor u and n can have the same shape).The features that are used for training the neural net classifieralso play a very important role. The choice of a good feature vector can significantly enhance theperformance of a character classifier whereas a poor one can degrade its performance considerably.

A generic character recognition system may be shown in Figure 1. Its different stages are as givenbelow:

Input: Samples are read to the system through a scanner.

Preprocessing: Preprocessing converts the image into a form suitable for subsequent processing

and feature extraction.

Segmentation: The most basic step in CR is to segment the input image into individual glyphs.This step separates out sentences from text and subsequently words and letters from sentences.

Feature extraction: Extraction of features of a character forms a vital part of the recognitionprocess. Feature extraction captures the vital details of a character.

Classification: During classification, a character is placed in the appropriate class to which itbelongs.

Post Processing: Combining the CR techniques either in parallel or series.

4


6/27

Figure 1: System Block Diagram: Off-line Handwritten character Recognition

2 History of character recognition system

The very first effort in the direction of CR was made by Tyuring who attempted to develop an aidfor the visually handicapped [1]. The first character recognizer appeared in around 1940s. The earlyworks were concentrated either upon machine-printed text or upon a small set of well-separated hand-written text or symbols. Machine-printed CR generally used template matching and for handwrittentext, low-level image processing techniques were used on the binary image to extract feature vectors,which were then fed to statistical classifiers [2],[3],[4]. A good survey of the CR techniques used until

1980s can be found in [5]. The period from 1980 - 1990 witnessed a growth in CR system develop-ment due to rapid growth in information technology [6],[7],[8]. Structural approaches were initiated inmany systems in addition to the statistical methods [9],[10]. The syntactic and structural approachesrequire efficient extraction of primitives [11]. Chan et al. [12] discussed a structural approach forrecognizing on-line handwriting. The recognition process starts with a sequence of points from theuser and then uses these points to extract the structural primitives. These primitives include differenttypes of line segments and curves. But there existed an upper limit in the recognition rate, becausethe CR research was focused basically on the shape recognition techniques without using any seman-tic information. Historical review of CR research and development during 1980-1990 can be found in[13]and [14] for off-line and on-line cases, respectively.After 1990, image processing techniques and pattern recognition were combined using artificial intel-

ligence. Along with powerful computers and more accurate electronic equipments such as scanners,cameras and electronic tablets, there came in efficient, modern use of methodologies such as neural net-works (NNs), hidden Markov models (HMMs), fuzzy set reasoning, and natural language processing.The 1990s systems for the machine-printed off-line [15],[16] and limited vocabulary, user-dependenton-line handwritten characters [17],[18] were satisfactory only for restricted applications.Although research on recognizing isolated handwritten characters has been quite successful, recogniz-ing off-line cursive handwriting has been found to be a challenging problem. There is a large corpusof research on the application of character recognition in different domains, but no system to date

5


7/27

has achieved the goal of system acceptability.

3 Applications

One application of CR system is handwritten word recognition . Current research aims at developingconstrained systems for limited domain applications such as postal address reading , check sorting, taxreading, and office automation for text entry. Since we can make use of the entire word at once, it ispossible to exploit correlations between adjacent characters. One way to do this is through contextualknowledge of syntax and a dictionary of possible words, which has been shown to be successful forreading handwritten address information of postmarked mail. Another potential application of CRsystems is in script recognition. CR systems also find applications in newly emerging areas, suchas development of electronic libraries, multimedia database, and systems which require handwritingdata entry.

4 Methodology Used

4.1 Segmentation

Most of the existing CR systems threshold the gray-level image and normalize the slant angle andbaseline skew in the preprocessing stage. Then, they employ the normalized binary image in thesegmentation and recognition stages [19, 20, 21]. However, in some cases, normalization may severelydeform the writing generating improper character shapes. Furthermore, through the binarization ofthe gray scale document image, useful information is lost. In order to avoid the limitation of binaryimage, some recent methods use gray-level image [22]. There, however, the insignificant details

suppress important shape information. The method used in this project for segmentation is similarto that in [23] which employs an analytic approach on gray-level image supported by binary imageand a set of global features.

4.1.1 Heuristic Based Segmentation

4.1.1.1 Global Feature Estimation : In this stage, first, the input image is binarized using a globalthreshold. Secondly, the following operations are performed on the binarized image.

4.1.1.1.1 Stroke Width and Height Estimation : Stroke Width Estimation is a two-scan pro-cedure. The first scan on each row of the binary image calculates the stroke width histogram by

counting the black pixel runs in horizontal direction. Then, the mean width, estimated over all of therows, is taken as the upper bound (maximum width) for the run length of the strokes. The secondscan on the stroke width histogram discards those strokes whose run length is greater than maximumwidth. Finally, the stroke width of the input-word image is estimated as the average width of thestrokes in the second scan. In order to estimate the stroke height, which is assumed to be the aver-age height of the vertical strokes in writing, a similar algorithm is used with the scanning procedureapplied in vertical direction. Minimum height is estimated instead of maximum width. In the secondscan, those pixels whose run lengths are smaller than the minimum height are discarded.

6


8/27

4.1.1.1.2 Slant Angle Detection: Slant is the deviation of the strokes from the vertical direc-tion, depending on writing style. In many handwriting recognition studies, slant correction is appliedbefore segmentation and recognition stages. However, this correction produces serious deformation incharacters. In [24] no slant correction was applied, but slant angle was used later in the segmentation

stage. For slant angle estimation, we have used [25]. The method involves rotating the image from45 to 45.The horizontal projection was taken at each rotation to calculate Wigner - Ville distri-bution (WVD - a joint function of time and frequency). The angle, which presents the maximumintensity after applying WVD, is taken as the estimated slant angle.

4.1.1.1.3 Baseline extraction : Locations of upper and lower baselines determine the existenceof ascending and descending characters in a given word image. Baseline information is used insegmentation in order to avoid problems introduced by ascending and descending portions of thecharacters. In [24], a new baseline extraction algorithm has been proposed. First, a preliminarycenterline for each word image is determined by finding the horizontal line with the highest number

of black pixel runs. Then, the local minima below the preliminary baseline are identified eliminatingthe ones on the ascending part. The goal is to find the best fit to the local minima with a highcontribution from the normal characters and low contribution from descending characters. A weightis computed for each minimum by considering the average angle between that minimum and therest of the minima. This approach assumes relatively small average angles among the minima ofnormal characters compared to the average angle between a descending minimum and normal minima,independent of the writing style. Finally, a line-fitting algorithm is performed over the weighted localminima. To locate the upper baseline, the local maxima above the lower baseline are identified andtheir distances from lower baseline is calculated. The ones whose distance is less than the estimatedstroke height are pruned. Next the remaining distances are clustered in two classes and a line parallelto the lower baseline is drawn, which passes from the mean value of the class, which includes the local

maxima with smaller distances. The center baseline is a parallel line with equal distance from theupper and lower baseline.

4.1.1.2 Determination of Segmentation Regions : The segmentation regions carry the potentialsegmentation boundaries between the connected characters. The first step is to partition each wordimage into stripes along the slant angle direction, each of which contains a potential segmentationboundary. The rules applied on the binary word image for identifying the segmentation regions arebased on the fact that a single maximum above the center baseline indicates a single character or aportion of a character whereas the region between the two adjacent local maxima carries a potentialsegmentation boundary.

Determination of the segmentation regions in each word image is accomplished in three steps:

Step:1 A straight line is drawn in the slant angle direction from each local maximum until the topof the word image. However, there may be an ascender character on this path which shouldbe avoided. While going upward in slant direction, if any contour pixel is hit, this contour isfollowed until the slope of the contour changes to the opposite direction which marks the end ofthe character. The direction of the contour following is selected as the opposite of the relativeposition of the local maximum with respect to the first contour pixel hit by the slanted straight

7


9/27

line. After this a line is drawn from the maximum to the top of the word image in the slantdirection.

Step:2 In this step, a path in the slant direction from each maximum to the lower baseline is drawn.

However, the algorithm avoids passing from the white pixels by selecting a black pixel, as longas there is one in either left or right neighborhood of the white pixels.

Step:3 A process similar to the one in the first step is performed in order to determine the path fromlower baseline to the bottom of the word image. In this case, the aim is to find the path, whichdoes not cut any part of the descended character.

4.1.1.3 Segmentation Path : The problem of segmentation can be represented as finding the short-est path from the top row to the bottom row, which minimizes the cumulative cost function as givenin [24],

Cost= 1


10/27

4.1.2 Neural Network Based segmentation

In [26] word recognition system, heuristic and intelligent methods are used for the segmentation ofreal world, handwritten words.

Gray level image is converted to binary image. Slant detection similar to the one used in Heuristicbased segmentation is employed and then slant correction is done. For both training and testingphases, a heuristic, feature detection algorithm is used to locate prospective segmentation points inhandwritten words. Each word is inspected in an attempt to locate characteristics representative ofsegmentation points.

4.1.2.1 Segmentation using a heuristic algorithm : A simple heuristic segmentation algorithm wasimplemented which scanned handwritten words for important features to identify valid segmentationpoints between characters. The algorithm first scanned the word looking for minimas or arcs betweenletters, common in handwritten cursive script. For this a histogram of vertical pixel densities iscalculated for each word. The histogram is obtained by calculating total runs of vertical pixels foreach column of the word image where black pixels exist. The histogram is examined for minima (lowvertical pixel density) which may confirm the location of possible segmentation points in the word. Inmany cases these arcs are the ideal segmentation points, however in the case of letters ,as a and o,where an erroneous segmentation point could be identified. Therefore the algorithm incorporated ahole seeking component which attempted to prevent invalid segmentation points from being found.If an arc was found, the algorithm checked to see whether it had not segmented a letter in half,by checking for a hole. Finally, the algorithm performed a final check to see if one segmentationpoint was not too close to another. This was done by ascertaining if the distance between the lastsegmentation point and the position being checked was equal to or greater than the average character

width of a particular word. If the segmentation point in question was too close to the previous one,segmentation was aborted. Conversely, if the distance between the position being checked and the lastsegmentation point was greater than the average character width, a segmentation point was forced.

4.1.2.2 Manual Segmentation of the database: Since we did not have any database for handwrittenwords we created our own database for the training of neural network segmentation. 26 words werechosen that contained all the upper and lower case alphabets and then 10 different samples of eachword were taken on paper from different writers. The images were then scanned and preprocessed tocreate a list of 260 words. Prior to ANN training, the heuristic feature detector was used to segmentall words. The segmentation points output by the heuristic feature detector were manually analyzedso that the x coordinates can be categorized into correct and incorrect segmentation point classes.For each segmentation point, a matrix of pixels representing the segmentation area was extracted andstored in an ANN training file. The feature extractor breaks the segmentation point matrix down intosmall windows of equal size 5x5 and analysis the density of black and white pixels. Therefore, insteadof presenting the raw pixel values of the segmentation points to the ANN, only the densities of eachwindow are presented. As an example, if a window exists that, and contains 6 black pixels, then asingle value of 0.24 (Number of pixels/25) was written to the training file to represent the value ofthe window. Accompanying each matrix the desired output was also stored in the training file (0.1for an incorrect segmentation point and 0.9 for a correct point) ready for ANN training

9


11/27

4.1.2.3 Training of the ANN : For this step, a multi-layer feed-forward Neural Network trainedwith the backpropagation algorithm was used. The ANN was presented with the training pairs foundin the previous step.

4.1.2.4 Testing phase of the segmentation technique : Following ANN training, the words used fortesting are also segmented using the heuristic, feature-based algorithm. This time there is no manualprocessing. The segmentation points are automatically extracted and are fed into the trained ANN.The ANN then verifies which segmentation points are correct and which are incorrect. Finally, uponANN verification, each word used for testing should only contain valid segmentation points.

4.1.3 Results of segmentation :

[A] [B]

Figure 2: A: Neural network based segmentation,Neural Network used:MLP, Configuration:[90(singlehidden layer)], Training Algorithm:traingdx(Matlab); B : Heuristic segmentation

4.2 Feature Extraction

A compact and characteristic representation of the image is required in the CR systems. For thispurpose, a set of features is extracted for each class that helps distinguish it from other classes, whileremaining invariant to intra class differences [27]. A good survey on feature extraction methods forCR can be found in [22].

The different representation methods can be categorized into three major classes:

1. Global Transformation and Series Expansion: includes Fourier Transform, Gabor Transforms,wavelets, moments and Karhuen-Loeve Expansion.

2. Statistical Representation: Zoning, Crossing and Distances, Projections.

3. Geometrical and Topological Representation: Extracting and Counting Topological Structures,Geometrical Properties, Coding, Graphs and Trees etc.

We have used the following three features.

4.2.1 Gradient Features

The method is similar to the one presented in [28].

10


12/27

4.2.1.1 Skeletonisation : The skeletonisation process has been used on binary pixel image. Theextra pixels which do not belong to the backbone of the character, were deleted and the broad strokeswere reduced to one pixel thin lines. This creates a uniformity in all the testing and training data.

4.2.1.2 Normalization and Compression: Since there are a lot of variations in handwritings ofdifferent persons, therefore after skeletonisation process, we used a normalization process, whichnormalized the character into 32 x 32-pixel character and used as an input of the neural network.

4.2.1.3 Gradient Feature Extraction: Each character is normalized into 32 x 32 size. The gradientoperator, named Sobel operator is used to calculate the gradient. The Sobel operator uses twotemplates to compute the gradient components in horizontal and vertical directions, respectively.The templates are shown below :

Figure 3: Horizontal and Vertical Templates for sobel operator.

The two gradient components at location (i,j) are calculated by:

gv(i, j) =f(i 1, j+ 1) + 2f(i, j+ 1 ) + f(i + 1, j+ 1) f(i 1, j 1) 2f(i, j 1) f(i + 1, j 1)

(5)

gh(i, j) =f(i 1, j 1 ) + 2f(i 1, j) + f(i 1, j+ 1) f(i + 1, j 1) 2f(i + 1, j) f(i + 1, j+ 1)

(6)

The gradient strength and the direction are calculated as:

G(i, j) =

g2v(i, j) +g2h(i, j)

(7)

= arctan

gv(i, j)gh(i, j)

(8)

The gradient strength and the direction calculation are the same as eq:7 and eq:8. In this way, wecan calculate the gradients of each character which comes between 0 and 2

11


13/27

4.2.2 Fourier Descriptor

The method adopted is similar to [29] where first boundary detection is done. Once a boundary imageis obtained then Fourier descriptors are found. This involves finding the discrete Fourier coefficients

a[k]andb[k] for 0 < k < L1, where L is the total number of boundary points found, by applyingequations :

a[k] = (1

L)

Lm=1

x[k]e(jk(2

L)m)

(9)

b[k] = (1

L)

Lm=1

y[k]e(jk(2

L)m)

(10)

wherex[m] and y [m] are the x and y coordinates, respectively, of the mth boundary point. As found

in the study [29], descriptor produced usingr[k] =|a[k]|2 +|b[k]|2 is less effective than using the

moduli of the complex coefficients, |a[k]| and|b[k]|. The values for K= 0 are discarded as they onlycontain information about the position of the image. The coefficients for high values of k describehigh frequency features in the image but do not contain much information about the overall shape ofthe character and so these high frequency components are also discarded. So the first five beginningfromk = 1 to k = 5 are considered.Once the coefficients of the moduli have been found, the input vector is normalized to 1 to compensatefor image scaling. To spread the input data more evenly over the input space, the mean and standard

deviation. vectors are found over the whole set of test and training data. Thejth

component of inputvector i, is calculated as :

ipj = (ipoj) ioj)

1

noj1

+ 1

(11)

whereipoj is thejth component of the original vector or pattern p, iojis the mean of thej

th componentsof the original vectors andnoj is the corresponding standard deviation. Coefficientlinearly controlsthe degree of standard deviation compensation. If = 0, there is no compensation for variations ofstandard deviation between dimensions; if = 1, the standard deviation of all dimensions is forced

to equal 1, giving full standard deviation compensation.

4.2.2.1 Fourier angle : It was also mentioned in [29] that if there is moduli alone is not successfulin discriminating all the classes experiments can done to incorporate angles also in the training set.

12


14/27

4.2.2.2 Fourier magnitude [30] : The use of FFT is not feasible if one seeks rotation and shiftinvariant descriptors for the characters. Further, it has been observed that only the first few (say10-15) Fourier coefficients are needed to adequately describe the various characters. Under theseconditions there exists no computational advantage in using FFT to evaluate the Fourier coefficients.

The Fourier coefficients derived from eq:9 and eq:10 are not rotation or shift invariant (to clarify, itis noted that a shift will occur if the starting point of boundary following is arbitrary). In order toderive a set of Fourier descriptors that have the invariant property with respect to rotation and shiftthe following operations are defined. For each n compute a set of invariant descriptorsr[n] as :

r[n] =|a[n]|2 +|b[n]|2 (12)

It is easy to show thatr are invariant to rotation or shift. A further refinement in the derivation ofthe descriptors is realized if dependence ofr[n] on the size of the character is eliminated by computinga new set of descriptors :

s[n] =r[n]/r[1] (13)

The Fourier coefficients(a[n]|,|b[n]|and the invariant descriptors s[n], n= 2, 3..were derived forall the character specimens and stored in files for application to reconstruction and recognition.

4.3 Training of classifiers

4.3.1 Neural Network (NN) based classifiers [31].

A neural network is a massively parallel distributed processor that has a natural propensity for storing

experiential knowledge and making it available for use. It resembles the brain in two respects :

1. They adapt by learning process.

2. Knowledge is stored in interconnections between neurons known as synaptic weights.

Basically, learning is a process by which the free parameters (i.e.,synaptic weights and bias levels) of aneural network are adapted through a continuing process of stimulation by the environment in whichthe network is embedded. The type of learning is determined by the manner in which the parameterchanges take place. Broadly learning can be classified into two :

1. Supervised Learning : This form of learning assumes the availability of a labeled (i.e., ground-truthed) set of training data made up of N input-output.

2. Unsupervised Learning : This form of learning do not assume the availability of a set of trainingdata made up of N input-output. They learn to classify input vectors according to how theyare grouped spatially and try to tune its network by considering a neighborhood.

In this project we will consider MLP RBF as classifiers based on supervised learning. We have usedMatlab neural network toolbox for the implementation of these networks.

13


15/27

4.3.1.1 Multilayer Perceptron(MLP) : This network is a feed forward network because its struc-ture does not contain any loop. As shown in Fig., a multilayer perceptron has an input layer of sourcenodes and an output layer of neurons (i.e., computation nodes); these two layers connect the networkto the outside world. In addition to these two layers, the multilayer perceptron usually has one or

more layers of hidden neurons, which are so called because these neurons are not directly accessible.The hidden neurons extract important features contained in the input data. Each input node isconnected to each node of hidden layer by a synaptic weight. The input to a hidden node is the sumof all input nodes weighted by synaptic weights for connection between input nodes and the hiddenneurons.

Figure 4: MLP structure.

There are many activation functions out of which we selected tan-sigmoid, log-sigmoid and purelinear.

1. Tan sigmoid -

tansig(n) = 2(1 + exp(2n))

1

2. Log sigmoid -

logsig(n) = 1

(1 + exp(n))

3. purelinear -purelin(n) =n

4.3.1.2 Radial Basis function(RBF NN) : Radial Basis function NN ( RBF NN ) is a two layer

network. It falls under the category of feed-forward network, in which graphs has no loops. Basicstructure of RBF network is given below :

14


16/27


17/27

(14)

where, j is the weight associated with the jth radial basis function, centered at j and zj =(||xj ||). The output y approximates a target set of values denoted by y.

Figure 6: Complete RBF structure.

4.3.1.3 Training of neural network: For training the neural network we divide training datasetinto two parts.

1. Estimation subset used for training the model.

2. Validation subset used for evaluating the model performance.

The network is finally tuned by using the entire set of training examples and then tested on testdata.Training of these networks is usually done by back-propagation algorithm. This algorithm con-sists of two phases:

1. Forward Phase: In this phase the free parameters of the network are fixed, and the inputsignal is propagated through the network, layer by layer. At the end of this phase error signal is

calculated between predicted output of network and the actual output corresponding to inputsample presented.

2. Backward Phase: During this phase, the error signalei is propagated through the network inthe backward direction. It is during this phase that adjustments are applied to the networkweights so as to minimize the error ei in a statistical sense, generally MSE criterion is used.

16


18/27

Figure 7: Back propagation Network.

4.3.1.4 Classification using neural networks: In classification problems, the purpose of the net-work is to assign each input to one of the classes. Each of the output units has continuous activationvalues between 0.0 and 1.0. In order to definitely assign a class from the outputs, the network must

decide if the outputs are reasonably close to 0.0 or 1.0, otherwise the class is regarded as undecided.Confidence levels (the accept and reject thresholds) decide how to interpret the network outputs.

4.3.2 Support vector machine(SVM) [32]

Support Vector Machines are based on the concept of decision planes that define decision boundaries.A decision plane is one that separates between a set of objects having different class memberships.Most classification tasks, however, are not that simple, and often more complex structures are neededin order to make an optimal separation, i.e., correctly classify new objects (test cases) on the basisof the examples that are available (train cases). For example , in the figure below the GREEN and

RED objects would require a curve (which is more complex than a line).

Figure 8: Complex classification problem.

Support Vector Machines are particularly suited to handle such tasks.The illustration below shows the basic idea behind Support Vector Machines. Here we see the originalobjects (left side of the schematic) mapped, i.e., rearranged, using a set of mathematical functions,known as kernels. The process of rearranging the objects is known as mapping (transformation).Note that in this new setting, the mapped objects (right side of the schematic) is linearly separableand, thus, instead of constructing the complex curve (left schematic), all we have to do is to find anoptimal line that can separate the GREEN and the RED objects.

17


19/27

Figure 9: Classification using SVM.

Support Vector Machine (SVM) is primarily a classifier method that performs classification tasksby constructing hyperplanes in a multidimensional space that separates cases of different class labels.

4.4 Testing result of MLP, RBF SVM on the features extractedFourier with phase, ||a(k)|| and ||b(k)|| features are used for the comparison of classifiers.

4.4.1 Performance of Neural Network classifiers :

[A] [B]

Figure 10: A: MLP with structure [80(first hidden) 50(second hidden) 50(third hidden)],AlgorithmusedGradient-descent with momentum (traingdx of Matlab), learning rate: adaptive with initial 0.2,Momentum :0.9 : Results are very bad on training set ; B : RBF : Results are good on training databut over learning is high hence bad results on test data.

4.4.2 Performance of Support vector machine classifiers :

In case of SVM result on training data is 98 .86% and very optimum learning. The result on thetesting data is 62.93%.On the test data SVM outperforms the other two networks.

4.4.3 Comparison between all four feature vectors with SVM :

Now we have to pick the best feature extraction technique for our system. For that we tested SVMwith different feature vectors. The table below shows the recognition rate (%) for all four feature

18


20/27

vectors.

Fourier with

magnitude(s(k)),|a(k)| and|b(k)|

Fourier with

phase, |a(k)|and |b(k)|.

Fourier with

magnitude(s(k)),|a(k)|,|b(k)|and phase

Gradient fea-

tures.

86.66% 98.74% 98.04% 40.50%

4.5 Post Processing:Combining the CR techniques

Fusion is one of the powerful methods for improving recognition rates produced by various techniques.It takes advantage of different errors produced by different techniques, emphasizes the strengthsand avoids the weaknesses of individual techniques. Researcher have found that in many real wordapplications , it is better to fuse multiple techniques to improve the results. Fusion can be done in

the following two ways:

Serial Architecture: In this method the output of a classifier is fed into the next classifier. Thereare four basic methodologies used, viz.: sequential, selective [33], boosting [34] and cascade[35] methodologies.

Parallel Architecture: This method combines the result of more than one independent algorithmsby using one of the following methodologies: voting , Bayesian [36], Dempster-Shafer Theory[37], behavior-knowledge space [38], mixture of experts [39] and stacked generalization.

We here use a method based on Borda count that is inspired from [40] to combine the following results:

Technique 1: SVM on Moduli of Fourier Coefficients||a(k)||and||b(k)||and magnitude s(k).

Technique 2: SVM on Moduli of Fourier Coefficients||a(k)||and||b(k)||and phase.

Technique 3: SVM on Moduli of Fourier Coefficients||a(k)||and ||b(k)||, phase and magnitude s(k).

4.5.1 Conventional Borda Count

Conventional Borda count for a string in the lexicon is defined as the sum of the number of stringsthat are below the string in the different lexicons produced by the various techniques [40].

4.5.2 Modified Borda Count

A rank is assigned and used in the calculation of the Boda count, instead of calculating the numberof strings below the string to be recognized. The rank for a particular string can be calculated usingthe following formulae :

Rank= 1(positionofthestringinthetopNstrings)

N

19


21/27

The rank is 0, if the string is not in the top N choices.We have taken N=3.Therefore only top three words are considered from each technique to calculate therank. Secondly the confidence values produced by different techniques are considered. The confidence

values for all the three predicted words for any given technique is the confidence that the classifier hasin its predicted string, even if the string is not a valid lexicon word. This can be estimated by summingup the scores of each predicted characters. This is reasonable because the top three strings are cho-sen based on its similarity with the predicted string. The similarity between the predicted string andthe lexicon words are found by finding the number of matching characters and their relative positions.

Final Boda count of a lexicon word = (rankconfidence)tech1+ (rankconfidence)tech2+ (rankconfidence)tech3

20


22/27

5 Final Results

Figure 11: Result on Moderated.

21


23/27

Figure 12: Result on Puzzle.

22


24/27

Figure 13: Result on Rolled.

6 Discussions

In case of Moderated, the neural network segmentor failed to segment te. This is obvious becauseit treated them as a hole because of the way in which these pair of characters was written. The out-puts of the three different techniques are MOrerlmd, MOGeraED and MOrerlmd, which hasvery small similarity with word Moderated. This error is because of the low discriminative ability

of fourier features and their combinations in our case where they have to distinguish 52 differentclasses. This error is corrected in the post processing step where the borda count for all three paralleltechniques is highest for the word Moderated of the lexicon. Hence, system outputs correct wordModerated.In case of puzzle, u is incorrectly segmented into two. The outputs of the three different techniquesare PzZzfe, PCZZfc and PsZzme, which has very small similarity with word Puzzle. Againthis is due to the low discriminative ability of Fourier features. Here the output of two techniquesis Puzzle with confidence 1.2 each while the third technique predicted Climate with confidence2.17. This error is corrected when the borda count for all three techniques are combined with highest

23


25/27

confidence for the word Puzzle . Hence, system outputs correct word Puzzle.In case of Rolled, segmentation is perfect but the outputs of the three techniques are quite differentfrom the word Rolled. But combining the results of the three parallel techniques the score for theword Rolled is highest, hence system outputs correct word.

7 Conclusions

We thus conclude that the proposed system gives fairly good results on the test samples that werepresented to it. We could not list the recognition accuracy as percentage because we did not haveenough test samples. We tested both heuristic and neural network based segmentation and foundthat the later gave better results. This is reasonable because heuristic algorithm is based on rulesthat are deduced empirically and there is no guarantee for their optimum results for different stylesof writing. So their validation using neural network becomes essential. Moreover our character

recognition network has 52 output classes whereas in most of the literature they have used separateclassifiers for upper and lower case characters. We tested different neural networks that have beenused in the past for character recognition. We tried different configuration of MLP upto 3 hiddenlayers and the best results were obtained with [80 50 50] configuration, with validation performanceof 0.01 in 640 epochs. The training algorithm used in this case was Gradient-descent with momentum(traingdx of Matlab). Also, we tested RBF neural network and got performance (MSE) of 0.0010155in 1800 epochs. This network suffered from over learning and gave poor results on test data. Apartfrom neural network we tried Support vector machines classifier on the same feature set and achieved98% classification accuracy on training data set and 62.93% on test data set. Finally, we selectedSVM as it outperformed MLP and RBF. For feature extraction we started with gradient features,

which in our case produced very poor results. We tried Fourier features like moduli of Fouriercoefficients,magnitude, phase and their various combinations as feature vectors. We got best resultswith Moduli of Fourier coefficients and phase with a recognition accuracy of 98.74% on training dataset. We have used three combinations of Fourier descriptors in our final system.Postprocessing whichuses lexicon becomes imperative as there is no other way to find out the errors that have creeped inat any of the previous stages.The only way to do that is to verify that whether the predicted word isa valid lexicon word or not.Thus incorporating this in our final system using Borda Count improvedthe overall efficiency of the system.

8 Future work

Performance of neural network based segmentation can be improved by using a larger database. Moreresearch can be done to come up with a better feature vector that incorporates transform based, sta-tistical and directional features for character recognition. SVM has outperformed in classification ofcharacters because it performs classification tasks by constructing hyperplanes in a multidimensionalspace that separates samples of different class labels. Other recently developed technique like Demp-ster Shafer theory could be used for combining different CR technique. Even in the case of BordaCount other techniques can be explored which can give different confidences to each predicted lexicon

24


26/27

word for a given classifier. Also, experiments can be done to give different weights to each of theparallel CR techniques according to their performance on the validation data

References[1] J. Mantas, An overview of character recognition methodologies, Pattern Recognition, vol. 19, no. 6, pp. 425 430, 1986.[2] T. S. El-Sheikh and R. M. Guindi, Computer recognition of arabic cursive scripts,Pattern Recognition, vol. 21, no. 4, pp. 293 302,

1988.

[3] S. Mori, K. Yamamoto, and M. Yasuda, Research on machine recognition of handprinted characters, Pattern Analysis and MachineIntelligence, IEEE Transactions on, vol. PAMI-6, no. 4, pp. 386 405, 1984.

[4] C. Suen, M. Berthod, and S. Mori, Automatic recognition of handprinted characters 8212;the state of the art, Proceedings of theIEEE, vol. 68, no. 4, pp. 469 487, 1980.

[5] C. Tappert, C. Suen, and T. Wakahara, The state of the art in online handwriting recognition, Pattern Analysis and MachineIntelligence, IEEE Transactions on, vol. 12, pp. 787 808, Aug. 1990.

[6] R. Bozinovic and S. Srihari, Off-line cursive script word recognition,Pattern Analysis and Machine Intelligence, IEEE Transactionson, vol. 11, pp. 68 83, Jan. 1989.

[7] V. Govindan and A. Shivaprasad, Character recognition a review, Pattern Recognition, vol. 23, no. 7, pp. 671 683, 1990.

[8] Q. Tian, P. Zhang, T. Alexander, and Y. Kim, Survey: omnifont-printed character recognition, in Society of Photo-Optical Instru-mentation Engineers (SPIE) Conference Series (K.-H. Tzou & T. Koga, ed.), vol. 1606 of Society of Photo-Optical InstrumentationEngineers (SPIE) Conference Series, pp. 260268, Nov. 1991.

[9] A. Belaid and J.-P. Haton, A syntactic approach for handwritten mathematical formula recognition, Pattern Analysis and MachineIntelligence, IEEE Transactions on, vol. PAMI-6, no. 1, pp. 105 111, 1984.

[10] Y. Ding, F. Kimura, Y. Miyake, and M. Shridhar, Evaluation and improvement of slant estimation for handwritten words, inDocument Analysis and Recognition, 1999. ICDAR 99. Proceedings of the Fifth International Conference on, pp. 753 756, Sept.1999.

[11] S. M. Lucas, E. Vidal, A. Amiri, S. Hanlon, and J.-C. Amengual, A comparison of syntactic and statistical techniques for off-lineocr, inProceedings of the Second International Colloquium on Grammatical Inference and Applications , (London, UK), pp. 168179,Springer-Verlag, 1994.

[12] K.-F. Chan and D.-Y. Yeung, Recognizing on-line handwritten alphanumeric characters through flexible structural matching, 1999.

[13] S. Mori, C. Suen, and K. Yamamoto, Historical review of ocr research and development,Proceedings of the IEEE, vol. 80, pp. 10291058, July 1992.

[14] C. Tappert, C. Suen, and T. Wakahara, The state of the art in online handwriting recognition, Pattern Analysis and MachineIntelligence, IEEE Transactions on, vol. 12, pp. 787 808, Aug. 1990.

[15] H. Avi-Itzhak, T. Diep, and H. Garland, High accuracy optical character recognition using neural networks with centroid dithering,Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 17, pp. 218 224, Feb. 1995.

[16] I. Bazzi, R. Schwartz, and J. Makhoul, An omnifont open-vocabulary ocr system for english and arabic, Pattern Analysis and

Machine Intelligence, IEEE Transactions on, vol. 21, pp. 495 504, June 1999.

[17] J. Hu, S. G. Lim, and M. K. Brown, Writer independent on-line handwriting recognition using an hmm approach,Pattern Recognition,vol. 33, no. 1, pp. 133 147, 2000.

[18] A. Meyer, Pen computing: a technology overview and a vision,SIGCHI Bull., vol. 27, pp. 4690, July 1995.

[19] G. Kim and V. Govindaraju, A lexicon driven approach to handwritten word recognition for real-time applications,Pattern Analysisand Machine Intelligence, IEEE Transactions on, vol. 19, pp. 366 379, Apr. 1997.

[20] M. Mohamed and P. Gader, Handwritten word recognition using segmentation-free hidden markov modeling and segmentation-baseddynamic programming techniques, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 18, pp. 548 554, May1996.

25


27/27

[21] A. A. Atici and F. T. Yarman-Vural, A heuristic algorithm for optical character recognition of arabic script, Signal Processing,vol. 62, no. 1, pp. 87 99, 1997.

[22] ivind Due Trier, A. K. Jain, and T. Taxt, Feature extraction methods for character recognition-a survey, Pattern Recognition, vol. 29,no. 4, pp. 641 662, 1996.

[23] S.-W. Lee, D.-J. Lee, and H.-S. Park, A new methodology for gray-scale character segmentation and recognition, Pattern Analysisand Machine Intelligence, IEEE Transactions on, vol. 18, pp. 1045 1050, Oct. 1996.

[24] N. Arica and F. Yarman-Vural, Optical character recognition for cursive handwriting, Pattern Analysis and Machine Intelligence,IEEE Transactions on, vol. 24, pp. 801 813, June 2002.

[25] E. Kavallieratou, N. Fakotakis, and G. Kokkinakis, Skew angle estimation for printed and handwritten documents using the wigner-ville distribution, Image and Vision Computing, vol. 20, no. 11, pp. 813 824, 2002.

[26] M. Blumenstein and B. Verma, Neural-based solutions for the segmentation and recognition of difficult handwritten words from abenchmark database, in Document Analysis and Recognition, 1999. ICDAR 99. Proceedings of the Fifth International Conferenceon, pp. 281 284, sep 1999.

[27] I.-S. Oh, J.-S. Lee, and C. Suen, Analysis of class separation and combination of class-dependent features for handwriting recognition,Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, pp. 1089 1094, Oct. 1999.

[28] D. Singh, M. Dutta, and S. H. Singh, Neural network based handwritten hindi character recognition system, in Proceedings of the2nd Bangalore Annual Computer Conference, COMPUTE 09, (New York, NY, USA), pp. 15:115:4, ACM, 2009.

[29] I. P. Morns and S. S. Dlay, Character recognition using fourier descriptors and a new form of dynamic semisupervised neural network,Microelectronics Journal, vol. 28, no. 1, pp. 73 84, 1997.

[30] M. Shridhar and A. Badreldin, High accuracy character recognition algorithm using fourier and topological descriptors, PatternRecognition, vol. 17, no. 5, pp. 515 524, 1984.

[31] S. Haykin, Neural Networks: A Comprehensive Foundation. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1st ed., 1994.

[32] C. Wei, Statsoft, inc., tulsa, ok.: Statistica, version 8, AStA Advances in Statistical Analysis, vol. 91, pp. 339341, 2007.10.1007/s10182-007-0038-x.

[33] S. Gopisetty, R. Lorie, J. Mao, M. Mohiuddin, A. Sorin, and E. Yair, Automated forms-processing software and services, IBM J.

Res. Dev., vol. 40, pp. 211230, March 1996.

[34] H. Drucker, R. E. Schapire, and P. Simard, Improving performance in neural networks using a boosting algorithm, in Advances inNeural Information Processing Systems 5, [NIPS Conference], (San Francisco, CA, USA), pp. 4249, Morgan Kaufmann PublishersInc., 1993.

[35] J. Park, V. Govindaraju, and S. Srihari, Ocr in a hierarchical feature space, Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. 22, pp. 400 407, Apr. 2000.

[36] H.-J. Kang and S.-W. Lee, Combining classifiers based on minimization of a bayes error rate, in Document Analysis and Recognition,1999. ICDAR 99. Proceedings of the Fifth International Conference on, pp. 398 401, Sept. 1999.

[37] L. Xu, A. Krzyzak, and C. Suen, Methods of combining multiple classifiers and their applications to handwriting recognition,Systems,Man and Cybernetics, IEEE Transactions on, vol. 22, no. 3, pp. 418 435, 1992.

[38] Y. Huang and C. Suen, A method of combining multiple experts for the recognition of unconstrained handwritten numerals,Pattern

Analysis and Machine Intelligence, IEEE Transactions on, vol. 17, pp. 90 94, Jan. 1995.

[39] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, Adaptive mixtures of local experts, Neural Comput., vol. 3, pp. 7987,March 1991.

[40] B. Verma, P. Gader, and W. Chen, Fusion of multiple handwritten word recognition techniques,Pattern Recognition Letters, vol. 22,no. 9, pp. 991 998, 2001.

07010206_226_offline Handwritten Character Recognition

Documents

Transcript of 07010206_226_offline Handwritten Character Recognition