6 HANDWRITTEN CHARACTER RECOGNITION USING … CHARACT… · handwriting recognition systems ......

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),

ISSN 0976 - 6375(Online), Volume 6, Issue 2, February (2015), pp. 54-74 © IAEME

54

HANDWRITTEN CHARACTER RECOGNITION USING

FEED-FORWARD NEURAL NETWORK MODELS

Nilay Karade1, Dr. Manu Pratap Singh

2, Dr. Pradeep K. Butey

3

1A-304, Shivpriya Towers, Jaitala, Nagpur-440039, Maharshatra, India

2Department of Computer Science, Dr. B.R.Ambedkar University, Khandari,

Agra - 282002, Uttar Pradesh, India

3HOD (Computer Science), Kamala Nehru Mahavidyalaya, Nagpur, India

ABSTRACT

Handwritten character recognition has been vigorous and tough task in the field of pattern

recognition. Considering its application to various fields, a lot of work is done and is being

continuing to improve the results through various methods. In this paper we have proposed a system

for individual handwritten character recognition using multilayer feed-forward neural networks. For

the experimental purpose we have taken 15 samples of lower & upper case handwritten English

alphabets in scanned image format i.e. 780 different handwritten character samples. There are two

methods of feature extraction are used to construct the pattern vectors for training set. This training

set is presented to the six different feed-forward neural networks namely newff, newfit, newpr,

newgrnn, newrb and newrbe. The test pattern set is used to evaluate the performance of these neural

networks models. The results are compared to find the accuracy in recognition of the respective

models. The number of hidden layer, number of neurons in hidden layer, validation checks and

gradient factors of the neural networks models are taken into consideration during the training.

Keywords: Character Recognition, multilayer feed-forward Artificial Neural Network,

Backpropagation, Handwriting recognition, Pattern Classification

1. INTRODUCTION

These days computer have been penetrated in every field and the work is being done at a

higher speed with greater accuracy. Pattern recognition through computer is a challenging task and

this task becomes more critical if the pattern is in the form of handwritten curve script. Pattern

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &

TECHNOLOGY (IJCET)

ISSN 0976 – 6367(Print)

ISSN 0976 – 6375(Online)

Volume 6, Issue 2, February (2015), pp. 54-74

© IAEME: www.iaeme.com/IJCET.asp

Journal Impact Factor (2015): 8.9958 (Calculated by GISI)

www.jifactor.com

IJCET

© I A E M E



55

recognition, as a subject, spans a number of scientific disciplines, uniting them in search for a

solution to the common problem of recognizing the pattern of a given class and assigning the name

of identified class. Pattern recognition is the categorization of input data into identifiable classes

through the extraction of significant attributes of the data from irrelevant background detail. A

pattern class is a category determined by some common attributes. It is true that the older

handwritten documents are digitized but the 100% automation of work cannot be achieved. The

handwriting recognition has helped a lot to the advancement of automation process [1]. The

handwriting recognition systems are broadly classified into two types, namely online and offline

handwritten recognition. In online approach, the two dimensional coordinates of the consecutive

points are symbolize as a function of time. Also the sequence of the strokes made by the writer is on

hand. Whereas in the case of off-line handwriting recognition approach the written script is captured

with the help of devices like scanner and the whole script is available as an image [2]. When both

these approaches are compared, it has been found that due the temporal information available with

the online approach, it is superior to that of off line approach [3]. On the other hand in the off-line

systems, the neural networks have been productively used for capitulate comparably high recognition

accuracy levels [1]. A number of applications such as document analysis, mailing address

interpretation, bank processing etc. require offline handwriting recognition system [1, 4]. Thus, the

off-line handwriting recognition enjoys the first choice by many researchers in order to investigate

and discover the novel methods that would get better recognition correctness. It is widely used in

image processing, pattern recognition, and artificial intelligence.

During the last few years the researchers have proposed many mathematical approaches to

solve the pattern recognition problems. Recognition strategies heavily depend on the nature of the

data to be recognized. In the cursive case, the problem is made complex by the fact that the writing is

fundamentally ambiguous as the letters in the word are generally linked together, poorly written and

may even be missing. On the contrary, hand printed word recognition is more related to printed word

recognition, the individual letters composing the word being usually much easier to isolate and to

identify. As a consequence of this, methods working on a letter basis (i.e., based on character

segmentation and recognition) are well suited to hand printed word recognition while cursive scripts

require more specific and/or sophisticated techniques. Inherent ambiguity must then be compensated

by the use of contextual information.

Neural network computing has been expected to play a significant role in a computer-based

system of recognizing handwritten characters. This is because a neural network can be trained quite

readily to recognize several instances of a written letter or word, and can then be generalized to

recognize other different instances of that same letter or word. This capability is vital to the

realization of robust recognition of handwritten characters or scripts, since characters are rarely

written twice in exactly the same form. There have been reports of successful use of neural networks

for the recognition of handwritten characters [11, 12], but we are not aware of any general

investigation which might shed light on the systematic approach of a complete neural network

system for the automatic recognition of cursive character. The techniques of artificial neural

networks are widely used for pattern recognition task over the conventional approaches to handle

such type of problem due to the following reasons:

1. The same alphabet character written by the same person can vary in shape, size and style.

2. It is not only the case with same person but also the shape, size and style of the same character

can vary from person to person.

3. Character image scanned in offline method might have poor quality due to noise present within

it.

4. As there are no pre defined rules about the look of the visual character, the rules should be

heuristically deduced form the set of sample data. The human brain by its very nature does the

same thing using the features discussed in the following two points.



56

5. The human brain can read handwritings of various people having different fashion of writing

because it is adaptive to slight variations and errors in pattern.

6. It can take hold of new styles present in the character due to its ability of learning from

experiences with no time.

J. Praddep, E.Srinivasan,and S.Himavathi [1] have proposed a handwritten character

recognition system using neural network by means of diagonal based feature extraction method.

They have stated with the binarization of the image which results in binary image, which further

undergoes the edge detection and dilation and then segmentation. In the process of segmentation a

series of characters is decomposed into sub image of each individual character, each of which is

converted into 90x60 pixels for classification and recognition process. Each character image

obtained in such a way that it is divided into 54 equal zones, each of size 10 x10 pixels and

then features are extracted from each zone pixels by moving along its diagonals. They have

ended up with 54 features for each of the character. Another feature extraction method gives them 69

features by averaging the values placed in zones row wise and column wise. A feed forward back

propagation neural network having two hidden layers with architecture of 54-100-100-38 is used to

perform the classification with both the features with vertical, horizontal and diagonal orientation

and have found 92.69, 93.68 , 97.80 percent accuracy and 92.69, 94.73, 98.54 percent accuracy,

respectively.

Kauleshwar Prasad, Devvrat C. Nigam, Ashmika Lakhotiya and Dheeren Umre [3] have

converted the character image into a binary image, and then apply character extraction algorithm in

which it has empty traverse list initially. A row is scan pixel by pixel and on getting black pixel, it is

checked if it is already in the traverse list. It is checked that if it is already there then it is ignored,

otherwise added to the traverse list using edge detection algorithm. They have claimed to have good

results by using feed-forward Backpropagation neural network and also stated that poorly chosen

feature extraction method gives poor results.

Ankit Sharma and Dipti R Chaudhary [4] have achieved the accuracy of 85%, using feed

forward neural network. The special form of Reduction is used which includes the noise removal and

edge detection for the feature extraction of grayscale images.

Chirag I Patel, Ripal Patel, Palak Patel [5] have achieved the accuracy of 91%, 889%, 91%,

91%, 94%, 94% using different models of Backpropagation neural networks. After character

extraction and edge detection from the document, it goes under the process of normalization where

the images having various sizes are normalized to a uniform size. The resultant image is applied with

‘Line Fitting’, a skew detection technique to correct this skewedness, in which it is rotated by an

angle θ. The constructed pattern from this method is further used for the training by Backpropagation

algorithm of feed-forward multilayer neural networks.

Anita Pal and Dayashankar [7] have used a multilayer perceptron with one hidden layer to

recognize Handwritten English Character. Boundary tracing along with Fourier Descriptor is used to

extract the feature from the handwritten character. By analyzing its shape and judge against its

features, a character is identified. Test result was found to have fine recognition accuracy of 94%

for handwritten English characters with less training time.

The genetic algorithm is used with feed forward neural network architecture as the hybrid

evolutionary algorithm [27] for the recognition of handwritten English alphabets. In this paper each

character is considered as the gray level image and divided into sixteen parts. The mean of each part

is considered as one feature of the pattern. Thus, there are sixteen features in real numbers are used

as the pattern vector for each image. The trained network performed well for the pattern

classification for test patterns.

In this paper we consider the two approaches of feature extraction from the images of

handwritten capital and small letters of English alphabets. The first method of feature extraction uses

the row wise mean value of the pixels for a processed image of size n x n. The second method



57

consider the each pixel value of the dilated image of size n x n. These features are used to construct

the pattern vectors. The two training sets are formed from these samples examples of pattern vectors.

There are six different feed forward neural networks models are used with six different learning

methods. The performances of these neural networks with different learning rules are analyzed. The

rate of recognition for patterns from the test pattern set is also evaluated. The performance evaluation

indicates that the Radial bias function (RBF) neural network architecture performs better than other

neural network models for both the methods of feature extraction. The rate of recognition for test

pattern set in RBF is found better with respect to other neural network models.

Rest of the paper is containing 6 sections. Section 2 of the paper describes the feature

extraction methods for handwritten English characters. The section 3 discusses about feed forward

neural networks and Backpropagation learning and Radial basis function. The section 4 describes the

experiment and simulation design. The Section 5 presents the simulated results and discussion.

Section 6 describes the conclusion followed by the references.

2. FEATURE EXTRACTION

Feature extraction and selection can be defined as extracting the most representative

information from the raw data, which minimizes within class. The pattern variability while

enhancing between class pattern variability so that, a set of feature are extracted from each class that

helps to distinguish it from other classes, while remaining invariant to characteristic differences

within the class. Here we are considering the feature extraction from the input stimuli with two

methods namely the row wise mean of pixel from a scanned image and each pixel value of the

image. In our approaches we have considered the input data in the form of fifteen different set of

each hand written capital and small English characters by five different peoples. It is quite natural

that the five different people considered the different hand writing and different writing style for

every character. So, in this way we have total 780 samples. Among these 780 samples we used 520

samples for training and the remaining 260 samples were used in test pattern set. Now to prepare our

training set of input output pattern pairs, we consider each scanned hand written character as a color

bit map image. This color bitmap image of a character is now changed into gray level image and then

into binary image as shown in figure 1.

Fig 1 (a) gray level image Fig 1 (b) Binary Image

Now we obtain the images after the edge detection and dilation for both the methods of

feature extraction. The edged and dilated images can show in figure 2.

Fig 2 (a) edged image Fig 2 (b) dilate image



58

Hence to obtain the uniform pattern vector for every input stimulus we make the dilated

images of equal sizes by resizing the images into the size of 30 x 30 as shown in figure 3.

Fig 3: uniform resize images

Now in the first method of feature extraction we construct the pattern vector for the processed

images of English alphabets by taking row wise mean of image of size 30 x 30. The obtained pattern

vector will represent in column matrix of order 30 x 1. Thus we have an input pattern matrix of order

30 x 520 with target output pattern matrix of order 6 x 520 because to distinguish characters from

each other we require 52 different classes so that we use the 6 binary digits to present the target

output pattern vector.

In the second method of feature extraction we construct the pattern vector for the processed

images of English alphabets by taking each pixel value of the image. Therefore we have an input

pattern vector of size 900 x 1. Thus we have an input pattern matrix of order 900 x 520 with target

output pattern matrix of 6 x 520.

Thus, we have constructed the training set of input output patterns pairs to analyze the

performance of multilayer feed forward neural networks with six different learning methods. We

have also constructed our test pattern set to verify the performance of networks. Our test pattern set

consist with another set of hand written characters i.e. order 30 x 30 and 900 x 30 for both the

methods of pattern presentation respectively. Input pattern for these test character set are constructed

in same manner as we did for training set pattern.

3. FEED FORWARD NEURAL NETWORKS MODEL

The neural approach applies biological concepts to machines for pattern recognition. The

outcome of this effort is invention of artificial neural networks. Neural networks can be viewed as

massively parallel computing systems consisting of an extremely large number of simple processors

with many interconnections. Neural network models attempt to use some organizational principles

(such as learning, generalization, adaptively, fault tolerance, distributed representation, and

computation) in a network of weighted directed graphs in which the nodes are artificial neurons and

directed edges (with weights) are connections between neuron outputs and neuron inputs. The main

characteristics of neural networks are that they have the ability to learn complex nonlinear input-

output relationships, use sequential training procedures, and adapt themselves to the data. The most

commonly used family of neural networks for pattern classification tasks [13] is the feed-forward

network, which includes multilayer perceptron and Radial-Basis Function (RBF) networks. These

networks are organized into layers and have unidirectional connections between the layers. The

learning process involves updating network architecture and connection weights so that a network

can efficiently perform a specific pattern recognition task. The increasing popularity of neural

network models to solve pattern recognition problems has been primarily due to their seemingly low

dependence on domain-specific knowledge (relative to model-based and rule-based approaches) and

due to the availability of efficient learning algorithms. Neural networks provide a new suite of

nonlinear algorithms for feature extraction (using hidden layers) and classification (e.g., multilayer

perceptron). In spite of the seemingly different underlying principles, most of the well known neural



59

network models are implicitly equivalent or similar to classical statistical pattern recognition

methods. Ripley [14] and Anderson et al. [15] also discuss the relationship between neural networks

and statistical pattern recognition. Despite these similarities, neural networks do offer several

advantages such as, unified approaches for feature extraction & classification and flexible procedures

for finding good, moderately nonlinear solutions. The advantages of neural networks are their

adaptive-learning, self-organization and fault-tolerance capabilities. For these outstanding

capabilities, neural networks are used for pattern recognition applications. The goal in pattern

recognition is to use a set of example solutions to some problem to infer an underlying regularity

which can subsequently be used to solve new instances of the problem. In the case of feed-forward

networks, the set of example solutions (called a training set), comprises sets of input values together

with corresponding sets of desired output values. The training set is used to determine an error

function in terms of the discrepancy between the predictions of the network, for given inputs, and the

desired values of the outputs given by the training set. A common example of an error function

would be the squared difference between desired and actual output, summed over all outputs and

summed over all patterns in the training set. The learning process then involves adjusting the values

of the parameters to minimize the value of the error function. This kind of error Backpropagation

would be used to reconstruct the input patterns and make them free from error which increases the

performance of the neural networks. However, effective learning algorithms were only known for the

case of networks in which at most one of the layers comprised adaptive interconnections. Such

networks were known variously as perceptron [16] and Adaline [17], and were seriously limited in

their capabilities [18].

The feed forward neural network consists of an input layer of units, one or more hidden

layers, and an output layer. Each node in the layer has one corresponding node in the next layer,

thus creating the stacking effect. The input layer’s nodes have output functions that deliver data to

the first hidden layer nodes. The hidden layer(s) are the processing layer, where all of the actual

computation takes place. Each node in hidden layer computes a sum based on its input from the

previous layer (either the input layer or another hidden layer). The sum is then “compacted” by a

sigmoid function (a logistic transfer function), which changes the sum to a limited and manageable

range. The output sum from the hidden layers is passed on to the output layer, which produces the

final network. The feed-forward networks may contain any number of hidden layers, network with

a single hidden layer can learn any set of training data that a network with multiple layers can

learn, depends upon the complexity of the problem [19]. In feed forward neural network an input

may be either a raw/preprocessed signal or image. Alternatively, some specific features can also be

used. If specific features are used as input, there number and selection is crucial and application

dependent. Weights are connected between an input and a summing node and affect to the

summing operation. The Bias or threshold value is considered as a weight with constant input 1 i.e.

x0=1 and w0=θ, usually the weight are randomized in the beginning [20, 21].

The neuron is the basic information processing unit of a neural network. It consists of: A set of

links, describing the neuron inputs, with weights, w1,w2,w3….wn , An adder function (linear

combiner) for computing the weighted sum as:

j

m

j

j xwv ∑=

=1

(3.1)

And activation function (squashing function) for limiting the amplitude of the neuron output as

shown in figure 3.1

)( bvy += ϕ (3.2)



60

where,

j

m

j

j xwv ∑=

=0

(3.3)

owb =

The output at every node can finally calculates by using sigmoid function

Kxexfy

−+==

1

1)( ;where K is the adaption constant (3.4)

Figure 4: The Functioning of neural network architecture.

The supervised learning mechanism is commonly used to train the feed forward multilayer

neural network architecture. In this learning process a pattern is presented at the input layer. The

pattern will be transformed in its passage through the layers (hidden) of the network until it reaches

the output layer. The units in the output layer all belong to a different category. The outputs of the

network as they are now compared with the outputs as they ideally would have been if this pattern

were correctly classified, in the later case unit with the correct category would have had the largest

output value and the output values of the other output units would have been very small. On the

basis of this comparison all the connection weights are modified a little bit to guarantee that, the

next time this same pattern is presented at the inputs, the value of the output unit that corresponds

with the correct category is a little bit higher than it is now and that, at same time, the output values

of all the other incorrect outputs are little bit lower than they are now. The differences between the

actual outputs and the idealized outputs are propagated back from the top layer to lower layers to

be used at these layers to modify connection weights. Thus it is consider as back propagation

learning algorithm.

The Backpropagation (BP) learning algorithm is currently the most popular supervised

learning rule for performing pattern classification tasks [20]. It is not only used to train feed forward

neural networks such as the multilayer perceptron, it has also been adapted to recurring neural

networks [21]. The BP algorithm is a generalization of the delta rule, known as the least mean square

algorithm. Thus, it is also called the generalized delta rule. The BP overcomes the limitations of the

perceptron learning enumerated by Minsky and Papert [22]. Due to the BP algorithm, the MLP can

be extended to many layers. This algorithm propagates backward the error between the desired signal

and the network output through the network. After providing an input pattern, the output of the

network is then compared with a given target pattern and the error of each output unit calculated.

This error signal is propagated backward, and a closed-loop control system is thus established. The

Weights

Summing

Function

Bias

b

Activation

Function

Local

Field

v

Output

y

x1

x2

xm

ω2

wm

w1

∑ )(−ϕ

………….



61

weights can be adjusted by a gradient-descent approach. In order to implement the BP algorithm, a

continuous, nonlinear, monotonically increasing, differentiable activation function is required as

Logistic Sigmoid function or hyperbolic tangent function.

So that, to provide the training for multi-layer feed forward network to approximate an

unknown function, based on some training data consisting of pairs ( ) Szx ∈, , the input pattern vector

x represents a pattern of input to the network, with desired output pattern vector z from the training

set S. The objective function for optimization or minimization is defined in the sum of

instantaneously squared error as:

∑=

−=J

j

jj

P STE1

2)(2

1 (3.5)

where 2

)( jj ST − is the squared difference between the actual output of the network on the output

layer for the presented input pattern P and the target output pattern vector for the pattern P. All the

network parameters ( )1−mW and mθ , m = 2 ・・・M, can be combined and represented by the matrix

[ ]ijwW = . So that, the error function E can be minimized by applying the gradient-descent procedure

as:

W

EW

∂

∂−=∆ η (3.6)

where η is a learning rate or step size, provided that it is a sufficiently small positive number.

Applying the chain rule the equation (3.6) can express as

( ) ( )

( )

( )mij

mj

mj

mij w

u

u

E

w

E

∂

∂

∂

∂=

∂

∂ +

+

1

1 (3.7)

while ( )

( ) ( )( ) ( ) ( )( ) ( )m

im

jmm

jmij

mij

mj

oowww

u=+

∂

∂=

∂

∂∑

+

+

1

1

θ (3.8)

and ( ) ( )

( )

( ) ( )( ) ( )( )11

1

1

11

++

++

+

++ ∂

∂=

∂

∂

∂

∂=

∂

∂m

j

m

jm

j

im

j

m

j

m

j

m

j

uo

E

u

o

o

E

u

Eφ (3.9)

For the output unit m=M-1

( ) jmj

eo

E=

∂

∂+1

(3.10)

For the hidden units, m = 1,2,3………,M − 2,

( )

1

221

2

+

=++ ∑

+

∂

∂=

∂

∂ mj

j

mmj

m

u

E

o

Eω

ω ω

ω (3.11)

Define the delta function by

( )( )mp

mj

u

E

∂

∂=δ (3.12)



62

for m = m = 2,3………,M. By substituting (3.7), (3.11), and (3.12) into (3.9), we finally obtain the

following equations:

For the output units, m = M − 1,

( ) ( ) ( )( )M

jM

jjM

j ue φδ −= (3.13)

For hidden units, m = 1,……..,M − 2,

( ) ( ) ( )( ) ( ) 1

1

22

+

=

+∑

+

−= m

J

mMj

Mjj

Mj

m

ue ω

ω

ω ωδφδ (3.14)

Equations (3.13) and (3.14) provide a recursive method to solve ( )1+mjδ for the whole network. Thus,

W can be adjusted by

( )( ) ( )m

im

jmij

oE 1+−=

∂

∂δ

ω (3.15)

For the activation transfer functions, we have the following relations for the logistic function

( ) ( ) ( )[ ]uuu φβφφ −= 1 (3.16)

For the tanh function

( ) ( )[ ]uu21 φβφ −= (3.17)

The update for the biases can be done in two ways. The biases in the (m+1)th

layer i.e. θ(m+1)

can be expressed as the expansion of the weight W(m)

, that is, ( ) ( ) ( )( )m

J

mm

m 1,01,01 ..........,.........

+=+ ωωθ .

Accordingly, the output o(m) is expanded into ( ) ( ) ( )( )m

J

mm

mooo ....,,.........,1 1= . Another way is to use a

gradient-descent method with regard to θ(m)

, by following the above procedure. Since the biases can

be treated as special weights, these are usually omitted in practical applications. The algorithm is

convergent in the mean if max

20

λη << , where λmax is the largest eigenvalue of the autocorrelation of

the vector x, denoted as C [23]. When η is too small, the possibility of getting stuck at a local

minimum of the error function is increased. In contrast, the possibility of falling into oscillatory traps

is high when η is too large. By statistically preprocessing the input patterns, namely, de-correlating

the input patterns, the excessively large eigenvalues of C can be avoided and thus, increasing η can

effectively speed up the convergence. PCA preconditioning speeds up the BP in most cases, except

when the pattern set consists of sparse vectors. In practice, η is usually chosen to be 0 < η < 1 so that

successive weight changes do not overshoot the minimum of the error surface. The BP algorithm can

be extended or improved by adding a momentum term [24] and known as Gradient Descent with

momentum term. As per this learning rule the weight update between output layer and hidden layer

is represented by following weight updating equations as:

( ) ( )( )( )sw

sww

Esw

ho

H

i

ho

ho

ho∆−

+∆+∂

∂−=+∆ ∑

= ααη

1

11

1

(3.18)



63

Whereas the weight update between hidden layer and input layer can be represent as:

( ) ( )( )( )sw

sww

Esw

ho

N

i

ih

ih

ih∆−

+∆+∂

∂−=+∆ ∑

= ααη

1

11

1

(3.19)

Where α is the momentum factor, usually 0 < α ≤ 1.

The BP algorithm is a supervised gradient-descent technique, wherein the MSE between the

actual output of the network and the desired output is minimized. It is prone to local minima in the

cost function. The performance can be improved and the occurrence of local minima reduced by

allowing extra hidden units, lowering the gain term, and with modified training with different initial

random weights. These are also efficient variant of Backpropagation learning algorithms like

conjugate descent, Levenberg-Marquardt Backpropagation and Radial bias functions. There are

different six neural networks are used with these learning techniques namely feed forward network,

fitting network, pattern recognition, generalized regression neural network and Radial basis neural

networks. These models and learning algorithms are used to improve the performance of feed

forward multilayer network architecture for the given training set.

3.1 Radial Basis Function

In this section, we investigate the network structure related to the multi layer feed-forward neural

network (FFNN), implemented using the Radial Basis Function. RBF networks emulate the behavior of

certain biological networks. RBF-MLP is essentially feed forward neural network with three layers

namely Input, Hidden and Output. The single hidden layer consists of the locally tuned or locally

sensitive units, and the output layer (in most cases) consists of binary responsive units. In the hidden layer

units, the unit response is localized and decreases as a function of the distance of input from the unit’s

receptive field center. The RBF-MLP uses a static Gaussian function as the nonlinearity for the hidden

layer neurons. The Gaussian function responds only to a small region of the input space where the

Gaussian is centered. The key to a successful implementation of these networks is to find suitable centers

for the Gaussian functions [25] in supervisory mode. The process starts with the training of input layer. Its

function is to obtain the Gaussian centers and the widths from the input samples. Thus achieved centers

are then prearranged within the weights of the hidden layer. The output of this layer is derived from the

input samples weighted by a Gaussian combination. The advantage of using the radial basis function is

that it discovers the input to output map using local approximations [26]. Usually the supervised segment

is simply a linear combination of the approximations. Since linear combiners have few weights, these

networks train extremely fast and require fewer training samples.

In contrast to the classical MLP the activation of a neuron is not given by the weighted sum of its

all inputs but by the computation of a RBF. The RBF that we use is Gaussian Function, which can be

expressed as:

∅�� =

| ��|�

��

(3.1.1)

Where ф is the Gaussian Function, x is the input to the neuron i, µ i is the basis of neuron i and σi

is the amplitude of neuron i. The input layer has i nodes, the hidden and the output layer have k and j

neurons, respectively. Each input neuron corresponds to a component of an input vector x. Each node in

the hidden layer uses an RBF as its non linear activation function and performs a non-linear transform of

the input. The output layer is a linear combiner, mapping the nonlinearity into a new space. The RBF-

MLP can achieve a global optimal solution to the adjustable weights in the minimum MSE by using the

linear optimization method. Therefore for an input pattern x, the output of the jth

node of output layer can

be defined as:



64

j

K

k

kikkjj wxwxy 0

1

)()( +−=∑=

µφ For all j=1, 2…….J, (3.1.2)

Where )(xy j is the jth

output of the RBF-MLP, kjw as the connection weight from the kth

hidden

unit to the jth

output unit , jw0 is the threshold or network bias term, kµ is the prototype or centre of the kth

hidden unit.

The RBF )(xφ is typically selected as the Gaussian function as:

)2

exp()(2

2

k

ki

k

xx

σ

µφ

−−= (3.1.3)

For k = 1, 2… K where kσ represents the width of the neuron. Where x is the N- dimensional

input vector and kµ is the vector determining the centre of the Radial Basis function kφ . The

weight vector between the input layer and the kth

hidden layer neuron can be interpreted as the

centre kµ Therefore for an input pattern x, the Error of the network can be defined as same in

equation (3.5).

The error function has been considered in equation (3.5) is the least mean square (LMS). This

error will minimize along with the decent gradient of error surface in the weight space between hidden

layer and the output layer. The same error will be minimized with the Gaussian Radial Basis function’s

parameter as defined in equation (3.1.3). Now, we obtain the expression for the derivatives of the error

function with respect to the weights Radial Basis function parameters for the set of P pattern pairs (xp,

yp) as; where p=1 to P.

ik

p

ikw

Ew

∂

∂−=∆ 1η (3.1.4)

k

p

k

E

µηµ

∂

∂−=∆ 2

(3.1.5)

and k

p

k

E

σησ

∂

∂−=∆ 3

(3.1.6)

The update equation for weight in a standard MLP is represented as;

)1()()()1( −∆+∆+=+ tWtWtWtW kikiikik αη (3.1.7)

Where )(tWik the state of weight matrix at iteration t is, )1( +tWik is the state of weight matrix at next

iteration, )1( −tWik is the state of weight matrix at previous iteration, )(tW ki∆ is current change/

modification in weight matrix, α is standard momentum variable to accelerate learning process and η is the learning rate of the network.

Since �� is the outcome of Radial Basis Function used and gradient for the network is given by partial

differentiation of this error with respect to different parameters. Hence from the equation (3.5) we have,

∆�� =�� ∑ �� − ��

�� ∗ ! ��"��!

�

��

(3.1.8)



65

∆#�� == �$ ∑ �� ∗ �� − ��

��,� ∗ &�"'��(�� ∗

! ��"��!

�

��

(3.1.9)

And, ∆)�� =�* ∑ �� ∗ �� − ��

��,� ∗|&�"'��|

$(�+ ∗

! ��"��!

�

��

(3.1.20)

We have from equations (3.1.8), (3.1.9) & (3.1.20) the expressions for change in weight vector &

Radial basis function parameters to accomplish the learning in supervised way. The setting of the Radial

Basis function parameters with supervised learning represents a non linear optimization problem which

will typically be computationally intensive and may be proven to find local minima of the error function.

Thus, for reasonable well localized RBF, an input will generate a significant activation in a small region

and the opportunity of getting stuck at a local minimum is small. Hence, the training of the network for L

pattern pair i.e. (xl, y

l) will accomplish in iterative manner with the modification of weight vector.

4. EXPERIMENT AND SIMULATION DESIGN

In this paper we have implemented two feature extraction methods on six different artificial

neural network models in Matlab, namely feed forward network (newff), fitting network (newfit),

generalized regression (newgrnn), pattern recognition (newpr), radial basis network (newrb) and

exact radial basis network (newrbe) with Levenberg-Marquardt Backpropagation and Radial bias

functions . In this simulation design for each neural network model we have created 2 networks, one

for lower case another for upper case characters which consume the input retrieved from first feature

extraction method. Similarly another two networks are created for the same models of neural

networks those use data generated from second method of feature extraction. Thus, there are four

neural networks created for each model of neural network. The architectural detail of the each model

is presented in table 1, 2, 3, 4, 5 and 6 respectively.

(1) Newff network with Levenberg-Marquardt learning rule

Table 1: Architecture detail about Newff

Description Network 1 Network 2

Number of hidden layers 3 2

Number neurons in hidden layer 37-23-7 21-11

Number neurons in output layer 5 5

Number of inputs 30 30

Transfer function tansig- tansig- tansig tansig- tansig

Training function trainlm trainlm

Learning rate 1.0000e-003 1.0000e-003

Max number of epochs 1000 1000

Error goal 0 0

Number of samples of each alphabet for pattern 10 10

Number of samples of each alphabet for training 5 5



66

(2) Newfit network with Levenberg-Marquardt learning rule

Table 2: Architecture detail about Newfit

Description Network 3 Network 4

Number of hidden layers 3 2

Number neurons in hidden layer 31-17-9 21-11

Number neurons in output layer 5 5

Number of inputs 30 30

Transfer function tansig- tansig- tansig tansig- tansig

Training function trainlm trainlm

Learning rate 1.0000e-003 1.0000e-003

Max number of epochs 1000 1000

Error goal 0 0

Number of samples of each alphabet for pattern 10 10

Number of samples of each alphabet for

training 5 5

(3) Newgrnn Network with Radial Basis Function

Table 3: Architecture detail about Newgrnn

Description Network 5

Number of hidden layers 1

Number neurons in hidden layer 260

Number neurons in output layer 5

Number of inputs 30

Number of samples of each alphabet for pattern 10

Number of samples of each alphabet for training 5

(4) NewPR network with Levenberg-Marquardt learning rule

Table 4: Architecture detail about Newpr



Number neurons in hidden layer 41-31-17-7


Number of inputs 30

Transfer function tansig- tansig- tansig- tansig

Training function trainscg

Max number of epochs 1000

Error goal 0





67

(5) Newrebe network with Radial Basis Function

Table 5: Architecture detail about Newrbe





Number of inputs 30



(6) Newrb network with Radial Basis Function

Table 6: Architecture detail about Newrb





Number of inputs 30



Therefore six neural network models are used with eight neural network architectures. The

two different supervised learning methods are used i.e. Levenberg-Marquardt learning and Radial

Basis function approximation. The simulation results are obtained from all these networks for both

the feature extraction methods.

5. RESULT AND DISCUSSION

The simulated results are obtained from both the methods of feature extraction with all the six

models of neural networks by using Levenberg-Marquardt Backpropagation learning and Radial

Basis approximation. The training set consists with handwritten English capital and small alphabets.

The performance of neural network model for training and testing is presented with regression value

and regression line for the simulated output values of the neural network models. The performance

of all the six neural network models for training and testing is presented in table 7, 8, 9, 10, 11 & 12

and figure 5, 6, 7, 8, 9, 10, 11 & 12.

Table 7: Simulated Results for Newff model with Levenberg-Marquardt learning rule

Description Pattern data training

regression value

Average of regression value for test

data samples

Network 1 using Feature

Extraction method 1 0.33743 0.211826









68

Figure 5: Performance of Network1 for both the feature extraction methods


Table 8: Simulated Results for Newfit model with Levenberg-Marquardt learning rule


regression value

Average of regression value

test data samples












69


Table 9: Simulated Results for Newgrnn model with Radial Basis Function Approximation

Description Pattern data training regression

value

Average of regression value test data

samples




Extraction method 2 1 0.72463


Table 10: Simulated Results for NewPR model with Levenberg-Marquardt learning rule

Description Pattern data training regression

value

Average of regression value test

data samples

Network 6 using Feature Extraction

method 1 0.485805 0.343696


method 2 0.846857 0.396131




70

Table 11: Simulated Results for Newrbe model with Radial Basis Function Approximation


regression value

Average of regression value test

data samples


method 1 1 0.403733


method 2 1 0.112004


Table 12: Simulated Results for Newrb model with Radial Basis Function Approximation


regression value

Average of regression

value test data samples


method 1 1 0.303037


method 2 1 0.112487

Figure 12: Performance of Network 8 for both the feature extraction methods

The simulation results of training are indicating that the performance of network models with

Radial Basis function approximation is better than network models with Levenberg-Marquardt



71

Backpropagation learning technique for the second feature extraction method i.e. each pixel value of

the resize and processed image. Now we evaluate the performance of these trained neural network

models for reorganization of handwritten English capital and small alphabets, those did not present

during the training. The performances of these networks are presented in table 13 and table 14. The

table 13 is presenting the performance of all the six neural network models for the prototype input

patterns processed with first method of feature extraction whereas the table 14 is presenting the

performance of all he six neural network models for the same input patterns processed with second

method of feature extraction. The first row of both the tables is representing the rate of correct

recognition for the presented input patterns. The second row of both the tables is presenting the

correct number of recognized pattern among the presented arbitrary patterns.

Table 13: Performance of all the six models for pattern recognition of presented prototype input

patterns using first method of feature extraction

Description newff newfit newgrnn newpr newrbe newrb

% of characters

recognized 10 0 10 20 25 30

Total no. of

characters

recognized

2 0 2 4 5 6

Presented

Prototype

Patterns

Correct and Incorrect Recognition

newff newfit newgrnn newpr netrbe newrb

e � � � � � �

j � � � � � �

k � � � � � �

m � � � � � �

n � � � � � �

p � � � � � �

q � � � � � �

t � � � � � �

u � � � � � �

v � � � � � �

B � � � � � �

E � � � � � �

H � � � � � �

J � � � � � �

K � � � � � �

L � � � � � �

R � � � � � �

X � � � � � �

Y � � � � � �

Z � � � � � �

From the table 13 it can observer that the performance of Radial Basis function neural

network is better than the other neural networks models. Its performance is even better than the exact

radial basis function network. It correctly recognized 6 out of 20 prototype arbitrary input patterns of

handwritten English alphabets. These patterns did not use in the training set and selected as the

samples of test patterns.



72

Table 14: Performance of all the six models for pattern recognition of presented prototype input

patterns using second method of feature extraction

newff newfit newgrnn newpr newrbe newrb

%. of

characters

recognized

5 0 85 5 15 15

Total no.

of

characters

recognized

1 0 17 1 3 3

e � � � � � �

j � � � � � �

k � � � � � �

m � � � � � �

n � � � � � �

p � � � � � �

q � � � � � �

t � � � � � �

u � � � � � �

v � � � � � �

B � � � � � �

E � � � � � �

H � � � � � �

J � � � � � �

K � � � � � �

L � � � � � �

R � � � � � �

X � � � � � �

Y � � � � � �

Z � � � � � �

From the table 14 it can observer that the performance of generalized neural network model

trained with Radial Basis function approximation is better than the other neural networks models. Its

performance is even better than the exact radial basis function network and Radial basis Network. It

correctly recognized 17 out of 20 prototype arbitrary input patterns of handwritten English alphabets.

It is quit noticeable that the performance of neural network is better for second method feature

extraction i.e. each pixel value of the resize image only for generalized neural network with radial

basis function approximation whereas the performance of other neural network models is better for

first method of feature extraction i.e. mean value of pixel of processed image.

6. CONCLUSION

This paper presented the performance evaluation of six different models of feed forward

neural networks trained with Levenberg-Marquardt Backpropagation learning technique and Radial

basis function approximation for the handwritten curve script of capital and small English alphabets.

There are two feature extractions method used. In the first method the row wise mean of the

proceeded image of alphabets is considered and in second method each pixel value of the resize and

precede image is considered. The simulated results are indicating that the generalized neural

network trained with radial basis function approximation for second method of feature extraction



73

yields the highest rate of recognition i.e. 85% for randomly chosen 10 lower case and 10 uppercase

characters. The remaining models of neural networks are showing poor performance irrespective of

the feature extraction method. The following observations are considered from the simulation of

performance evaluation:

1. First method of feature extraction uses 30 features for each character whereas second method

of feature extraction uses 100 features for the each character. Thus, it seems that more the

number of features more is the accuracy level as far as generalized neural network model is

concern.

2. In the training process the regression value for Radial basis network is found perfect but during

the validation for the test pattern the performance degrades rapidly. Thus the network is well

tuned for the training set but not able to generalize the behavior. It is working as good

approximation and bad generalization.

3. The second method of feature extraction is providing more feature values in the pattern

information with respect to the first method of feature extraction. Therefore, the performance

of each neural network model is found better for the second feature extraction method.

7. REFERENCES

1. J. Pradeep, E. Srinivasan and S. Himavathi, “Diagonal based feature extraction for

handwritten alphabets recognition system using neural network”, International Journal of

Computer Science & Information Technology (IJCSIT), 3 (1) 27-38 (2011)

2. R. Plamondon and S. N. Srihari, “On-Line and Off-Line handwriting recognition recognition

– A complete survey”, IEEE Transaction on pattern Recognition and Machine Intelligence,

22 (1) 63-84 (2000)

3. Kauleshwar Prasad, D. C. Nigam, Ashmika Lakhotiya and Dheeren Umre, “Character

Recognition Using Matlab’s Neural Network Toolbox”, International Journal of u- and e-

Service, Science and Technology, 6 (1) 13-20 (2013)

4. Ankit Sharma and Dipti R Chaudhary, “Character Recognition Using Neural Network”,

International Journal of Engineering Trends and Technology (IJETT), 4 (4) 662-667 (2013)

5. Chirag I. Patel, Ripal Patel and Palak Patel, “Handwritten Character Recognition using

Neural Network”, International Journal of Scientific & Engineering Research, 2 (5) 1-6

(2011)

6. Manish Mangal and Manu Pratap Singh, “Handwritten English vowels recognition using

hybrid evolutionary Feed-forward neural network”, Malaysian Journal of Computer Science,

19 (2) 169-187 (2006).

7. Anita pal and Dayashankar Singh, “Handwritten english character recognition using neural

network”, International Journal of Computer Science & Communication, 1 (2) 141-144

(2010).

8. K. Y. Rajput and Sangeeta Mishra, “Recognition and editing of Devnagri handwriting using

neural network”, Proc33dings of SPIT-IEEE Colloquium and International Conference,

Mumbai, India, 1 66-70 (2008)

9. Meenakshi Sharma and Kavita Khanna, “Offline signature verification using supervised and

unsupervised neural networks”, International Journal of Computer Science and Mobile

Computing, 3 (7) 425-436 (2014).

10. Priyanka Sharma and Manavjeet Kaur, “Classification in Pattern Recognition: A Review“,

International Journal of Advanced Research in Computer Science and Software Engineering,

3 (4) 298-306 (2013)

11. K. Fukushima and N. Wake, "Handwritten alphanumeric character recognition by the

neocognitron.", IEEE Trans. on Neural Networks, 2 (3) 355-365 (1991).



74

12. Y. L. Cun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbad and L. D. Jackel,

"Handwritten digit recognition with a Backpropagation network", Neural Information

Processing Systems, Touretzky editor, Morgan Kaufmann Publishers, (2) 396-404 (1990).

13. A. K. Jain, J. Mao and K. M. Mohiuddin, “Artificial Neural Networks: A Tutorial.”,

Computer, 31-44, (1996).

14. B. Ripley, “Statistical Aspects of Neural Networks.”, Networks on Chaos: Statistical and

Probabilistic Aspects. U. Bornndorff-Nielsen, J. Jensen, and W. Kendal, eds., Chapman and

Hall, (1993).

15. J. Anderson, A. Pellionisz and E. Rosenfeld, “Neuro-computing 2: Directions for Research”,

Cambridge Mass.: MIT Press, (1990).

16. F. Rosenblatt, “Principles of Neurodynamics: Perceptron and the Theory of Brain

Mechanisms”, Spartan Books, Washington, D.C., (1962).

17. B. Widrow and M. A. Lehr, “30 years of adaptive neural networks: perceptron, Madeline, and

Backpropagation.”, Proceedings of the IEEE 78 (9) 1415-1442 (1990).

18. M. L. Minsky and S. A. Papert, “Perceptron.”, Cambridge, MA: MIT Press. Expanded

Edition, (1990).

19. S. B. Cho, “Fusion of neural networks with fuzzy logic and genetic algorithm”, IOS Press,

363–372 (2002).

20. B. Widrow and M. E. Hoff, ”Adaptive switching circuits” IRE Eastern Electronic Show &

Convention (WESCON1960), Convention Record, (4) 96–104 (1960).

21. P. J. Werbos, “Beyond regressions: New tools for prediction and analysis in the behavioral

sciences”, PhD Thesis, Harvard University, Cambridge, MA, (1974).

22. F. J. Pineda, “Generalization of back-propagation to recurrent neural networks”, Physical Rev

Letter, (59) 2229–2232 (1987).

23. R. Battiti, and F. Masulli, “BFGS optimization for faster automated supervised learning”, In:

Proc. Int. Neural Network Conf. France, (2) 757-760 (1990)

24. D. E. Rumelhart, G. E. Hinton and R. J. Williams, “Learning internal representations by error

propagation”, MIT Press, Cambridge, (1) 318–362 (1986).

25. P. Muneesawang and L. Guan, "Image retrieval with embedded sub-class information using

Gaussian mixture models", Proceedings of International Conference on Multimedia and

Expo, (2003).

26. S. Lee. “Off-Line Recognition of Totally Unconstrained Handwritten Numerals Using

Multilayer Cluster Neural Network”, IEEE Trans. Pattern Anal. Mach. Intell. 18 (6) 648-652

(1996).

27. S. Shrivastava, S. and Manu Paratp Singh, “Performance evaluation of feed-forward

neural network with soft computing techniques for hand written English alphabets”,

Journal of Applied Soft Computing, Elsevier, (11) 1156-1182 (2011).

28. V. Subba Ramaiah and R. Rajeswara Rao, “Automatic Text-Independent Speaker Tracking

System Using Feed-Forward Neural Networks (FFNN)” International journal of Computer

Engineering & Technology (IJCET), Volume 5, Issue 1, 2014, pp. 11 - 20, ISSN Print: 0976

– 6367, ISSN Online: 0976 – 6375.

29. M. M. Kodabagi, S. A. Angadi and Chetana. R. Shivanagi, “Character Recognition of

Kannada Text In Scene Images Using Neural Network” International Journal Of Graphics

And Multimedia (IJGM), Volume 4, Issue 1, 2014, pp. 9 - 19, ISSN Print: 0976 – 6448, ISSN

Online: 0976 –6456.

30. Ms. Aruna J. Chamatkar and Dr. P.K. Butey, “Performance Analysis of Data Mining

Algorithms with Neural Network” International journal of Computer Engineering &

Technology (IJCET), Volume 6, Issue 1, 2015, pp. 1 - 11, ISSN Print: 0976 – 6367, ISSN

Online: 0976 – 6375.

6 HANDWRITTEN CHARACTER RECOGNITION USING … CHARACT… · handwriting recognition systems ......

Documents

Transcript of 6 HANDWRITTEN CHARACTER RECOGNITION USING … CHARACT… · handwriting recognition systems ......