handwritten marathi character recognition using neural network

CHAPTER 5

HANDWRITTEN

MARATHI CHARACTER

RECOGNITION

USING

NEURAL NETWORK

A Neural Network Based Handwritten Character Recognition for Marathi Script

5. Handwritten Marathi Character Recognition using Neural Network 65

5.1 Introduction

Handwritten character recognition is gaining popularity due to its potential

application areas which would reduce the task of data entry and save the time in case of

Form filling, Postal Automation, and Banking etc. But developing a system for

handwritten character recognition poses a challenge to the researchers due to the varying

shape of the character that may depend upon the writer, the physical and mental

condition of the writer, the acquisition device, pen width, pen ink color and many other

factors. Moreover, handwritten Marathi characters tend to be more complex due to their

structure, shape and presence of modifiers as discussed earlier. All these reasons demand

a pattern recognition task that takes care of the challenges at each stage of the pattern

recognition system. The next section discusses the development of handwritten Marathi

recognition system using neural network. Section 5.2 discusses the need for neural

network. Section 5.3 describes the single stage recognition system while section 5.4

describes the multistage recognition system. Experiments and results are presented in

Section 5.5 and finally Section 5.6 presents the concluding remarks.

5.2 Need for neural network

The study of handwritten numeral recognition using neural network studied in

Chapter 4 gave encouraging results to adopt neural networks as recognition tool for the

handwritten Marathi characters. Although, neural network required large training time,

the testing of the characters was very fast.

5.3 Single stage recognition system

The block diagram of the system built in the initial phase is shown in Figure 5.1.

Here character images were scanned one by one, cropped to a fixed dimension of say

52x52 manually, and stored in the database. Then a binarization operator converted the

gray scale character image into binary before extraction of features. Various features like

Character

image

Pre-

processing

Feature

Extraction

Neural

Network

Recognized

character

Figure 5.1 Single stage Marathi character recognition system



standard deviation, Euclidean distance, and Radon features were extracted and applied to

neural network during both training and testing. The neural network had to be training

for large number of characters and their samples. The output of the neural network gave

the recognized character. The recognition rate of single stage system was not satisfactory

as the number of characters to be classified was huge and there were less number of

training samples per character. Moreover the images in the database required large

amount of memory. Also a large neural network needed to be built which took long time

to train the network.

To overcome these situations and to improve the recognition rate, a multistage

recognition system was developed. This system segmented and cropped the characters

automatically. Then a two stage structural classifier classified the pre-processed

characters based upon the structural features and stored them in the database

automatically. Thus 24 structural classes were obtained. The final recognition was done

by applying the features derived from the characters to the neural network. A separate

neural network is built for each of the 24 classes. So the network of that structural class

is used to which the character under test belongs to. The next section discusses the

multistage recognition system in detail.

5.4 Multi stage recognition system

The proposed system designed to recognize handwritten Marathi characters is

shown in Figure 5.2. The detailed discussion of the blocks in the system is given further.

Unlike the system discussed in the previous sections, this system adopts a multistage

recognition scheme. Here, the characters are pre-classified first into various groups based

upon the similarity of the structural features and then the character belonging to the

respective class is recognized by the neural network built for that class.

Figure 5.2 Multi stage recognition system

Character

image

Pre-

processing

Segmentation

Neural

Network

Recognized

character Feature

Extraction

Structural

classification

Image

Resize



5.4.1 Data collection

The system is designed to recognize 39 handwritten Marathi characters as shown

in Figure 5.3. The characters are scanned at 300 dpi in bmp file format. A flatbed scanner

is used to scan the pages for training as well as testing.

The handwritten Marathi character dataset is collected from more than 10 writers.

A sample set of about 100 samples per character is collected resulting in about more than

4000 character samples in the database.

5.4.2 Pre-processing

Pre-processing plays an important role in handwritten character recognition as in

any other pattern recognition task. Improper selection of parameters during pre-

processing may result in variations in the shape of the character eventually affecting the

recognition rate. The following algorithms are implemented in this stage to obtain a

binary image while keeping the shape characteristics intact as far as possible.

• Binarization

A point operator converts the gray scale character images to binary. This operator

separates pixels that have values within specified range i.e. the object from the rest or the

Figure 5.3 Characters used in the proposed system



background. This is done by choosing a threshold that separates the object and the

background. Here, the threshold is chosen by using uniform thresholding after

normalization. In uniform thresholding, pixels above a threshold are set to white and

those below the threshold are set to black. Uniform thresholding requires the knowledge

of the gray levels otherwise the target features might not get selected or may get

misclassified after the thresholding process. So the handwritten characters were tested

and checked for the global features for various threshold values before finalization of a

threshold. On testing about one third of the characters in the database, the normalized

threshold value of 0.85 was found to be an optimum value that gave correct feature

selection of global features in most of the cases.

• Averaging

Many a times when a character is handwritten, it exhibits lesser width at the

curvature than at other parts of the character. This point is more likely to break during

binarization. Hence, a 3x3 averaging operator is implemented before binarization, which

blurs the image resulting into bridging small gaps and retaining the actual shape of the

character.

• Opening

Handwritten characters show various undesirable effects like unwanted strokes,

gaps or breaks which occur due to binarization. The unwanted strokes occur more often

between the pen lifting and placing points and their occurrence depend upon the writing

style and the ink viscosity. These strokes may result into unwanted feature detection after

binarization. In order to avoid this, the binarized image should be cleaned. This is done

by using morphological opening operator. Morphological opening removes thin

protrusions, breaks thin connections and smoothes the object contour. The morphological

opening of image I by structuring element B is simply erosion of I by B followed by

dilation of the result by B as indicated in equation 8. Here the structuring element B used

removes all objects smaller than 40 pixels using 8 – connectivity.

BBIBI ⊕Θ= )(o (5.1)

where, BI o indicates the opening of the image I with the structuring element B.



5.4.3 Segmentation

Isolated handwritten Marathi characters are written on plain pages. During

training, these page images are pre-processed and the characters are segmented and

stored in the database after structural classification. During testing, again the pages are

pre-processed, segmented, structurally classified and recognized automatically. The lines

and characters are written in such a way that they do not overlap. The lines and

characters are segmented using horizontal and vertical projection profiles. Peaks of the

projection profiles separate the lines and the characters in the document. The number of

lines and number of characters in each line are separated. An array of size 1x k is created

which stores the values of the upper left and lower right corner coordinates of each

character, where k is the total number of characters in the document.

Marathi characters have a header line on the top of the characters. The header line

above all the characters join in a word, forming a continuous line on the top of the word.

This makes it easier to separate the characters in a word. If header line is drawn on the

character/word, the horizontal projection profile of the image is computed to remove it.

The row max_row with maximum number of black pixels, max_count is obtained. The

width of the header line is then computed. This is done by finding the starting row,

start_head and the ending row, end_head of the header line. The rows corresponding to

50% of the max_count on the upper and lower side of max_row were considered to be

the start_head and end_head respectively. This percentage was fixed after analyzing

about one third of the images in the database. Again the image is cleaned by using

opening operator to remove the unwanted remains of the header line. Characters without

header line are used here for database creation.

5.4.4 Structural classification

The large number of Marathi character set with a wide range of variations in the

writing style demands a pre-classification of the characters before the final recognition.

The pre-classification is done using a two stage classification based upon the structural

features. These stages are

1. Detection of global features,

2. Detection of local features.



The first stage employs classification using global features like presence of

vertical line in the character, its position in the character and the presence of holes. These

features can be termed as global features. They classify the characters coarsely into six

classes. The detection of global features is followed by the detection of the local features

which further classify the six classes into four classes each. The local features are

character specific than the global level features.

• Detection of global features

Global features used for classifying the characters at first stage include:

1. Presence of vertical bar in the character,

2. Position of the vertical bar

3. Presence of the enclosed region.

About 60% of Devanagari characters exhibit a vertical line in them. This vertical

line is at the center in two of the characters while in the rest, it is towards the end. The

remaining 40% of the characters do not have a vertical line. Also another feature that is

enclosed region is present in 56% of the characters approximately. This approximation is

due to the writing style of individual writer.

To detect whether these features are present in the character, the following algorithm is

implemented.

Detection of vertical bar in the character: PresenceVertical projection profile of the

character image f(m,n) is calculated in order to find the column with maximum number

of pixels nmax. An average height of the vertical bar is considered to be 85 percent of the

total height of the image. This value is set as a threshold TV to find the presence of a

vertical bar in a character. Thus if,

nmax ≥ TV (5.2)

then, vertical bar is said to be present, else, there is no vertical bar in the character.

Detection of position of vertical bar in the character: If the presence of vertical bar is

detected, further its location is found so as to further classify the character as per its

location within the character. Again an average threshold TM is set to be 30 percent, for

the position of the vertical bar in the character. If,

T≥ TM, (5.3)

Then, the vertical bar is towards the center else towards the end, where,



T = ((n-nmax)/n) x100 (5.4)

Table 5.1 First stage structural classification

Global features

Class Mid bar End bar

Enclosed

region

No bar enclosed (NBE) 0 0 1

No bar not enclosed (NBNE) 0 0 0

Mid bar enclosed (MBE) 1 0 1

Mid bar not enclosed (MBNE) 1 0 0

End bar enclosed (EBE) 0 1 1

End bar not enclosed (EBNE) 0 1 0

Detection of presence of enclosed region in the character: Here, 8-adjancency is used to

find the presence of connected components or the enclosed regions. Two foreground

pixels p and q are said to be connected if there exists an 8-connected path between them,

consisting entirely of foreground pixels. Table 5.1 shows the classification of characters

based upon these global features.

• Detection of local features

The local features used here are:

1. Presence of endpoints in the lower part of the character.

To find these features, the binary image f(m,n) is first thinned to yield a single

pixel wide character. This character is then passed to hit-or-miss transformation to find

the endpoints of the character. Eight structuring elements are used to detect the location

of endpoints in all eight directions. The image is then partitioned into four quadrants as

shown in Figure 5.4. A vector V = [V1 V2 V3 V4] is defined, where V1, V2, V3 and V4

indicate the presence of endpoints in quadrant 1, 2, 3 and 4 respectively by setting or

resetting them accordingly. Here quadrants 3 and 4 only are of interest. The presence of

end points in quadrants 3 and 4 set the values V3 and V4. The combination of values in

V3 and V4 classifies the character into four classes 00, 01, 10 and 11, where, class 00

indicates that there is no end point in quadrant 3 and 4, whereas, 01 indicates that there is

an endpoint in quadrant 3 and no endpoint in quadrant 4 and so on. Table 5.2 shows this

classification.



Table 5.2 Second stage structural classification

Local features

Class End point in

quadrant 4

End point in

quadrant 3

00 Absent Absent

01 Absent Present

10 Present Absent

11 Present Present

Figure 5.4 Character partitioning for end point detection

Is vertical line

present?

Is enclosed

region present?

NB (No bar)

MB (Mid

bar)

EB (End

bar)

NBNE (No bar

not

encld.)

NBE (No bar

encld.)

MBNE (Mid bar

not

encld.)

EBNE (End bar

not

encld.)

MBE (Mid

bar

encld.)

EBE (End

bar

encld.)

00

01

10

11

00

01

10

11

00

01

10

11

00

01

10

11

00

01

10

11

00

01

10

11

Stage 1

Stage 2

Figure 5.5 Two stage structural classification



The entire two stage structural classification is shown in Figure 5.5. After this

classification, 24 classes are obtained which form the entire database.

5.4.5 Image resize

After the structural classification, the characters are resized to a fixed size. The

resized image is further used for extracting the features. Bicubic interpolation is used for

resizing where, the output pixel value is the weighted average of pixels in the nearest 4-

by-4 neighborhood. Figure 5.6 shows the image resizing operation where, a binary image

of size 57x35 is resized to 16x16.

(a) (b)

5.4.6 Feature extraction

Feature extraction stage in character recognition, as in any pattern recognition

task, plays a major role in improving the recognition accuracy [111]. The features are

extracted from resized binary characters. Thus, the characteristics used for recognition lie

solely in the shape variations, the orientation of the character and the position of the

strokes in the character. The features selected should consider these properties of a

character. Hence various feature extraction techniques that take these things into account

are implemented. These are then applied to the neural network and the results are

analyzed. The features extracted are:

1. Euclidean distance features

2. Radon features

3. Normalized pixel density features

Figure 5.6 Image resizing a) Cropped image, and b) Resized image



Here the Euclidean features consider the distance between the strokes, the Radon

features take into account the orientation of the character and the normalized pixel

density features consider the shape features of the character. These feature extraction

techniques are explained next.

• Euclidean distance feature

To account for the distance of the features within the character, Euclidean

distance transform is computed. The distance transforms play a central role in the

comparison of binary images, particularly for images resulting from local feature

detection techniques such as edge or corner detection. The distance between pixels can

be measured using Euclidean distance transform [112] in which the value at a pixel is

linearly proportional to the Euclidean distance between that pixel and the object pixel

closest to it. The Euclidean distance DE between two pixels (i,j) and (k,l) is:

( ) ( )[ ] ( ) ( )[ ] 2/122,, ljkilkji,DE −+−= (5.5)

(a) (b)

Thus the Euclidean distance provides a metric or measure of the separation of

points in the image. It is the straight-line distance between two pixels. It calculates the

distance between each pixel that is set to off (0) and the nearest nonzero pixel for binary

images as shown in Figure 5.7. Consider a 5x5 binary image as shown below:

bw =

0 0 0 0 0

0 1 0 0 0

0 0 0 0 0

0 0 0 1 0

0 0 0 0 0

The Euclidean distance features using the instruction bwdist(bw) for the image is:

D =

1.4142 1.0000 1.4142 2.2361 3.1623

1.0000 0 1.0000 2.0000 2.2361

1.4142 1.0000 1.4142 1.0000 1.4142

2.2361 2.0000 1.0000 0 1.0000

3.1623 2.2361 1.4142 1.0000 1.4142

Figure 5.7 Euclidean distance calculation a) Binary image,

and b) Euclidean distance



Here the features indicate the distance of that pixel with the nearest non-zero

pixel. As seen above, the dimensions of D are same as that of bw. For an image of size

16x16, the total number of Euclidean distance features obtained is 256. In order to

reduce these features, the horizontal and vertical profiles of the feature matrix D are

calculated. It means that the features are added row-wise and column-wise. Thus 16

features for horizontal profile of D and 16 for vertical profile of D are obtained. These 32

features are applied to the neural network.

• Radon feature

Radon transform is used to compute the projection of an object in its image.

Applying the Radon transform to an image f(m,n) for a given set of angles can be thought

of as computing the projection of the image along the given angles. The resulting

projection is the sum of the intensities of the pixels in each direction, i.e. a line integral.

The radon function computes projections of an image matrix along specified directions.

A projection of a two-dimensional function f(x,y) is a set of line integrals. The radon

function computes the line integrals from multiple sources along parallel paths, or

beams, in a certain direction. The beams are spaced 1 pixel unit apart. To represent an

image, the radon function takes multiple, parallel-beam projections of the image from

different angles by rotating the source around the center of the image. Figure 5.8 shows a

single projection at a specified rotation angle.

For example, the line integral of f(x,y) in the vertical direction is the projection of

f(x,y) onto the x-axis; the line integral in the horizontal direction is the projection of f(x,y)

Figure 5.8 Parallel-beam projection at rotation angle theta



onto the y-axis. Figure 5.9 shows horizontal and vertical projections for a simple two-

dimensional function.

Projections can be computed along any angle θ. In general, the Radon transform

of f(x,y) is the line integral of f parallel to the y´-axis

∫∞

∞−

+−= ')cos'sin',sin'cos'()'( dyyxyxfxR θθθθθ (5.6)

Where,

−

=

y

x

y

x

θθθθ

cossin

sincos

'

' (5.7)

Figure 5.10 illustrates the geometry of the Radon transform.

The Radon transform is a mapping from the Cartesian rectangular coordinates to

a distance and angle, known as polar coordinates, in which each point corresponds to a

straight line in the spatial domain. Conversely, each point in the spatial domain becomes

a sine curve in the projection domain [113]. The Radon transform of an image I for the

angles specified in the vector theta can be computed using the radon function with the

syntax:

[R,xp] = radon (I, theta);

where, the columns of R contain the Radon transform for each angle in theta. The vector

xp contains the corresponding coordinates along the x´-axis. The center pixel of I is

Figure 5.9 Horizontal and vertical projections of a simple function



defined to be floor ((size (I) + 1)/2); this is the pixel on the x´-axis corresponding to x’ =

0. The algorithm first divides pixels in the image into four sub pixels and projects each

sub pixel separately, as shown in Figure 5.11.

Each subpixel's contribution is proportionally split into the two nearest bins,

according to the distance between the projected location and the bin centers. If the

Figure 5.10 Geometry of the Radon transform

Figure 5.11 Radon transform calculation



subpixel projection hits the center point of a bin, the bin on the axes gets the full value of

the subpixel, or one-fourth the value of the pixel. If the subpixel projection hits the

border between two bins, the subpixel value is split evenly between the bins.

• Normalized pixel density feature

At first, the cropped binary character is stretched to a fixed 70x50 pixels

dimension. The resized character image is used to calculate normalized pixel density

features. Then the characters are partitioned into 35 non-overlapping zones of size

100x100 as shown in the Figure 5.12.

(a) (b) (c)

The number of zero pixels s(x, y) is found, where x = 1, 2, …,7 and y = 1, 2, …,5.

The normalized pixel density features npd (x, y) are calculated as,

npd( x, y) = (100 - s( x, y)/100) (5.8)

5.4.7 Neural network design

Artificial neural networks are one of the popular techniques used for

classification due to their learning and generalization abilities. They have been

traditionally used for handwritten character recognition application in various other

languages [114, 115] including Devanagari. Out of various architectures, multilayer

perceptron (MLP) is widely used. The MLP is a fully connected network, with an input

layer, hidden layers and an output layer, where every neuron in a layer is connected to

5

0

7

0

1

1

Figure 5.12 Normalized pixel density calculation (a) Original image,

(b) Resized binary image, and (c) Character partitioning



each and every neuron in the next layer by a weighted link through which the state of the

neuron is transmitted. Each layer has a different activation function, with different

number of neurons in it. Such a network is shown in Figure 5.13. It consists of an input

layer, a hidden layer and an output layer. The feature vector is applied as the input signal

to the neurons in the hidden layer from the input layer. The neurons are connected to

each other by links which are associated with weights. A bias is similar to weight. It acts

exactly as a weight on a connection from a unit whose activation is always one. Each

neuron in the hidden layer includes a nonlinear activation function.

This operation is described by [116]:

),( 1111 ++++ += mmmmm baWfa (5.9)

for m = 0, 1,…, M-1, where M is the number of layers in the network. The neurons in the

first layer receive external inputs or features:

,0 pa = (5.10)

which provides the starting point for the network. The outputs of the neurons in the last

layer are considered the network outputs:

taM = (5.11)

Once the network weights and biases are initialized, the network is ready for training or

learning. The hidden neurons enable the network to learn complex tasks by extracting

progressively more meaningful features from the input vectors. A learning rule is a

procedure for modifying weights and biases of a network. The purpose of the learning

rule is to train the network to perform a pattern recognition task. During training the

weights and biases of the network are iteratively adjusted to minimize error. The training

Figure 5.13 Multilayer perceptron



process requires a set of examples of proper network behavior, network inputs p and

target outputs t. As each input is applied to the network, the network output is compared

to the target. The error is calculated as the difference between the target output and the

network output. The goal is to minimize the average of the sum of these errors. The

simplest implementation of back propagation learning updates the network weights and

biases in the direction in which the performance function decreases most rapidly, the

negative of the gradient. One iteration of this algorithm can be written as

xk+1 = xk - αk gk (5.12)

where xk is a vector of current weights and biases, gk is the current gradient, and αk is the

learning rate.

There are two different ways in which this gradient descent algorithm can be

implemented: incremental mode and batch mode. In incremental mode, the gradient is

computed and the weights are updated after each input is applied to the network. In batch

mode, all the inputs are applied to the network before the weights are updated. In batch

mode the weights and biases of the network are updated only after the entire training set

has been applied to the network. The gradients calculated at each training example are

added together to determine the change in the weights and biases. The batch steepest

descent training function is used for training the network. The weights and biases are

updated in the direction of the negative gradient of the performance function. The

training parameters associated with this type of training are:

• epochs

• goal

• learning rate

The learning rate is multiplied times the negative of the gradient to determine the

changes to the weights and biases. The larger the learning rate, the bigger the step. If the

learning rate is made too large, the algorithm becomes unstable. If the learning rate is set

too small, the algorithm takes a long time to converge. The training stops if the number

of iterations exceeds epochs, or if the performance function drops below goal.

5.5 Experiments and Results

The sufficient number of samples in the database ensures that the final

classification accuracy is independent of the structural classification results. There in no



limitation on the size of the database. The parameters for the structural classification can

be varied as per the writing style of the writer in case of fixed writer systems. The

parameters for mid bar and end bar detection are selected after testing about one third of

the characters in the database.

• Pre-processing results

(a) (b) (c) (d)

Pre-processing includes image cleaning and binarization. Figure 5.14 gives the

results for the entire pre-processing algorithms applied. Here the original image is shown

in Figure 5.14 (a). The image binarization result obtained using 0.85 threshold value is

shown in Figure 5.14 (b). But it results into breaking of the character at the curvature.

Hence an averaging filter of 3x3 mask is applied to bridge the gap as seen in Figure 5.14

(c). Finally, image opening removes the unwanted isolated stroke having the area lesser

than 40 pixels as shown in Figure 5.14 (d).

(a)

(b) (c)

Figure 5.14 Preprocessing results for a character (a) Original image, (b)

Binarized image, (c) Binarization after averaging, and (d) Opening result

Figure 5.15 Preprocessing results for a word (a) Original word, (b)

Binarization result, and (c) Binarization after Averaging result



(a) (b)

The pre-processing results for a word are shown in Figure 5.15. Figure 5.15 (a)

shows the original image of the word. Figure 5.15 (b) shows the binarization result and

Figure 5.15 (c) shows the result of binarization after averaging, which gives a smooth

boundary of the image. Since there are no isolated dots or strokes in this image, the

opening operation gives the same result as Figure 5.15 (c).The pre-processing result on

applying all the mentioned algorithms for a document shown in Figure 5.16 (a) is given

in Figure 5.16 (b).

• Segmentation result

Figure 5.17 shows the segmentation results. At first, the header line in the pre-

processed word image in Figure 5.17 (a) is removed as shown in Figure 5.17 (b).

(a) (b) (c)

Figure 5.16 Preprocessing results for a document

(a) Original image, (b) Pre-processing result

Figure 5.17 Segmentation results (a) Pre-processed image,

(b) header line removal, (c) character segmentation



Then the characters are segmented, cropped as shown in Figure 5.17 (c) and

passed to structural classification stage. Similarly, the pre-processed characters (without

header line) in the document shown in Fig. 5.16 are segmented, cropped and given to the

structural classification stage.

• Structural classification result

The pre-processed character is classified into one of 24 classes as discussed

earlier. The characters in the document shown in Figure 5.16 are segmented and

classified into various classes based upon their structural parameters as shown in Table

5.3.

Table 5.3 Structural classification of characters in the document in Figure 5.16

Sr. no. Class Characters classified

1. NBE/00

2. NBE/01

3. NBE/11

4. NBNE/00

5. NBNE/01

6. NBNE/10

7. NBNE/11

8. MBE/00

9. MBE/01

10. MBNE/01

11. EBE/01

12. EBE/11

13. EBNE/01

14. EBNE/11



As seen in row 1 of the table, three characters got classified to NBE/00 (No bar

enclosed/00). This means that in all these characters, the vertical bar is absent, the

enclosed region is present and there are no endpoints in the lower quadrants hence Class

00. In row 2, the global features remain the same as in row 1, but there is an endpoint

found in quadrant 3 and hence NBE character with Class 01. Similar reasoning can be

applied to all other characters in the table to find their final class out of 24 classes.

The handwritten characters may take different shapes as per the writing style of

the writer. This may result in classification of the same character to different classes.

Figure 5.18 shows character ‘la’ (la) written by four different writers. Here, the first

sample is classified to the class ‘NBNE/11’ (No vertical bar, no enclosed region, end

points in both 3rd

and 4th

quadrants), the second is classified to ‘NBE/11’(No vertical bar,

enclosed region present, end points in both 3rd

and 4th

quadrants), the third belongs to

‘EBNE/11 (End bar present, no enclosed region, end points in both 3rd

and 4th

quadrants)’ while the forth is classified to ‘EBE/11’ (End bar present, enclosed region

present, end points in both 3rd

and 4th

quadrants). Table 5.4 shows number of characters

classified in each of the structural class.

The above example shows that a character belongs to any class based upon the

writing style and the features detected. Table 5.4 shows all the classes in the database,

number of characters in that class and the characters in it. In this table, same character

can be found in many classes because of the variations in the writing style of the writers.

Table 5.4 Characters in the structural class

Sr.

no.

Class No. of

characters

classified

Characters classified

1. NBE/00

6

(a)

Figure 5.18 Character ‘lalalala’ (la) assigned to different structural classes

(a) NBNE/11, (b) NBE/11, (c) EBNE/11, (d) EBE/11

(b) (c) (d)



2. NBE/01

24

3. NBE/10

5

4.

NBE/11

16

5. NBNE/00

3

6. NBNE/01

21

7. NBNE/10

8

8. NBNE/11

27

9. MBE/00

3

10. MBE/01

5

11. MBE/10 2

12.

MBE/11 5

13. MBNE/00 1

14. MBNE/01

2

15. MBNE/10 0 ____

16. MBNE/11

1



17. EBE/00

2

18. EBE/01

16

19. EBE/10

1

20. EBE/11

14

21. EBNE/00

1

22. EBNE/01

18

23. EBNE/10

3

24. EBNE/11

19

• Feature extraction results

The characters are stored in the database after the two stage structural

classification discussed previously. The character is further resized to a fixed size. The

resized character is used for feature extraction. Here Euclidean distance features, Radon

features and normalized pixel density features are extracted. The feature extraction

technique, resize factor and number of features are given in Table 5.5.



Table 5.5 Feature extraction parameter settings

Sr.

no.

Feature extraction

technique

Resize factor No. of features

1. Euclidean distance

transform

16x16 32 (16 in horizontal and 16 in vertical

directions)

2. Radon transform 16x16 81 (27 features in each 0°, 45° and 90°

respectively)

3. Normalized pixel

density features

70x50 35 (character partitioning into 35 non-

overlapping blocks of size 100x100)

Figure 5.19 shows the Euclidean distance features for the image in both

horizontal and vertical direction. 16 features in horizontal and 16 features in vertical

direction are appended to get 32 features in all.

Euclidean features for character 'ha'

0

5

10

15

20

25

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Vertical direction

Horizontal direction

(a) (b)

Euclidean features for character 'pha'

0

5

10

15

20

25

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Vertical direction

Horizontal direction

(a) (b)

Figure 5.20 shows the Radon distance features for the respective images. 27

features given by R in each of the three directions namely, 0°, 45° and 90° are appended

to get 81 features in all.

Figure 5.19 Euclidean distance features a) Original image, b) its

features in vertical and horizontal direction



Radon features for character 'ha'

0

2

4

6

8

10

12

14

16

18

20

-13

-11 -9 -7 -5 -3 -1 1 3 5 7 9

11

13

xpR

0 degrees

45 degress

90 degress

(a) (b)

Radon features for character 'pha'

0

2

4

6

8

10

12

14

16

18

20

-13

-11 -9 -7 -5 -3 -1 1 3 5 7 9

11

13

xp

R

0 degrees

45 degrees

90 degrees

(a) (b)

Finally the normalized pixel density features for the respective character images

are shown in Figure 5.21. Here 35 features are obtained by partitioning the 70x50 sized

characters into 35 non-overlapping blocks and counting the number of zeros in them. All

these features are applied to the neural network separately, built for each of the 24

classes and the results are analyzed.

Figure 5.20 Radon features a) Original image, b) its features in 0, 45

and 90 degrees



(a) (b) (a) (b)

• Neural network results

Table 5.6 Neural network parameter settings

Parameters Values

Number of inputs With Euclidean features: 32

With Radon features:81

With Normalized pixel density features: 35

Number of hidden layers 1

Number of neurons in hidden layer Equal to number of inputs

Hidden layer activation function Log-sigmoid transfer function

Number of neurons in output layer Number of characters in the structural class

Output layer activation function Linear

Learning rate 0.5

Goal 0.001

Error function mse

Maximum number of epoch 1000

Training algorithm Gradient descent backpropagation

A separate neural network is built for all the 24 classes. Thus 24 networks are

built in all. Each network is trained using two third of the characters in the database,

while one third of the characters are used for testing. During testing, that neural network

Figure 5.21 Normalized pixel density features a) Original

image, and b) its features



out of 24 is used, to which the character under test belongs after structural classification.

The number of input neurons equal to the number of features extracted. The number of

output neurons equal to the characters in the structural class. Thus the number of output

neurons differs in each of the 24 classes. The parameters setting of the neural network

are given in Table 5.6.

Results showed that some characters which exhibit large shape variations get

classified into many structural classes. For example, the samples of a character got

classified into 13 different classes out of 24 during database creation. The recognition

rate given by such characters is lesser comparatively as their shape shows wide

variations and as a result, the number of samples available in the database is less. The

characters with lesser shape variations give more than 98% recognition rate.

Table 5.7 Recognition Performance without structural classification

Sr.

no.

Feature

extraction

technique

Resize

factor

Number

of

features

Recognition

technique

Training

time

(sec)

Recognition

rate (%)

1. Radon features 16x16 81 Neural

network

135.15 81.88

2. Euclidean feature 16x16 32 Neural

network

194.50 82.64

3. Normalized pixel

density feature

70x50 35 Neural

network

160.97 84.00

Table 5.8 Recognition performance with structural classification

Sr.

no.

Feature

extraction

technique

Resize

factor

Number

of

features

Recognition

technique

Training

time

(sec)

Recognition

rate (%)

1. Radon feature 16x16 81 Neural

network

36.05 89.00

2. Euclidean

features

16x16 32 Neural

network

74.57 90.14

3. Normalized pixel

density feature

70x50 35 Neural

network

54.63 91.54



The recognition performance using various features and the multilayer perceptron

without structural classification is presented in Table 5.7. It also shows the resize factor,

number of features obtained with each technique, time required to train the huge single

network created with the number of inputs equal to number of features extracted as per

the recognition rate and number of outputs equal to 39.

Table 5.8 gives the recognition performance with structural classification

resulting into 24 classes and 24 neural networks in turn. The neural networks have the

inputs equal to the features extracted again. But the outputs equal to the number of

characters in the respective structural class. The table indicates that the implementation

of structural classification improves the structural classification considerably. The time

required to test a character is approximately 0.05 sec, when tested on an Intel Core 2 Duo

CPU running on 2GHz with 2 GB RAM.

• Recognized character results

The index of the output neuron with maximum value is used to find the

recognized character. The recognized character is displayed as text using Kiran font.

5.6 Concluding remarks

The recognition of handwritten Marathi characters is quite a challenging task.

The single stage recognition technique fails to give satisfactory performance. Hence a

multistage recognition system is designed to meet the challenges. This system improves

the performance considerably over single stage classification system as indicated in

Table 5.7 and Table 5.8. In case of Radon features, the recognition performance

improves by 8.26%, while in case of Euclidean features, it improves by 6.36% and in

case of Normalized pixel density features it improves by 7.54%. Since the Radon

features are not invariant to the orientation of the character, the recognition rate with

radon features is the least in both the classification systems. The normalized pixel

density features yields the highest recognition accuracy of 91.54% in multistage

recognition system.

handwritten marathi character recognition using neural network

Documents

Transcript of handwritten marathi character recognition using neural network