handwritten marathi character recognition using neural network
Transcript of handwritten marathi character recognition using neural network
CHAPTER 5
HANDWRITTEN
MARATHI CHARACTER
RECOGNITION
USING
NEURAL NETWORK
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 65
5.1 Introduction
Handwritten character recognition is gaining popularity due to its potential
application areas which would reduce the task of data entry and save the time in case of
Form filling, Postal Automation, and Banking etc. But developing a system for
handwritten character recognition poses a challenge to the researchers due to the varying
shape of the character that may depend upon the writer, the physical and mental
condition of the writer, the acquisition device, pen width, pen ink color and many other
factors. Moreover, handwritten Marathi characters tend to be more complex due to their
structure, shape and presence of modifiers as discussed earlier. All these reasons demand
a pattern recognition task that takes care of the challenges at each stage of the pattern
recognition system. The next section discusses the development of handwritten Marathi
recognition system using neural network. Section 5.2 discusses the need for neural
network. Section 5.3 describes the single stage recognition system while section 5.4
describes the multistage recognition system. Experiments and results are presented in
Section 5.5 and finally Section 5.6 presents the concluding remarks.
5.2 Need for neural network
The study of handwritten numeral recognition using neural network studied in
Chapter 4 gave encouraging results to adopt neural networks as recognition tool for the
handwritten Marathi characters. Although, neural network required large training time,
the testing of the characters was very fast.
5.3 Single stage recognition system
The block diagram of the system built in the initial phase is shown in Figure 5.1.
Here character images were scanned one by one, cropped to a fixed dimension of say
52x52 manually, and stored in the database. Then a binarization operator converted the
gray scale character image into binary before extraction of features. Various features like
Character
image
Pre-
processing
Feature
Extraction
Neural
Network
Recognized
character
Figure 5.1 Single stage Marathi character recognition system
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 66
standard deviation, Euclidean distance, and Radon features were extracted and applied to
neural network during both training and testing. The neural network had to be training
for large number of characters and their samples. The output of the neural network gave
the recognized character. The recognition rate of single stage system was not satisfactory
as the number of characters to be classified was huge and there were less number of
training samples per character. Moreover the images in the database required large
amount of memory. Also a large neural network needed to be built which took long time
to train the network.
To overcome these situations and to improve the recognition rate, a multistage
recognition system was developed. This system segmented and cropped the characters
automatically. Then a two stage structural classifier classified the pre-processed
characters based upon the structural features and stored them in the database
automatically. Thus 24 structural classes were obtained. The final recognition was done
by applying the features derived from the characters to the neural network. A separate
neural network is built for each of the 24 classes. So the network of that structural class
is used to which the character under test belongs to. The next section discusses the
multistage recognition system in detail.
5.4 Multi stage recognition system
The proposed system designed to recognize handwritten Marathi characters is
shown in Figure 5.2. The detailed discussion of the blocks in the system is given further.
Unlike the system discussed in the previous sections, this system adopts a multistage
recognition scheme. Here, the characters are pre-classified first into various groups based
upon the similarity of the structural features and then the character belonging to the
respective class is recognized by the neural network built for that class.
Figure 5.2 Multi stage recognition system
Character
image
Pre-
processing
Segmentation
Neural
Network
Recognized
character Feature
Extraction
Structural
classification
Image
Resize
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 67
5.4.1 Data collection
The system is designed to recognize 39 handwritten Marathi characters as shown
in Figure 5.3. The characters are scanned at 300 dpi in bmp file format. A flatbed scanner
is used to scan the pages for training as well as testing.
The handwritten Marathi character dataset is collected from more than 10 writers.
A sample set of about 100 samples per character is collected resulting in about more than
4000 character samples in the database.
5.4.2 Pre-processing
Pre-processing plays an important role in handwritten character recognition as in
any other pattern recognition task. Improper selection of parameters during pre-
processing may result in variations in the shape of the character eventually affecting the
recognition rate. The following algorithms are implemented in this stage to obtain a
binary image while keeping the shape characteristics intact as far as possible.
• Binarization
A point operator converts the gray scale character images to binary. This operator
separates pixels that have values within specified range i.e. the object from the rest or the
Figure 5.3 Characters used in the proposed system
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 68
background. This is done by choosing a threshold that separates the object and the
background. Here, the threshold is chosen by using uniform thresholding after
normalization. In uniform thresholding, pixels above a threshold are set to white and
those below the threshold are set to black. Uniform thresholding requires the knowledge
of the gray levels otherwise the target features might not get selected or may get
misclassified after the thresholding process. So the handwritten characters were tested
and checked for the global features for various threshold values before finalization of a
threshold. On testing about one third of the characters in the database, the normalized
threshold value of 0.85 was found to be an optimum value that gave correct feature
selection of global features in most of the cases.
• Averaging
Many a times when a character is handwritten, it exhibits lesser width at the
curvature than at other parts of the character. This point is more likely to break during
binarization. Hence, a 3x3 averaging operator is implemented before binarization, which
blurs the image resulting into bridging small gaps and retaining the actual shape of the
character.
• Opening
Handwritten characters show various undesirable effects like unwanted strokes,
gaps or breaks which occur due to binarization. The unwanted strokes occur more often
between the pen lifting and placing points and their occurrence depend upon the writing
style and the ink viscosity. These strokes may result into unwanted feature detection after
binarization. In order to avoid this, the binarized image should be cleaned. This is done
by using morphological opening operator. Morphological opening removes thin
protrusions, breaks thin connections and smoothes the object contour. The morphological
opening of image I by structuring element B is simply erosion of I by B followed by
dilation of the result by B as indicated in equation 8. Here the structuring element B used
removes all objects smaller than 40 pixels using 8 – connectivity.
BBIBI ⊕Θ= )(o (5.1)
where, BI o indicates the opening of the image I with the structuring element B.
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 69
5.4.3 Segmentation
Isolated handwritten Marathi characters are written on plain pages. During
training, these page images are pre-processed and the characters are segmented and
stored in the database after structural classification. During testing, again the pages are
pre-processed, segmented, structurally classified and recognized automatically. The lines
and characters are written in such a way that they do not overlap. The lines and
characters are segmented using horizontal and vertical projection profiles. Peaks of the
projection profiles separate the lines and the characters in the document. The number of
lines and number of characters in each line are separated. An array of size 1x k is created
which stores the values of the upper left and lower right corner coordinates of each
character, where k is the total number of characters in the document.
Marathi characters have a header line on the top of the characters. The header line
above all the characters join in a word, forming a continuous line on the top of the word.
This makes it easier to separate the characters in a word. If header line is drawn on the
character/word, the horizontal projection profile of the image is computed to remove it.
The row max_row with maximum number of black pixels, max_count is obtained. The
width of the header line is then computed. This is done by finding the starting row,
start_head and the ending row, end_head of the header line. The rows corresponding to
50% of the max_count on the upper and lower side of max_row were considered to be
the start_head and end_head respectively. This percentage was fixed after analyzing
about one third of the images in the database. Again the image is cleaned by using
opening operator to remove the unwanted remains of the header line. Characters without
header line are used here for database creation.
5.4.4 Structural classification
The large number of Marathi character set with a wide range of variations in the
writing style demands a pre-classification of the characters before the final recognition.
The pre-classification is done using a two stage classification based upon the structural
features. These stages are
1. Detection of global features,
2. Detection of local features.
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 70
The first stage employs classification using global features like presence of
vertical line in the character, its position in the character and the presence of holes. These
features can be termed as global features. They classify the characters coarsely into six
classes. The detection of global features is followed by the detection of the local features
which further classify the six classes into four classes each. The local features are
character specific than the global level features.
• Detection of global features
Global features used for classifying the characters at first stage include:
1. Presence of vertical bar in the character,
2. Position of the vertical bar
3. Presence of the enclosed region.
About 60% of Devanagari characters exhibit a vertical line in them. This vertical
line is at the center in two of the characters while in the rest, it is towards the end. The
remaining 40% of the characters do not have a vertical line. Also another feature that is
enclosed region is present in 56% of the characters approximately. This approximation is
due to the writing style of individual writer.
To detect whether these features are present in the character, the following algorithm is
implemented.
Detection of vertical bar in the character: PresenceVertical projection profile of the
character image f(m,n) is calculated in order to find the column with maximum number
of pixels nmax. An average height of the vertical bar is considered to be 85 percent of the
total height of the image. This value is set as a threshold TV to find the presence of a
vertical bar in a character. Thus if,
nmax ≥ TV (5.2)
then, vertical bar is said to be present, else, there is no vertical bar in the character.
Detection of position of vertical bar in the character: If the presence of vertical bar is
detected, further its location is found so as to further classify the character as per its
location within the character. Again an average threshold TM is set to be 30 percent, for
the position of the vertical bar in the character. If,
T≥ TM, (5.3)
Then, the vertical bar is towards the center else towards the end, where,
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 71
T = ((n-nmax)/n) x100 (5.4)
Table 5.1 First stage structural classification
Global features
Class Mid bar End bar
Enclosed
region
No bar enclosed (NBE) 0 0 1
No bar not enclosed (NBNE) 0 0 0
Mid bar enclosed (MBE) 1 0 1
Mid bar not enclosed (MBNE) 1 0 0
End bar enclosed (EBE) 0 1 1
End bar not enclosed (EBNE) 0 1 0
Detection of presence of enclosed region in the character: Here, 8-adjancency is used to
find the presence of connected components or the enclosed regions. Two foreground
pixels p and q are said to be connected if there exists an 8-connected path between them,
consisting entirely of foreground pixels. Table 5.1 shows the classification of characters
based upon these global features.
• Detection of local features
The local features used here are:
1. Presence of endpoints in the lower part of the character.
To find these features, the binary image f(m,n) is first thinned to yield a single
pixel wide character. This character is then passed to hit-or-miss transformation to find
the endpoints of the character. Eight structuring elements are used to detect the location
of endpoints in all eight directions. The image is then partitioned into four quadrants as
shown in Figure 5.4. A vector V = [V1 V2 V3 V4] is defined, where V1, V2, V3 and V4
indicate the presence of endpoints in quadrant 1, 2, 3 and 4 respectively by setting or
resetting them accordingly. Here quadrants 3 and 4 only are of interest. The presence of
end points in quadrants 3 and 4 set the values V3 and V4. The combination of values in
V3 and V4 classifies the character into four classes 00, 01, 10 and 11, where, class 00
indicates that there is no end point in quadrant 3 and 4, whereas, 01 indicates that there is
an endpoint in quadrant 3 and no endpoint in quadrant 4 and so on. Table 5.2 shows this
classification.
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 72
Table 5.2 Second stage structural classification
Local features
Class End point in
quadrant 4
End point in
quadrant 3
00 Absent Absent
01 Absent Present
10 Present Absent
11 Present Present
Figure 5.4 Character partitioning for end point detection
Is vertical line
present?
Is enclosed
region present?
NB (No bar)
MB (Mid
bar)
EB (End
bar)
NBNE (No bar
not
encld.)
NBE (No bar
encld.)
MBNE (Mid bar
not
encld.)
EBNE (End bar
not
encld.)
MBE (Mid
bar
encld.)
EBE (End
bar
encld.)
00
01
10
11
00
01
10
11
00
01
10
11
00
01
10
11
00
01
10
11
00
01
10
11
Stage 1
Stage 2
Figure 5.5 Two stage structural classification
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 73
The entire two stage structural classification is shown in Figure 5.5. After this
classification, 24 classes are obtained which form the entire database.
5.4.5 Image resize
After the structural classification, the characters are resized to a fixed size. The
resized image is further used for extracting the features. Bicubic interpolation is used for
resizing where, the output pixel value is the weighted average of pixels in the nearest 4-
by-4 neighborhood. Figure 5.6 shows the image resizing operation where, a binary image
of size 57x35 is resized to 16x16.
(a) (b)
5.4.6 Feature extraction
Feature extraction stage in character recognition, as in any pattern recognition
task, plays a major role in improving the recognition accuracy [111]. The features are
extracted from resized binary characters. Thus, the characteristics used for recognition lie
solely in the shape variations, the orientation of the character and the position of the
strokes in the character. The features selected should consider these properties of a
character. Hence various feature extraction techniques that take these things into account
are implemented. These are then applied to the neural network and the results are
analyzed. The features extracted are:
1. Euclidean distance features
2. Radon features
3. Normalized pixel density features
Figure 5.6 Image resizing a) Cropped image, and b) Resized image
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 74
Here the Euclidean features consider the distance between the strokes, the Radon
features take into account the orientation of the character and the normalized pixel
density features consider the shape features of the character. These feature extraction
techniques are explained next.
• Euclidean distance feature
To account for the distance of the features within the character, Euclidean
distance transform is computed. The distance transforms play a central role in the
comparison of binary images, particularly for images resulting from local feature
detection techniques such as edge or corner detection. The distance between pixels can
be measured using Euclidean distance transform [112] in which the value at a pixel is
linearly proportional to the Euclidean distance between that pixel and the object pixel
closest to it. The Euclidean distance DE between two pixels (i,j) and (k,l) is:
( ) ( )[ ] ( ) ( )[ ] 2/122,, ljkilkji,DE −+−= (5.5)
(a) (b)
Thus the Euclidean distance provides a metric or measure of the separation of
points in the image. It is the straight-line distance between two pixels. It calculates the
distance between each pixel that is set to off (0) and the nearest nonzero pixel for binary
images as shown in Figure 5.7. Consider a 5x5 binary image as shown below:
bw =
0 0 0 0 0
0 1 0 0 0
0 0 0 0 0
0 0 0 1 0
0 0 0 0 0
The Euclidean distance features using the instruction bwdist(bw) for the image is:
D =
1.4142 1.0000 1.4142 2.2361 3.1623
1.0000 0 1.0000 2.0000 2.2361
1.4142 1.0000 1.4142 1.0000 1.4142
2.2361 2.0000 1.0000 0 1.0000
3.1623 2.2361 1.4142 1.0000 1.4142
Figure 5.7 Euclidean distance calculation a) Binary image,
and b) Euclidean distance
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 75
Here the features indicate the distance of that pixel with the nearest non-zero
pixel. As seen above, the dimensions of D are same as that of bw. For an image of size
16x16, the total number of Euclidean distance features obtained is 256. In order to
reduce these features, the horizontal and vertical profiles of the feature matrix D are
calculated. It means that the features are added row-wise and column-wise. Thus 16
features for horizontal profile of D and 16 for vertical profile of D are obtained. These 32
features are applied to the neural network.
• Radon feature
Radon transform is used to compute the projection of an object in its image.
Applying the Radon transform to an image f(m,n) for a given set of angles can be thought
of as computing the projection of the image along the given angles. The resulting
projection is the sum of the intensities of the pixels in each direction, i.e. a line integral.
The radon function computes projections of an image matrix along specified directions.
A projection of a two-dimensional function f(x,y) is a set of line integrals. The radon
function computes the line integrals from multiple sources along parallel paths, or
beams, in a certain direction. The beams are spaced 1 pixel unit apart. To represent an
image, the radon function takes multiple, parallel-beam projections of the image from
different angles by rotating the source around the center of the image. Figure 5.8 shows a
single projection at a specified rotation angle.
For example, the line integral of f(x,y) in the vertical direction is the projection of
f(x,y) onto the x-axis; the line integral in the horizontal direction is the projection of f(x,y)
Figure 5.8 Parallel-beam projection at rotation angle theta
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 76
onto the y-axis. Figure 5.9 shows horizontal and vertical projections for a simple two-
dimensional function.
Projections can be computed along any angle θ. In general, the Radon transform
of f(x,y) is the line integral of f parallel to the y´-axis
∫∞
∞−
+−= ')cos'sin',sin'cos'()'( dyyxyxfxR θθθθθ (5.6)
Where,
−
=
y
x
y
x
θθθθ
cossin
sincos
'
' (5.7)
Figure 5.10 illustrates the geometry of the Radon transform.
The Radon transform is a mapping from the Cartesian rectangular coordinates to
a distance and angle, known as polar coordinates, in which each point corresponds to a
straight line in the spatial domain. Conversely, each point in the spatial domain becomes
a sine curve in the projection domain [113]. The Radon transform of an image I for the
angles specified in the vector theta can be computed using the radon function with the
syntax:
[R,xp] = radon (I, theta);
where, the columns of R contain the Radon transform for each angle in theta. The vector
xp contains the corresponding coordinates along the x´-axis. The center pixel of I is
Figure 5.9 Horizontal and vertical projections of a simple function
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 77
defined to be floor ((size (I) + 1)/2); this is the pixel on the x´-axis corresponding to x’ =
0. The algorithm first divides pixels in the image into four sub pixels and projects each
sub pixel separately, as shown in Figure 5.11.
Each subpixel's contribution is proportionally split into the two nearest bins,
according to the distance between the projected location and the bin centers. If the
Figure 5.10 Geometry of the Radon transform
Figure 5.11 Radon transform calculation
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 78
subpixel projection hits the center point of a bin, the bin on the axes gets the full value of
the subpixel, or one-fourth the value of the pixel. If the subpixel projection hits the
border between two bins, the subpixel value is split evenly between the bins.
• Normalized pixel density feature
At first, the cropped binary character is stretched to a fixed 70x50 pixels
dimension. The resized character image is used to calculate normalized pixel density
features. Then the characters are partitioned into 35 non-overlapping zones of size
100x100 as shown in the Figure 5.12.
(a) (b) (c)
The number of zero pixels s(x, y) is found, where x = 1, 2, …,7 and y = 1, 2, …,5.
The normalized pixel density features npd (x, y) are calculated as,
npd( x, y) = (100 - s( x, y)/100) (5.8)
5.4.7 Neural network design
Artificial neural networks are one of the popular techniques used for
classification due to their learning and generalization abilities. They have been
traditionally used for handwritten character recognition application in various other
languages [114, 115] including Devanagari. Out of various architectures, multilayer
perceptron (MLP) is widely used. The MLP is a fully connected network, with an input
layer, hidden layers and an output layer, where every neuron in a layer is connected to
5
0
7
0
1
1
Figure 5.12 Normalized pixel density calculation (a) Original image,
(b) Resized binary image, and (c) Character partitioning
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 79
each and every neuron in the next layer by a weighted link through which the state of the
neuron is transmitted. Each layer has a different activation function, with different
number of neurons in it. Such a network is shown in Figure 5.13. It consists of an input
layer, a hidden layer and an output layer. The feature vector is applied as the input signal
to the neurons in the hidden layer from the input layer. The neurons are connected to
each other by links which are associated with weights. A bias is similar to weight. It acts
exactly as a weight on a connection from a unit whose activation is always one. Each
neuron in the hidden layer includes a nonlinear activation function.
This operation is described by [116]:
),( 1111 ++++ += mmmmm baWfa (5.9)
for m = 0, 1,…, M-1, where M is the number of layers in the network. The neurons in the
first layer receive external inputs or features:
,0 pa = (5.10)
which provides the starting point for the network. The outputs of the neurons in the last
layer are considered the network outputs:
taM = (5.11)
Once the network weights and biases are initialized, the network is ready for training or
learning. The hidden neurons enable the network to learn complex tasks by extracting
progressively more meaningful features from the input vectors. A learning rule is a
procedure for modifying weights and biases of a network. The purpose of the learning
rule is to train the network to perform a pattern recognition task. During training the
weights and biases of the network are iteratively adjusted to minimize error. The training
Figure 5.13 Multilayer perceptron
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 80
process requires a set of examples of proper network behavior, network inputs p and
target outputs t. As each input is applied to the network, the network output is compared
to the target. The error is calculated as the difference between the target output and the
network output. The goal is to minimize the average of the sum of these errors. The
simplest implementation of back propagation learning updates the network weights and
biases in the direction in which the performance function decreases most rapidly, the
negative of the gradient. One iteration of this algorithm can be written as
xk+1 = xk - αk gk (5.12)
where xk is a vector of current weights and biases, gk is the current gradient, and αk is the
learning rate.
There are two different ways in which this gradient descent algorithm can be
implemented: incremental mode and batch mode. In incremental mode, the gradient is
computed and the weights are updated after each input is applied to the network. In batch
mode, all the inputs are applied to the network before the weights are updated. In batch
mode the weights and biases of the network are updated only after the entire training set
has been applied to the network. The gradients calculated at each training example are
added together to determine the change in the weights and biases. The batch steepest
descent training function is used for training the network. The weights and biases are
updated in the direction of the negative gradient of the performance function. The
training parameters associated with this type of training are:
• epochs
• goal
• learning rate
The learning rate is multiplied times the negative of the gradient to determine the
changes to the weights and biases. The larger the learning rate, the bigger the step. If the
learning rate is made too large, the algorithm becomes unstable. If the learning rate is set
too small, the algorithm takes a long time to converge. The training stops if the number
of iterations exceeds epochs, or if the performance function drops below goal.
5.5 Experiments and Results
The sufficient number of samples in the database ensures that the final
classification accuracy is independent of the structural classification results. There in no
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 81
limitation on the size of the database. The parameters for the structural classification can
be varied as per the writing style of the writer in case of fixed writer systems. The
parameters for mid bar and end bar detection are selected after testing about one third of
the characters in the database.
• Pre-processing results
(a) (b) (c) (d)
Pre-processing includes image cleaning and binarization. Figure 5.14 gives the
results for the entire pre-processing algorithms applied. Here the original image is shown
in Figure 5.14 (a). The image binarization result obtained using 0.85 threshold value is
shown in Figure 5.14 (b). But it results into breaking of the character at the curvature.
Hence an averaging filter of 3x3 mask is applied to bridge the gap as seen in Figure 5.14
(c). Finally, image opening removes the unwanted isolated stroke having the area lesser
than 40 pixels as shown in Figure 5.14 (d).
(a)
(b) (c)
Figure 5.14 Preprocessing results for a character (a) Original image, (b)
Binarized image, (c) Binarization after averaging, and (d) Opening result
Figure 5.15 Preprocessing results for a word (a) Original word, (b)
Binarization result, and (c) Binarization after Averaging result
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 82
(a) (b)
The pre-processing results for a word are shown in Figure 5.15. Figure 5.15 (a)
shows the original image of the word. Figure 5.15 (b) shows the binarization result and
Figure 5.15 (c) shows the result of binarization after averaging, which gives a smooth
boundary of the image. Since there are no isolated dots or strokes in this image, the
opening operation gives the same result as Figure 5.15 (c).The pre-processing result on
applying all the mentioned algorithms for a document shown in Figure 5.16 (a) is given
in Figure 5.16 (b).
• Segmentation result
Figure 5.17 shows the segmentation results. At first, the header line in the pre-
processed word image in Figure 5.17 (a) is removed as shown in Figure 5.17 (b).
(a) (b) (c)
Figure 5.16 Preprocessing results for a document
(a) Original image, (b) Pre-processing result
Figure 5.17 Segmentation results (a) Pre-processed image,
(b) header line removal, (c) character segmentation
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 83
Then the characters are segmented, cropped as shown in Figure 5.17 (c) and
passed to structural classification stage. Similarly, the pre-processed characters (without
header line) in the document shown in Fig. 5.16 are segmented, cropped and given to the
structural classification stage.
• Structural classification result
The pre-processed character is classified into one of 24 classes as discussed
earlier. The characters in the document shown in Figure 5.16 are segmented and
classified into various classes based upon their structural parameters as shown in Table
5.3.
Table 5.3 Structural classification of characters in the document in Figure 5.16
Sr. no. Class Characters classified
1. NBE/00
2. NBE/01
3. NBE/11
4. NBNE/00
5. NBNE/01
6. NBNE/10
7. NBNE/11
8. MBE/00
9. MBE/01
10. MBNE/01
11. EBE/01
12. EBE/11
13. EBNE/01
14. EBNE/11
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 84
As seen in row 1 of the table, three characters got classified to NBE/00 (No bar
enclosed/00). This means that in all these characters, the vertical bar is absent, the
enclosed region is present and there are no endpoints in the lower quadrants hence Class
00. In row 2, the global features remain the same as in row 1, but there is an endpoint
found in quadrant 3 and hence NBE character with Class 01. Similar reasoning can be
applied to all other characters in the table to find their final class out of 24 classes.
The handwritten characters may take different shapes as per the writing style of
the writer. This may result in classification of the same character to different classes.
Figure 5.18 shows character ‘la’ (la) written by four different writers. Here, the first
sample is classified to the class ‘NBNE/11’ (No vertical bar, no enclosed region, end
points in both 3rd
and 4th
quadrants), the second is classified to ‘NBE/11’(No vertical bar,
enclosed region present, end points in both 3rd
and 4th
quadrants), the third belongs to
‘EBNE/11 (End bar present, no enclosed region, end points in both 3rd
and 4th
quadrants)’ while the forth is classified to ‘EBE/11’ (End bar present, enclosed region
present, end points in both 3rd
and 4th
quadrants). Table 5.4 shows number of characters
classified in each of the structural class.
The above example shows that a character belongs to any class based upon the
writing style and the features detected. Table 5.4 shows all the classes in the database,
number of characters in that class and the characters in it. In this table, same character
can be found in many classes because of the variations in the writing style of the writers.
Table 5.4 Characters in the structural class
Sr.
no.
Class No. of
characters
classified
Characters classified
1. NBE/00
6
(a)
Figure 5.18 Character ‘lalalala’ (la) assigned to different structural classes
(a) NBNE/11, (b) NBE/11, (c) EBNE/11, (d) EBE/11
(b) (c) (d)
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 85
2. NBE/01
24
3. NBE/10
5
4.
NBE/11
16
5. NBNE/00
3
6. NBNE/01
21
7. NBNE/10
8
8. NBNE/11
27
9. MBE/00
3
10. MBE/01
5
11. MBE/10 2
12.
MBE/11 5
13. MBNE/00 1
14. MBNE/01
2
15. MBNE/10 0 ____
16. MBNE/11
1
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 86
17. EBE/00
2
18. EBE/01
16
19. EBE/10
1
20. EBE/11
14
21. EBNE/00
1
22. EBNE/01
18
23. EBNE/10
3
24. EBNE/11
19
• Feature extraction results
The characters are stored in the database after the two stage structural
classification discussed previously. The character is further resized to a fixed size. The
resized character is used for feature extraction. Here Euclidean distance features, Radon
features and normalized pixel density features are extracted. The feature extraction
technique, resize factor and number of features are given in Table 5.5.
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 87
Table 5.5 Feature extraction parameter settings
Sr.
no.
Feature extraction
technique
Resize factor No. of features
1. Euclidean distance
transform
16x16 32 (16 in horizontal and 16 in vertical
directions)
2. Radon transform 16x16 81 (27 features in each 0°, 45° and 90°
respectively)
3. Normalized pixel
density features
70x50 35 (character partitioning into 35 non-
overlapping blocks of size 100x100)
Figure 5.19 shows the Euclidean distance features for the image in both
horizontal and vertical direction. 16 features in horizontal and 16 features in vertical
direction are appended to get 32 features in all.
Euclidean features for character 'ha'
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Vertical direction
Horizontal direction
(a) (b)
Euclidean features for character 'pha'
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Vertical direction
Horizontal direction
(a) (b)
Figure 5.20 shows the Radon distance features for the respective images. 27
features given by R in each of the three directions namely, 0°, 45° and 90° are appended
to get 81 features in all.
Figure 5.19 Euclidean distance features a) Original image, b) its
features in vertical and horizontal direction
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 88
Radon features for character 'ha'
0
2
4
6
8
10
12
14
16
18
20
-13
-11 -9 -7 -5 -3 -1 1 3 5 7 9
11
13
xpR
0 degrees
45 degress
90 degress
(a) (b)
Radon features for character 'pha'
0
2
4
6
8
10
12
14
16
18
20
-13
-11 -9 -7 -5 -3 -1 1 3 5 7 9
11
13
xp
R
0 degrees
45 degrees
90 degrees
(a) (b)
Finally the normalized pixel density features for the respective character images
are shown in Figure 5.21. Here 35 features are obtained by partitioning the 70x50 sized
characters into 35 non-overlapping blocks and counting the number of zeros in them. All
these features are applied to the neural network separately, built for each of the 24
classes and the results are analyzed.
Figure 5.20 Radon features a) Original image, b) its features in 0, 45
and 90 degrees
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 89
(a) (b) (a) (b)
• Neural network results
Table 5.6 Neural network parameter settings
Parameters Values
Number of inputs With Euclidean features: 32
With Radon features:81
With Normalized pixel density features: 35
Number of hidden layers 1
Number of neurons in hidden layer Equal to number of inputs
Hidden layer activation function Log-sigmoid transfer function
Number of neurons in output layer Number of characters in the structural class
Output layer activation function Linear
Learning rate 0.5
Goal 0.001
Error function mse
Maximum number of epoch 1000
Training algorithm Gradient descent backpropagation
A separate neural network is built for all the 24 classes. Thus 24 networks are
built in all. Each network is trained using two third of the characters in the database,
while one third of the characters are used for testing. During testing, that neural network
Figure 5.21 Normalized pixel density features a) Original
image, and b) its features
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 90
out of 24 is used, to which the character under test belongs after structural classification.
The number of input neurons equal to the number of features extracted. The number of
output neurons equal to the characters in the structural class. Thus the number of output
neurons differs in each of the 24 classes. The parameters setting of the neural network
are given in Table 5.6.
Results showed that some characters which exhibit large shape variations get
classified into many structural classes. For example, the samples of a character got
classified into 13 different classes out of 24 during database creation. The recognition
rate given by such characters is lesser comparatively as their shape shows wide
variations and as a result, the number of samples available in the database is less. The
characters with lesser shape variations give more than 98% recognition rate.
Table 5.7 Recognition Performance without structural classification
Sr.
no.
Feature
extraction
technique
Resize
factor
Number
of
features
Recognition
technique
Training
time
(sec)
Recognition
rate (%)
1. Radon features 16x16 81 Neural
network
135.15 81.88
2. Euclidean feature 16x16 32 Neural
network
194.50 82.64
3. Normalized pixel
density feature
70x50 35 Neural
network
160.97 84.00
Table 5.8 Recognition performance with structural classification
Sr.
no.
Feature
extraction
technique
Resize
factor
Number
of
features
Recognition
technique
Training
time
(sec)
Recognition
rate (%)
1. Radon feature 16x16 81 Neural
network
36.05 89.00
2. Euclidean
features
16x16 32 Neural
network
74.57 90.14
3. Normalized pixel
density feature
70x50 35 Neural
network
54.63 91.54
A Neural Network Based Handwritten Character Recognition for Marathi Script
5. Handwritten Marathi Character Recognition using Neural Network 91
The recognition performance using various features and the multilayer perceptron
without structural classification is presented in Table 5.7. It also shows the resize factor,
number of features obtained with each technique, time required to train the huge single
network created with the number of inputs equal to number of features extracted as per
the recognition rate and number of outputs equal to 39.
Table 5.8 gives the recognition performance with structural classification
resulting into 24 classes and 24 neural networks in turn. The neural networks have the
inputs equal to the features extracted again. But the outputs equal to the number of
characters in the respective structural class. The table indicates that the implementation
of structural classification improves the structural classification considerably. The time
required to test a character is approximately 0.05 sec, when tested on an Intel Core 2 Duo
CPU running on 2GHz with 2 GB RAM.
• Recognized character results
The index of the output neuron with maximum value is used to find the
recognized character. The recognized character is displayed as text using Kiran font.
5.6 Concluding remarks
The recognition of handwritten Marathi characters is quite a challenging task.
The single stage recognition technique fails to give satisfactory performance. Hence a
multistage recognition system is designed to meet the challenges. This system improves
the performance considerably over single stage classification system as indicated in
Table 5.7 and Table 5.8. In case of Radon features, the recognition performance
improves by 8.26%, while in case of Euclidean features, it improves by 6.36% and in
case of Normalized pixel density features it improves by 7.54%. Since the Radon
features are not invariant to the orientation of the character, the recognition rate with
radon features is the least in both the classification systems. The normalized pixel
density features yields the highest recognition accuracy of 91.54% in multistage
recognition system.