Aocr Hmm Presentation

AOCRArabic Optical Character Recognition

ABDEL RAHMAN GHAREEB KASEM

ADEL SALAH ABU SEREEA

MAHMOUD ABDEL MONEIM ABDEL MONEIM

MAHMOUD MOHAMMED ABDEL WAHAB

Main contents Introduction to AOCR

Feature extraction

Preprocessing

AOCR system implementation

Experimental results

Conclusion & future directions

Applications

Introduction

Why AOCR? What is OCR? What is the problem in AOCR? What is the solution?

Pre-Segmentation. Auto-Segmentation.


Feature extraction

Preprocessing




Applications

Preprocessing1. Image rotation

2. Segmentation. Line segmentation. Word segmentation

3. Image enhancement

PreprocessingProblem of tilted image

1. Image rotation

Preprocessing 1. Process rotated image

Rotate by -1 degree


Rotate by -2 degree


Rotate by -3 degree


Rotate by -4 degree


Clear zeros

Clear zeros

Mean value0.2*Mean value


Threshold effect


GRAY Scale Vs. Black/White

in Rotation process

Original image

Gray scale

Black/White

Preprocessing

1. Process rotated image



Preprocessing

2. Segmentation.

What is the Segmentation process?

Why we need segmentation in Arabic OCR?

What is the algorithm used in Segmentation?

2. Segmentation.Preprocessing Line level segmentation

2. Segmentation.Preprocessing Word level segmentation

2. Segmentation.Preprocessing

Preprocessing

1. Process rotated image



Preprocessing


3. Image enhancement Preprocessing Noise Reduction

By morphology operations

Very important notation:

Apply Image Enhancement operations on small images not large image

بسم ال الرحمن الرحيم

ال أكبر ال أكبر ال أكبر

ل إله ال ال

وال أكبر

بسم ال الرحمن الرحيم

ال أكبر ال أكبر ال أكبر

ل إله ال ال

وال أكبر

Large Image

X

Small Images


Feature extraction

Preprocessing




Applications

FeatureFeature ExtractionExtraction

اكبر ال

• Feature Selection

Suitable for HMM technique ( i.e. window scanning based features).

Suitable for word level recognition (not character). To retain as much information as possible. Achieve high accuracy with small processing time.

we select features such that:

Satisfaction of the previous points

Each feature designed such that, it deals with the principle of slice technique

محمد رسول ال

n1

n3

n4

n6

n5

n2

n7

Feature vector

Features deal with words not single character, where algorithm is based on segmentation free concept.

We avoid dealing with structural features as it requires hard implementation, in addition large processing time.

To achieve high accuracy with lowest processing time, we use simple features & apply overlap between slices to ensure smoothing of extracted data.

الصلةoverlap

(1)Background Count

Calculate vertical distances (in terms of pixels) of background regions, where each background region is bounded by two foreground regions.

النجاح background

Foreground

Feature vector

d1

d3

d2

d3 d2 d1

Feature vector ofthe selected slide

Two pixels with on overlap

:Example

Feature Figure

Baseline Count (2)

calculate number of black pixels above baseline (with [+ve] value) & number of black pixels below baseline (with [-ve] value) in each slide.

:Example


Baseline

Thinning

No. of black pixels(above baseline (X1

No. of black pixels(below baseline (X2

X2 X1 Feature vector

Feature Figure

Centroid (3)

For each slide we get its Centroid (cx, cy) so the feature vector contains sequence of centroids.

:Example

Cx Cy Feature vector


Cross Count (4)

For each slide we calculate number of crossing from background (white) to foreground (black).

:Example

2 Feature vector


Euclidean distance (5)

We get the average foreground pixel in region above & below baseline, then Euclidean distance is measured from baseline to the average points above & below baseline, with +ve value for point above and –ve value for point below.

Thinning

Baseline

Euclidean distanceabove baseline D1

Euclidean distancebelow baseline D2

One pixel without overlap

D2 D1 Feature vector

:Example

Feature Figure

Horizontal histogram (6)

For each slide we get its horizontal histogram (horizontal summation for rows in the slide).

Calculate HistogramFour pixels with one overlap

:Example

Feature Figure

Vertical histogram (7)

for each slide we get its vertical histogram (vertical summation for columns).


Two pixels with one overlap

:Example

Feature Figure

)Weighted vertical histogram (8

Exactly as the previous feature but the only difference is that, we multiply each row in the image by a number (weight), where the weight vector which be multiplied by the whole image takes a triangle shape.

:Example

weight vector

1

1-


Two pixels with one overlap

Feature Figure


Feature extraction

Preprocessing




Applications

Implementation of AOCR BasedHMM Using HTK

Data preparation

Creating Monophone HMMs

Recognizer Evaluation

Data preparation

The Task Grammar

The Dictionary

Recording the Data

Creating the Transcription Files

Coding the Data

The Task Grammar

Isolated AOCR Grammar ----->Mini project

Connected AOCR Grammar ---->Final project

Isolated AOCR Grammar

$name =a1| a2 | a3 | a4 | a5|……………|a28|a29;

( SENT-START <$name> SENT-END )

a1-----> ا a2---> ب a3---> ت a4---> ث

a29---> space

Connected AOCR Grammar

$name =a1| a2 | a3 | a4 |

a5 |……………|a124|a125;

(SENT-START <$name> SENT-END )

a1-----> ا a2---> ـا a11---> ــبــ a23---> ـ جـ a124---> لله a125---> ـــــــ

?Why Grammar

Start

a1

a2

a124

a125

a3

End

?How is it created

Hparse creates it

Grammar

Word Net )Wdnet (

HParse

The Dictionary

Our dictionary is limited???

The Dictionary

Recording the Data

Featureextraction Transformer

(Image)D signal-2

D-1vector

wav.

Creating the Transcription Files

Word level MLF

Phone level MLF

Word level MLF

#! MLF! #"*/1.lab"فصل."*/2.lab"في الفرق بين الخالق والمخلوق."*/3.lab"وما ابراهيم وآل ابراهيم الحنفاء والنبياء فهم."*/4.lab".يعلمون انه ل بد من الفرق بين الخالق والمخلوق..

فصلفي الفرق بين الخالق والمخلوق

وما ابراهيم وآل ابراهيم الحنفاء والنبياء فهميعلمون انه ل بد من الفرق بين الخالق والمخلوق

Phone level MLF

# !MLF !#"lab.1/*"a74a51a88."lab.2/*"a74a108a123a1a86a75a38a77a123

# !MLF !#"lab.1/*"

فصل.

"lab.2/*"في الفرق بين الخالق والمخلوق

."lab.3/*"

وما ابراهيم وآل ابراهيم الحنفاء والنبياء فهم.

"lab.4/*".يعلمون انه ل بد من الفرق بين الخالق والمخلوق

.

Coding the Data

HCOPY

MFCC FilesS0001.mfcS0002.mfcS0003.mfc

..etc

Wave form filesٍٍS0001.wav

S0002.wavS0003.wav

..etc

ConfigurationFile

Script File


Creating Flat Start Monophones

Re-estimation


The first step in HMM training is to define a

prototype model.

The parameters of this model are not important; its purpose is to define the model topology

The Prototype~o <VecSize> 39 <MFCC_0_D_A>

~h "proto"<BeginHMM><NumStates> 5<State> 2<Mean> 390.0 0.0 0.0 . . . . . . . <Variance> 391.0 1.0 1.0 . . . . . . . . <State> 3<Mean> 390.0 0.0 0.0 . . . . . . .<Variance> 391.0 1.0 1.0 . . . . . . . <State> 4<Mean> 390.0 0.0 0.0 . . . . . . . <Variance> 391.0 1.0 1.0 . . . . . . .<TransP> 50.0 1.0 0.0 0.0 0.00.0 0.6 0.4 0.0 0.00.0 0.0 0.6 0.4 0.00.0 0.0 0.0 0.7 0.30.0 0.0 0.0 0.0 0.0<EndHMM>

Initialization Process

Proto

Vfloors

Proto

HCompV

hmm0

Initialized prototype~o <VecSize> 39 <MFCC_0_D_A>~h "proto"<BeginHMM><NumStates> 5<State> 2<Mean> 39-5.029420e+000 1.948325e+000 -5.192460e+000 . . . . . <Variance> 391.568812e+001 1.038746e+001 2.110239e+001 . . . . . <State> 3<Mean> 39-5.029420e+000 1.948325e+000 -5.192460e+000 . . . . . .<Variance> 391.568812e+001 1.038746e+001 2.110239e+001 . . . . . <State> 4<Mean> 39-5.029420e+000 1.948325e+000 -5.192460e+000 . . . . . . .<Variance> 391.568812e+001 1.038746e+001 2.110239e+001 . . . . . . . <TransP> 50.0 1.0 0.0 0.0 0.00.0 0.6 0.4 0.0 0.00.0 0.0 0.6 0.4 0.00.0 0.0 0.0 0.7 0.30.0 0.0 0.0 0.0 0.0<EndHMM>

Vfloors Contents

v varFloor1~

Variance> 39>

1.568812e-001 1.038746e-001 2.110239e-001 . . . . . .

hmmdefs

~o <VecSize> 39

<MFCC_0_D_A>

Initialized proto

Creating initialized Models

a125

a2

a1

Initializedmodel

Creating Macros File

Vfloors file o~ VecSize> 39>

<MFCC_0_D_A>

Vfloors file

Re-estimation Process

Hmmdefsmacros

HERest

InitializedProto

HCompV

Hmmdefsmacros

Training FilesMFc Files

Phones levelTranscription

monophones

Recognition Process

Hvite

Trained Models

Test Files

Word Networkwnet

The dictioarydict

Reconizedwords

Recognizer Evaluation

HResults

ReferenceTranscription

ReconizedTranscription

Accuracy


Feature extraction

Preprocessing




Applications

Experimental Results

Main Problem -1

1-1 1-1 Requirements:Requirements: Connected Character Recognition.Connected Character Recognition.

Multi-sizes.Multi-sizes.

Multi-fonts.Multi-fonts.

Hand Written.Hand Written.

1-2 1-2 Variables:Variables: Tool .Tool .

Method used to train and test.Method used to train and test.

Model Parameters.Model Parameters.

Feature Parameters.Feature Parameters.

Tool:

How it can operate with images?

DiscreteInput images.

(failed)

ContinuousInput a continuouswave form

(Succeeded)

DATA Input to HTK

Isolated Character Recognition -2

2-1 Single Size (16)- Single Font (Simplified 2-1 Single Size (16)- Single Font (Simplified Arabic Fixed).Arabic Fixed).

2-2 Multi-Sizes Character Recognition.2-2 Multi-Sizes Character Recognition.

2-3 Variable Lengths Character Recognition2-3 Variable Lengths Character Recognition.

2-1 2-1 Single Size (16)- Single Font (Simplified Arabic Single Size (16)- Single Font (Simplified Arabic Fixed)Fixed)

Best method.

Best number of states.

Best Widow size.

Best method:

Model for each char. (35 models) Vs Model for each Char. In each position (116 Models)

(Vertical histogram-11 states-window=2.5)

35No. of Models

99.14 %Accuracy

116

100%

Best number of states:

(Vertical histogram-Number of Models=35 -window=2 pixels)

3No. of States

96.55 %Accuracy

11

99.14%

Best Widow size:

(2-D histogram-Number of Models=124-11 states).

97.00%

97.50%98.00%

98.50%

99.00%

99.50%100.00%

100.50%

1.2 1.7 1.5Window size

Accu

racy

2-2 2-2Multi-Sizes Character RecognitionMulti-Sizes Character Recognition

Sizes (12-14-16):

(2-D histogram-Number of Models=124-11 states).

85.00%

90.00%

95.00%

100.00%

105.00%

1.6 1.84 2

Window size

Ac

cu

rac

y

2-3 2-3Variable Lengths Character RecognitionVariable Lengths Character Recognition

Train with different lengths:

Vertical histogram gives Accuracy more than 2-D histogram Vertical histogram-Number of Models=35 -window=2 pixels

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

4.6 3.7 2.8 2.33 1.66

Window size

Ac

cu

rac

y

Make Model for dash: Training:

Train with characters (with out dash) &dash model.

Train with different lengths & dash model.

Train with different lengths & dash model & if the character has a dash at its end we define it as a character model followed by dash model.

(True way).

Make Model for dash:

•Testing:

•Vertical histogram:

failed to recognize the dash model using all methods (recognize it as a space).

• 2-D histogram : for window size =2.6

Accuracy=100%

-3Connected Character Recognition

3-1 Single Size (16)- Single Font (Simplified 3-1 Single Size (16)- Single Font (Simplified Arabic Fixed).Arabic Fixed).

3-2 Parameter Optimization.3-2 Parameter Optimization.

3-3 Multi-Sizes Character Recognition.3-3 Multi-Sizes Character Recognition.

3-4 Fusion by feature concatenation.3-4 Fusion by feature concatenation.

3-1 3-1 Single Size (16)- Single Font (Simplified Single Size (16)- Single Font (Simplified(Arabic Fixed(Arabic Fixed

Best Method: (on a simple experiment (10 words))

The correct way for the word Recognition is to train the character models by (Words or Lines).

Assumptions: Training data: 25-pages (495 lines). Simplified Arabic fixed (font size = 16). Images: 300dpi-black and white. Testing data: 4-pages (74 lines). Feature properties: window=2*frame.

Vertical histogram:

88.00%89.00%

90.00%91.00%92.00%93.00%

94.00%95.00%

6 6.5 7 7.5 8

Window size

Ac

cu

rac

y

2-D histogram:

89.00%90.00%91.00%92.00%93.00%94.00%95.00%96.00%97.00%

4.99 5.33 5.89

Window Size

Ac

cu

rac

y

3-2 3-2 Parameter OptimizationParameter Optimization

Line Level Vs Word Level. optimum number of mixture. optimum number of States. optimum initial transition probability. optimum window Vs frame ratio.

•Line Level Vs Word levelLine Level Vs Word level

Assumptions:

Simplified Arabic fixed (font size = 16). Testing data: same training data. Feature type: (vertical histogram, window=2*frame). Images: 300dpi-black and white.

Line LevelLevel

84.99% Accuracy

Word Level

85.36%

Conclusion:

We will concentrate on the line segmentation instead of word segmentation because of:

The disadvantages of the word segmentation: We have a limitation on the window size because

of its small size. Accuracy decreases with increasing the number

of mixture. The simplicity of the line segmentation than word

segmentation in preprocessing.

•optimum number of mixtureoptimum number of mixture. One dimension features : Training data: 495 lines Testing data: same training data. Feature type:

(Vertical histogram, window=2*frame, window size = 6.5 pixels).

70.00%

75.00%

80.00%

85.00%

90.00%

95.00%

100.00%

1 3 5 10 15

Number of Mixtures

Ac

cu

rac

y

Two dimension features : Training data: 495 lines Testing data: same training data. Feature type:

(2-D histogram, window=2*frame, window size = 5.33 pixels, N= 4)

86.00%

88.00%

90.00%

92.00%

94.00%

96.00%

98.00%

1 5 7 10

Number of Mixtures

Ac

cu

rac

y

•optimum number of Statesoptimum number of States

One dimension features :

80.00%

85.00%

90.00%

95.00%

100.00%

6 8 11 13

Number of States

Acc

urac

y

Two dimension features : Assumptions: as previous Results:

8Number of

states

92.52%Accuracy =

11

95.02%

•optimum initial transition probability optimum initial transition probability

Almost Equally likely probabilities. (Failed)

Random Probabilities ……..very bad.

Each state may still in it self or go to the next state only, probability that state sill in it self higher than probability to go to the next state…………(Succeed).

0 1 0 0 00 0.7 0.3 0 00 0 0.6 0.4 0------------------------------and so on.

•optimum window Vs frame ratiooptimum window Vs frame ratio

• Assumptions: as previous in (2-D feature)

• Results:

0.40.6Overlapping

Ratio =

91.70%92.52%Accuracy =

0.5

93.92%

Maximum Accuracy for all features:

Max. AccuracyFeature Type

95.96%2-D histogram

87.16%Euclidean distance

91.51%Cross count

95.75%Weighted histogram

89.70%Baseline count

91.61%Background count

Vertical histogram 96.97%

3-3 3-3Multi-Sizes Character RecognitionMulti-Sizes Character Recognition

Resizing the test data only: Training data: Simplified Arabic fixed-font size =16. Testing data:

Simplified Arabic fixed. Font size = 12-16-18 (After resize). 60 lines.

Feature Type: Vertical histogram

1814Font size

76.21%79.74%Accuracy

16

96.97%

Resizing the training and test data: Training data:

• Simplified Arabic fixed.• Font size = 14-16-18 • (After resize).• (324 * 3) lines.

Testing data:• (324 * 3) lines.• Same as training.

Feature Type: Vertical histogram

Accuracy = 92.15%

3-4 F3-4 Feature concatenationeature concatenation

Concatenates vertical histogram and 2-D histogram.

44Scale vertical

histogram)=

4.25.57Window size =

69.02%77.17%Accuracy =

No scale

5

84.09%


Feature extraction

Preprocessing




Applications

Future works

Improving printed text system: Data base: increasing its size to support Multi-

sizes and Multi-fonts. Preprocessing improvements:

Improving the image enhancement to solve the problem of noisy pages.

Develop a robust system to solve the problems that depends on the nature of input pages (delete frames and borders and pictures and tables…..etc).

Search for new features and combine between them to

improve the accuracy.

Training and testing improvements: Tying the models. Using Adaptation supported by HTK-tool that may

improve the (Multi-size) system (size independent). Using tri-phones technique to solve the problems of

overlapping.

Improve the time response (implement all pre-processing programs by C++).

Increasing the accuracy by feature fusion.

Build the Multi-Language system (Language independent system).

Develop the hand written system, especially because HMM can attack this problem efficiently.

Develop the ON-Line system.


Feature extraction

Preprocessing




Applications

Automatic Form Recognition

Check Bank Reading

بنــك مصــر: ..........................شيك رقم

: .................اسم المصرف اليه

: ..................المبلغ بالحروف: ................ المبلغ بالرقام

امضاء ...................

Digital libraries :

Where all books, magazines, newspapers…etc can be stored as a softcopy on PCs & CDs.

بسم ال

Transcription of historical archives & "non-death" of paper

Where we can store all archived papers & documents as a softcopy files.

بسم ال

Aocr Hmm Presentation

Documents

Transcript of Aocr Hmm Presentation