MAIN PPT2

7/31/2019 MAIN PPT2

1/31

UNICODE OCR

N. Manasa (07T81A1220)A.Sruthi (07T81A1249)

B.Venkatalaxmi (07T81A1256)

N.Shanthi kishore (07T81A1239)

By

Under the Guidance of

R.Ravinder reddy

7/31/2019 MAIN PPT2

2/31

Abstract

IntroductionSystem Analysis

System Requirements

System Design

7/31/2019 MAIN PPT2

3/31

Abstract

Character degradation is a big problem for machine printed character recognition.

Two main reasons for degradation are extrinsic image degradation such as blurring

and low image dimension, and intrinsic degradation caused by font variations. A

recognition method that combines two complementary classifiers is proposed in

this paper.

The local feature based classifier extracts the local contour direction changes,which is effective for character patterns with less structure deterioration.

The global feature based classifier extracts the texture distribution of the character

image, which is effective when the character structure is hard to discriminate. The

two complementary classifiers are combined by candidate fusion in a coarse-to-fine

style. Experiments are carried on degraded Chinese character recognition.

The results prove the effectiveness of our method.

7/31/2019 MAIN PPT2

4/31

Introduction

The process of converting scanned images of machine-printed or handwritten text

(numerals, letters, and symbols) into a computer-processable format; also known as

optical character recognition (OCR).

A typical OCR system contains three logical components: an image scanner,

OCR software and hardware, and an output interface.The image scanner optically captures text images to be recognized. Text images

are processed with OCR software and hardware.

The process involves three operations: document analysis (extracting individual

character images), recognizing these images (based on shape), and contextual

processing (either to correct misclassifications made by the recognition algorithm

or to limit recognition choices).

The output interface is responsible for communication of OCR system results to

the outside world.

7/31/2019 MAIN PPT2

5/31

OCR is the mechanical or electronic translation of scanned images of handwritten,

typewritten or printed text into machine-encoded text.

There are many different approaches to optical character recognition problem. One

of the most common and popular approaches is based on neural networks, which

can be applied to different tasks, such as pattern recognition, time series prediction,

function approximation, clustering, etc.

As more and more convenient document capture devices emerge in the market, the

demands for degraded character recognition increase dramatically. Many research

results are published in recent years for degraded character recognition.

The solution to the intrinsic shape degradation can be well solved by the

nonlinear normalization and the block based local feature extraction used in

handprint character recognition.

7/31/2019 MAIN PPT2

6/31

As for the extrinsic image degradation, a comprehensive study on

the image degradation model on Latin character set is presented in.

There are basically two approaches for the extrinsic degradation:

the local based grayscale feature extraction and the global based

texture feature extraction . While most of the methods focus onhow to solve one of the above problems, few papers robust against

the combination of the both.

The recognition confidence and estimated image blurring level areused to combine a local feature based classifier with a global

feature based classifier. However, the hierarchical recognition

structure cannot efficiently handle the mixture cases in real

environment.

7/31/2019 MAIN PPT2

7/31

System Analysis:

oExisting System

oDisadvantage

oProposed System

oAdvantage

7/31/2019 MAIN PPT2

8/31

Existing System

In the existing system the Character degradation is a big problemfor machine printed character recognition.

Two main reasons for degradation are the intrinsic degradation

caused by character shape variation and the extrinsic imagedegradation such as blurring and low image dimension.

A mixture of the above factors makes degraded character

recognition a difficult task.

7/31/2019 MAIN PPT2

9/31

7/31/2019 MAIN PPT2

10/31

Proposed System

In the proposed system intrinsic problem and extrinsic

problems can be solved by the complementary classifiermethod which consisting of local and global features

two classification processes are executed in parallel under a

coarse-to fine recognition structure.

In the proposed system the Unicode OCR method uses hybrid

recognition algorithm is proposed to solve the problem in the

existing system.

It is used to find the characters fonts, its size, its width and its

height.

It mainly employees an approach called Neural Networks.

7/31/2019 MAIN PPT2

11/31

Neural networks

Neural Networks usually called Artificial Neural Networks.

It is a mathematical model or computational model that isinspired by the structure or functional aspects of biological

neural networks.

A neural network consists of an interconnected group of

artificial neurons, and it processes information using aconnectionist approach to computation.

An artificial neuron receives a number of inputs either from

original data, or from the output of other neurons in the neural

network. Each input comes via a connection that has a strength or

weight. These weights correspond to synaptic efficiency in a

biological neuron.

7/31/2019 MAIN PPT2

12/31

In this it consists of 3 phases which are Training set, Loading, and Recognition

1. Training set: In the training step we teach the network to respond with desiredoutput for a specified input. For this purpose each training sample is

represented by two components: possible input and the desired network's

output for the input.

2. Loading : After the training step is done, we can give an arbitrary input to the

network and the network will form an output, from which we can resolve a

pattern type presented to the network.

3. Recognition: Recognition will be done only if errors will be less i.e nothing

but On each learning epoch all samples from the training set are presented to

the network and the summary squared error is calculated. When the error

becomes less than the specified error limit, then the training is done and the

network can be used for recognition. 92 percent accuracy in recognition proves

the good performance of the proposed system.

7/31/2019 MAIN PPT2

13/31

Proposed System(cont..) How to recognize ?

We need to input to the trained network and get itsoutput. Then we should find an element in the output vectorwith the maximum value. The element's number will point usto the recognized pattern:

13

7/31/2019 MAIN PPT2

14/31

Advantage

In our system classifiers combination was implemented

using fusion to solve the problem of degradation.

The result shows that the proposed method is much more

robust than any of the individual classifier

Normalization technique was used which provide exact

pixel identification and high speed in character recognition .

7/31/2019 MAIN PPT2

15/31

REQUIREMENTS

DESIGNS

SCREEN SHOTS

7/31/2019 MAIN PPT2

16/31

7/31/2019 MAIN PPT2

17/31

DFD DIGRAMS

17

Source image Processing usingartificial neuralnetwork

Characterreorganization(paragraphs, lines,words, characters)

Unicode mapping ClassificationFeature extraction (characterheight, width, horz lines,vertical lines, slope lines)

Recognized text

7/31/2019 MAIN PPT2

18/311818

7/31/2019 MAIN PPT2

19/31

Use-case diagram

7/31/2019 MAIN PPT2

20/31

Class diagram

7/31/2019 MAIN PPT2

21/31

SEQUENCE DIAGRAM

7/31/2019 MAIN PPT2

22/31

Screen shots

7/31/2019 MAIN PPT2

23/31

7/31/2019 MAIN PPT2

24/31

Load train set

i k

7/31/2019 MAIN PPT2

25/31

Saving network

7/31/2019 MAIN PPT2

26/31

Loading network and the image

7/31/2019 MAIN PPT2

27/31

7/31/2019 MAIN PPT2

28/31

7/31/2019 MAIN PPT2

29/31

Future Enhancements

The future research directions include better coarse

classification and the robustness evaluation under other

degradation types.

Experiments are carried on degraded Chinese characterrecognition.

Even we can implement our system to detect the degraded

characters of other languages and to convert them into required

language.

7/31/2019 MAIN PPT2

30/31

7/31/2019 MAIN PPT2

31/31

THANK YOU

MAIN PPT2

Documents

Transcript of MAIN PPT2