MAIN PPT2

download MAIN PPT2

of 31

Transcript of MAIN PPT2

  • 7/31/2019 MAIN PPT2

    1/31

    UNICODE OCR

    N. Manasa (07T81A1220)A.Sruthi (07T81A1249)

    B.Venkatalaxmi (07T81A1256)

    N.Shanthi kishore (07T81A1239)

    By

    Under the Guidance of

    R.Ravinder reddy

  • 7/31/2019 MAIN PPT2

    2/31

    Abstract

    IntroductionSystem Analysis

    System Requirements

    System Design

  • 7/31/2019 MAIN PPT2

    3/31

    Abstract

    Character degradation is a big problem for machine printed character recognition.

    Two main reasons for degradation are extrinsic image degradation such as blurring

    and low image dimension, and intrinsic degradation caused by font variations. A

    recognition method that combines two complementary classifiers is proposed in

    this paper.

    The local feature based classifier extracts the local contour direction changes,which is effective for character patterns with less structure deterioration.

    The global feature based classifier extracts the texture distribution of the character

    image, which is effective when the character structure is hard to discriminate. The

    two complementary classifiers are combined by candidate fusion in a coarse-to-fine

    style. Experiments are carried on degraded Chinese character recognition.

    The results prove the effectiveness of our method.

  • 7/31/2019 MAIN PPT2

    4/31

    Introduction

    The process of converting scanned images of machine-printed or handwritten text

    (numerals, letters, and symbols) into a computer-processable format; also known as

    optical character recognition (OCR).

    A typical OCR system contains three logical components: an image scanner,

    OCR software and hardware, and an output interface.The image scanner optically captures text images to be recognized. Text images

    are processed with OCR software and hardware.

    The process involves three operations: document analysis (extracting individual

    character images), recognizing these images (based on shape), and contextual

    processing (either to correct misclassifications made by the recognition algorithm

    or to limit recognition choices).

    The output interface is responsible for communication of OCR system results to

    the outside world.

  • 7/31/2019 MAIN PPT2

    5/31

    OCR is the mechanical or electronic translation of scanned images of handwritten,

    typewritten or printed text into machine-encoded text.

    There are many different approaches to optical character recognition problem. One

    of the most common and popular approaches is based on neural networks, which

    can be applied to different tasks, such as pattern recognition, time series prediction,

    function approximation, clustering, etc.

    As more and more convenient document capture devices emerge in the market, the

    demands for degraded character recognition increase dramatically. Many research

    results are published in recent years for degraded character recognition.

    The solution to the intrinsic shape degradation can be well solved by the

    nonlinear normalization and the block based local feature extraction used in

    handprint character recognition.

  • 7/31/2019 MAIN PPT2

    6/31

    As for the extrinsic image degradation, a comprehensive study on

    the image degradation model on Latin character set is presented in.

    There are basically two approaches for the extrinsic degradation:

    the local based grayscale feature extraction and the global based

    texture feature extraction . While most of the methods focus onhow to solve one of the above problems, few papers robust against

    the combination of the both.

    The recognition confidence and estimated image blurring level areused to combine a local feature based classifier with a global

    feature based classifier. However, the hierarchical recognition

    structure cannot efficiently handle the mixture cases in real

    environment.

  • 7/31/2019 MAIN PPT2

    7/31

    System Analysis:

    oExisting System

    oDisadvantage

    oProposed System

    oAdvantage

  • 7/31/2019 MAIN PPT2

    8/31

    Existing System

    In the existing system the Character degradation is a big problemfor machine printed character recognition.

    Two main reasons for degradation are the intrinsic degradation

    caused by character shape variation and the extrinsic imagedegradation such as blurring and low image dimension.

    A mixture of the above factors makes degraded character

    recognition a difficult task.

  • 7/31/2019 MAIN PPT2

    9/31

  • 7/31/2019 MAIN PPT2

    10/31

    Proposed System

    In the proposed system intrinsic problem and extrinsic

    problems can be solved by the complementary classifiermethod which consisting of local and global features

    two classification processes are executed in parallel under a

    coarse-to fine recognition structure.

    In the proposed system the Unicode OCR method uses hybrid

    recognition algorithm is proposed to solve the problem in the

    existing system.

    It is used to find the characters fonts, its size, its width and its

    height.

    It mainly employees an approach called Neural Networks.

  • 7/31/2019 MAIN PPT2

    11/31

    Neural networks

    Neural Networks usually called Artificial Neural Networks.

    It is a mathematical model or computational model that isinspired by the structure or functional aspects of biological

    neural networks.

    A neural network consists of an interconnected group of

    artificial neurons, and it processes information using aconnectionist approach to computation.

    An artificial neuron receives a number of inputs either from

    original data, or from the output of other neurons in the neural

    network. Each input comes via a connection that has a strength or

    weight. These weights correspond to synaptic efficiency in a

    biological neuron.

  • 7/31/2019 MAIN PPT2

    12/31

    In this it consists of 3 phases which are Training set, Loading, and Recognition

    1. Training set: In the training step we teach the network to respond with desiredoutput for a specified input. For this purpose each training sample is

    represented by two components: possible input and the desired network's

    output for the input.

    2. Loading : After the training step is done, we can give an arbitrary input to the

    network and the network will form an output, from which we can resolve a

    pattern type presented to the network.

    3. Recognition: Recognition will be done only if errors will be less i.e nothing

    but On each learning epoch all samples from the training set are presented to

    the network and the summary squared error is calculated. When the error

    becomes less than the specified error limit, then the training is done and the

    network can be used for recognition. 92 percent accuracy in recognition proves

    the good performance of the proposed system.

  • 7/31/2019 MAIN PPT2

    13/31

    Proposed System(cont..) How to recognize ?

    We need to input to the trained network and get itsoutput. Then we should find an element in the output vectorwith the maximum value. The element's number will point usto the recognized pattern:

    13

  • 7/31/2019 MAIN PPT2

    14/31

    Advantage

    In our system classifiers combination was implemented

    using fusion to solve the problem of degradation.

    The result shows that the proposed method is much more

    robust than any of the individual classifier

    Normalization technique was used which provide exact

    pixel identification and high speed in character recognition .

  • 7/31/2019 MAIN PPT2

    15/31

    REQUIREMENTS

    DESIGNS

    SCREEN SHOTS

  • 7/31/2019 MAIN PPT2

    16/31

  • 7/31/2019 MAIN PPT2

    17/31

    DFD DIGRAMS

    17

    Source image Processing usingartificial neuralnetwork

    Characterreorganization(paragraphs, lines,words, characters)

    Unicode mapping ClassificationFeature extraction (characterheight, width, horz lines,vertical lines, slope lines)

    Recognized text

  • 7/31/2019 MAIN PPT2

    18/311818

  • 7/31/2019 MAIN PPT2

    19/31

    Use-case diagram

  • 7/31/2019 MAIN PPT2

    20/31

    Class diagram

  • 7/31/2019 MAIN PPT2

    21/31

    SEQUENCE DIAGRAM

  • 7/31/2019 MAIN PPT2

    22/31

    Screen shots

  • 7/31/2019 MAIN PPT2

    23/31

  • 7/31/2019 MAIN PPT2

    24/31

    Load train set

    i k

  • 7/31/2019 MAIN PPT2

    25/31

    Saving network

  • 7/31/2019 MAIN PPT2

    26/31

    Loading network and the image

  • 7/31/2019 MAIN PPT2

    27/31

  • 7/31/2019 MAIN PPT2

    28/31

  • 7/31/2019 MAIN PPT2

    29/31

    Future Enhancements

    The future research directions include better coarse

    classification and the robustness evaluation under other

    degradation types.

    Experiments are carried on degraded Chinese characterrecognition.

    Even we can implement our system to detect the degraded

    characters of other languages and to convert them into required

    language.

  • 7/31/2019 MAIN PPT2

    30/31

  • 7/31/2019 MAIN PPT2

    31/31

    THANK YOU