Form Reader Project Report

download Form Reader Project Report

of 13

Transcript of Form Reader Project Report

  • 8/13/2019 Form Reader Project Report

    1/13

    Form Reader Project ReportAndree Ang Kisjanto Surya, ICT Batch 2007 - Image Processing Assignment

    1 IntroductionThis document is written to accomplish the grading requirement for Image Processing subject. The roleof this document is as a personal report for the Form Reader Project given to the students of ICT 2!in Image Processing subject. The objective of the Form Reader Project is to design and implement aform reader application "#CR and #$R% utili&ing image processing techniques written from scratch.

    In this document' the design of the application will be described in detail. The test result against theimplemented solution will be presented. (nown issues will be mentioned and possible solutions to fi)e)isting problems will be proposed.

    2 Application DesignThe design of the application can be presented in * sections+ form design' form preprocessing' #pticalCharacter Recognition "#CR%' and #ptical $ar, Reader "#$R%.

    2.1 Form Design

    In this project' the form to be read is designed to be li,e the simplified version of the official answersheet of national final e)amination "-/% in Indonesia' with #CR capabilities. The design of the formcan be seen in Illustration 0.

    Illustration ! "orm design

  • 8/13/2019 Form Reader Project Report

    2/13

    In this design' the name should be written in the provided name bo)es and will be e)tracted b1 theapplication using #CR. The answers should be mar,ed b1 dar,ening the chosen answers "' ' C' 3'or 4 for each number% and will be e)tracted using #$R.

    In order to handle location shifting in the raw input image' 5 corners "upper left' upper right' andbottom left% are mar,ed with cross signs to indicate the area that should be read b1 the application.

    2.2 Form Preprocessing

    The form preprocessing step involves image enhancement and preparations required in order totransform the raw input image into a form suitable for further processing "i.e. #CR and #$R%. The ,e1tas,s involved in the form preprocessing step is described in Illustration 2.

    2.2.1 Binarization

    The binari&ation process is performed using a simple global thresholding scheme "6on&ales and 7oods20' p. !8% with a static threshold value of 2 "the best value obtained from e)periment%. Thisbinari&ation process using this scheme has several wea,nesses and can be improved. Please see section* for further detail.

    2.2.2 Inversion

    It is not uncommon to consider the value of 0 as 9e)ists: or 9foreground: and the value of as 9note)ists: or 9bac,ground:' thus this scheme is used in this application. The purpose of inversion is toconverts the raw input image to adhere to this scheme.

    2.2.3 Find Corners and Remove Border

    In order to remove unnecessar1 border in the raw input image' the coordinates of the corner mar,s have

    Illustration 2! "orm #re#rocessing #rocedures

  • 8/13/2019 Form Reader Project Report

    3/13

    to be e)tracted. This can be done b1 anal1&ing the hori&ontal and vertical histogram of the corners ofthe raw input image. In this application' the si&e of the corners to be anal1&ed is ;); "pi)el%.

    Through the histogram anal1sis of the corner' we are able to obtain the the coordinates of the cornermar,s' thus allow us to remove the unnecessar1 border. The result of this operation can be seen inIllustration *.

    The current process to find corner mar,s coordinates has several wea,nesses and can be improved.

    fter the coordinates have been found' the1 can also be used to provide more robustness against variet1of input conditions "e.g. s,ew' rotation%' which are currentl1 not implemented. ll of these possibilitieswill be discussed later in section *.

    2.3 Optical Character Recognition (OCR)

    The #CR capabilities in this application is used to e)tract the name written in the name bo)es. Theprocess can be briefl1 described in Illustration ;.

    Illustration $! Corners o% the ra& in#ut image &ith si'e (0)(0 *#i)el+

    *a+ u##er le%t, *+ u##er right, *c+ ottom le%t

    Illustration .! "orm a%ter #re#rocessing *inari'ed, in/erted, order remo/ed+

  • 8/13/2019 Form Reader Project Report

    4/13

    2.3.1 Boundary Removalfter getting a name bo) using a predefined coordinate' width' and height' we have to removeunwanted components from the name bo) in order to optimi&e the character recognition process. #neof such components is the boundar1 of the name bo) "see Illustration 8"a%%. To accomplish the removalof the boundar1' a technique called e)traction o% connected com#onent"6on&ales and 7oods 20' p.88!% is used.

    7e start with a mas,1' an image with dimension equals to ' but all of the pi)els values are filledwith ' e)cept in the boundar1 which are filled with 0 "see Illustration 8"b%%. The intersection of and1' referred as0"Illustration 8"c%%' will be the starting component of this technique. From now on' the

    following operation is performed iterativel1 until 23=230 ' with 3 containing all componentsconnected with the border of"Illustration 8"e%%.

    Sin this case is the structuring element' whose form' si&e' and center point can be seen in Illustration8"d%. fter the boundar1 has been found' the process of boundar1 removal would be as simple assubtractingwith3. The result can be seen in Illustration 8"f%.

    Illustration (! 45ead name4 action %lo&chart

    3=30S 6 3=0'2'5'...

  • 8/13/2019 Form Reader Project Report

    5/13

    This technique results in the removal of all components in that are connected with the border ofet and8be the width and height of the input image' *), y+be a pi)el location in the input image'and */, &+be the corresponding output pi)el location "after mapping%. The pi)el mapping can be done

    using the following formula.

    The square brac,ets in the preceding formulas is a nearest integer "rounding% function.

    This approach' which can be referred to as %or&ard ma##ing "6on&ales and 7oods 20' p. 0?%'

    Illustration 9! Boundary remo/al #rocess

    *a+ ame o) , *+ mas3 1, *c+ starting #oint 0, *d+ structuring element S, *e+ 3, *%+ - 3

    Illustration 7! Trimming #rocess

    /=[7ne&7 ) ] &=[8ne&

    8 y ]

    % / , &=%) , y

  • 8/13/2019 Form Reader Project Report

    6/13

    possess several problems. For e)ample' two or more pi)els location in the input image can be mappedto the same location in the output image' or a pi)el location in the output image ma1 not be allocated api)el value at all. ecause of these problems' in/erse ma##ingapproach is more commonl1 used inpractice. The idea of inverse mapping is to scan the output pi)els in the output image first. For eachpi)el location in the output image */, &+' compute the corresponding pi)el location in the input image*), y+. The formula for inverse mapping' which is actuall1 the inverse to the preceding formula' is as

    follow.

    Please note that the nearest neighbor interpolation is implicitl1 carried out in the nearest integer"rounding% function. n e)ample of the scaling result can be seen in Illustration =.

    2.3.! "atc#ing

    fter the character in the name bo) has been prepared' the ne)t step is to match this character against

    ,nown set of characters in the database. In current implementation' the matching process is conductedusing minimum distance classifier' b1 computing the:uclidean distance "Tan' @teinbach and (umar28% between input image and database records.

    @uppose that %*3+is a pi)el value in position 3of the input image' %;*3+is a pi)el value in position 3ofthe database record' and nis the total number of pi)els in the input image. The 4uclidean distancebetween these two images can be computed using the following formula.

    4ach character will be compared against each database record. The database record with the least

    distance to the character would be considered the matching record.

    2.! $ptical "ar% Reader &$"R'

    The #ptical $ar, Reader "#$R% capabilities in the application is used to e)tract the filled answer "'' C' 3' or 4% for each number "from 0 to ;%. The process of answer e)traction can be summari&ed inIllustration ?.

    )=[ ne&/ ] y=[8

    8ne&&]

    % / , &=%) , y

    Illustration

  • 8/13/2019 Form Reader Project Report

    7/13

    The hori&ontal division of region into ; subAregions can be seen in Illustration 0. The variableT85:S8=>?is the minimal value for S*3+in order to be considered valid choice. If the value of S*3+

    is less than T85:S8=>?' it is considered an invalid answer' e.g. the form filler does not fill in an1answer for the corresponding number. The value of T85:S8=>? is currentl1 set to 2;' obtained frome)periment.

    3 Test ResultsThe test is conducted against two input images. The first input is a scanned empt1 form which is filledwith the aid of an image manipulation software "6I$P%. The second input is a scanned form' filledmanuall1 using a 2 pencil. Compared to the first input image' the location of the second input imageis shifted for about 2 pi)els to right and 0 pi)els to bottom.

    3.1 Test Result 1

    The condition of the first input image+

    Illustration @! 4et ans&er4 action %lo&chart

    Illustration 0! ?i/ision o% an ans&er region into ( su-regions

  • 8/13/2019 Form Reader Project Report

    8/13

    Properl1 scanned' without rotation or location shifting.

    Filled with the aid of an image manipulation software "6I$P%.

    Filled using a pitch blac, fill and font color.

    Filled using the same font as one used in the database records "rial%.

    Tale ! 1atching results %or in#ut image

    Character Best Match Euclidean Distance with Database Records

    "% *.58' "% ?.0!' "C% ?.55' "3% ?.5=' "4% ?.8' "F% =.=5' "6% ?.0!' " > "% ?.00' "% !.82' "C% !.0*' "3% !.5;' "4% ;.=5' "F% !.!' "6% =.2;' "

  • 8/13/2019 Form Reader Project Report

    9/13

    / / "% ?.00' "% =.02' "C% ?.00' "3% =.2;' "4% =.*?' "F% =.5!' "6% =.8' "ocation is shifted for about 2 pi)els to right and 0 pi)els to bottom.

    Filled manuall1 using a 2 pencil.

    The handwriting st1le used in the second input image can be seen in Illustration 00.

  • 8/13/2019 Form Reader Project Report

    10/13

    Tale 2! 1atching results %or in#ut image 2

    Character Best Match Euclidean Distance with Database Records

    "% 8.52' "% !.=0' "C% =.5!' "3% =.*5' "4% =.0?' "F% =.0?' "6% =.*5' " "% ?.55' "% =.?*' "C% ?.22' "3% =.02' "4% =.8' "F% =.5!' "6% =.*?' " > "% =.' "% =.!!' "C% !.82' "3% =.88' "4% =.;*' "F% =.0?' "6% =.50' "

  • 8/13/2019 Form Reader Project Report

    11/13

    ?.*5' "T% !.=!' "-% ?.00' "D% =.50' "7% ?.00' "E% =.!2' "% =.' "G% ?.8

    / / "% ?.' "% ?.0!' "C% =.8' "3% ?.8' "4% ?.0!' "F% =.*?' "6% !.82' " "% ?.;*' "% =.8' "C% !.' "3% =.8' "4% !.!;' "F% =.2;' "6% =.' "

  • 8/13/2019 Form Reader Project Report

    12/13

    ! (no)n Issues and *ossi+le olutionsThe current implementation of the form reader application is not practical enough for a real application.The performance of the form reader' especiall1 in the #CR module' needs an improvement. $oreover'the application is not robust enough to handle the high variabilit1 of user inputs and problems that ma1occur in the scanning process "e.g. rotation' position shift' wrong orientation%. In this section' all of the,nown issues will be presented along with suggestions on how to solve each problems.

    0. The binari&ation process is conducted using a simple global thresholding scheme with a staticthreshold value. @hould there be a difference in the brightness level in the input image'unwanted behavior ma1 occurs "e.g. object of interest ma1 be lost because its intensit1 level isnot as strong as in normal condition%. more robust approach would be to use a globalthresholding scheme with d1namic threshold value' e.g. =tsu;s method "6on&ales and 7oods20' p. !8;%.

    2. The application provides no anticipation against nois1 input images. /oise is problematicbecause it could be mista,en with the actual object of interest. This would cause problems'especiall1 in corner mar,s detection and #CR. To anticipate nois1 input images' an imagesmoothing operation should be performed in the form preprocessing step' perhaps using one of

    the most common image smoothing technique called a/eraging %ilters"6on&ales and 7oods'20' p. 0!*%.

    5. The process to find the location of the corner mar,s cannot ver1 well handle location shifting inthe input image. The si&e of corner regions to be searched for corner mar,s is too small "; ) ;pi)els%' therefore if the image is shifted too much' the corner mar,s would not be able to befound. If we set the si&e of the corner regions to be larger' it will increase the ris, of having acollision to the actual form content' which would lead to subsequent difficulties in histogramanal1sis "Illustration 02%. This problem can be solved b1 assigning a different color to thecorner mar,s in the form design. nother approach would be to increase the space between thecorner mar,s and the actual content of the form.

    *. Currentl1' the corner mar,s in the form is onl1 utili&ed to remove unnecessar1 border in the rawinput image. In order for the application to be more robust against geometric distortion "e.g.resolution mismatch' location shifting' rotation' shear' wrong orientation%' the location of thesecorner mar,s can be used as tie #ointsin image registrationprocess "6on&ales and 7oods' p.

    000%. Image registration is a process of aligning two or images of the same scene. ased on thelocation information of these tie points' we can conduct image registration using a%%inetrans%ormation"6on&ales and 7oods 20' p. 0?%.

    ;. In the #CR module' the process of boundar1 removal for the name bo)es is not robust. Fore)ample' if the character written b1 the form filler is touching the border "which is notuncommon%' the character would also be removed b1 the operation' resulting in a meaninglessimage. more robust approach would be to use a simple histogram anal1sis to differentiatebetween name bo) border and the character.

    Illustration 2! Corner regions &ith larger si'e *00 ) 00 #i)els+

  • 8/13/2019 Form Reader Project Report

    13/13

    8. The current #CR design described in section 2.* has proven to be not effective enough tohandle the large variabilit1 of handwriting st1le. @everal improvements can be made to enhancethe character recognition performance. Image smoothening using averaging filter can be used sothat the matching would be more tolerant against a slight shape shift in the letter. morphological operation' thinning"6on&ales and 7oods 20' p. 8!*% can be used to obtain thes,eletonised representation of the letter in order to to avoid the problem caused b1 the

    difference in letter width. The number of the database records should also be increased andinclude various samples of handwriting in the records. The rtificial /eural /etwor, "Tan'@teinbach' and (umar 28' p. 2*8% or @upport Dector $achine "Tan' @teinbach' and (umar28' p. 2;8% ma1 also be incorporated as the classifier.

    , Re-erences6on&ales' RC and 7oods' R4 20'?igital Image Processing' 5rdedn' Pearson' -nited @tates ofmerica.Tan' P$' @teinbach' $ and (umar D 28'Introduction to ?ata 1ining' Pearson' -nited @tates ofmerica.