10.1.1.89.6415

download 10.1.1.89.6415

of 12

Transcript of 10.1.1.89.6415

  • 8/22/2019 10.1.1.89.6415

    1/12

    J. M. Whi teG. D. Rohrer

    Image Thresholding for Optical Character Recognition andOther Applications Requiring Character ImageExtraction

    Two new, cost-effective thresholding algorithms for use in extracting binary images of characters from machine- orhand-printed documents are described. The creation of a binary representation ro m an analog image requires such algorithmsto determine w hether a point is converted into a binary one because it fal ls within a character stroke or a binary zero because itdoes not. This thresholding is a critical step in Optical Character Recognition (O CR ). t is also essential fo r other CharacterImage Extraction (CIEJapplications, such as the processing of m achine-printed or handwritten characters fro m carbon copyforms or bank checks, where smudges and scenic backgrounds,for example, mayhave to be suppressed. The irst algorithm , anonlinear, adap tive procedure, is implemented with a m inimum of hardware and is intended fo r m any CIE applications. Thesecond is a more aggressive approach directed toward specialized, high-volume applications which u sti fy extra com plexity.

    introductionOn e of the most significant problems in Optical Char acterRecognition (OCR) is the conversion of nonideal analogimages into deal binary images [11. The origina l documentswhich arescanned orcharactersare often dirty,multi-colored, and produced by a variety of pens, ma rkers, pencils,or printermechanisms.Charactersar eoften meared orsmud ged, and are sometimes written with either very lightstrokes that are ifficult to detect or very heavy strokes thattend to broaden and run together when imaged. T he scan-ning hardw are, du e to technology and cost limitations, mayhave nonuniform illumination over the scan field, sensitivityand dark current variations from element to element n thesensing array , and nonideal resolution characteristics fromthe lens and from crosstalk in the array .

    A similar but more nclusive thresholdin g problem may becalled Character Image Extraction (CIE), which describesthe suppressionofunwantedbackground patterns so t ha tonly printed or handwritten characters may be captured aselectronic images. As with OCR, this process involves con-verting nonideal analog imagesof characters into deal ones,but he binary mages may be compressed or storage or

    distribution, sorted, or used in comp uterized printing. Th eCIE process is different fromdigital facsimile, where apseudo-gray-scale reproduction of the im age is desired [2 ] .CJE ou tpu t is binary, a s opposed to m ulti-level gra y scale,an d consists of black picture elemen ts pixels) where charac-ter stro kes are writ ten an d white pixels elsewhere. Pictorialconte nt and noise w hich do not conform to the criteria forcharacter strokes are eliminated. These images can beom -pressed more efficiently than digita l acsimile and, therefore,ar e used for electronic distribution , sorting, and comp uter-ized printing aswell as for OCR .

    To overcome the difficulties of character extraction, thedesigner of a thresholding algorithm or circuit must use asmuch a priori information about the character images asspractical for the cost range of the equipmen tbeing designed.For example, th e width of a typical character stroke s about0.2 mm, with some of the widest strokes up to about 1 mm .(This stroke width encompasses most chara cters found inCI E ap plications.) The overall s ize of a character rangesfrom about2.5 mm wide by 4.2 mm high (typewriter output)to 6 mm by 9 mm for h and-prin ted characters. Except for

    o Copyright 1983 by Interna tional Business Machines C orporation. Copying in printed form for private use is permitted without payment ofroyalty provided that ( 1 ) each reprodu ction is done without alteration and (2) the Journal reference and IBM copyright notice are included onthe first page. The titleand abstract, but no other portions, of this paper may be copied or distributed royalty free witho ut further perm ission bycomputer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from theEditor.00

    J. M. HIT E AND G. D. ROHRER IBM J . RES. DEVELOP. VOL. 27 NO . 4 JU LY 1983

  • 8/22/2019 10.1.1.89.6415

    2/12

    dots on the i a nd j and for punctuation marks, charac-ters are madeof strokes which ar e ideally long, but narrow,connected groups of black pixels. The thresholding circuitsshould deletebackgro und levels which are chan ging overregions larger than the characterize. Maximum detectabil-ity should occur for dimensions app ropriat e to the chara cterstroke. Ideally, this last m aximizatio n process should not bedone ndepende ntly for each dimension in the two-dimen-sionalspace. Thisdoublemaxim ization would emph asizedot s and dot-like noise more than lines, whichhave highspatial frequency co mpon ents in only on e direction.

    The decreasingcost of digita l im age captu re and rocess-ing hardware, especially CC D-scan ned Cha rge CoupledDevice) photodiode arr ays and memory chips, has mad e itpossible to consider approach es to hresholding that haveno tbeen practical before. Interline pauses equired by somesystemsmake igitalhresholdingmore avorableha nstrictly analo g hresholding. The forme r can beontrolled bythe ystem clock and ime-independ entdigital memory,whereas analog time constants areixed and require uninter-rupted operation.

    State-of-the-art docum ent scanning systems provide dis-cretely sampled output n a rectilinear grid. One typical gridis 240 pixels/inch (approximately 0.1 mm/pixel) both hori-zontally and vertically. This provides an average of at leasttwo samples, or pixels, per strok e width-a condition whichguarantees that a t leas t one sampleill fall totallywithin thestroke. This alsoproduces 5.4 million pixels for a typical 8.5by 11-inch sheet of paper or 1.Omillion pixels for a typical2.75 by 6-inch bank check. Processing this amo untof data a thigh speeds requires special real-time processing algorithm sin order o minimize hardware costs. The appro ache s wehave developed and tested , an d which form the basis of thispape r, re hus significantlydifferent from typical low-speed, iterative digitalprocessing of photograph ic or satelliteda ta [31.

    The first approach is a dynamic threshold algorithm [4 ] .The black /whit e decision is determined by a threshold levelwhich is continually changed as the scanned gray-scale datastream changes. Th e basic hreshold calculationmayberepresented (in each dimensio n) as a first-order differenceequation with a nonlinea r coefficient. The non linea r ermwas heuristically determined,but wasmodeled after heresponse of a resistor-capacitor-diode circu it. Not only doesthis algorithm result in near-optim al image fidelity, it alsoca n be built with a very small amou nt of logic and m emoryhardware and cane program med with very few lines of codeif a microprocessor implementatio n is preferred. This ensureslow cost as well as high-speed p erformance . Althoug h notdescribed in detail in thispaper, a modification of thedynamic threshold algorithm was createdor calculating the

    IBM J. RES. DEVELOP. VOL. 27 NO. 4 JULY 1983

    threshold across a segmented scan field (with eight paralleloutputs rom he scanning array).Th is modification wassimulated in softwa re, not in real-time hardware, with suc-cessful results in mage quality.

    The second algorithm is a unique combination of simplealgorithms hat more ullyutilizes the inear width an dconnectedness of ch aracte r strokes. Some of th e simplicity ofthe first hresholding algorithm is sacrificed, but typicallymore of the pixels in the background region are ma dewhite.This results in a more idealized ou tpu t, and the improvedoutput is more noticeable when the scanned gray-scale dataare distortedor have dark background regions. Unlike othermulti-operational algo rithm s tha t can achieve this idealizedoutput [ 5 ] , he black /whit e decision in the second algorithmis determine d in only one processing passof the scanne d data.This greatly reduces the complexity of the implementationand allows thealgorithm obe economically easible orhigh-speed imaging.

    This algorithm is a label and search process. Before thefinal black/white decision is made, the pixels lying near anedge (sharp change in gra y-scale data) are labeled. Pixelslocated on th e da rk side of an edge are distinguished fromthose on the light side. The light and darkides of an edge areidentified by a sum of the differences calculation that is anapproximation of the Lapla cian oper ator (two-dimensionalsecond derivative). The final black/white decision is basedon a search operation f the labeled image. Unlike the resultswith ou r Dynam ic Threshold and any other algorithms 6 ] ,the sharp edgef a large background pa ttern is eliminated bythis algorithm ince the other edgeoes not appe ar within th especified distance. Pixels within a cha ract er stro ke are ma deblackbecause ssociated dges can be found. The highspatial frequency associated with the edge is not sufficient initself to causea black/white transition in the outpu t.

    Dynamic Threshold Algor i thmFunctional d escription

    Th e main objective of the Dynamic Threshold Algorithm sto set a threshold for the bina ry (1 for black, 0 for white)decision about a given pixel. The appro achconceptually is tocom pare the gray value of the pixel with some average ofgray values in some approximately character-sizeneighbor-hood about thepixel. If the pixel is significantly darker thanthe neighboring pixels, it is calledblack.Two difficultiesarise with the obvious approach of uniformly averaging thegray values in a circular neighborhood. T he first problem isone of cost: Storingman y lines of gray-level pixel da tabecomes prohibitive. An avera ging approach must be devel-oped which does not require referral back and forth to otherlines of scan data . Th e econd problem s one of performance:If the contra st ratio of the chara cter to the background is 40

    J. M.W H I T E AND G. D. ROHRE

  • 8/22/2019 10.1.1.89.6415

    3/12

    Read only memory(ROM)output values 1Input

    Updatetoward-lack \\-Update towardwhite

    *Input(ROM address)

    Updatetowardblack-Figure 1 Update functions f upper curve) and g (lower curve)provide the rateof response of the running average to changes in thegray values input to the Dynamic Threshold Algorithm hardware.Abscissas vary from - 27 to +128, representing the sign and theseven most significant bits of the difference. Ordin ate values varyfrom 10to -63 .high, then he threshold level should be increased by anadditionalamount o educe noise fromdirt,smudges ,background printing, etc. Furthermore, the average mu stada pt quickly after leaving a very da rk charac ter so tha t afollowing lighter characterwill not be elimin ated.

    Th e solution to he torage equireme nt problem oraveraging is to use a running average nstead of a trueaverage. To calculate running average, ( n ) , ro m a streamof sampled, digitize d, raster-scan ned gray values, u(n), afraction,f , of the inpu t gray alue is added to a complemen-tary amoun t of the previous average, y (n - 1 ) :y ( n ) =f .u(n)+ ( 1 -f) y ( n - 1 ) . ( 1 )This equation can e expanded to how explicitly that y ( n ) sa nonuniformly weighted average of t he curren t and pastpixels:

    my ( n ) =Zf.1 - ) i - u(n - ).i= O

    (Note that, or negative arguments, u is taken to be the value

    Input (address)Figure 2 Bias functions are used to offset the decision level and toelimin ate noisy backgrounds. Since the nput is 6 bits (values 0 to 63 )and the outputs 8 bits (0 to 255), a multiplication of approximatelyfour is included in the table, which scales the output to the rangecovered by th e average.

    set in the historybufferwhen the thresholdingsystem isinitialized.) However, to i mp lem ent the runn ingverage, it isinstructive to rewriteEq . ( 1 ) a s

    Thus, the a verage can be updated y adding to th e previousvalue a fraction of the difference between the cu rrent gra ypixel value and the prior average value. When implementedin hardware , this latter expression for the runn ing aver agerequires very few components.

    Replacing the constant valuef in q. (3 ) with a functionfgives a more versatile, nonlinear equation :A n ) =Y ( n - 1) + f E u ( n ) - Y ( n - 1)l . (4)Cert ain restrictions should be applied to the func tion f. tshould equal zeroonlywhen itsargume nt is zero, andotherwise it should have a value between zero an d th e valueof its argume nt. Th is will guarantee that th e av erage neverexceeds the rangeof the inpu t gray ixel values and tha t theaverag e will converge to the inpu t value for uniformly grayareas.

    Th e nonlinearity permit ted by the use f a function nsteadof a multiplicative c onstan t provides a solution to the econdproblem described above; i.e., th e averag e can be made to

  • 8/22/2019 10.1.1.89.6415

    4/12

    verticallevel/I3

    lnltlalizehorizontallevel

    Figure 3 Components used to implement the Dynamic Threshold Algorithm. Inputs 3 and 14 are used to initialize the horizontal and verticalaverages at th e beginning of each lineand column, respectively. Input is 6-bit gray scale ata from the A /D converter connected o the scanningarray. Output 15 is the binary result of the thresholding.

    adjust rapidly to large, high-contrast signals, and to have atendency to follow the black peaks of the chara cter strok epixels. Th e peak-following charac teristic is simil ar to that fa rectification circuit, and n fact, the functionfillustrated nFig. 1 a) is similar to th e c urrent-voltage characteristicsf aleakydiode. Other hresholding mpleme ntations [7 ] haveactually used diodes in apeak-following scheme,but heanalog approac h does not allow fine tuning of the ada pta tionrate and s not clock-controlled as is the digital approach.

    Equation4) only gives one-dimensional averagin g.Imag es are two-dimensional, and experience has shown thattwo-dimensional averaginggreatlyenhancesperformance.To achieve wo-dimensional running verage, verticalaverages, z(n) ,are stored for each colum n of the image, andas that columns reached, the vertical averages updated bythe horizontal averag e value y ( n ) :z(n)=z ( n - II)+g [ y ( n )- (n - II)], ( 5 )where Q is the num ber f pixels in ascan line. Thus, we have avertical average of the horizontal average. The update func-tion, g, shown in Fig. l(b), operates more rapidly th an thefirst-stage, horizontal upd ate for two reasons: The first stageusually eliminates heextreme values;also, the re is littlelook-ahead, if any, in the vertical directio n.

    Since he unningavera ge is one-sided in tha t t onlyaverages over past pixels, it is desirable to store some num berN of scann ed pixel values and use this delayed alue,u (n - N ) , in comparison with thedyn amic threshold oraverage z(n) .W e found that N =8 was a good choice fordata scanned a t 240 pixels/inch. This number is four timesthe numberof pixels in the narro westof typical strokes. Aminimal additional mprovement was observed whenwe alsoadded an entire ine to the delay; .e. , N =8 +II.

    One other feature is required o complete the Dynam icThresho ld Algorithm , and that is a bias between th e grayvalue of the pixel being compared, u ( n - N ) , and hetwo-dimensional average, z ( n ) . Without bias, the thresholddecision would be determin ed by noise fluctuations in uni-formareas.Additionalamou nts of bias are required togua ran tee the uppression of residual images of the speciallycolored boxes used in many O C R forms. Color filtering f theoptical mage eliminates most of the contrast from theseboxes, but due to t he varie ty f inks which are used and thetolerances which a re specified to accom moda te these nks,biases of five to twenty percent may be required for som eapplications.

    The bias may be a function of the history, z (n) ,of thelocalized pixel, u ( n ) ,or of both. In our implementation, thebiasing function, h , was based only on the localized pixel.That is , h [ u ( n- N ) ] was compared with . z (n) . Figure 2illustrates ypical biasing functions. The dash ed line ndi-cates the unbiased condition in which h [ u ( n ) ]s equivalentto u ( n ) . For those cases indicating a bias toward white, theou tpu t decision will be white unless th e pixel is definitelydarker than theeighborhood. Conversely, f the pixel is da rkon an absolu te scale, then the output decision will be blackunless the pixel is relatively lighte r than its eighborhood bythe indicated amount.Various bias curves are indicated andwere selectable for the various color drop-out modes ofoperation.

    ImplementationTh e low cost of implementation makes the Dynamic Thresh-old Algorithm potentially useful in many OCR pplications.Figure 3 is a block diagram of the mpl emen tatio n indedicateddigitalhardwa re. A flow diag ram forasimplemicroprocessor software implementationwould be very simi- 40

    IBM J. RES. DEVELOP. VOL. 27 NO. 4 JULY 1983 J. M. WHITE A N D G . D. ROHR

  • 8/22/2019 10.1.1.89.6415

    5/12

    Figure 4 Pseudo-gray reproduction of stress test document data as scanned.

    Figure 5 Result of using the Dynamic Threshold Algorithm on data of Fig. 4. Insert is an enlargementof lower right numbers. Each pixelcorresponds to 1/240 inch in the original document.

    lar . Several nonobvious manipulations of the data streamwere used to keep the numberf parts toa minimum withoutsacrificing quality .

    The da ta f r om thenalog-to-digital converter ( A D C ) wa s6bitsper pixel, representi nggray levels of 0 to 63. The6-bit-wide da ta ath is indicated by 1 in Fig. 3. Thesubtraction of u(n) r o m y ( n - 1 ) was performed by comple-menting u(n) and adding i t to ( n ) ,as indic ated in blocks 2and 3. Thehor izontalaverage y ( n ) was an8-bit value,ranging from 0 to 255. The six bits from block 2 were add edin block 3 as the most significant bits-in effect multiplyin gu(n) by 4. This permit ted the updatingf history values by 1part in 256, instead of 1 part in 64, so th at very slow rates ofupdating could be realized.

    The carry bit from the adde r (3 in Fig. 3) along with theseven most significantbitswereused toaddress RO M 4

    404 which stored the function$ This ncremental esult rom he

    RO M was add ed to the uffered history value in lock 5, andthe new value y ( n )was passed to buffer 6.

    A similararran gem ent is used to perform the verticalave ragin g. The only difference is th at a shift register 7 andbuffer 8 are used tostore he line of historyvalues andprovide the delay .Th e resulting valuez(n) s applied to oneside of an 8-b it comp arator. The upper d ata pathonsists ofthe shift register 7, which provides the delay N between theaverage term and ixel under comparison, and the bias table,RO M 9, which took a 6-bit inpu t and rovided a biased, 8-bitoutput so that the upper channelould in effect also have the4 times multiplication. Sin ce theRO M had a 9-bit input, thethree extra inputs ( two arehown as 10an d 1 1) could be usedto select eight different bias tables.

    ResultsThe Dynam ic Threshold Algorithm has been tested with awide variety of documents. OC R statistics on thousands of

    J. M. HITE AND G. D. ROHRER IBM J. RES. DEVELOP. VOL. 27 NO . 4 JULY 1983

  • 8/22/2019 10.1.1.89.6415

    6/12

    t0 I I I1300 1350 1400 1450 1500 1.550 1600 0 1 0 0 20 0 30 0 4 0

    Pixels along ro w 357 of document T931301 Column 1428 (top to bo t t om) 0 1 document TY31303Figure 6 Biased gray levels (solid curve) and two-dimensional running average (dotted curve) for the horizontal line (a) and the verticalcolumn (b) intersecting in the lower right 6 of Figs. 4 and 5. Intersection coordinates are 1428 and 357 . Output of threshold is white when thebiased gray level is larger and is black when the avera ge is larger.

    characters scanned from live application documents andthresholded using the Dynamic Th reshold Algorithm hard-warecompared favorablywith esultsusingprior thresh-olding methods. For purposes of illustration, we have selectedthree docum ent samples which exemplify worst case prob-lems (Figs. 4 an d 5 ) , a typical applicatio n (Fig. 8 ) , and anidealized image test pattern (Fig.9).

    Figure 4 is a computer/photocomposer pseudo-halftonereproduction of the actual gray-scale data scanned from astress test document. The reading of all characters on thisdocumentarxceedsheapabilities of most OC Rmachines (some charac ters areoo light, oth ers are smudged ,erased, or marked over); in a numb er of in stances heminimal-hardware,DynamicThresholdAlgorithm fails.However, it is instructive to see what th e problems are andhow closely we border on a failure.

    Figure 5 shows theoutpu t of theDynamicThresholdAlgorithm operating on the da ta of Fig. 4 . The 6 in thebottom row f Fig. 5(b) is an xam ple of a ommonproblem-the cen ter of a loop is alm ost filled in but needs tobe open for best operation of the recognition logic. Th e gra yvalue in the center f the loop is actually darker than manyfthe oth er chara cter strokes, as can be seen in Fig. 6, whichshows the bias gray evels and average output alues as theywould appear at the inputof the comparator (12 in Fig. 3) .Th e opening of the 6 is located approximately at th entersec-tion of column 1428 and row 357. (The document s 2048 by38 4 pixels). The threshold dynamica lly shifts to cap ture theopening of the 6 as desired.

    If a dark are a s roughly the size of a characte r, then it illtend to emaindark. Even withbiasing, which stronglyemphasizes contrast since these levels straddle the break inthe bias curve (Fig.2) , the center leftow of characters [Fig.4(a)] is not machine-readable.Figure7(a) indicates th atthere is not nough time for the vertical average obeadjusted to such a dark background during the processing ofone chara cter ine, but if the problem line is replicated [as inFig. 7(b)], then the likelihood of detecting and reading th echara cters is enhanced for the second line of charac ters in thedark background.

    Figure 8 is a composite of a form using a green drop-ou tbackground with boxes to constrain the location of chara c-ters to be recognized and of segm ents of thresholded outp ut.The bias function of the Dynam ic Threshold Algorithmoesignore the residual contra st of the boxes against the whitepaper. The area about the canned cha racters is then free ofany noise which would redu ce he effectiveness of therecognition process.

    Figure 9 is a portion of the IEE E Facsimile Test Char t,which has been scanned and thresholded using the Dyn amicThresholding Algorithm. In addition o demonstrating hehigh resolution of the 24 0 pixel/inch syste m, the portion ofthe face shows how the bias towa rd black (lower left pa rt ofthe curves in Fig. 2 ) retain s the overall pictorial conten t ofthe large dark areas. Were theias toward white for all grayvalues, then an y large uniform area would appear as whitewith only transition areashaving black outp ut pixels. This is,of course, an option of the user a nd is controlled by the bias

    .I. M. WHITE AND G . D. ROHIBM J. RES. DEVELOP. VOL. 27 NO . 4 JULY 1983

  • 8/22/2019 10.1.1.89.6415

    7/12

    0 100 200 30 0 400300 (b )

    lColumn 252 (top to bottom) of document T931303Figure 7 Biased gray levels (solid curve ) nd two-dimensionalavera ge (dotted curve) along column 25 2 of Figs. 4 and 5. Excessiveoverall darkening of a character-sized area results in a loss ofinformation (a), but replicating the dark line (b) shows that hethreshold tends to adapt asesired when the dark areas larger thana single charac ter dimension.

    tables selected for use witha particular document. As shownin Fig. 2 , the bias tables which were used change from bia stoward black tobias owardwhite a t 40 percent of fullscale.

    Integrated function algorithm0 Functional descriptionThe second algorithm is intended for extracting charactersfrom complex images in an effective and rapid manne r. T hemain objective s to remove as mu ch of the nonessentialback grou nd as economically easible to allow or efficientcompression and subsequ ent handlin gf the binary image.nsome CI E applicationshisane difficult tochieve

    Figure 8 Sections of actualOC R document (a, c) with green"drop-out" boxes surrounding the OC R egions, and (b,d) DynamicThreshold Algorithm output of corresponding areas.

    nFigure 9 Result of scanning IEEE Facsimile Test Chart withDynam ic Threshold Algorithm. Reproduction of the arge blackareas in th e face section is accomplished by biasing the lower grayvalues toward black; otherwise, all largely uniform areas would bewhite.

    because thecontrast atio of text to thebackground isextremely small. For bank checks, as a n examp le, a ratio ofless than 20% is notuncommon, while ratios within th ebackground alone can exceed 20%. This obviously restrictsthe use of contrast as the ole measurement in thresholding.Othermeasurem ents should bemade nd ppropr iatelyweighted togenerate a more idealized output. The nte-grated Function Algorithm, therefore,employs several mea-sureme nts to deal with these complex images. Wid th of thetext, harpne ss of the ext edges, andconstant atioallcontr ibute to thehreshold decision process.

    The Integrated Function Algorithm processes images inthe following fashion: F irst, digitized data from the rasterscanner are processed by gradient-like operators to identifyand label pixels in, or very close to, areas where sharpchang es exist n the gray-level imag e.These regions aretypicallyedges of text char acter s) or backgroundareashaving high contrast chang e. T hisocalized edge informa tionis then interrogated to separate the text edges from those

    M.W H I T E A N D G. D. ROHRER IB M J. RES. DEVELOP. VOL. 27 NO . 4 JULY 1983

  • 8/22/2019 10.1.1.89.6415

    8/12

    belonging to ex traneous backgrou nd. T he se parationecisionis based on correlating hese dges o haracter trokewidths. T hen, internal regions of the character strokes aremade black (binary 1) while all other areas are madewhite(binary 0).

    To accurately identify theixels in th e vicinity of a n edge,a m easurem ent of the chang es in gray-level values is used.The measurement,defined as activity operator ( i , j ) , s theabsolu te sum of approxim ated derivatives for both scan an draster directions takenover a small area. The erivative, d x ,as proposed by Sobel [8], n the candirection for thegray-level u located a t pixel i in r a s t e r j i sdx ( i , j )=u( i - 1 , j ) - u ( i +1 , j ) . ( 6 )Similarly,dy( i , j ) =u ( i , j - 1 ) - u ( i , j + 1 ) . (7 )The chang e activity for one pixel, a( i , ) , s defined in thisalgorithm as the absolute sumf these Sobel derivatives:a ( i , j ) = dx ( i , j ) +Id y ( i , j ) . (8)To max imize the detectab ility of edges for most hand- andmachine-printed characters, theactivity defined in Eq . (8) issummed over nine pixels. The form of the activity operatorfor one pixel, A( i , j ) , s thenA( i , j )= x x a( i +n.j +m ) . (9)n=-I,O.l m=-1.0.1

    Thisoperator akes on large values in the vicinity ofcharacter edges and relatively lower valueselsewhere. Asimple thresholdin g techniq ue, therefore, s sufficient to pickonly those pixels lying close to cha racteredges. One need notbe too concerned abo ut incorrectly identifying some pixels.The ir positions are usually uncorrelate d (not related o astroke edge),which results in heir being removed or properlylabeled in theblac k/wh ite process. Figure 10 shows thehistogram of A ( i , j ) valuescollected from a high-speedscanner. The operator is well-behaved, as demon strated inthe distr ibutions for uniform, black background to theerybusy background of a scenic bank check. Few pixels exceedthe value f 25 in all cases. A ( i , ) ehaves much the samenthe presence of text.This is shown in Fig. 11 . Thresholdvalues between 15 an d 24 are sufficient toseparate mostbackg round pixels from those n th e vicinity of chara cteredges.

    Th e next step is to label all identified edge pixels accordingto the sign of an approxim ated Laplacian operator, dd x y .Referring to Eqs. (6) an d (7) , the ope rator for pixel ( i , j ) sddxy( i , j )=dx( i +1 , j ) - dx( i - 1 , j )

    +dy ( i , j + 1) - dy ( i , j - 1) . (10)

    IBM J. RES. DEVELOP. VOL. 27 NO. 4 JULY 1983

    300 (b :

    250

    150

    I Activity operator valueFigure 10 Activity perator istograms. igures show activityoperator, A ( i , ) , histograms for areas comprised f 2560 pixels, 20by 128, at 240 pixels per inch. Gray-level data, u ( i , j ) , is 6 bits perpixel, rep resenting values rom 0 to 63 . (a) Scanner background(noise); (b) plain white paper; and (c) scenic background of a bankcheck.

    In erms of gray-level values, u, the operator imply isddxy( i , j )=u( i +2 , j ) +u( i - 2 , j )

    +u ( i , j +2 ) +u ( i , j - 2)- 4 * u ( i , j ) . (11)

    J. M. WH I TE A N D G . D. ROH

  • 8/22/2019 10.1.1.89.6415

    9/12

    I I0 to 4 5 to 14 I5 to 24 25 and above4ctivity operator value

    Figure 11 Activity operator influenced by text edges. Scan sizeand resolution are as escribed in Fig. 10. (a) Distribution of A ( i , j )values for text on a clean background; (b ) text on a scenic bankcheck.

    Historically, this ope rator is not well behaved on typicalimagedata [9] . In hisalgorith m, however, dd xy is onlyapplied for hose pixels having an activ ity ope rator value,A ( i , j ) , greater han or equal o a threshold, T. Thiseffectively smooth s ddxy because the L aplacian tends to bestable in these areas.

    This combination of A ( i , j ) and ddxy( i , j ) s formed togenerate a three-level image in which only the sharpest edgesare identified and labeled. A pixel in the new image, S ( i , j ) ,is defined as

    S ( i , j )= - f A ( i , j )2 a n d d d x y ( i , j)

  • 8/22/2019 10.1.1.89.6415

    10/12

    line could be calib rated to the line idth restrictions, so tha twide, dark b ackground areas can now be dropped from thefinal image .

    ImplementationA block diagram show ing the majo r processing functions forthis algorith m is presented in Fig. 13. All function s could berealizedn one or more microprocessors, depending onprocessing time equirem ents.Calculation of the edge-finding and labeling factors should be done in parallel forefficient processing. The edge correlation or generating thefinal image can be buffered n a n economical fashion an dperformed in an othe r processor operating serially. For high-speed applications parallelprocessing chann els could beused. Each chann el has to overlap the adjacent channelsy aline width to ensure that no seams are c reated n the binaryimage.

    The lgorithm was implemented in pecial-purposeprocessor. Digitized inpu t from the A/D con verte r s identi-cal to thatdescribed for the Dynamic Threshold Algorithm.Th e derivatives, d x an d dy , for each pixel are calculatedwithtwo adders an d en ough RAM toold two scan rastersf graylevels. Operators ddxy an d A are calculated in a similararrangement. The hreshold, T , is set to be a binary fractionof A values and is bounded by 15 and 25. The threshold iseithe r increased or decreased dependingon the numb er of Avalues between 15 and 25 n eight rasters.

    To l imit the amountf hardware and to reducerocessingtime required to do the edge correlat ion, the edge searchslimited to two directions, x an d y . A lineor raster store f theS image provides the data necessary to test for the orderedsequences in the x direction. Data for the sequenc e testn they direct ion are accumulated througheeping a history of themost recent edge sequence or each pixel. Only thesequencetype (1 for + - nd 0 for -, +) and the numberof rasterssince it occurred (binaryoded run-length) need to be stored.This search matrix is shown on a typical S image in Fig. 14.Th e basic criterion for making pixel black requires that thepixel be bounded by edge s only in on e direction.

    More accurate decisions ca n be made with little addedexpense by expanding he decision m atrix o nclude woedge tests or each direction (double searchdecision matrix).H er e a four-pixel clus ter (2 y 2) is evalua ted simultaneouslywith theedgesearch in the two rows and columns ha tcontain the clus ter . Theixels must bebounded in both rowsor both columns tobe made black.

    The ntegratedFunctionAlgor i thmwas mplementedusing the double searchecision matrix. This further enab ledus to remove much of th e nonessential backg round frombusyor complex images.

    IA / D Tr, dyr(i. j )(i . ,

    store analysisine+store store Edge(i. )+ - line-x(i. j )ine

    LD'dnfi. )Figure 13 Block diagram of the Integrated Function system.0000000000000000000000000"++000++"-000000000000000000000000000000000OOOOOOOOOOOOOOOOOOoOooooo"++oo++r"- - ooooooooooooooooooooooooooooooooo00000000000000000000000 0000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000 00000000000000000000000000000

    0000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000

    00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

    0000000000000000000000000000000000000000000000000000

    0000000000000000000000000000000000000000000000000000000000

    """"""".............000000000000000000000000000"++ ++"-00000000000000000000000000000000OOO OOO OOO OOO OOoOO OOO ooooooo"-++oo++"oooooooooooooooooooooooooooooooo

    Figure 14 Decision matrix positioned on an in terna l pixelS(i , )of a typical stroke im age. The edge searchs limited to two irectionsand eigh t pixels. This is sufficient to define character line widths upto 1 mm for a scanner resolutionof 240 pixels per inch.

    00000000000000000000ooooooo"++~++"-oooooooooo ooooooooo ooooooooo ooo

    ResultsTo eval uate the effectiveness of removing bac kgrou nd, wetested the algo rithm with a variety of scenic bank checks.Th e gray-level imag es of these checks were collected from ahigh-speed sca nner (low signal-to-noise ratio). Thus , n addi-tion to the busy backgrounds, these gray levels contain eddistortions created by the scanner hardware.

    Figure 15 shows small portions of bankcheck magesproduced by two versions of the algorithm . The 120/24 0-pixel image s illustrate the enefits of using the dou ble searc hdecision m atrix. (The 240-pixel output represents the inglerow/column search matrix.)Considerably mo re of the back-ground is dropped from heoutput in the 120/240-pixelimages. The four pixels in the 2 by 2 cluster in the doublematr ix images are made ei ther a l llack or all white, leadingto a 120-pixel output resolution. This shows tha t much of th echaracter shapes can be maintained through a reduction inresolution. The double search matrix can also be used toproduce binary im agesa t the scanesolution.

    Figure 16 shows ascenic bank check and ts 120-pixelbinary image produced by the Integrated Function system 4

    IB M J. RES. DEVELOP, VOL. 27 NO. 4 JULY 1983 J. M. WHITE AND G. D. ROH

  • 8/22/2019 10.1.1.89.6415

    11/12

    Original 120/240 pixels 240 pixels

    Figure 15 Binary images produced by Integrated Function Algo-rithm thresholding. Originals (left) were scanned at 240 pixels perinch, creating gray-level images of 128 by 512 pixels. Center images(1 20/24 0 pixels a re 120-pixel-per-inch binary images producedfrom 2 40-pixel-per-inch gray levels. Binary im ages on righ t (240pixels per inch) are produced from thesam e set of gray levelswithout the 2:l reduction feature.

    . . . ~:&.& .j m -.~ ~ 3 L . 5 4 7 R I : ~ ~ ~ C I O D ~ ~ . 3 ! . 5 $ l r - .. - r 0 0 p . o D ~ o ~ 5 0 # ~f

    ?

    Figure 16 Binary image of a scenic bank check . Document (origi-nal at top) was scanned at 240 pixels per inch creating a gray-levelimage of 660 by 1440 pixels. The binary image (below) produced bythe Integ rated Function A lgorithm is 120 pixels per inch.

    with the doub le search decision m atrix. Alth ough the bac k-ground contains marked changes n contrast, much of it hasbeen removed. Some of th ebackgro und, however, oescorrelate with normal stroke widths ands retained. This canbe readily seen in he upper right cornerf the image.SummaryTh e conversion of nonideal analog mage s nto digitized

    410 images isignificantroblem. Textcharacters)must e

    J. M.WHITE A N D G. D. ROHRER

    extracte d f rom unclea r b ackgro unds n a cost-effective man-ner. This paper has described two solutions to this problem.The Dyn amic Thresh old Algor ithm provides near-optimalperform ance and can be built with a very small amo unt ofhardware. The est results ndicate hat his algorithm isappr opri ate f or man y C IE pplications. By sacrificing somesimplicity, the Inte grated F unction Algor ithm is capable ofproducing m ore idealized outp ut from very busy or complexdocuments.Scen ic bank-check ima ges have been used toil lustrate the backgroundemoval capabilities.AcknowledgmentsWe emember with appreciation hesuppo rt of the ateGeorg Gaebelein in th e OC R activity, and we would like tothank co-inventors R. L. Melamud andJ. D. Nihart for the ircontributions o heDynamicThresholdAlgorithm ndG. A. Davidson for providing the softwar e tomodel and testthe Integrated Function concept. D. G. Abrah am providedassistance in the m odeling and esting of the ntegratedFunction Algorithm, andS. J. Skocz collected the images orFigs. 15 and 16. F. C. Mintzer reproduced Figs. 4, 5, and 9for us on an electronic photocomposer at the IBM Tho mas.Watson Research Center n Yorktown Heights, NY.References1. J. R. Hicks and J. C . Eby, Jr., Signal Processing Techniques inCommercially Available High-speed Optical Character eadingEquipment, J . SPIE (Society of Photo-Optical InstrumentationEngineers) 180 (Real-TimeSignal Processing 11), 205-216(1979).2. J. M . White, Recent A dvances in Thresholding Techniques forFacsimile, J . Appl. PhotographicEngr. 6, 2, 49-57 (Apr il1980).3. H. . Andrews, Digital Image Processing, IEEE Catalog No.EHO 133-9, Institute of Electrical and Electronics Engineers,New York, 1978.4. R. L. Melamud, J. D. Nihart,an d J. M. White, DynamicThreshold Device, U . S. aten t 4,345,314, August 17, 1982.5. Y .Yasuda, M. Dubois, and T.S.Huang, Data Compression forCheck Processing Machines, Proc. ZEEE 68,7, 874-885 (July1980).6. K. Y. Wong, Multi-function Auto Thresholding Algorithm,IBM Tech.DisclosureBull. 21, 7, 3001-3003 (December1978).7. R. E. Penny, Dynamic Threshold Settin g Circuit, IBM Tech.Disclosure Bull. 18,6 , 1962-1965 (November 1975).8 .R. 0. Duda and P. E. Hart, PatternRecognitionand SceneAnalysis, John W iley & Sons, Inc., New York, 1973, p. 271.9. P. T.Cahill , R. J. R. Knowles, 0. Tsen, T. Lowinger, andR. ouapinya,Evaluation of Edge Detection AlgorithmApplied to Nuc lear Medicine Images, Proceedings of the FifthInternational Conference on PatternRecognition, December1980, pp. 1296-130 0.

    Received November 18, 1982; revised February 18, 1983

    Gene D. Rohrer IBM Information roductsDivision, 1001W. T . HarrisBoulevard,Charlotte.NorthCarolina 28257. M r.Rohrer is manager of the Advanced Technology Department at the

    IB M J. RES. DEVELOP. VOL. 27 NO. 4 JU LY 1983

  • 8/22/2019 10.1.1.89.6415

    12/12

    Charlotte laboratory. The department is currently engaged in proj-ects for the banking industry. His previous experience includes thedevelopment of various recognition systems for IB M documentprocessors as well as the advanced development of image systems.He also worked on the design and developm ent of signal processingalgorithms and control systems for a large variety of electromechani-cal devices. Mr. Roh rer received a B.S. in electrical engineeringfrom NorthCarolinaSta te University in 1967 andan M.S. inelectrical engineering from Syracuse University in 1970. He is amember of Eta Kapp a Nu, Phi Kappa Phi, and the Institute ofElectrical and Electronics Engineers.James M. White IBM Information Products Division, 1001W. T . Harris Boulevard, Charlotte,North Carolina 28257. Dr .White is a member of the Advanced Technology Department in theCharlotte laboratory,where he is involved in the cap ture an d use of

    conventional and electronic images. He was gradua ted with a B.S. inphysics from the Georgia I nstitute of Technology in 19 67, obtainedan M.S. in applied physics from S tanford U niversity in 1969, spenttwo years at the U . S. Army electronics laboratory in FortMonmouth, New Jersey, and then returned to Stanford and com-pleted his Ph.D. thesis in 1973. He start edhis professional career a tth e IBM Thomas J. Watson Research Center, Yorktown Heights,New York, where he spent two years working on integrated optics,then made major contributions in the design and testing of aCCD/photodiode scanner array,a document capture system, agray-scale image processing algorithm , and th e application of newprinting technologies for images. In 1978 Dr . White transferred tothe Cha rlotte ocation and is currently working on projects related tocheck reader/sorters. Dr. White is a member of Tau Beta Pi andSigma Pi S igma, and a senior member of the Institu te of Electricaland Electronics Engineers.

    IBM J. RES. DEVELO P. VOL. 2 7 NO. 4 JULY 1983

    4

    J. M. WHITE A N D G. . ROH