00201844

4
Character Segmentation Techniques for Handwritten Text - Survey Christopher E. Dunn an d P. S . P. Wang College o f ComputerScience Northeas ern University Boston, MA 02166 U SA Abstract This paper is a survey of techniques for segmenting images of handwritten text into individual characters. The topic is broken into two categories: segmentation and segmentation-recognition echniques. Several approaches to each are outlined, and each is analyzed for its relevance to printed, cursive, on-line and off-line input data. 1 Introduction In the character recognition, the process of segm enting the data has become more important as recognition techniques have improved. The unconstrained natur e of handwritten text has become the next hurdle to overcome [ 14,161. Segmentation is needed since handwritten character frequently interfere with one another. Common ways in which characters can interfere include: overlapping, touching, connected, and intersecting pairs (Figure 1) [8]. An additional problem with printed text is the occurrence of broken characters such as multi-stroke characters, e.g. "5" and "t", as well as those that are broken by definition e.g. "it ' and "j". Overlapping Touching Connected Intersecting Figure 1 Cursive writing by definition is a connected sequence of characters, hence all characters will need to be segmented at one or more points. In addition to all the other problems of interference is the fact that the representat ion of cursiv e text is inherently ambiguous. For example, a "U" followed by an '5" is represented in a similar manner as "w". Compounding this ambiguity is the imprecision with which writers may form letters such and "m " and "n", where the top is concave as apposed to convex such that they appear to be "w" and "U" respectively. Another differentiation to make in handwritten data is whether the input has been acqu ired by on-line or off-line techniques. On-line data is acquired by any device that uniquely orders the stroke information, and makes segmentation easier since characters ar e usually written in a sequential order of strokes; excluding the cross on a "t" and the dot on top of an "i" which tend to come at the end of a word in cursive writing. Alternatively, off-line data is typically scanned from a previously written text image. If the actual stroke sequence is needed, then other techniques must be used to infer this information . Related to the above difficulty in separating characters, is also the problem of eliminating connecting s trokes and tails of characters which are extraneous to the characters themselves. Some segmentation t echniques will at tempt to identify these ligatures as separa te entit ies rather than part of the characters themselves. Finally, noise due to preprocessing techniques such as thresholding and thinning o f the input raster image must be eliminated. 2 Straight Segmentation Straight segmentation techniques are those which can be used as an individual component in a text analysis process, as opposed to in tegrated segmentation-recognition techniques which by design depend on the recognition process. This type of segmentati on is usuall y designed with rules that attempt to identify only and all character segm entation points. It is possible that any technique of this sort may be integrated with the recognition process as a verification of its success, but it usage does not depend on it. 2. 1 Region Finding One straight forward approach to segmentation of printed characters is to use a series of region finding, region grouping and splitti ng algorit hms. Region finding is a simple technique that identifies all disjoint regions in the image. We will assume that all the images discussed are unthinned, binary images unless otherwise stated. T he pixels are originally labeled ON/OFF, where ON signifies the data areas. To find regions, the image is examined 511 0-8186-2915-0/92 $3.00 Q 1992 IEEE

Transcript of 00201844

Page 1: 00201844

8/8/2019 00201844

http://slidepdf.com/reader/full/00201844 1/4

Page 2: 00201844

8/8/2019 00201844

http://slidepdf.com/reader/full/00201844 2/4

pixel by pixel until a ON value is found. Once found, it'slabeled with a new region number, and its neighbors aresearched for additional ON values. If a neighbor is ON,then it is given the same label, and the search proceeds toits neighbors. Thus the search proceeds recursively on theneighbors of ON pixels until no ON neighbors are found.

The algorithm then returns to its search of the entire inpu timage. The result is that all disjoint regions will beidentified and all pixels in any region will be labeled witha unique number. Region finding is sufficient to segmentcharacters which are overlap ping [5,9]. If the size of eachregion is calculated then very small regions which arenoise may be eliminated at this point.

2.2 Grouping Regions

In order to deal with broken characters or those whichhave separated parts, grouping procedures are applied. Onesimple grouping method is to calculate the smallestbounding box, that completely encloses each region. Iffor any two regions the bounding box of one region

completely encloses another region, then the enclosedregion is relabeled to the value of the enclosing region.Thus the resulting region is composed of two disjointsub-regions. [7] This is helpful for connecting regionsthat have been separated due to noise induced bythresholding procedures which transform grey lev el imagesto binary ones. An enhan cement to this method is toallow a given percentage of the enclosed region to beoutside the bounding box of the enclosing region.

Besides a simple enclosure grouping, other regionsmay be grouped by proximity operators. A proximityoperator is set of bounding boxes where each box isempirically determined based on relative position ofindividu al pen strokes in a character class. For examp le,the character "5" can be broken into two regions whereone is the top horizontal line, and the second region is therest of the character. These two regions correspo nd to thetwo strokes of a pen which are typically made whenconstructing the character. The character "5" is frequentlyconstructed such that the two strokes are not touching. Aproximity operator for the character class "5", would be aset of two bound ing boxes, such that when overlaid withthe two regions of a broken "5 " the regions would beenclosed by the correspondin g bounding boxes.Proximity operators can be constructed by overlayingseveral test images of a character class, then computingthe bounding box of the composite image of each sub-region. Matching proximity operators can be done byaligning the input regions and proxim ity operators via thecentroid of the two sets. As with the simpler groupingprocedure, the matching may be done on the basis of a

threshold, corresponding o the percentageof enclosed areafor each region and boun ding box pair. [5]

2.3 Splitting Regions

Touching characters mean that some regions willrequire splitting. This is typically done by firstidentifying the maxima and m inima of a contour along thebottom and top of the region respectively. The region is

then split by describing a path from a minima point to amaxima point, where the minima and maxim a are alignedvertically within some threshold distance. If the twopoints are connected by a single solid area, then thesplitting path can be made by bisecting this region.When the minima and m axima don't align properly, thenone must be ch osen and a cu t is made vertically throughthe adjacent solid area [2,5]. An estimate of the averagesize of a single character can be used to identify whichregions are candidates for splitting.

An alternative method for splitting regions has beentried on connected pairs of handwritten digits. Thismethod first bisects the region with a straight horizontalline. Then the points at which the line crosses theregions' data are calculated. If an even number are found,

then the split is started at a point m idway between the twomiddle crossing points. The split follows the middle ofthe non-data area in both an upward and downwarddirection. If a data region is found during the split, thenit's cut vertically until a non-da ta area is again found. Thesplit finishes at the top and bottom of the regions'bounding box. With digits there will be only one or twocrossings of the horizontal line, so the starting point willbe accurate if the re are an even number of crossings, and itis known that the region is comprised of two characters.For an odd num ber of crossings, the roughness of the leftand right profile of the region is inspected. Regions thathave a rough profile indicate the number of crossings thatthe digit may have. This is used to decide the startingpoint for an odd number of crossings. This splittingtechnique can be considered a context d irected technique

since it relies on specific information about the characterdomain, such as the roughness profiles and the number ofvertical lines in digits. [1 2,13]

2.4 Tail Removal

Extraneous stroke tails may need to be elim inated fromregions since they may produce errors in the subsequentrecognition process. This can be done by calculating thevertical density (vertical histogram) of pixels over theinput region. Tails will be indicated by low densitysections connected to higher density sections. Heuristicsbased upon the maximum likely tail length given acharacter set, are then devised to truncate the identifiedtails [5].

578

Page 3: 00201844

8/8/2019 00201844

http://slidepdf.com/reader/full/00201844 3/4

2.5 Presegmentation

An alternative to splitting regions as described above,where the rules attempt to find only and all segmentationpoints, is to use a simpler algorithm for finding all"possible" segmentation points with the intention of

identifying which are the actual segmentation points at alatter time. This is refered to as presegmentation, and itallows two advantages. One is that most presegmentationalgorithms overestimate the possible segm entation pointsso that points tend not to be missed as with other directsegmentation methods . The other, is thatpresegmentation leads to schemes where evidence for theconfidence value of each segmentation hypothesis(presegmentation point) is successively gained throughlatter steps in the segmen tation process. The result is amore robust segm entation process, since several rules mayact in concert to increase or decrease the confidence of asegmentation point rather than a single rule or algorithmas was previously de scrik d.

Presegmentation may be accomplished in se veral ways.With printed characters there is a high probability thatconnected characters will either intersect one another witha four or three-way junctio n, or touch each other, with thestrokes forming a high degree of curvature where theymeet. Presegmen tation will consist of identifying allsuch junctio ns, or points of high curvature. Notice thatthe character "4" as both these types of features and willbe presegmented with the understanding that later rulesmust throw out these segmentation hypotheses. In thecase of junction points, standard algorithms for thiningthe data will have the byproduct that junction points areidentified. [9] Points of high curvature are found bycalculating the derivative of the slope of the input lines,then finding the minima or maxima outside of anempirically determined threshold value. [4,10,113

2.6 Rule Based Methods

One way to verify presegmentation points is todevelop heuristics based on the struc ture of cha racters inthe domain of interest and the possible ways they mayinterfere. This has been don e for digits [8]. For example,if two three-way junctions are connected vertically by asingle line, then the line is said to be common to twodigits. On the other hand, if the line conne cting the three-way junctions is horizo ntal and near the top of the imag e,then it is said to be a connecting ligature. However, theserule based heuristics may be difficult or impossible todesign for very large sets of character classes.

3 Segmentation-Recognition Techniques

Unlike straight segmentation techniques, segmentation-recognition techniques rely on character recognitionmethods to alter the confidence values of segmentation

hypoth eses. Cursive text requ ires this type ofsegmentation due to the inherent ambiguity found whenletters are juxtaposed. This ambiguity derives primarilyfrom the fact that script letters are connected, and, inEnglish script many similar strokes are shared amongcharacters. This makes cursive text unsuitable for straightsegmentation techniques. Segmentation-recognition hastwo common properties among its variants. First, theinput word is presegme nted into segmentation hypothesessuch that it is highly probable that all the truesegmentation points between c haracters are accoun ted for.Second , subsets of se gmen tation hypoth eses are searchedto find the optimal set of segmentation points, based onstroke or character recognition information .

3.1 Elastic Matching

One of the first types of segmentaton-recognitionstrategies to be investigated was elastic matching of on-line script. [15] In this process text da ta is gathered via anon-line device and encoded as a series of de screte vectors ofeven length. The ends of the vectors are the

presegmentation points. Elastic matching compares thevector sequence against the vector information of characterprototypes. A distance metric is formulated to gauge thedifference between the prototype characters and the inputvector sequence. This metric takes into account thedifference in relative position and slope of the vecto rs, andis summed over all matched vector pairs. It's termedelastic matching since the prototype vectors are notrequired to be mapped in a one-to-one relationship withthe input vectors. In the case of an input vector sequencecorresponding to one character, the elastic matchingprocess is a search to find the optimum mapping betweenthe inpu t and each character prototype based on thedistance metric. To optimize the search, a threshold is setsuch that distance of all partial mappings during th e searchmust be below the threshold or else they are pruned from

the search process. In general, since a mapping between acharacter prototype and the input text may use only aportion of the input, then the remaining input may beused to match against another character prototype. In thisway the input is mapped against all possiblecombinations of prototype characters, and with allpossible segmentation points between the input vectors.Further optimization can be done by designing heuristicsbased on the average size of an input character to reducethe number of mappings searched.

3.2 Presegmentation at Higher Structural Levels

Most other techniques for segmentation-recognition,attempt to presegment the input at a higher structural

level. One such technique for cursive writing uses theminima along the input contour. First the input is slantcorrected so that strokes are alligned vertically. Then twohorizontal reference lines are found such that they allign

579

Page 4: 00201844

8/8/2019 00201844

http://slidepdf.com/reader/full/00201844 4/4

with the top and bottom of the sm all lowerc ase letters (eg.a,c,e,i,m,n,o,r,s,u,v,w,x). Next, the local minima of thecontour are found w hich lie inside these reference lines.This allows only the connecting strokes to be used asminima, since these allways fall within the region definedby the reference lines. For each minima found, thosewhich have an additional stroke directly above are thrown

out. This eliminates presegmentation in the middle ofsome letters (eg. a,c,o,s,x). The slant correction doneearlier allows this filtering to be performed more easily bynow only searching vertically from the minima point.Other points may be thrown out based on their proximityrelative to each other, thus allowing only onepresegmentation point per connecting ligature. [1,4]

After presegmentation, the text input is represented as asequen ce of text sections. Letters are then hypothesize dover all subsequences of text sections via somerecognition procedure. Then th e possible letter sequencesare searchedas abov e with a best-first strategy and pruningof letter sequences whose cost value is above anempirically determined threshold. The above method hasbeen used fo r off-line cursive script.

Recognition of presegmented text is sometimes themost computationally intensive part of the overall textrecognition process, while searching for the appropriatecombination of recognized segments is less expcnsive. In

this case, reducing the complexity of segment recogniLionwill have the most effect to reducing the overallcomplexity. This is one mo tivation for presegmenting atthe stroke level. This has been done with on-line printedtext [ 6 ] , s well as with off-line cursive text [3]. On-lineprinted text is naturally presegmented by the input processwhich includes pen-up and pen-down information. Foron-line script local minima and maxima, and junctionpoints can be used. The segments of curvature betweenpresegmentation points are matched to a set of basicstroke types via an aliignment procedure. The allignmentprocess yields a "goodness of fit" value between the

prototype stroke and the curve segment. Search is thenperformed to find the optimum combination of strokessuch that they form a sequence or characters.

4 Conclusions

Straight segmentation is the technique of form ing rulesto identify members of a ch aracter set without identifyingtheir specific classification. It is useful for printedcharacter sets but will not work for cursive text. Theprimary advantage of straight segmentation is that it

greatly reduces the complexity of search for a wordhypothesis since the character boundaries are pre-determined . How ever, this type of segmenlatim issubject to error even in the case of printed letters.

Segmentation-recognition trategies are more expensivedue to the increased complexity of search for findingoptimum word hypotheses. However, the inherent

ambiguity of cursive text requires this type ofsegmentation,

References

131

141

151

R. M. Bozinovic and S . N. Srihari, Off-Line C ursiveScript Word Recognition, IEEE PAMI , 11(1):68-83, Jan.

1 9 8 9 .E. Cohen, J. J. Hull and S . N. Srihari, Reading andUnderstanding Handwritten Addresses, Proc. USPS Adv.Tech. Conf . , pp. 822-836, 1990.

S . Edelman, T. Flash and S . Ullman, Reading CursiveHandwriting by Alignment of Letter Prototypes,International Journal of Computer Vision, 5(3):303-331, 1990.

J. T. Favata and S . N. Srihari, Recognition ofHandwritten Words for Address Reading, Proc. USPSAdv. Tech. Conf, pp. 191-205, 1990.

R. Fenrich, and S . Krishnamoorthy, Segmenting DiverseQuality Handwritten Digit Strings in Near Real-Time,Proc. USPS Adv. Te ch. Conf . , pp. 523-537, 1990.

T. Fujisaki, T. E. Chefalas, J. Kim, C. C. Tappert and C.G. Wolf, O nline Run -on Character Recognizer: Design

and Performance, IE M Research Report, 1990.E. Mandler, Advanced Preprocessing Technique for On-Line Recognition of Handprinted Symbols, ComputerRecognition and Human Poduction of Handwriting, Eds.R. Plamondon, C. Y. Suen and M. L. Simmer, WorldScientific, pp. 19-36, 1989.

B. T. Mitchell and A. M. Gillies, A Model-BasedComputer V ision System for Recognizing HandwrittenZIP Codes, Machine Vision and Applications, 2:231-243, 1989.

T. Pavlidis, Algorithms for Graphics and ImageProcessing, Computer Science Press, Rockville MD, pp.129-214, 1982.

R. Plamondon and P. Yergeau, A Sy stem for theAnalysis and Synthesis of Handwriting, Proc. Int.Workshop on Frontiers in Handwriting Recognition,

Chateau de Bonas, France, pp. 167-179, 1991.L. R. B. Schomaker and H. Teulings, A HandwritingRecognition System Based on the Properties andArchitectures of the Human Motor System , Proc.Int.Workshop on Frontiers in Handwriting Recognition,Chateau de Bonas, France, pp. 195-211, 1991.

M. Shridhar and A. Badreldin, Recognition of Isolatedand Simply Connected Handwritten Numerals, PatternRecogni t ion, 19 (1): 1- 12, 1986.

M. Shridhar and A. Badreldin, Context-DirectedSegmentation Algorithm for Handw ritten NumericalStrings, Image and Vision Computing, 5(1):3-9, Feb.1 9 8 7 .

C. Y. Suen, Ed., Frontiers in Handwriting Re cogn ition,Concordia Univ. Press, Montreal 1990.

C. C. Tappert, Cursive Script Recognition by Elastic

Matching, IBM J. Res. Development, 26(6):765-771,Nov. 1982.

P. S . P. Wang (Ed.), Character and HandwrittingRecognition - Expanding Frontiers, World Scientific,Teaneck, NJ , 1991.

58 0