Handwriting Recognition CPSC 4600 @ UTC/CSE. Handprint Recognition aims to design systems which are...
Embed Size (px)
Transcript of Handwriting Recognition CPSC 4600 @ UTC/CSE. Handprint Recognition aims to design systems which are...
- Slide 1
Handwriting Recognition CPSC 4600 @ UTC/CSE Slide 2 Handprint Recognition aims to design systems which are able to recognize handwriting of natural language Methods and recognition rates depend on the level of constraints on handwriting. The constraints are mainly characterized by the: types of handwriting number of scriptors size of the vocabulary spatial layout. Handprint Recognition Slide 3 Methods and Strategies Recognition strategies heavily depends on the nature of the data to be recognized. In the cursive case, the problem is made complex by the fact that the writing is fundamentally ambiguous as the letters in the word are generally linked together, poorly written and may even be missing. On the contrary, hand printed word recognition is more related to printed word recognition, the individual letters composing the word being usually much easier to isolate and to identify. Slide 4 Character Recognition techniques can be classified according to two criteria: the way preprocessing is performed on the data the type of the decision algorithm Preprocessing techniques include : the use of global transforms (correlation, Fourier descriptors, etc.) local comparison (local density, intersections with straight lines, variable masks, etc.) geometrical or topological characteristics (strokes, loops, openings, diacritical marks, skeleton, etc.) Decision methods include: various statistical methods, neural networks, structural matching (on trees, chains, etc.) stochastic processing (Markov chains, etc.). Character Recognition Slide 5 Two main types of strategies have been applied to this problem: the holistic approach - recognition is globally performed on the whole representation of words and there is no attempt to identify characters individually. The main advantage of holistic methods is that they avoid word segmentation the analytical approach - deal with several levels of representation corresponding to increasing levels of abstraction (usually the feature level, the grapheme or pseudo-letter level and the word level). Words are not considered as a whole, but as sequences of smaller size units which must be easily related to characters in order to make recognition independent from a specific vocabulary Word Recognition Slide 6 Form-based Handprint Recognition National Institute of Standards and Technology (NIST) released to the public a standard reference form-based handprint recognition system for evaluating optical character recognition (OCR) in 1994. http://www.utc.edu/Faculty/Li- Yang/CPSC415/4-Handwriting/hsfsys2.pdf http://www.utc.edu/Faculty/Li- Yang/CPSC415/4-Handwriting/hsfsys2.pdf Slide 7 The NIST system is designed to read the hand printed characters written on a Handwriting Sample Forms (HSF). The form is designed to collect a large sample to handwriting to support handprint recognition research. NIST Special Database 19 (SD19) contains 3669 completed forms, each filled by a unique writer, and scanned binary at 11.8 pixels per millimeter. The dataset also contains over 800,000 segmented and labeled characters images from these forms. Form-based Handprint Recognition Slide 8 There is a blank form provided that can be printed, filled in, scanned and recognized. Slide 9 System Components Batch Initialization Load Form Image Register Form Image Remove From Box Isolate Lines of Handprint Segment Text Lines Normalize Characters Extract Feature Vectors Classify Characters Spell-correct Text Lines Slide 10 Batch Initialization Load pre-computed items from training A list of images files to be processed Coordinate locations of dominant form structures used for form registration Spatial template containing the coordinate location Basis functions used for feature extraction Neural network weights for classification Dictionaries for spelling correction Four types of fields: numeric, lowercase, uppercase, and preamble paragraph Each type of fields requires a separate set of basis functions and neural network weights. Slide 11 Register From Image To reliably isolate the handprint on a form Form registration automatically estimates the amount of rotation and translation in the image. Because most forms contains a fixed configuration of vertical and horizontal lines, we trace parallel ray across the image accumulating the number of black pixels along each ray. A range of ray angles are sample, the angles producing the maximum response is used to estimate the rotational skew. Slide 12 A prototypical from is scanned, its rotational distortion is automatically measured and removed, and the position of the detected dominant line s are stored for future registration. The image is the result of logically ORing corresponding pixels across a set of 500 registered images. Slide 13 Remove Form Box Given a field sub-image, black pixels corresponding to the handwriting must be separated from the black pixels corresponding to the form. We need locate the box within the field sub-image, and intelligently removes the sides so as to preserve overlapping characters. The sides of the box are detected using a run-based techniques that tracks the longest runs across the sub- image. Overlapping character stokes are identified using spatial cures, and only pixels that are distinctly part of the forms box are removed. Slide 14 Remove Form Box Slide 15 Isolate Lines of Handprint A A connected component is defined as the largest set of black pixels where each pixel is a direct neighbor of at least one other black pixel in the component. For multiple-line responses There are no lines provided within this paragraph box to guide a writer. Bottom-up approach to isolate the lines of handprint within a paragraph. Each component is represented by its geometric center. To reconstruct the handprinted lines of text, a nearest neighbor search is performed left-to-right and top-to- bottom through the system of 2-dimensional points. Slide 16 Isolate Lines of Handprint Slide 17 Segment Text Lines Connected components are used as first- order approximations to single and complete characters. Connected components frequently represent single characters and are computed very quickly. Errors occur when characters touch one another and when characters are written with disconnected strokes (naturally occurring with dotted letters). Slide 18 A simple adaptive model of writing style In a simple adaptive model of writing style, fragmented characters are reconstructed, multiple characters are split, and noise components are identified and discarded. M. D. Garris, Component-Based Handprint Segmentation Using Adaptive Writing Style Model, NIST Internal Report 5843, June 1996. Slide 19 Model Writing Style To To adapt to variations in handwriting style, one needs to be able to statistically capture how much black ink (or pixels) in an image is likely to constitute a single character. Two simple statistical features are measured from each isolated image of handwriting The estimated stroke width (esw) approximates the width of the lines comprising the characters. The estimate character height (ech) is to find the maximum height of all the connected components in the image. Standard stroke pixel (ssp) = square of one stroke width Standard stroke area (ssa) = Standard stroke area (ssa) = estimated stroke width * estimate character height Slide 20 If (component.area < (0.5 * ssa) then Noise where structure member (a) is the pixel area of the component (c) and ssa is the pixel area of a standard stroke width. If (component.width < (2 * esw)) && (component.height < (3 * esw)) then Dot where structure member (w) is the pixel width of the component (c) Slide 21 Characters that required the merging of connected components Slide 22 Multiple Character Detection Before one can split touching characters, one must be able to detect that multiple characters exist in a component image. a simple aspect ratio (ar) was tested. where w is the width of the component, and ech is the estimated character height for the field. The larger the width is to the height, the more likely the component contains multiple characters. A training set of single and touching character components was used to compute a range of aspect ratio samples, and a threshold was empirically derived. Slide 23 Multiple Character Detection standard stroke count (ssc) or ssc = p/ssa where p is the black pixel count of the component. Slide 24 Vertically Straight Cut An example of multiple characters A component determined to contain multiple touching characters must be further analyzed to derive a strategy for splitting the characters. Slide 25 Vertically Straight Cut Perpendicular distances are computed from the left and right feature points to the detector line and the larger of the two distances is stored along with the x-position of the vertical cut. By minimizing the maximum perpendicular distances across the range of cuts, the vertical cut is selected whose left and right pieces both contain maximal pixel data and both pieces qualify as single characters. Slide 26 Contoured Cut Path A single straight cut does not satisfactorily divide the component. In these cases, a more sophisticated non-straight path is required. Starting at the x-position of the optimal vertical cut, a search (or trace) is initiated from the top of the component downwards and from the bottom of the component upwards. The trace downwards (the top- trace) performs much like sand being dribbled down the side of a complex surface. Slide 27 Segment Text Lines Slide 28 Normalize Characters and Extract Feature Vectors The segmented character images vary greatly in size, slant, and shape. Image normalization is performed to deal with the size and slant of writing, leaving the recognition process primarily the task of differentiating characters by variation in shape. The Karhunen Love (KL) transform is applied to these binary pixel vectors in order to reduce dimensionality, suppr