Handwriting recogntion slides boeing

20
Language Technologies Institute Handwriting Recognition A Project of the Boeing/Carnegie Mellon Aerospace Data Analytics Lab Project members: Daniel Clothiaux Vivian Robison Tejashree Gharat Vipul Mascarenhas Project mentors: Dr. Ravi Starzl Dr. Barnabas Poczos

Transcript of Handwriting recogntion slides boeing

Page 1: Handwriting recogntion slides boeing

Language Technologies

Institute

Handwriting

RecognitionA Project of the

Boeing/Carnegie Mellon Aerospace Data Analytics Lab

Project members:

Daniel Clothiaux

Vivian Robison

Tejashree Gharat

Vipul Mascarenhas

Project mentors:

Dr. Ravi Starzl

Dr. Barnabas Poczos

Page 2: Handwriting recogntion slides boeing

Language Technologies

Institute

Contents At-a-Glance

• Task and Goals

• Approach

• Challenges

• Solutions

• Project Roadmap and Context

Page 3: Handwriting recogntion slides boeing

Language Technologies

Institute

The Task

• Handwriting recognition (HWR) and transcription

of airplane maintenance related work/job/task

cards and similar paper forms

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]

Page 4: Handwriting recogntion slides boeing

Language Technologies

Institute

The Goals

• Automatic form-type identification

• High-quality OCR / transcription of printed and

hand-written characters

• Association of content with proper data field

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]

Page 5: Handwriting recogntion slides boeing

Language Technologies

Institute

The Approach

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]

Page 6: Handwriting recogntion slides boeing

Language Technologies

Institute

The Approach

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]

Page 7: Handwriting recogntion slides boeing

Language Technologies

Institute

The Approach

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]

Page 8: Handwriting recogntion slides boeing

Language Technologies

Institute

IFE Turn Check Carried out saw DMC-R787-A 44-25-00-48A-300B-A ROV A1/ 01-Nov 2013. All Ops OK.Outfit Toolkit DACOSL 01 checked complete.

The Approach

IFE Turn Check Required. Outfit toolkit PACOSL 01 in use

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]

Page 9: Handwriting recogntion slides boeing

Language Technologies

Institute

The Approach

IFE Turn Check Carried out saw DMC-R787-A 44-25-00-48A-300B-A ROV A1/ 01-Nov 2013. All Ops OK.Outfit Toolkit DACOSL 01 checked complete.

IFE Turn Check Required. Outfit toolkit PACOSL 01 in use

Task Card Datastore

Subject

Action

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]

Page 10: Handwriting recogntion slides boeing

Language Technologies

Institute

The Challenges

Automatic form recognition and processing

• Form Identification

• Deskewing / Denoising

• Segmentation

Robust handwriting OCR

• Network Design

• DNN Overfitting

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]

Page 11: Handwriting recogntion slides boeing

Language Technologies

Institute

Form Processing

• Recognition by Convolutional Template Matching

• Minimizing L2 distance to template image with

rotation

and shearing shear x-axis shear y-axis

clockwise-rotation

Sum of absolute differences in pixel intensities Evaluate at each pixel in search image

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]

Page 12: Handwriting recogntion slides boeing

Language Technologies

Institute

OCR• Character recognition by Deep Neural Networks

*Example of LeNet Convolutional Neural Network

• Enough power for the task, but watch overfitting

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]

Page 13: Handwriting recogntion slides boeing

Language Technologies

Institute

OCRControl overfitting with organic data sets enhanced

by and generative writing engines

1.Boeing Data

2.NIST and public data

3.Font-based generation

4.RNN driven generation

Estimated ~2 billion examples

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]

Page 14: Handwriting recogntion slides boeing

Language Technologies

Institute

Error Analysis

kas 7 t

• Current system errors stem from

writing style differences.

• Additional data from Boeing will

help address the problem.

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]

Page 15: Handwriting recogntion slides boeing

Language Technologies

Institute

NIST Special Database 19

Handwriting Sample Forms

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR

• MIS(Multiple Image Set)

allows multiple images to

be stored together where

one or more images are

stored as a continuous

raster.

Work ]

Page 16: Handwriting recogntion slides boeing

Language Technologies

Institute

NIST Special Database 19

Handwriting Sample Forms

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR

Tasks:

1. Fetching the header and data from the

MIS file.

2. Decode the data (encoded via CCITT

4 compression technique)

3. Encode it again for the target file

format

4. Identify header and file chunks of the

target file format

5. Convert the MIS header and body

information to the new format and

write it to file

Work ]

MIS content Sample Header

Page 17: Handwriting recogntion slides boeing

Language Technologies

Institute

Vertical Projection for Character

Segmentation

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR Work ]

Example 1:

Example 2:

Page 18: Handwriting recogntion slides boeing

Language Technologies

Institute

Project Roadmap

Beta System

~1 Year

(task cards)

Deployment System

~2 Years

(multiple forms)

Style

Quantification

(Publication)

Balanced semi-

supervised

training set

(Publication)

Advanced Form

processing

and segmentation

for task cards

High quality

OCR for

Task Cards

Improved

generalizable

OCR for multiple

form types

Advanced Form

processing

and segmentation

for multiple form

types

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR Work ]

Page 19: Handwriting recogntion slides boeing

Language Technologies

Institute

Raw Data

(PDF / Image)

Text

Analysis

Parts

Inventory

Optimization

Sensor

Analysis

Handwriting

Recognition

Project Context

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR Work ]

Page 20: Handwriting recogntion slides boeing

Language Technologies

Institute

Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR Work ]

Thank you