Handwriting Recognition for Genealogical Records Luke Hutchison [email protected] FHT 2003.

29
Handwriting Recognition Handwriting Recognition for Genealogical Records for Genealogical Records Luke Hutchison Luke Hutchison [email protected] [email protected] FHT 2003 FHT 2003

Transcript of Handwriting Recognition for Genealogical Records Luke Hutchison [email protected] FHT 2003.

Page 1: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Handwriting RecognitionHandwriting Recognitionfor Genealogical Recordsfor Genealogical Records

Luke HutchisonLuke [email protected]@email.byu.edu

FHT 2003FHT 2003

Page 2: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Church Extraction EffortChurch Extraction Effort

• Nov 2002: Church released US 1880 and Canadian Nov 2002: Church released US 1880 and Canadian 1881 Census1881 Census

• 55 million names55 million names• 11 million man-hours11 million man-hours

• Granite Vault: contains 2.3 million rolls of microfilmGranite Vault: contains 2.3 million rolls of microfilm( = about 6 million 300-page volumes )( = about 6 million 300-page volumes )

• Approximate extraction time for one personApproximate extraction time for one person(based on the above census): (based on the above census): 280 years, 24/7280 years, 24/7

• We don't have that sort of timeWe don't have that sort of time• Need automated extraction: handwriting recognitionNeed automated extraction: handwriting recognition

Page 3: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Example Microfilm ImagesExample Microfilm Images

Page 4: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Handwriting RecognitionHandwriting Recognition

• Two different fields:Two different fields:• Online Handwriting RecognitionOnline Handwriting Recognition

Writer's pen movements capturedWriter's pen movements captured Velocity, acceleration, stroke order etc.Velocity, acceleration, stroke order etc. Style can be constrained (e.g. Graffitti gestures)Style can be constrained (e.g. Graffitti gestures)

• Offline Handwriting RecognitionOffline Handwriting Recognition Only pixelsOnly pixels Cannot constrain style (documentsCannot constrain style (documents

already written)already written)

• Offline is harder (less information)Offline is harder (less information)

• Genealogical records are all offlineGenealogical records are all offline Mary

Page 5: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Online Handwriting RecognitionOnline Handwriting Recognition

• Modern systems are moderately successful,Modern systems are moderately successful,• e.g. Microsoft Research's new Tablet PC:e.g. Microsoft Research's new Tablet PC:

Polynomial coefficients e.g. [0.94, 0.05, 0.29,...]

Page 6: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

OfflineOffline Handwriting Recognition Handwriting Recognition

• A difficult problemA difficult problem• Almost as many approaches as there are researchersAlmost as many approaches as there are researchers• e.g.e.g.

• Pattern RecognitionPattern Recognition• Statistical analysisStatistical analysis• Mathematical modellingMathematical modelling• Physics-based modellingPhysics-based modelling• Subgraph matching / graph searchSubgraph matching / graph search• Neural networks / machine learningNeural networks / machine learning• Fractal image compressionFractal image compression• ... (too many to list) ...... (too many to list) ...

Page 7: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Previous Work: OfflinePrevious Work: OfflineOnline ConversionOnline Conversion

• Finding contourFinding contour

• Finding midlineFinding midline

• Stroke ordering – difficult problemStroke ordering – difficult problem

Page 8: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

OfflineOfflineOnline Conversion ctd.Online Conversion ctd.• Especially difficult with genealogical records:Especially difficult with genealogical records:

• Stroke ordering: difficultStroke ordering: difficult

• Broken lines / blobs?Broken lines / blobs?

• Not practicalNot practical

Page 9: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Previous Work: Holistic MatchingPrevious Work: Holistic Matching

• Whole word is stretched to match known wordsWhole word is stretched to match known words

• Sources of variation compound across wordSources of variation compound across word

Page 10: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Previous Work: Sliding WindowPrevious Work: Sliding Window

• Narrow vertical window slides across wordNarrow vertical window slides across word• A state machine recognizes sequencesA state machine recognizes sequences

• Results good, but sensitive to noiseResults good, but sensitive to noise

Page 11: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Previous Work: ParascriptPrevious Work: Parascript

• Features detected & put in sequenceFeatures detected & put in sequence• Letters warped to best match sequence of featuresLetters warped to best match sequence of features

• Complex; sensitive to noiseComplex; sensitive to noise

Page 12: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Handwriting RecognitionHandwriting Recognition

• Some aspects of Handwriting Recognition:Some aspects of Handwriting Recognition:

• Segmentation problemSegmentation problem(can't read word until(can't read word untilit is segmented; can'tit is segmented; can'tsegment word until it is read)segment word until it is read)

• Different handwriting stylesDifferent handwriting styles

• Use of dictionary to correctUse of dictionary to correctfor errors in readingfor errors in reading

nr?

m?

Srnitb --> Smith

Page 13: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Thesis Approach: PreprocessingThesis Approach: Preprocessing

Outlines of word are traced and smoothed:Outlines of word are traced and smoothed:

Handwriting slope is corrected for automatically:Handwriting slope is corrected for automatically:

Page 14: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

SegmentationSegmentation

• Goal: robustly cut letters into segmentsGoal: robustly cut letters into segments• Match multiple segments to detect lettersMatch multiple segments to detect letters• Easier than matching whole letterEasier than matching whole letter

Page 15: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Dynamic Global SearchDynamic Global Search

• Assemble word spelling from possible letter readingsAssemble word spelling from possible letter readings

Best path: “Williarw Suwkino” (65% confidence)

Page 16: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Results (1)Results (1)

Page 17: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Results (2)Results (2)

Page 18: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Results (3)Results (3)

Page 19: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Results (4)Results (4)

In general: results even worse – system onlyworked well on words it was specifically trained on

Page 20: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

The Human Brain'sThe Human Brain'sVisual SystemVisual System

Retina

Page 21: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

The Human Brain'sThe Human Brain'sVisual SystemVisual System

Angular edge detectors

Retina

Page 22: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

The Human Brain'sThe Human Brain'sVisual SystemVisual System

Angular edge detectors

Retina

Line / curve detectors ... ... ...

Page 23: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

The Human Brain'sThe Human Brain'sVisual SystemVisual System

Angular edge detectors

Retina

Line / curve detectors

Feature detectors

... ... ...

Page 24: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

The Human Brain'sThe Human Brain'sVisual SystemVisual System

Angular edge detectors

Retina

Line / curve detectors

Feature detectors

... ... ...

Lateral inhibition

Feedback

Page 25: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

The Human Brain'sThe Human Brain'sVisual SystemVisual System

Angular edge detectors

Retina

Line / curve detectors

Feature detectors

Letter / word shape recognizers

... ... ...

Lateral inhibition

Feedback

J

Page 26: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

The Human Brain'sThe Human Brain'sVisual SystemVisual System

Angular edge detectors

Retina

Line / curve detectors

Feature detectors

Letter / word shape recognizers

... ... ...

Lateral inhibition

Feedback

J

Joseph

Page 27: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

ConclusionsConclusions

• Handwriting recognition is important for genealogy...Handwriting recognition is important for genealogy......but it is hard...but it is hard

• Current methods don't work very well...Current methods don't work very well......and they don't operate much like the human brain...and they don't operate much like the human brain

• Future work should focus on understanding the brain, Future work should focus on understanding the brain, and emulating it as much as possible, e.g. With:and emulating it as much as possible, e.g. With:• Hierarchical reasoningHierarchical reasoning• FeedbackFeedback• Lateral inhibitionLateral inhibition

Page 28: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.

Questions?Questions?

Luke HutchisonLuke [email protected]@email.byu.edu

Page 29: Handwriting Recognition for Genealogical Records Luke Hutchison lukeh@email.byu.edu FHT 2003.