Online Chinese Character Handwriting Recognition for Linux Presenter: Ran CHENG (Kelvin) Primary...

22
Online Chinese Online Chinese Character Character Handwriting Handwriting Recognition for Recognition for Linux Linux Presenter: Ran CHENG Presenter: Ran CHENG (Kelvin) (Kelvin) Primary Supervisor: Jim Primary Supervisor: Jim Hogan Hogan Associate Supervisor: Associate Supervisor: Jinhai Cai Jinhai Cai

Transcript of Online Chinese Character Handwriting Recognition for Linux Presenter: Ran CHENG (Kelvin) Primary...

Online Chinese Character Online Chinese Character Handwriting Recognition for Handwriting Recognition for

LinuxLinuxPresenter: Ran CHENG (Kelvin)Presenter: Ran CHENG (Kelvin)

Primary Supervisor: Jim HoganPrimary Supervisor: Jim Hogan

Associate Supervisor: Jinhai CaiAssociate Supervisor: Jinhai Cai

ContentContent

BackgroundBackground IntroductionIntroduction Related materialRelated material Handwriting Recognition SystemHandwriting Recognition System EvaluationEvaluation Future workFuture work

BackgroundBackground Why?Why?

Why handwriting?Why handwriting? One of most important input methodsOne of most important input methods

Why Chinese character?Why Chinese character? Potential Large marketPotential Large market One of the I18N goalsOne of the I18N goals

Why online?Why online? Only feasible runtime Input methodOnly feasible runtime Input method Frequently used Frequently used

Why Linux?Why Linux? Fast developing OSFast developing OS

Who?Who? Who is the sponsor?Who is the sponsor?

Redhat LinuxRedhat Linux What?What?

What will be the deliverables?What will be the deliverables? One handwriting software prototypeOne handwriting software prototype A feasible handwriting recognition algorithmA feasible handwriting recognition algorithm

IntroductionIntroduction Handwriting typesHandwriting types

OnlineOnline OfflineOffline SignatureSignature

The current online Chinese handwriting marketThe current online Chinese handwriting market Most are commercial, not open sourceMost are commercial, not open source Some existing open source, but not ChineseSome existing open source, but not Chinese

Aim:Aim: Online Handwriting recognition and recognition accuracyOnline Handwriting recognition and recognition accuracy Recognition for Chinese CharacterRecognition for Chinese Character Implementation of handwriting recognition algorithm Implementation of handwriting recognition algorithm

under Linuxunder Linux

Related materialRelated material

Hidden Markov Model (HMM)Hidden Markov Model (HMM) Chinese Character ProcessingChinese Character Processing

Hidden Markov Model (HMM)Hidden Markov Model (HMM)

What is HMM?What is HMM? Markov process with unknown parameters Markov process with unknown parameters challenge is to determine the hidden parameters challenge is to determine the hidden parameters

from the observable sequencefrom the observable sequence ExampleExample

Two people in different city {Bob, Carol}Two people in different city {Bob, Carol} Talk through the phoneTalk through the phone Weather and activitiesWeather and activities

{Sunny, Rainy, Cloudy} {Walk, Shopping, Cleaning}{Sunny, Rainy, Cloudy} {Walk, Shopping, Cleaning}

Chinese Character ProcessingChinese Character Processing

Character segmentationCharacter segmentation Pre-processingPre-processing Pattern RepresentationPattern Representation ClassificationClassification Context processingContext processing

Handwriting Recognition SystemHandwriting Recognition System

Writing padWriting pad Data collection, organization and formatData collection, organization and format Feature analysisFeature analysis Training state initialisation and optimisation Training state initialisation and optimisation Character recognition Character recognition

Writing padWriting pad

Basic functionsBasic functions Taking input from userTaking input from user

Data collectionData collection

42 Chinese characters for 43 strokes and 42 Chinese characters for 43 strokes and variationsvariations all the Chinese character strokesall the Chinese character strokes frequently used charactersfrequently used characters

From 5 different peopleFrom 5 different people 40 training examples for each character40 training examples for each character

Data organizationData organization

Data formatData format

Feature analysisFeature analysis

Character decompositionCharacter decomposition Each stroke is represented by Each stroke is represented by

5 states5 states State decompositionState decomposition

Each state contains statistic Each state contains statistic distribution probability of 16 distribution probability of 16 featuresfeatures

Training state initialisationTraining state initialisation

Observation segmentationObservation segmentation Feature distributionFeature distribution State TransitionState Transition

Training state optimisationTraining state optimisation

Viterbi algorithmViterbi algorithm

Training state optimisation (Continue)Training state optimisation (Continue)

Training state optimisation (Continue)Training state optimisation (Continue)

Observation Observation segmentationsegmentation

Feature distributionFeature distribution

State TransitionState Transition

Character recognitionCharacter recognition

1.1. Create a ranking list.Create a ranking list.2.2. Pick up a reserved input file as the observation file in the Pick up a reserved input file as the observation file in the

Viterbi algorithm.Viterbi algorithm.3.3. Pick up the distribution probability and transition probability Pick up the distribution probability and transition probability

files for a character stored in the database or file system.files for a character stored in the database or file system.4.4. Run the Viterbi algorithm and record the overall probability Run the Viterbi algorithm and record the overall probability

(we only used the overall path in the state transition (we only used the overall path in the state transition optimisation, and only use overall probability here).optimisation, and only use overall probability here).

5.5. According to the probability, insert the character at the proper According to the probability, insert the character at the proper position into the ranking list.position into the ranking list.

6.6. Repeat step 2 to 5 until no more character data is left in the Repeat step 2 to 5 until no more character data is left in the database or file system.database or file system.

EvaluationEvaluation

67% (56/84) of the 67% (56/84) of the characters are correctly characters are correctly recognised recognised

98.8% (83/84) of the 98.8% (83/84) of the character are recognised character are recognised in the top five positions in the top five positions

Future workFuture work

Writing Pad XInput supportWriting Pad XInput support Relative position handlingRelative position handling

For instance, “For instance, “ 工” 工” and “and “ 土”土” Duration handling Duration handling

For instance, “For instance, “ 士” 士” and “and “ 土”土”

Questions?Questions?

Thank youThank you