Tablet PC Capstone CSE 481b Microsoft’s Cursive Handwriting Recognizer Jay...

download Tablet PC Capstone CSE 481b Microsoft’s Cursive Handwriting Recognizer Jay Pittman and the entire Microsoft Handwriting Recognition Research and Development

of 29

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Tablet PC Capstone CSE 481b Microsoft’s Cursive Handwriting Recognizer Jay...

  • Slide 1
  • Tablet PC Capstone CSE 481b Microsofts Cursive Handwriting Recognizer Jay Pittman and the entire Microsoft Handwriting Recognition Research and Development Team
  • Slide 2
  • Tablet PC Capstone CSE 481b Agenda Neural Network Review Basic Recognition Architecture Language Model Personalization Error Reporting New Languages
  • Slide 3
  • Tablet PC Capstone CSE 481b Handwriting Recognition Team An experiment: A research group, but not housed in MSR Positioned inside a product group Our direction and inspiration come directly from the users This isnt for everyone, but we like it A dozen researchers Half with PhDs Mostly CS, but 1 Neuroscience, 1 Chemistry, 1 Industrial Engineering, 1 Speech Roughly half neural network researchers With various other recognition technologies
  • Slide 4
  • Tablet PC Capstone CSE 481b Neural Network Review Directed acyclic graph Nodes and arcs, each containing a simple value Nodes contain activations, arcs contain weights Activations represent soft booleans; range from 0.0 to 1.0 Weights represent excitatory and inhibitory connections; roughly symmetric about 0 At run-time, we do a forward pass which computes activation from inputs to hiddens, and then to outputs From outside, app only sees input nodes and output nodes 1.0 0.0 0.6 1.0 0.8 0.1 1.4 -0.80.7 -2.3 0.0 -0.1 Inputs Hiddens Outputs
  • Slide 5
  • Tablet PC Capstone CSE 481b Neural Network Forward Pass 1.0 0.0 0.6 1.0 0.8 0.1 1.4 -0.80.7 -2.3 0.0 -0.1 Inputs Hiddens Outputs act = F((in weight) + bias) F(X) = 1 e + 1 -x logistic function Features computed from ink Probability estimates of letters
  • Slide 6
  • Tablet PC Capstone CSE 481b Neural Network Training Start with a fixed architecture, and a random set of weights Iterate randomly through training samples For each training sample, do forward pass, and compute error of each output (size and direction) Compute what change in individual weights (size and direction) would lead to reducing each output error Reduce the change to a small fraction Repeat this walk through the training samples over and over, in different random orders 1.0 0.0 0.6 1.0 0.8 0.1 1.4 -0.80.7 -2.3 0.0 -0.1 Inputs Hiddens Outputs
  • Slide 7
  • Tablet PC Capstone CSE 481b C Example Forward Pass float Activations[]; float Biases[]; float Weights[]; float Inputs[]; float Logistic(float in) { return 1.0 / ((float)exp((double)-in) + 1.0); } void Forward(LAYER *pLayer) { int i; for (i = 0; i cActivations ; i++) { int j; float in = pLayer->Biases[i]; for (j = 0; j cInputs ; j++) { in += pLayer->Inputs[j] * pLayer->Weights[i][j]; } pLayer->Activations[i] = Logistic(in); } LAYER: (all squares are floats) int cActivations; int cInputs;
  • Slide 8
  • Tablet PC Capstone CSE 481b TDNN: Time Delayed Neural Network item 2item 3item 1 item 5 item 6 item 4 item 1 This is still a normal back-propagation network All the points in the previous several slides still apply The difference is in the connections Connections are limited Weights are shared The input is segmented, and the same features are computed for each segment Small detail: edge effects For the first two and last two columns, the hidden nodes and input nodes that reach outside the range of our input receive zero activations
  • Slide 9
  • Tablet PC Capstone CSE 481b Segmentation midpoints going up tops bottoms tops and bottoms
  • Slide 10
  • Tablet PC Capstone CSE 481b Training We use back-propagation training We collect millions of words of ink data from thousands of writers Young and old, male and female, left handed and right handed Natural text, newspaper text, URLs, email addresses, numeric values, street addresses, phone numbers, dates, times, currency amounts, etc. We collect in more than two dozen languages around the world Training on such large databases takes weeks We constantly worry about how well our data reflect our customers Their writing styles Their text content We can be no better than the quality of our training sets And that goes for our test sets too We are teaching the computer to read
  • Slide 11
  • Tablet PC Capstone CSE 481b Recognizer Architecture 88868226357 4446157 23 92 31 5194720 711252879 13 53 18 79 28576 13 81 82143 17 5743 90 7 16 57 914415 Output Matrix dog68 clog57 dug51 doom42 divvy37 ooze35 cloy34 doxy29 client22 dozy13 Ink Segments Top 10 List d 00 a 00 b 00 c 00 o 09 a 73 l 07 t 5 g 68 t 8 b 6 o 12 g 57 t 12 TDNN a b d o g a b t t c l o g t Lexicon e a Beam Search a b d e g h n o 4 5 3 90 12 4 14 7
  • Slide 12
  • Tablet PC Capstone CSE 481b Maintaining Ambiguity TDNN does NOT tell us which letter it is At least not in a definite answer Instead it tells us probability estimates for each and every character that it might be The same shape might be different pieces of different letters It is important to keep all theories alive for now So we can decide later, after we add in more information from the language model I suppose maintaining ambiguity is a euphemism for procrastinating
  • Slide 13
  • Tablet PC Capstone CSE 481b Error Correction: SetTextContext() Dictum Left Context Right Context Dict d 100 a 0 b 0 c 0 i 100 e 0 t 100 n 5 c 100 a 0 i 85 a 57 o 72 User writes Dictionary Recognizer misrecognizes it as Dictum User selects um and rewrites ionary TIP notes partial word selection, puts recognizer into correction mode with left and right context Beam search artificially recognizes left context Beam search runs ink as normal Beam search artificially recognizes right context This produces ionary in top 10 list; TIP must insert this to the right of Dict 1. 2. 3. 4. 5. 6. 7. Goal: Better context usage for error correction scenarios
  • Slide 14
  • Tablet PC Capstone CSE 481b Language Model We get better recognition if we bias our interpretation of the output matrix with a language model Better recognition means we can handle sloppier cursive You can write faster, in a more relaxed manner The lexicon (system dictionary) is the main part But there is also a user dictionary And there are regular expressions for things like dates and currency amounts We want a generator We ask it: what characters could be next after this prefix? It answers with a set of characters We still output the top letter recognitions In case you are writing a word out-of-dictionary You will have to write more neatly
  • Slide 15
  • Tablet PC Capstone CSE 481b Lexicon a b d o g a b t t c l o g t e a olo r ur s s naly s z e e r r s s d d s s US UK A AA CC C A C 4125 4098 A C A C the 952 ater 3606 US s 4187 US re THC s 3463 4125 3159 3354 US UK A C 1234 u s Simple node Leaf node (end of valid word) U.S. only U.K. only Australian only Canadian only Unigram score (log of probability) walking ru nn UK A C
  • Slide 16
  • Tablet PC Capstone CSE 481b Offensive Words The lexicon includes all the words in the spellchecker The spellchecker includes obscenities Otherwise they would get marked as misspelled But people get upset if these words are offered as corrections for other misspellings So the spellchecker marks them as restricted We live in an apparently stochastic world We will throw up 6 theories about what you were trying to write If your ink is near an obscene word, we might include that Dilemma: We want to recognize your obscene word when you write it Otherwise we are censoring, which is NOT our place We DONT want to offer these outputs when you dont write them Solution (weak): We took these words out of the lexicon You can still write them, because you can write out-of-dictionary But you have to write very neat cursive, or nice handprint Only works at the word level Cant remove words with dual meanings Cant handle phrases that are obscene when the individual words are not
  • Slide 17
  • Tablet PC Capstone CSE 481b Regular Expressions Many built-in, callable by ISVs, web pages Number, digit string, date, time, currency amount, phone number Name, address, arbitrary word/phrase list URL, email address, file name, login name, password, isolated character Many components of the above: Month, day of month, month name, day name (of week), year hour, minute, second Local phone number, area code, country code First name, last name, prefix, suffix street name, city, state or province, postal code, country None: Yields an out-of-dictionary-only system (turns off the language model) Great for form-filling apps and web pages Accuracy is greatly improved Use SetFactoid() or SetInputScope() This is in addition to the ability to load the user dictionary One could load 500 color names for a color field in a form-based app Or 8000 drug names in a prescription app On 2000 stock symbols
  • Slide 18
  • Tablet PC Capstone CSE 481b Regular Expressions A simple regular expression compiler is available at run time ISVs can add their own regular expressions One could imagine the DMV adding automobile VINs Blood pressure example: (!IS_DIGITS)/(!IS_DIGITS) p(!IS_DIGITS) Latitude example: (!IS_DIGITS)((!IS_TIME_MINORSEC)((!IS_TIME_MINORSEC))+)+ (N|S)
  • Slide 19
  • Tablet PC Capstone CSE 481b Default Factoid Used when no factoid is set Intended for natural text, such as the body of an email Includes system dictionary, user dictionary, hyphenation rule, number grammar, URL grammar All wrapped by optional leading punctuation and trailing punctuation Hyphena