Word Prediction in Hebrew Preliminary and Surprising Results
description
Transcript of Word Prediction in Hebrew Preliminary and Surprising Results
![Page 1: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/1.jpg)
August 6th ISAAC 2008
Word Prediction in Hebrew
Preliminary and Surprising ResultsYael Netzer
Meni AdlerMichael Elhadad
Department of Computer ScienceBen Gurion University, Israel
![Page 2: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/2.jpg)
August 6th ISAAC 2008
Outline • Objectives and example.• Methods of Word Prediction• Hebrew Morphology• Experiments and Results• Conclusions?
Outline
![Page 3: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/3.jpg)
August 6th ISAAC 2008
Word Prediction - Objectives
• Ease word insertion in textual software – by guessing the next word– by giving a list of possible options for the
next word– by completing a word given a prefix
• General idea: guess the next word given the
previous ones[Input w1 w2] [guess w3]
Objectives
![Page 4: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/4.jpg)
August 6th ISAAC 2008
(Example)I s_____
Word Prediction Example
![Page 5: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/5.jpg)
August 6th ISAAC 2008
(Example)I s_____ verb, adverb?
Word Prediction Example
![Page 6: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/6.jpg)
August 6th ISAAC 2008
(Example)I s_____ verb
sang? maybe. singularized? hopefully
Word Prediction Example
![Page 7: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/7.jpg)
August 6th ISAAC 2008
(Example)I saw a _____
Word Prediction Example
![Page 8: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/8.jpg)
August 6th ISAAC 2008
(Example)I saw a _____ noun / adjective
Word Prediction Example
![Page 9: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/9.jpg)
August 6th ISAAC 2008
(Example)I saw a b____
Word Prediction Example
![Page 10: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/10.jpg)
August 6th ISAAC 2008
(Example)I saw a b____ brown? big? bear?
barometer?
Word Prediction Example
![Page 11: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/11.jpg)
August 6th ISAAC 2008
(Example)I saw a bird in the _____
Word Prediction Example
![Page 12: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/12.jpg)
August 6th ISAAC 2008
(Example)I saw a bird in the _____ [semantics will
do good]
Word Prediction Example
![Page 13: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/13.jpg)
August 6th ISAAC 2008
(Example)I saw a bird in the z____
Word Prediction Example
![Page 14: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/14.jpg)
August 6th ISAAC 2008
(Example)I saw a bird in the z____ obvious (?)
Word Prediction Example
![Page 15: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/15.jpg)
August 6th ISAAC 2008
Statistical Methods• Statistical information
– Unigrams: probability of isolated words• Independent of context, offer the most likely
words as candidates – More complex language models (Markov
Models)• Given w1..wn, determine most likely candidate for
wn+1
– Most common method in applications is the unigram (see references in [Garay-Vitoria and Abascal, 2004])
Word Prediction Methods
![Page 16: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/16.jpg)
August 6th ISAAC 2008
Syntactic Methods• Syntactic knowledge
– Consider sequences of part of speech tags[Article] [Noun] predict [Verb]
– Phrase structure[Noun Phrase] predict [Verb]
– Syntactic knowledge can be statistical or based on hand-coded rules
Word Prediction Methods
![Page 17: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/17.jpg)
August 6th ISAAC 2008
Semantic Methods• Semantic knowledge
– Assign semantic categories to words – Find a set of rules which constrain the
possible candidates for the next word• [eat verb] predict [word of category food]
– Not widely used in word prediction, mostly because it requires complex hand coding and is too inefficient for real-time operation
Word Prediction Methods
![Page 18: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/18.jpg)
August 6th ISAAC 2008
Word Prediction Knowledge Sources
• Corpora: texts and frequencies• Vocabularies (Can be domain specific)• Lexicons with syntactic and/or semantic
knowledge• User’s history • Morphological analyzers• Unknown words models
Word Prediction Methods
![Page 19: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/19.jpg)
August 6th ISAAC 2008
Evaluation of Word Prediction
• Keystroke savings• Time savings • Overall satisfaction
– Cognitive overload (length of choice list vs. accuracy).
• A predictor is considered adequate if its hit ratio is high as the required number of selections decreases.1-(# of actual keystrokes/# of expected keystrokes)
Word Prediction Evaluation
![Page 20: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/20.jpg)
August 6th ISAAC 2008
Work in non-English Languages
• Languages with rich morphology:– n-gram-based methods offer quite reasonable
prediction [Trost et al. 2005] but can be improved with more sophisticated syntactic/semantic tools
• Suggestions for inflected languages (e.g. Basque)– Use two lexicons: stems and suffixes– Add syntactic information to dictionaries and
grammatical rules to the system, offer stems and suffixes
– Combine these two approaches: offer inflected nouns.
Hebrew Word Prediction
![Page 21: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/21.jpg)
August 6th ISAAC 2008
Motivation for Hebrew
• We need word prediction for Hebrew– No known previous published research for
Hebrew.
• We wanted to test our morphological analyzer in a useful application.
Hebrew
![Page 22: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/22.jpg)
August 6th ISAAC 2008
Initial Hypothesis
Word prediction in Hebrew will be complicated,
morphological and syntactic knowledge will be
needed.
![Page 23: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/23.jpg)
August 6th ISAAC 2008
Hebrew Ambiguity• Unvocalized writing: most vowels are “dropped”
inherent inhrnt • Affixation: prepositions and possessives are
attached to nounsin her note inhrntin her net inhrnt
• Rich Morphology– ‘inhrnt’ could be inflected into different forms
according to sing/pl, masc/fem properties. inhrnti, inhrntit, inhrntiot
– Other morphological properties may leave ‘inherent’ unmodified (construct/absolute forms for noun compounding).
Hebrew
![Page 24: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/24.jpg)
August 6th ISAAC 2008
Ambiguity Level• These variations create a high level of ambiguity:
– English lexicon: inherent inherent.adj– With Hebrew word formation rules:
inhrnt in.prep her.pro.fem.poss note.noun in.prep her.pro.fem net.noun inherent.adj.masc.absolute inherent.adj.masc.construct
• Parts of speech tagset:– Hebrew: Theoretically: ~300K, In practice: ~3.6K distinct
forms– English: 45-195 tags
• Number of possible morphological analyses per word:– English: 1.4 (Average # words / sentence: 12)– Hebrew: 2.7 (Average # words / sentence: 18)
Hebrew
![Page 25: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/25.jpg)
August 6th ISAAC 2008
(Real Hebrew) Morphological Ambiguity
• bzlm בצלם– bzelem (name of an association) בצלם– b-zalem (while taking a picture) בצלם– bzalam (their onion) בצלם– b-zila-m (under their shades) בצלם– b-zalam (in a photographer) בצלם– )ba-zalam (in the photographer בצלם– )b-zelem (in an idol בצלם– )ba-zelem (in the idol בצלם
Hebrew Morphology
![Page 26: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/26.jpg)
August 6th ISAAC 2008
Morphological AnalysisGiven a written form, recover the following
information:• Lexical category (part-of-speech)
– noun, verb adjective, adverb, preposition…• Inflectional properties
– gender, number, person, tense, status…• Affixes
– Prefixes: מ ש ה ו כ ל ב (prepositions, conjunctions, definiteness)
– Pronoun suffix: accusative, possessive, nominative
Hebrew Morphology
![Page 27: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/27.jpg)
August 6th ISAAC 2008
Morphological AnalysisExample: given the form בצלם propose the following
analyses:• בצלם
– proper-noun בצלם• בצלם
– verb, infinitive בצלם• בצלם
– noun, singular, masculine בצל-ם• בצלם
– noun, singular, masculine ב-צל-ם• בצלם בצלם
– noun, singular, masculine, absolute ב-צלם– noun, singular, masculine, construct ב-צלם
• בצלם בצלם – noun, definitive singular, masculine ב-צלם
Hebrew Morphology
![Page 28: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/28.jpg)
August 6th ISAAC 2008
Morphological Disambiguation
A difficult task in Hebrew:
Given a written form, select in context the correct morphological analysis out of all possible analyses.
We have developed a successful* system to perform morphological disambiguation in Hebrew [Adler et al, ACL06, ACL07, ACL08].
*93% for POS tagging and 90% for full morphology analysis, which was used in this test)
Hebrew Morphology
![Page 29: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/29.jpg)
August 6th ISAAC 2008
Word Prediction in Hebrew• We looked at Word Prediction as a
sample task to show off the quality of our Morphological Disambiguator
• But first… we checked a simple baseline
Hebrew Word Prediction
![Page 30: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/30.jpg)
August 6th ISAAC 2008
Baseline: n-gram methods• Check n-gram methods (unigram,
bigram, trigram)• Four sizes of selection menus: 1, 5, 7
and 9• Various training sets of 1M, 10M and
27M words to learn the probabilities of n-grams.
• Various genres.
Hebrew Word Prediction
![Page 31: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/31.jpg)
August 6th ISAAC 2008
Prediction results using n-grams only
Hebrew Word Prediction
Keystrokes needed to enter a message in % (Smaller is better)
For tri-grams model trained on 27M corpus – very good results!
![Page 32: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/32.jpg)
August 6th ISAAC 2008
Adding Syntactic Information
P(wn|w1,…,wn-1) = λ1P(wn-i,…,wn|LM) + λ2P(w1,…,wn|μ),– μ is the morpho-syntactic HMM (morphological disambiguator)– Combine P(w1,…,wn|μ) with the probabilistic language
model LM in order to rank each word candidate given previous typed words.
– if the user typed I saw, and the next word candidates are
{him, hammer}we use the HMM model, for calculating: p(I saw him|μ) p(I saw hammer|μ), in order to tune the probability given by the n-gram.
* Trained on a 1M sized corpus.Hebrew Word Prediction
![Page 33: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/33.jpg)
August 6th ISAAC 2008
Results with morpho-syntactic knowledge
Hebrew Word Prediction
Model sequences of parts of speech with morphological features
Results w/o syntactic knowledge
![Page 34: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/34.jpg)
August 6th ISAAC 2008
Some Notes on Results• n-grams perform very well (high level of
keystroke saving)• High rate for all genres• And the expected:
– Better prediction when trained on more data– Better prediction with tri-grams– Better prediction with larger window
• Morpho-syntactic information did not improve results (in fact, it hurt!)
Results
![Page 35: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/35.jpg)
August 6th ISAAC 2008
Conclusion• Statistical data on a language with rich
morphology yields good results – up to 29% with nine word proposals– 34% for seven proposals– 54% for a single proposal
• Syntactic information did not improve the prediction.
• Explanation - morphology didn't improve due the use of p(w1,…,wn|μ) of an unfinished sentence
Hebrew Word Prediction - Conclusions
![Page 36: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/36.jpg)
August 6th ISAAC 2008
תודה
Thank you
![Page 37: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.fdocuments.in/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/37.jpg)
August 6th ISAAC 2008
Technical Information• CMU – N-grams• Storage – Berkeley DB to store
knowledge for WP: Mapping n-grams• More questions on technology – [email protected]
Hebrew Word Prediction