Post on 03-Jan-2016
An Investigation of Statistical Machine Translation (Spanish to
English)
Raghav Bashyal
SMT
Statistical Machine TranslationPossible only through computersGlobal audienceUse of statistical techniques to produce natural translations
Kevin Knight's Book
SMT has two partsThe second part, N-grams, are simpleThe first part, the alignment portion, are difficultAfter many long projects, I made my own algorithm
Before that, an introduction to the characters
NLTK – simplifying input of corporaCorpora – hold textN-Grams – the frequency of a phrase
Algorithm
1. Matcha. Take small Spanish inputb. Look through the corpus to find instances of the inputc. Collect the Spanish sentences in which this input was found, as
well as the English translation right below each sentenced. Compare the English sentences to discover similar wordse. Find the most common similar words and find permutations of
them2. Checka. Gather bi-gram values for each permutation using the bigram
calculatorb. Calculate the probabilities for each permutation with Knight’s
formulae. Return the most probable permutation as the most likely simple
translation
Development
Simple – goal was to translateCorpora – functional “cosas” and “monkey”
Results
It works! “el mono” = “the monkey”Deeper understanding of SMT’s power (Google translate)Expand, elaborate upon algorithm