An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.

7
An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal

Transcript of An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.

Page 1: An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.

An Investigation of Statistical Machine Translation (Spanish to

English)

Raghav Bashyal

Page 2: An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.

SMT

Statistical Machine TranslationPossible only through computersGlobal audienceUse of statistical techniques to produce natural translations

Page 3: An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.

Kevin Knight's Book

SMT has two partsThe second part, N-grams, are simpleThe first part, the alignment portion, are difficultAfter many long projects, I made my own algorithm

Page 4: An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.

Before that, an introduction to the characters

NLTK – simplifying input of corporaCorpora – hold textN-Grams – the frequency of a phrase

Page 5: An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.

Algorithm

1. Matcha. Take small Spanish inputb. Look through the corpus to find instances of the inputc. Collect the Spanish sentences in which this input was found, as

well as the English translation right below each sentenced. Compare the English sentences to discover similar wordse. Find the most common similar words and find permutations of

them2. Checka. Gather bi-gram values for each permutation using the bigram

calculatorb. Calculate the probabilities for each permutation with Knight’s

formulae. Return the most probable permutation as the most likely simple

translation

Page 6: An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.

Development

Simple – goal was to translateCorpora – functional “cosas” and “monkey”

Page 7: An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.

Results

It works! “el mono” = “the monkey”Deeper understanding of SMT’s power (Google translate)Expand, elaborate upon algorithm