MT and Translator's Tools

152
Outline Tools for Translators Free Language Data Machine Translation Machine Translation and Translation Technology Jimmy O’Regan The Apertium Project OSS Bar Camp, 19 September 2009 Jimmy O’Regan Machine Translation and Translation Technology

Transcript of MT and Translator's Tools

Page 1: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Machine Translation and Translation Technology

Jimmy O’Regan

The Apertium Project

OSS Bar Camp, 19 September 2009

Jimmy O’Regan Machine Translation and Translation Technology

Page 2: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

1 Free Language Data

Jimmy O’Regan Machine Translation and Translation Technology

Page 3: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Tools for Translators

Jimmy O’Regan Machine Translation and Translation Technology

Page 4: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Some Terminology

InternationalisationGiving software the capability to display text in anotherlanguage

In Open Source, this generally means adding support forgettext.

LocalisationCustomising the messages displayed to the user to appear inthe manner most appropriate for them.In their language, or their dialect.

TranslationConverting text from one language to another

Jimmy O’Regan Machine Translation and Translation Technology

Page 5: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Some Terminology

InternationalisationGiving software the capability to display text in anotherlanguageIn Open Source, this generally means adding support forgettext.

LocalisationCustomising the messages displayed to the user to appear inthe manner most appropriate for them.In their language, or their dialect.

TranslationConverting text from one language to another

Jimmy O’Regan Machine Translation and Translation Technology

Page 6: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Some Terminology

InternationalisationGiving software the capability to display text in anotherlanguageIn Open Source, this generally means adding support forgettext.

Localisation

Customising the messages displayed to the user to appear inthe manner most appropriate for them.In their language, or their dialect.

TranslationConverting text from one language to another

Jimmy O’Regan Machine Translation and Translation Technology

Page 7: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Some Terminology

InternationalisationGiving software the capability to display text in anotherlanguageIn Open Source, this generally means adding support forgettext.

LocalisationCustomising the messages displayed to the user to appear inthe manner most appropriate for them.

In their language, or their dialect.

TranslationConverting text from one language to another

Jimmy O’Regan Machine Translation and Translation Technology

Page 8: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Some Terminology

InternationalisationGiving software the capability to display text in anotherlanguageIn Open Source, this generally means adding support forgettext.

LocalisationCustomising the messages displayed to the user to appear inthe manner most appropriate for them.In their language, or their dialect.

TranslationConverting text from one language to another

Jimmy O’Regan Machine Translation and Translation Technology

Page 9: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Some Terminology

InternationalisationGiving software the capability to display text in anotherlanguageIn Open Source, this generally means adding support forgettext.

LocalisationCustomising the messages displayed to the user to appear inthe manner most appropriate for them.In their language, or their dialect.

Translation

Converting text from one language to another

Jimmy O’Regan Machine Translation and Translation Technology

Page 10: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Some Terminology

InternationalisationGiving software the capability to display text in anotherlanguageIn Open Source, this generally means adding support forgettext.

LocalisationCustomising the messages displayed to the user to appear inthe manner most appropriate for them.In their language, or their dialect.

TranslationConverting text from one language to another

Jimmy O’Regan Machine Translation and Translation Technology

Page 11: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation vs. Translation

Localisation and translation are sometimes, but not always, thesame.

Documents may need to be localised, but not translated:A British company with an Irish office still needs to localise their

documents: any reference to “our London office” will need to bechanged to “our Dublin office”.

Jimmy O’Regan Machine Translation and Translation Technology

Page 12: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation vs. Translation

Localisation and translation are sometimes, but not always, thesame.Documents may need to be localised, but not translated:

A British company with an Irish office still needs to localise theirdocuments: any reference to “our London office” will need to bechanged to “our Dublin office”.

Jimmy O’Regan Machine Translation and Translation Technology

Page 13: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation vs. Translation

Localisation and translation are sometimes, but not always, thesame.Documents may need to be localised, but not translated:A British company with an Irish office still needs to localise their

documents: any reference to “our London office” will need to bechanged to “our Dublin office”.

Jimmy O’Regan Machine Translation and Translation Technology

Page 14: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation vs. Translation

Localised translations can also have additional requirements:

gettext allows numbers to be specially treated: ‘‘%dfile(s)’’ ugliness is not necessary.English and Spanish need two forms of words for number: single

and pluralPolish needs three: single, plural, and quantity (greater than 5)Slovenian needs four: single, dual, plural, and quantity.

Jimmy O’Regan Machine Translation and Translation Technology

Page 15: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation vs. Translation

Localised translations can also have additional requirements:gettext allows numbers to be specially treated: ‘‘%dfile(s)’’ ugliness is not necessary.

English and Spanish need two forms of words for number: singleand pluralPolish needs three: single, plural, and quantity (greater than 5)Slovenian needs four: single, dual, plural, and quantity.

Jimmy O’Regan Machine Translation and Translation Technology

Page 16: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation vs. Translation

Localised translations can also have additional requirements:gettext allows numbers to be specially treated: ‘‘%dfile(s)’’ ugliness is not necessary.English and Spanish need two forms of words for number

: singleand pluralPolish needs three: single, plural, and quantity (greater than 5)Slovenian needs four: single, dual, plural, and quantity.

Jimmy O’Regan Machine Translation and Translation Technology

Page 17: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation vs. Translation

Localised translations can also have additional requirements:gettext allows numbers to be specially treated: ‘‘%dfile(s)’’ ugliness is not necessary.English and Spanish need two forms of words for number: single

and plural

Polish needs three: single, plural, and quantity (greater than 5)Slovenian needs four: single, dual, plural, and quantity.

Jimmy O’Regan Machine Translation and Translation Technology

Page 18: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation vs. Translation

Localised translations can also have additional requirements:gettext allows numbers to be specially treated: ‘‘%dfile(s)’’ ugliness is not necessary.English and Spanish need two forms of words for number: single

and pluralPolish needs three

: single, plural, and quantity (greater than 5)Slovenian needs four: single, dual, plural, and quantity.

Jimmy O’Regan Machine Translation and Translation Technology

Page 19: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation vs. Translation

Localised translations can also have additional requirements:gettext allows numbers to be specially treated: ‘‘%dfile(s)’’ ugliness is not necessary.English and Spanish need two forms of words for number: single

and pluralPolish needs three: single, plural, and quantity (greater than 5)

Slovenian needs four: single, dual, plural, and quantity.

Jimmy O’Regan Machine Translation and Translation Technology

Page 20: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation vs. Translation

Localised translations can also have additional requirements:gettext allows numbers to be specially treated: ‘‘%dfile(s)’’ ugliness is not necessary.English and Spanish need two forms of words for number: single

and pluralPolish needs three: single, plural, and quantity (greater than 5)Slovenian needs four

: single, dual, plural, and quantity.

Jimmy O’Regan Machine Translation and Translation Technology

Page 21: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation vs. Translation

Localised translations can also have additional requirements:gettext allows numbers to be specially treated: ‘‘%dfile(s)’’ ugliness is not necessary.English and Spanish need two forms of words for number: single

and pluralPolish needs three: single, plural, and quantity (greater than 5)Slovenian needs four: single, dual, plural, and quantity.

Jimmy O’Regan Machine Translation and Translation Technology

Page 22: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation

Software localisation is a huge business area for proprietarysoftware.

One that traditionally lags behind Open Source.That advantage is usually due to the efforts of a handful ofdedicated volunteers for the majority of languages.But they’re catching up: Facebook is using Open Source-likeefforts for their translations.

Jimmy O’Regan Machine Translation and Translation Technology

Page 23: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation

Software localisation is a huge business area for proprietarysoftware.One that traditionally lags behind Open Source.

That advantage is usually due to the efforts of a handful ofdedicated volunteers for the majority of languages.But they’re catching up: Facebook is using Open Source-likeefforts for their translations.

Jimmy O’Regan Machine Translation and Translation Technology

Page 24: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation

Software localisation is a huge business area for proprietarysoftware.One that traditionally lags behind Open Source.That advantage is usually due to the efforts of a handful ofdedicated volunteers for the majority of languages.

But they’re catching up: Facebook is using Open Source-likeefforts for their translations.

Jimmy O’Regan Machine Translation and Translation Technology

Page 25: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation

Software localisation is a huge business area for proprietarysoftware.One that traditionally lags behind Open Source.That advantage is usually due to the efforts of a handful ofdedicated volunteers for the majority of languages.But they’re catching up: Facebook is using Open Source-likeefforts for their translations.

Jimmy O’Regan Machine Translation and Translation Technology

Page 26: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Localisation Tools

Unsurprisingly then, localisation is very well supported by OpenSource software:

Pootle (http://translate.sourceforge.net/wiki/pootle/index -Web-based)

Virtaal (http://translate.sourceforge.net/wiki/virtaal/index -cross platform)

poEdit (http://www.poedit.net/ - cross platform)

Lokalize (http://userbase.kde.org/Lokalize - KDE)

GTranslator (http://gtranslator.sourceforge.net/ - GNOME)

Jimmy O’Regan Machine Translation and Translation Technology

Page 27: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Translation Tools

Unfortunately, there’s only one real equivalent tool for generaltranslation: OmegaT

Jimmy O’Regan Machine Translation and Translation Technology

Page 28: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Common Features

All of these tools include these features:

Translation MemoryAutomatically reuse previous translations

Fuzzy matchingSuggest translations similar to previously translated sentences

Terminology ManagementGive suggestions from a per-project dictionary

Jimmy O’Regan Machine Translation and Translation Technology

Page 29: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Common Features

All of these tools include these features:

Translation Memory

Automatically reuse previous translations

Fuzzy matchingSuggest translations similar to previously translated sentences

Terminology ManagementGive suggestions from a per-project dictionary

Jimmy O’Regan Machine Translation and Translation Technology

Page 30: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Common Features

All of these tools include these features:

Translation MemoryAutomatically reuse previous translations

Fuzzy matchingSuggest translations similar to previously translated sentences

Terminology ManagementGive suggestions from a per-project dictionary

Jimmy O’Regan Machine Translation and Translation Technology

Page 31: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Common Features

All of these tools include these features:

Translation MemoryAutomatically reuse previous translations

Fuzzy matching

Suggest translations similar to previously translated sentences

Terminology ManagementGive suggestions from a per-project dictionary

Jimmy O’Regan Machine Translation and Translation Technology

Page 32: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Common Features

All of these tools include these features:

Translation MemoryAutomatically reuse previous translations

Fuzzy matchingSuggest translations similar to previously translated sentences

Terminology ManagementGive suggestions from a per-project dictionary

Jimmy O’Regan Machine Translation and Translation Technology

Page 33: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Common Features

All of these tools include these features:

Translation MemoryAutomatically reuse previous translations

Fuzzy matchingSuggest translations similar to previously translated sentences

Terminology Management

Give suggestions from a per-project dictionary

Jimmy O’Regan Machine Translation and Translation Technology

Page 34: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Common Features

All of these tools include these features:

Translation MemoryAutomatically reuse previous translations

Fuzzy matchingSuggest translations similar to previously translated sentences

Terminology ManagementGive suggestions from a per-project dictionary

Jimmy O’Regan Machine Translation and Translation Technology

Page 35: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

TerminologyLocalisation vs. Translation

Translate Toolkit

http://translate.sourceforge.net/A set of common tools for translation/localisation:

Translation Memory server

Format conversion

Terminology management

Quality control tools

All brought to you by the wonderful people of translate.org.za

Jimmy O’Regan Machine Translation and Translation Technology

Page 36: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Free Language Software Needs Free Data

Just like “Free Software Needs Free Documentation”, so too doesFree Language Software need Free Data.

Usually, this means we have to make it ourselves.Unfortunately, the community of developers of free languagesoftware, and thus free language data, is quite small.

Jimmy O’Regan Machine Translation and Translation Technology

Page 37: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Free Language Software Needs Free Data

Just like “Free Software Needs Free Documentation”, so too doesFree Language Software need Free Data.Usually, this means we have to make it ourselves.

Unfortunately, the community of developers of free languagesoftware, and thus free language data, is quite small.

Jimmy O’Regan Machine Translation and Translation Technology

Page 38: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Free Language Software Needs Free Data

Just like “Free Software Needs Free Documentation”, so too doesFree Language Software need Free Data.Usually, this means we have to make it ourselves.Unfortunately, the community of developers of free languagesoftware, and thus free language data, is quite small.

Jimmy O’Regan Machine Translation and Translation Technology

Page 39: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

The importance of spell checkers

Spell checking data packages are the absolute bare minimum ofsupport for a language with technology.Usually, the people who develop them tend to be involved in otherareas of Free language software:

Jimmy O’Regan Machine Translation and Translation Technology

Page 40: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

The importance of spell checkers

Kevin Scannell

Makes the Irish spell checking data. And An Gramadoir, an Irishlanguage grammar checker. And created a WordNet/thesaurus forIrish. And contributed the language data for Apertium’sIrish–Scots Gaelic translator. etc.Marcin Mi lkowskiHeavily involved in the Polish spell checking data AndLanguageTool, a multilingual grammar checker that’s integratedwith Open Office. And maintains the open Polish thesaurus.

Jimmy O’Regan Machine Translation and Translation Technology

Page 41: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

The importance of spell checkers

Kevin ScannellMakes the Irish spell checking data.

And An Gramadoir, an Irishlanguage grammar checker. And created a WordNet/thesaurus forIrish. And contributed the language data for Apertium’sIrish–Scots Gaelic translator. etc.Marcin Mi lkowskiHeavily involved in the Polish spell checking data AndLanguageTool, a multilingual grammar checker that’s integratedwith Open Office. And maintains the open Polish thesaurus.

Jimmy O’Regan Machine Translation and Translation Technology

Page 42: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

The importance of spell checkers

Kevin ScannellMakes the Irish spell checking data. And An Gramadoir, an Irishlanguage grammar checker.

And created a WordNet/thesaurus forIrish. And contributed the language data for Apertium’sIrish–Scots Gaelic translator. etc.Marcin Mi lkowskiHeavily involved in the Polish spell checking data AndLanguageTool, a multilingual grammar checker that’s integratedwith Open Office. And maintains the open Polish thesaurus.

Jimmy O’Regan Machine Translation and Translation Technology

Page 43: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

The importance of spell checkers

Kevin ScannellMakes the Irish spell checking data. And An Gramadoir, an Irishlanguage grammar checker. And created a WordNet/thesaurus forIrish.

And contributed the language data for Apertium’sIrish–Scots Gaelic translator. etc.Marcin Mi lkowskiHeavily involved in the Polish spell checking data AndLanguageTool, a multilingual grammar checker that’s integratedwith Open Office. And maintains the open Polish thesaurus.

Jimmy O’Regan Machine Translation and Translation Technology

Page 44: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

The importance of spell checkers

Kevin ScannellMakes the Irish spell checking data. And An Gramadoir, an Irishlanguage grammar checker. And created a WordNet/thesaurus forIrish. And contributed the language data for Apertium’sIrish–Scots Gaelic translator. etc.

Marcin Mi lkowskiHeavily involved in the Polish spell checking data AndLanguageTool, a multilingual grammar checker that’s integratedwith Open Office. And maintains the open Polish thesaurus.

Jimmy O’Regan Machine Translation and Translation Technology

Page 45: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

The importance of spell checkers

Kevin ScannellMakes the Irish spell checking data. And An Gramadoir, an Irishlanguage grammar checker. And created a WordNet/thesaurus forIrish. And contributed the language data for Apertium’sIrish–Scots Gaelic translator. etc.Marcin Mi lkowski

Heavily involved in the Polish spell checking data AndLanguageTool, a multilingual grammar checker that’s integratedwith Open Office. And maintains the open Polish thesaurus.

Jimmy O’Regan Machine Translation and Translation Technology

Page 46: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

The importance of spell checkers

Kevin ScannellMakes the Irish spell checking data. And An Gramadoir, an Irishlanguage grammar checker. And created a WordNet/thesaurus forIrish. And contributed the language data for Apertium’sIrish–Scots Gaelic translator. etc.Marcin Mi lkowskiHeavily involved in the Polish spell checking data

AndLanguageTool, a multilingual grammar checker that’s integratedwith Open Office. And maintains the open Polish thesaurus.

Jimmy O’Regan Machine Translation and Translation Technology

Page 47: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

The importance of spell checkers

Kevin ScannellMakes the Irish spell checking data. And An Gramadoir, an Irishlanguage grammar checker. And created a WordNet/thesaurus forIrish. And contributed the language data for Apertium’sIrish–Scots Gaelic translator. etc.Marcin Mi lkowskiHeavily involved in the Polish spell checking data AndLanguageTool, a multilingual grammar checker that’s integratedwith Open Office.

And maintains the open Polish thesaurus.

Jimmy O’Regan Machine Translation and Translation Technology

Page 48: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

The importance of spell checkers

Kevin ScannellMakes the Irish spell checking data. And An Gramadoir, an Irishlanguage grammar checker. And created a WordNet/thesaurus forIrish. And contributed the language data for Apertium’sIrish–Scots Gaelic translator. etc.Marcin Mi lkowskiHeavily involved in the Polish spell checking data AndLanguageTool, a multilingual grammar checker that’s integratedwith Open Office. And maintains the open Polish thesaurus.

Jimmy O’Regan Machine Translation and Translation Technology

Page 49: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

The importance of spell checkers

translate.org.zaMake the spelling checkers for several South African languages, aswell as many tools for translators already mentioned–Virtaal,Translate Toolkit, Pootle. Much of Apertium’s English–Africanstranslator was made directly by translate.org.za developers, as wellas Apertium’s dbus interface, and a GUI. (Virtaal allows translatorsto use machine translations as a basis for their work).

Jimmy O’Regan Machine Translation and Translation Technology

Page 50: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

The importance of spell checkers

Spell checkers are set to become even more important. Hunspell,which is fast becoming the standard spell checker in Open Sourceprojects, now includes morphological analysis and generation. Thiswill greatly improve, among other things, terminology managementin translator’s tools.At the moment, if you have “dog” in your terminology list, thetranslation tool will see that and only that: “dogs” will gounrecognised. With morphological analysis, the tool can know that“dogs” is not only related to “dog”, but is the plural of a noun:another assistance to the translator.

Jimmy O’Regan Machine Translation and Translation Technology

Page 51: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Machine Translation

Machine translation has a bad reputation.

Jimmy O’Regan Machine Translation and Translation Technology

Page 52: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Mechanical Translation

Mechanical translation, of any form, does tend, inevitably, to havemistakes.

For centuries, paintings of Moses portrayed him as having horns.A translator of the Latin Vulgate added the wrong vowel: hethought that Moses had horns, not that his face was glowing.And people were killed for wishing to correct that, and othermistakes.Proofread translations, always.

Jimmy O’Regan Machine Translation and Translation Technology

Page 53: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Mechanical Translation

Mechanical translation, of any form, does tend, inevitably, to havemistakes.For centuries, paintings of Moses portrayed him as having horns.

A translator of the Latin Vulgate added the wrong vowel: hethought that Moses had horns, not that his face was glowing.And people were killed for wishing to correct that, and othermistakes.Proofread translations, always.

Jimmy O’Regan Machine Translation and Translation Technology

Page 54: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Mechanical Translation

Mechanical translation, of any form, does tend, inevitably, to havemistakes.For centuries, paintings of Moses portrayed him as having horns.A translator of the Latin Vulgate added the wrong vowel: hethought that Moses had horns, not that his face was glowing.

And people were killed for wishing to correct that, and othermistakes.Proofread translations, always.

Jimmy O’Regan Machine Translation and Translation Technology

Page 55: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Mechanical Translation

Mechanical translation, of any form, does tend, inevitably, to havemistakes.For centuries, paintings of Moses portrayed him as having horns.A translator of the Latin Vulgate added the wrong vowel: hethought that Moses had horns, not that his face was glowing.And people were killed for wishing to correct that, and othermistakes.

Proofread translations, always.

Jimmy O’Regan Machine Translation and Translation Technology

Page 56: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Mechanical Translation

Mechanical translation, of any form, does tend, inevitably, to havemistakes.For centuries, paintings of Moses portrayed him as having horns.A translator of the Latin Vulgate added the wrong vowel: hethought that Moses had horns, not that his face was glowing.And people were killed for wishing to correct that, and othermistakes.Proofread translations, always.

Jimmy O’Regan Machine Translation and Translation Technology

Page 57: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup

– the most basic form of MT

Translation Memory – also, a basic form of MT

Example Based Machine Translation – considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation – currently the darling ofresearch and the basis of Google Translate. Solves a lot of oldproblems, but introduces new one. And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation – The oldest kind of MT,dating back to the 1950s. – The kind I work with, soobviously it’s the best!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 58: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup – the most basic form of MT

Translation Memory – also, a basic form of MT

Example Based Machine Translation – considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation – currently the darling ofresearch and the basis of Google Translate. Solves a lot of oldproblems, but introduces new one. And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation – The oldest kind of MT,dating back to the 1950s. – The kind I work with, soobviously it’s the best!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 59: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup – the most basic form of MT

Translation Memory

– also, a basic form of MT

Example Based Machine Translation – considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation – currently the darling ofresearch and the basis of Google Translate. Solves a lot of oldproblems, but introduces new one. And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation – The oldest kind of MT,dating back to the 1950s. – The kind I work with, soobviously it’s the best!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 60: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup – the most basic form of MT

Translation Memory – also, a basic form of MT

Example Based Machine Translation – considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation – currently the darling ofresearch and the basis of Google Translate. Solves a lot of oldproblems, but introduces new one. And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation – The oldest kind of MT,dating back to the 1950s. – The kind I work with, soobviously it’s the best!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 61: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup – the most basic form of MT

Translation Memory – also, a basic form of MT

Example Based Machine Translation

– considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation – currently the darling ofresearch and the basis of Google Translate. Solves a lot of oldproblems, but introduces new one. And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation – The oldest kind of MT,dating back to the 1950s. – The kind I work with, soobviously it’s the best!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 62: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup – the most basic form of MT

Translation Memory – also, a basic form of MT

Example Based Machine Translation – considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation – currently the darling ofresearch and the basis of Google Translate. Solves a lot of oldproblems, but introduces new one. And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation – The oldest kind of MT,dating back to the 1950s. – The kind I work with, soobviously it’s the best!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 63: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup – the most basic form of MT

Translation Memory – also, a basic form of MT

Example Based Machine Translation – considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation

– currently the darling ofresearch and the basis of Google Translate. Solves a lot of oldproblems, but introduces new one. And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation – The oldest kind of MT,dating back to the 1950s. – The kind I work with, soobviously it’s the best!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 64: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup – the most basic form of MT

Translation Memory – also, a basic form of MT

Example Based Machine Translation – considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation – currently the darling ofresearch and the basis of Google Translate.

Solves a lot of oldproblems, but introduces new one. And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation – The oldest kind of MT,dating back to the 1950s. – The kind I work with, soobviously it’s the best!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 65: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup – the most basic form of MT

Translation Memory – also, a basic form of MT

Example Based Machine Translation – considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation – currently the darling ofresearch and the basis of Google Translate. Solves a lot of oldproblems, but introduces new one.

And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation – The oldest kind of MT,dating back to the 1950s. – The kind I work with, soobviously it’s the best!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 66: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup – the most basic form of MT

Translation Memory – also, a basic form of MT

Example Based Machine Translation – considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation – currently the darling ofresearch and the basis of Google Translate. Solves a lot of oldproblems, but introduces new one. And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation – The oldest kind of MT,dating back to the 1950s. – The kind I work with, soobviously it’s the best!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 67: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup – the most basic form of MT

Translation Memory – also, a basic form of MT

Example Based Machine Translation – considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation – currently the darling ofresearch and the basis of Google Translate. Solves a lot of oldproblems, but introduces new one. And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation

– The oldest kind of MT,dating back to the 1950s. – The kind I work with, soobviously it’s the best!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 68: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup – the most basic form of MT

Translation Memory – also, a basic form of MT

Example Based Machine Translation – considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation – currently the darling ofresearch and the basis of Google Translate. Solves a lot of oldproblems, but introduces new one. And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation – The oldest kind of MT,dating back to the 1950s.

– The kind I work with, soobviously it’s the best!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 69: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup – the most basic form of MT

Translation Memory – also, a basic form of MT

Example Based Machine Translation – considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation – currently the darling ofresearch and the basis of Google Translate. Solves a lot of oldproblems, but introduces new one. And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation – The oldest kind of MT,dating back to the 1950s. – The kind I work with, soobviously it’s the best

!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 70: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Types of Machine Translation

Dictionary lookup – the most basic form of MT

Translation Memory – also, a basic form of MT

Example Based Machine Translation – considered the mostaccurate form of MT, but there are few if any examples “inthe wild”.

Statistical Machine Translation – currently the darling ofresearch and the basis of Google Translate. Solves a lot of oldproblems, but introduces new one. And breaks a lot of thingsthat “used to work”.

Rule Based Machine Translation – The oldest kind of MT,dating back to the 1950s. – The kind I work with, soobviously it’s the best!!!

Jimmy O’Regan Machine Translation and Translation Technology

Page 71: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Is Machine Translation a Translator’s Tool?

Yes.

That might be hard to accept. Particularly if you only speakEnglish. But for closely-related, similar languages, machinetranslation can be as effective and accurate as a spelling checker.

Jimmy O’Regan Machine Translation and Translation Technology

Page 72: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Is Machine Translation a Translator’s Tool?

Yes.That might be hard to accept.

Particularly if you only speakEnglish. But for closely-related, similar languages, machinetranslation can be as effective and accurate as a spelling checker.

Jimmy O’Regan Machine Translation and Translation Technology

Page 73: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Is Machine Translation a Translator’s Tool?

Yes.That might be hard to accept. Particularly if you only speakEnglish.

But for closely-related, similar languages, machinetranslation can be as effective and accurate as a spelling checker.

Jimmy O’Regan Machine Translation and Translation Technology

Page 74: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Is Machine Translation a Translator’s Tool?

Yes.That might be hard to accept. Particularly if you only speakEnglish. But for closely-related, similar languages, machinetranslation can be as effective and accurate as a spelling checker.

Jimmy O’Regan Machine Translation and Translation Technology

Page 75: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Uses of Machine Translation

1 Assimilation Understanding a text

2 Dissemination Preparing a text for translation. That is; forpreparing a rough draft for a translator. Who then edits thetext.

Jimmy O’Regan Machine Translation and Translation Technology

Page 76: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Uses of Machine Translation

1 Assimilation

Understanding a text

2 Dissemination Preparing a text for translation. That is; forpreparing a rough draft for a translator. Who then edits thetext.

Jimmy O’Regan Machine Translation and Translation Technology

Page 77: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Uses of Machine Translation

1 Assimilation Understanding a text

2 Dissemination Preparing a text for translation. That is; forpreparing a rough draft for a translator. Who then edits thetext.

Jimmy O’Regan Machine Translation and Translation Technology

Page 78: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Uses of Machine Translation

1 Assimilation Understanding a text

2 Dissemination

Preparing a text for translation. That is; forpreparing a rough draft for a translator. Who then edits thetext.

Jimmy O’Regan Machine Translation and Translation Technology

Page 79: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Uses of Machine Translation

1 Assimilation Understanding a text

2 Dissemination Preparing a text for translation.

That is; forpreparing a rough draft for a translator. Who then edits thetext.

Jimmy O’Regan Machine Translation and Translation Technology

Page 80: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Uses of Machine Translation

1 Assimilation Understanding a text

2 Dissemination Preparing a text for translation. That is; forpreparing a rough draft for a translator. Who then edits thetext.

Jimmy O’Regan Machine Translation and Translation Technology

Page 81: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

“Phrase-Based” SMT

Most current research (and commercial use) of Statistical MT uses“phrase-based” SMT.

The problem is it’s not phrase-based.It’s N-Gram based.

Jimmy O’Regan Machine Translation and Translation Technology

Page 82: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

“Phrase-Based” SMT

Most current research (and commercial use) of Statistical MT uses“phrase-based” SMT.The problem is it’s not phrase-based.

It’s N-Gram based.

Jimmy O’Regan Machine Translation and Translation Technology

Page 83: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

“Phrase-Based” SMT

Most current research (and commercial use) of Statistical MT uses“phrase-based” SMT.The problem is it’s not phrase-based.It’s N-Gram based.

Jimmy O’Regan Machine Translation and Translation Technology

Page 84: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Grams

An n-gram is a collection of “n” amounts of tokens

For text, these are usually (not always!) words...punctuation is counted as a “word”.

Jimmy O’Regan Machine Translation and Translation Technology

Page 85: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Grams

An n-gram is a collection of “n” amounts of tokensFor text, these are usually (not always!) words

...punctuation is counted as a “word”.

Jimmy O’Regan Machine Translation and Translation Technology

Page 86: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Grams

An n-gram is a collection of “n” amounts of tokensFor text, these are usually (not always!) words...punctuation is counted as a “word”.

Jimmy O’Regan Machine Translation and Translation Technology

Page 87: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Unigrams

“This is John’s dog.”

Example

ThisisJohn’sdog.

Jimmy O’Regan Machine Translation and Translation Technology

Page 88: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Unigrams

“This is John’s dog.”

Example

ThisisJohn’sdog.

Jimmy O’Regan Machine Translation and Translation Technology

Page 89: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Bigrams

“This is John’s dog.”

Example

This isis JohnJohn ’s’s dogdog .

Jimmy O’Regan Machine Translation and Translation Technology

Page 90: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Bigrams

“This is John’s dog.”

Example

This isis JohnJohn ’s’s dogdog .

Jimmy O’Regan Machine Translation and Translation Technology

Page 91: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Trigrams“This is John’s dog.”

Example

This is Johnis John ’sJohn ’s dog’s dog .

Jimmy O’Regan Machine Translation and Translation Technology

Page 92: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Trigrams“This is John’s dog.”

Example

This is Johnis John ’sJohn ’s dog’s dog .

Jimmy O’Regan Machine Translation and Translation Technology

Page 93: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

An n-gram language model is a collection of n-grams

for n..1: atrigram model includes trigrams, bigrams, and unigrams.Each n-gram is listed along with its frequency(According to a particular corpus)But, most importantly...

Jimmy O’Regan Machine Translation and Translation Technology

Page 94: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

An n-gram language model is a collection of n-grams for n..1: atrigram model includes trigrams, bigrams, and unigrams.

Each n-gram is listed along with its frequency(According to a particular corpus)But, most importantly...

Jimmy O’Regan Machine Translation and Translation Technology

Page 95: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

An n-gram language model is a collection of n-grams for n..1: atrigram model includes trigrams, bigrams, and unigrams.Each n-gram is listed along with its frequency

(According to a particular corpus)But, most importantly...

Jimmy O’Regan Machine Translation and Translation Technology

Page 96: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

An n-gram language model is a collection of n-grams for n..1: atrigram model includes trigrams, bigrams, and unigrams.Each n-gram is listed along with its frequency(According to a particular corpus)

But, most importantly...

Jimmy O’Regan Machine Translation and Translation Technology

Page 97: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

An n-gram language model is a collection of n-grams for n..1: atrigram model includes trigrams, bigrams, and unigrams.Each n-gram is listed along with its frequency(According to a particular corpus)But, most importantly...

Jimmy O’Regan Machine Translation and Translation Technology

Page 98: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

N-grams overlap.

Jimmy O’Regan Machine Translation and Translation Technology

Page 99: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

When a sequence of words is queried against a language model,the language model software computes the combined likelihood of1..n combinations in that sequence.

Jimmy O’Regan Machine Translation and Translation Technology

Page 100: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

Possibly the first use of n-gram language models was in AutomaticSpeech Recognition.

Jimmy O’Regan Machine Translation and Translation Technology

Page 101: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

For basic uses of ASR, such as call centres, a custom grammar isused.

In a mobile phone, such a grammar could look like this:

Example

CALLWORD : phone call dialZEROWORD : zero ohNUMBER : one two three four five six seven eight nineZEROWORDNUMBERS : NUMBER* NUMBERCOMMAND : CALLWORD NUMBERS

Jimmy O’Regan Machine Translation and Translation Technology

Page 102: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

For basic uses of ASR, such as call centres, a custom grammar isused.In a mobile phone, such a grammar could look like this:

Example

CALLWORD : phone call dialZEROWORD : zero ohNUMBER : one two three four five six seven eight nineZEROWORDNUMBERS : NUMBER* NUMBERCOMMAND : CALLWORD NUMBERS

Jimmy O’Regan Machine Translation and Translation Technology

Page 103: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

However, for continuous dictation, building such grammars is analmost infinite task.

Instead of defining long, complicated grammars that define, forexample, when the sound /mi:t/ represents “meet” and when itrepresents “meat”, n-gram language models allow the correctsound to be chosen based on the context of the surrounding words.

Jimmy O’Regan Machine Translation and Translation Technology

Page 104: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

However, for continuous dictation, building such grammars is analmost infinite task.Instead of defining long, complicated grammars that define, forexample, when the sound /mi:t/ represents “meet” and when itrepresents “meat”, n-gram language models allow the correctsound to be chosen based on the context of the surrounding words.

Jimmy O’Regan Machine Translation and Translation Technology

Page 105: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

This was an obvious progression for ASR, which uses statisticalmodelling to choose in context which sound is most likely, basedon the surrounding sounds.

Jimmy O’Regan Machine Translation and Translation Technology

Page 106: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

Now, language models are being used in all areas of languagetechnology.The problem is: useful language models are huge, and can becomputationally costly to use

unless you have a data centre.Google, for example, use language models for everything :

Spell checking (Search, Google Wave, GMail)

Search queries (“Did you mean?”)

Machine translation

Jimmy O’Regan Machine Translation and Translation Technology

Page 107: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

Now, language models are being used in all areas of languagetechnology.The problem is: useful language models are huge, and can becomputationally costly to use unless you have a data centre.

Google, for example, use language models for everything :

Spell checking (Search, Google Wave, GMail)

Search queries (“Did you mean?”)

Machine translation

Jimmy O’Regan Machine Translation and Translation Technology

Page 108: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram Language Models

Now, language models are being used in all areas of languagetechnology.The problem is: useful language models are huge, and can becomputationally costly to use unless you have a data centre.Google, for example, use language models for everything :

Spell checking (Search, Google Wave, GMail)

Search queries (“Did you mean?”)

Machine translation

Jimmy O’Regan Machine Translation and Translation Technology

Page 109: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram-Based SMT

Basic Statistical MT uses a probabilistic dictionary: each word pairhas a probability assigned.

The interesting part is how they get those dictionaries.A program, usually GIZA++ (Open Source), reads two pairs oftext: the source language, and the target language.

Jimmy O’Regan Machine Translation and Translation Technology

Page 110: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram-Based SMT

Basic Statistical MT uses a probabilistic dictionary: each word pairhas a probability assigned.The interesting part is how they get those dictionaries.

A program, usually GIZA++ (Open Source), reads two pairs oftext: the source language, and the target language.

Jimmy O’Regan Machine Translation and Translation Technology

Page 111: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram-Based SMT

Basic Statistical MT uses a probabilistic dictionary: each word pairhas a probability assigned.The interesting part is how they get those dictionaries.A program, usually GIZA++ (Open Source), reads two pairs oftext: the source language, and the target language.

Jimmy O’Regan Machine Translation and Translation Technology

Page 112: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram-Based SMT

For each word in each sentence of the source language, theprobable translation is considered to be every word in the target; asmore words are seen, the translations are re-evaluated: the nexttime word 1 is used, perhaps “possible translation 1” is present,but “possible translation 2” is absent from the sentence: theprobability of the former is increased; the latter, decreased.And so on, over the course of the text, the probabilities of eachword are re-evaluated; then the whole text is processed again, andagain, until a reasonable level of probability remains.The result is a probabilistic dictionary.

Jimmy O’Regan Machine Translation and Translation Technology

Page 113: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram-Based SMT

N-grams come into the picture in two ways:

1 Evaluating multiple probable translationsSimilarly to Speech Recognition, each choice is evaluatedagainst a language model

2 N-grams as “words”As well as considering individual words, each n-gram isconsidered as a possible “phrase”, and treated as an individualword. This helps to cut down on ambiguous terms:“basketball coach” vs. “coach driver”.

Jimmy O’Regan Machine Translation and Translation Technology

Page 114: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram-Based SMT

N-grams come into the picture in two ways:

1 Evaluating multiple probable translations

Similarly to Speech Recognition, each choice is evaluatedagainst a language model

2 N-grams as “words”As well as considering individual words, each n-gram isconsidered as a possible “phrase”, and treated as an individualword. This helps to cut down on ambiguous terms:“basketball coach” vs. “coach driver”.

Jimmy O’Regan Machine Translation and Translation Technology

Page 115: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram-Based SMT

N-grams come into the picture in two ways:

1 Evaluating multiple probable translationsSimilarly to Speech Recognition, each choice is evaluatedagainst a language model

2 N-grams as “words”As well as considering individual words, each n-gram isconsidered as a possible “phrase”, and treated as an individualword. This helps to cut down on ambiguous terms:“basketball coach” vs. “coach driver”.

Jimmy O’Regan Machine Translation and Translation Technology

Page 116: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram-Based SMT

N-grams come into the picture in two ways:

1 Evaluating multiple probable translationsSimilarly to Speech Recognition, each choice is evaluatedagainst a language model

2 N-grams as “words”

As well as considering individual words, each n-gram isconsidered as a possible “phrase”, and treated as an individualword. This helps to cut down on ambiguous terms:“basketball coach” vs. “coach driver”.

Jimmy O’Regan Machine Translation and Translation Technology

Page 117: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram-Based SMT

N-grams come into the picture in two ways:

1 Evaluating multiple probable translationsSimilarly to Speech Recognition, each choice is evaluatedagainst a language model

2 N-grams as “words”As well as considering individual words, each n-gram isconsidered as a possible “phrase”, and treated as an individualword.

This helps to cut down on ambiguous terms:“basketball coach” vs. “coach driver”.

Jimmy O’Regan Machine Translation and Translation Technology

Page 118: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

N-Gram-Based SMT

N-grams come into the picture in two ways:

1 Evaluating multiple probable translationsSimilarly to Speech Recognition, each choice is evaluatedagainst a language model

2 N-grams as “words”As well as considering individual words, each n-gram isconsidered as a possible “phrase”, and treated as an individualword. This helps to cut down on ambiguous terms:“basketball coach” vs. “coach driver”.

Jimmy O’Regan Machine Translation and Translation Technology

Page 119: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Moses

Moses is an Open Source SMT system. Moses has a distinctadvantage over several other SMT systems:

1 It’s Open Source

Actively developed, and supported by a large community

2 Factored ModelsMoses is able to make use of linguistic information.

3 Open DataThe Moses developers also recognise the importance of FreeLinguistic Data, and have provided the EuroParl corpus sothat others may build a statistical MT system using it. Also,the JRC Acquis – the corpus of EU legal text (and most ofthe data behind Google Translate’s support for most officialEU languages) have prepared their corpus for use with Moses.

Jimmy O’Regan Machine Translation and Translation Technology

Page 120: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Moses

Moses is an Open Source SMT system. Moses has a distinctadvantage over several other SMT systems:

1 It’s Open SourceActively developed, and supported by a large community

2 Factored ModelsMoses is able to make use of linguistic information.

3 Open DataThe Moses developers also recognise the importance of FreeLinguistic Data, and have provided the EuroParl corpus sothat others may build a statistical MT system using it. Also,the JRC Acquis – the corpus of EU legal text (and most ofthe data behind Google Translate’s support for most officialEU languages) have prepared their corpus for use with Moses.

Jimmy O’Regan Machine Translation and Translation Technology

Page 121: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Moses

Moses is an Open Source SMT system. Moses has a distinctadvantage over several other SMT systems:

1 It’s Open SourceActively developed, and supported by a large community

2 Factored Models

Moses is able to make use of linguistic information.

3 Open DataThe Moses developers also recognise the importance of FreeLinguistic Data, and have provided the EuroParl corpus sothat others may build a statistical MT system using it. Also,the JRC Acquis – the corpus of EU legal text (and most ofthe data behind Google Translate’s support for most officialEU languages) have prepared their corpus for use with Moses.

Jimmy O’Regan Machine Translation and Translation Technology

Page 122: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Moses

Moses is an Open Source SMT system. Moses has a distinctadvantage over several other SMT systems:

1 It’s Open SourceActively developed, and supported by a large community

2 Factored ModelsMoses is able to make use of linguistic information.

3 Open DataThe Moses developers also recognise the importance of FreeLinguistic Data, and have provided the EuroParl corpus sothat others may build a statistical MT system using it. Also,the JRC Acquis – the corpus of EU legal text (and most ofthe data behind Google Translate’s support for most officialEU languages) have prepared their corpus for use with Moses.

Jimmy O’Regan Machine Translation and Translation Technology

Page 123: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Moses

Moses is an Open Source SMT system. Moses has a distinctadvantage over several other SMT systems:

1 It’s Open SourceActively developed, and supported by a large community

2 Factored ModelsMoses is able to make use of linguistic information.

3 Open Data

The Moses developers also recognise the importance of FreeLinguistic Data, and have provided the EuroParl corpus sothat others may build a statistical MT system using it. Also,the JRC Acquis – the corpus of EU legal text (and most ofthe data behind Google Translate’s support for most officialEU languages) have prepared their corpus for use with Moses.

Jimmy O’Regan Machine Translation and Translation Technology

Page 124: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Moses

Moses is an Open Source SMT system. Moses has a distinctadvantage over several other SMT systems:

1 It’s Open SourceActively developed, and supported by a large community

2 Factored ModelsMoses is able to make use of linguistic information.

3 Open DataThe Moses developers also recognise the importance of FreeLinguistic Data, and have provided the EuroParl corpus sothat others may build a statistical MT system using it. Also,the JRC Acquis – the corpus of EU legal text (and most ofthe data behind Google Translate’s support for most officialEU languages) have prepared their corpus for use with Moses.

Jimmy O’Regan Machine Translation and Translation Technology

Page 125: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy:The system is a pipelineEach piece “does one thing, and does it well”(Not quite: analysis and generation of words are performed byseparate modes of the same program)Each component can be easily replacedThe apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data: dictionaries,rules

Doesn’t run on Windows yet. Shame on you for cheering! ;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 126: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy:The system is a pipelineEach piece “does one thing, and does it well”(Not quite: analysis and generation of words are performed byseparate modes of the same program)Each component can be easily replacedThe apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data: dictionaries,rules

Doesn’t run on Windows yet. Shame on you for cheering! ;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 127: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy:The system is a pipelineEach piece “does one thing, and does it well”(Not quite: analysis and generation of words are performed byseparate modes of the same program)Each component can be easily replacedThe apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data: dictionaries,rules

Doesn’t run on Windows yet. Shame on you for cheering! ;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 128: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy

:The system is a pipelineEach piece “does one thing, and does it well”(Not quite: analysis and generation of words are performed byseparate modes of the same program)Each component can be easily replacedThe apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data: dictionaries,rules

Doesn’t run on Windows yet. Shame on you for cheering! ;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 129: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy:The system is a pipeline

Each piece “does one thing, and does it well”(Not quite: analysis and generation of words are performed byseparate modes of the same program)Each component can be easily replacedThe apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data: dictionaries,rules

Doesn’t run on Windows yet. Shame on you for cheering! ;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 130: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy:The system is a pipelineEach piece “does one thing, and does it well”

(Not quite: analysis and generation of words are performed byseparate modes of the same program)Each component can be easily replacedThe apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data: dictionaries,rules

Doesn’t run on Windows yet. Shame on you for cheering! ;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 131: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy:The system is a pipelineEach piece “does one thing, and does it well”(Not quite: analysis and generation of words are performed byseparate modes of the same program)

Each component can be easily replacedThe apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data: dictionaries,rules

Doesn’t run on Windows yet. Shame on you for cheering! ;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 132: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy:The system is a pipelineEach piece “does one thing, and does it well”(Not quite: analysis and generation of words are performed byseparate modes of the same program)Each component can be easily replaced

The apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data: dictionaries,rules

Doesn’t run on Windows yet. Shame on you for cheering! ;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 133: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy:The system is a pipelineEach piece “does one thing, and does it well”(Not quite: analysis and generation of words are performed byseparate modes of the same program)Each component can be easily replacedThe apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data

: dictionaries,rules

Doesn’t run on Windows yet. Shame on you for cheering! ;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 134: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy:The system is a pipelineEach piece “does one thing, and does it well”(Not quite: analysis and generation of words are performed byseparate modes of the same program)Each component can be easily replacedThe apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data: dictionaries

,rules

Doesn’t run on Windows yet. Shame on you for cheering! ;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 135: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy:The system is a pipelineEach piece “does one thing, and does it well”(Not quite: analysis and generation of words are performed byseparate modes of the same program)Each component can be easily replacedThe apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data: dictionaries,rules

Doesn’t run on Windows

yet. Shame on you for cheering! ;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 136: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy:The system is a pipelineEach piece “does one thing, and does it well”(Not quite: analysis and generation of words are performed byseparate modes of the same program)Each component can be easily replacedThe apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data: dictionaries,rules

Doesn’t run on Windows yet.

Shame on you for cheering! ;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 137: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy:The system is a pipelineEach piece “does one thing, and does it well”(Not quite: analysis and generation of words are performed byseparate modes of the same program)Each component can be easily replacedThe apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data: dictionaries,rules

Doesn’t run on Windows yet. Shame on you for cheering!

;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 138: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium is an Open Source Machine Translation platform.

Rule Based

Statistical disambiguation

Follows the UNIX philosophy:The system is a pipelineEach piece “does one thing, and does it well”(Not quite: analysis and generation of words are performed byseparate modes of the same program)Each component can be easily replacedThe apertium program itself is just a shell script that callsthe correct pipeline.

Several statistics-based tools for building data: dictionaries,rules

Doesn’t run on Windows yet. Shame on you for cheering! ;)

Jimmy O’Regan Machine Translation and Translation Technology

Page 139: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Some Errors

(Spanish – English)

Fondo Monetario InternacionalInternational Monetary bottom

(Catalan – English)Fidel CastroFaithful Castrate

Jimmy O’Regan Machine Translation and Translation Technology

Page 140: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Some Errors

(Spanish – English)Fondo Monetario Internacional

International Monetary bottom

(Catalan – English)Fidel CastroFaithful Castrate

Jimmy O’Regan Machine Translation and Translation Technology

Page 141: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Some Errors

(Spanish – English)Fondo Monetario InternacionalInternational Monetary bottom

(Catalan – English)Fidel CastroFaithful Castrate

Jimmy O’Regan Machine Translation and Translation Technology

Page 142: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Some Errors

(Spanish – English)Fondo Monetario InternacionalInternational Monetary bottom

(Catalan – English)

Fidel CastroFaithful Castrate

Jimmy O’Regan Machine Translation and Translation Technology

Page 143: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Some Errors

(Spanish – English)Fondo Monetario InternacionalInternational Monetary bottom

(Catalan – English)Fidel Castro

Faithful Castrate

Jimmy O’Regan Machine Translation and Translation Technology

Page 144: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Some Errors

(Spanish – English)Fondo Monetario InternacionalInternational Monetary bottom

(Catalan – English)Fidel CastroFaithful Castrate

Jimmy O’Regan Machine Translation and Translation Technology

Page 145: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium was born from a set of translators developed inUniversitat d’Alacant, as part of the OpenTrad project.

Originallydesigned to translate between the Romance languages of Spain, ithas been expanded over time to support more distant languages:First English–Catalan More recently, Basque to Spanish

Jimmy O’Regan Machine Translation and Translation Technology

Page 146: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium was born from a set of translators developed inUniversitat d’Alacant, as part of the OpenTrad project. Originallydesigned to translate between the Romance languages of Spain, ithas been expanded over time to support more distant languages:

First English–Catalan More recently, Basque to Spanish

Jimmy O’Regan Machine Translation and Translation Technology

Page 147: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium was born from a set of translators developed inUniversitat d’Alacant, as part of the OpenTrad project. Originallydesigned to translate between the Romance languages of Spain, ithas been expanded over time to support more distant languages:First English–Catalan

More recently, Basque to Spanish

Jimmy O’Regan Machine Translation and Translation Technology

Page 148: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Apertium

Apertium was born from a set of translators developed inUniversitat d’Alacant, as part of the OpenTrad project. Originallydesigned to translate between the Romance languages of Spain, ithas been expanded over time to support more distant languages:First English–Catalan More recently, Basque to Spanish

Jimmy O’Regan Machine Translation and Translation Technology

Page 149: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Community

As part of the OpenTrad project, Apertium had a community ofdevelopers, but limited to university and business developments.Thanks mostly to Francis Tyers, Apertium has in recent yearsbegun to also acquire a community of volunteer contributors.

The first release from the volunteer community was our Welsh toEnglish translator (mostly designed by Kevin Donnelly – who alsomaintains the Welsh spell checking data).This summer, we took part in Google’s Summer of Codeprogramme, with 8 successful students. One of the translatorsdeveloped during GSoC, for Norwegian Bokmal–Nynorsk, has(within a month of release) been used to translate 30 articles onthe Nynorsk Wikipedia.

Jimmy O’Regan Machine Translation and Translation Technology

Page 150: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Community

As part of the OpenTrad project, Apertium had a community ofdevelopers, but limited to university and business developments.Thanks mostly to Francis Tyers, Apertium has in recent yearsbegun to also acquire a community of volunteer contributors.The first release from the volunteer community was our Welsh toEnglish translator (mostly designed by Kevin Donnelly – who alsomaintains the Welsh spell checking data).

This summer, we took part in Google’s Summer of Codeprogramme, with 8 successful students. One of the translatorsdeveloped during GSoC, for Norwegian Bokmal–Nynorsk, has(within a month of release) been used to translate 30 articles onthe Nynorsk Wikipedia.

Jimmy O’Regan Machine Translation and Translation Technology

Page 151: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Community

As part of the OpenTrad project, Apertium had a community ofdevelopers, but limited to university and business developments.Thanks mostly to Francis Tyers, Apertium has in recent yearsbegun to also acquire a community of volunteer contributors.The first release from the volunteer community was our Welsh toEnglish translator (mostly designed by Kevin Donnelly – who alsomaintains the Welsh spell checking data).This summer, we took part in Google’s Summer of Codeprogramme, with 8 successful students. One of the translatorsdeveloped during GSoC, for Norwegian Bokmal–Nynorsk, has(within a month of release) been used to translate 30 articles onthe Nynorsk Wikipedia.

Jimmy O’Regan Machine Translation and Translation Technology

Page 152: MT and Translator's Tools

OutlineTools for TranslatorsFree Language DataMachine Translation

Is Machine Translation a Translator’s Tool?N-GramsMosesApertium

Language Pairs Supported

Spanish – Catalan Spanish – RomanianFrench – Catalan Occitan – CatalanEnglish – Galician Occitan – SpanishSpanish – Portuguese English – CatalanEnglish – Spanish English – EsperantoSpanish – Galician French – SpanishEsperanto – Spanish Welsh – EnglishBreton – French Esperanto – CatalanPortuguese – Catalan Portuguese – GalicianBasque – Spanish Nynorsk – Bokmal

Jimmy O’Regan Machine Translation and Translation Technology