Making the Most of Machine Translation Today...Machine Translation is a relatively new technology,...

8
Making the Most of Machine Translation Today An eBook from the ULG library

Transcript of Making the Most of Machine Translation Today...Machine Translation is a relatively new technology,...

Page 1: Making the Most of Machine Translation Today...Machine Translation is a relatively new technology, with origins in the early 20th century, that has consistently experienced rapid advancements

1

Making the Most ofMachine Translation TodayAn eBook from the ULG library

Page 2: Making the Most of Machine Translation Today...Machine Translation is a relatively new technology, with origins in the early 20th century, that has consistently experienced rapid advancements

2

History

Machine Translation is a relatively new technology,

with origins in the early 20th century, that has

consistently experienced rapid advancements since

the 1980s. George Artsrouni and Petr Smirnov-

Troyanskii worked to create MT in the 1930s, with

Smirnov-Troyanskii laying the groundwork for what

was needed in an MT system.

Artsrouni, an engineer, and Troyanskii, a Russian

academic, both applied for patents that are considered

the definitive precursor to modern-day MT. The two

men sought patents for electromechanical tools that

could be used as translation dictionaries (Hutchins).

Troyanskii generally garners more acclaim in MT

history, given that he suggested the system would

require the following component parts: an editor

familiar with a source language to convert words to

base forms; a machine, which would turn them into

equivalent forms in the target language; and a second

editor who would edit the machine translations

(Craciunescu, Gerding-Salas, Stringer-O’Keeffe,

2004).

Attempts to create a successful MT system

continued into the 1950s and 1960s with the advent

Machine Translation (MT) is ever-present in the translation industry.

The technology is being used to shorten project

timelines for Language Service Providers (LSPs)

and reduce costs for clients as they localize

content around the globe. MT is an amazing

tool for business when it’s used properly. Often,

this means working with a linguist to post edit

translated documents to ensure translations

are accurate and readable.

Although MT has reached amazing new levels

of accuracy, it’s still not on par with human

linguistic expertise. But that doesn’t mean

using MT is a fruitless endeavor; in fact, that’s

far from the case. This article describes how

global companies can best leverage MT without

sacrificing the quality of translation projects.

Page 3: Making the Most of Machine Translation Today...Machine Translation is a relatively new technology, with origins in the early 20th century, that has consistently experienced rapid advancements

3

of computers. In 1954, the first MT test run took place when

IBM partnered with Georgetown University, a demonstration

widely circulated in the media and considered to be quite a feat

at the time. Although advancements continued in the field,

the thought of digitally translating languages seemed less

and less like a possibility and more like a flawed experiment.

Hopes were diminished when the United States government

created an advisory committee on the technology, the

Automatic Language Processing Advisory Committee

(ALPAC), in 1964. The committee released an unflattering

report of MT’s progress, saying research wasn’t advancing

at the rate it should.

“No one can guarantee, of course, that we will not suddenly

or at least quickly attain machine translation, but we feel that

this is very unlikely,” the report stated.

In ALPAC’s paper, released in 1966, computational linguistics

researcher Victor Yngve, who was working at MIT at the time,

provided a similarly doubtful outlook.

“As to the possibility of fully automatic translation, I am

convinced we will someday reach the point where this will

be feasible and economical. However, there is considerable

basic knowledge required that we simply don’t have atthe

moment, and it is anybody’s guess how soon this knowledge

can be obtained,” said Yngve.

The report put a stop to MT research for some time, before

important developments resurfaced in the 1980s.

MT research and development boomed in the 1990s and

2000s with the advent of the digital age. Statistical- and

example-based translation came to the fore in the 1990s and

2000s, and today the most recent advances, such as Neural

Machine Translation (NMT), are again making waves in the

industry.

Currently, MT is seen as a technology that’s drastically

improved since its infancy but still has a good amount of

room to grow in terms of accuracy. While MT has gotten

better at deciphering context thanks to NMT, which is able

to review complete sentences at a time instead of individual

words or phrases, it is still usually necessary to include Post

Editing (PE) for the best results. PE refers to the process in

which a human translator cleans up a document after it goes

through an MT system.

Functions

MT is now virtually ubiquitous in the modern world, and

sites like Google Translate and Bing allow anyone to use the

technology.

MT is used in many different ways by companies within

different industry verticals, including Legal, Life Sciences,

Page 4: Making the Most of Machine Translation Today...Machine Translation is a relatively new technology, with origins in the early 20th century, that has consistently experienced rapid advancements

4

Manufacturing, Information Technology, Finance and

Consumer Products. Businesses have had success using

MT to reduce translation costs and more efficiently convey

global messages.

MT can be used together with other technologies to

expedite translation projects, including:

• Language Identification: Language Identification (LI) automatically identifies the language(s) a document contains based on a pre-populated text corpus stored in the system.

• Optical Character Recognition: Optical Character Recognition scans photos or documents containing coded text and converts that text into an editable format. OCR is commonly used for PDFs or other non-editable documents.

• Terminology Integration: Some MT systems can incorporate Translation Memory (TM) or glossary databases to aid in keeping terminology consistent.

MT can be a boon for businesses, but its limitations may also

outweigh its capabilities if it is used inappropriately or without

proper preparation.

Recently, Google and Microsoft have begun developing

Neural MT (NMT) engines and incorporated them into their

applications. This relatively new technology promises to

replace Phrase-Based Machine Translation (PBMT). NMT

employs artificial intelligence algorithms that can derive

meaning from whole sentences or ideas using so-called neural

networks whereas PBMT works with individual words, or

segments of a sentence (Wu, et all., 2016).

NMT is modeled on neural networks in the human brain, where

information is sent to different “layers” to be processed before

output. The biggest benefit to NMT is its speed and accuracy,

which is possible thanks to its ability to use algorithms to learn

linguistic rules on its own from statistical models.

Statistical MT, or PBMT, on the other hand, uses predictive

algorithms to translate text. These systems are built upon

parallel bilingual text corpora, which serve as a basis for

“matching” to create output with the highest probability of

being correct.

Page 5: Making the Most of Machine Translation Today...Machine Translation is a relatively new technology, with origins in the early 20th century, that has consistently experienced rapid advancements

5

When MT Works For Business

Before using MT for corporate translation projects, users

should ask themselves some key questions to see if the

solution will work for them.

To determine whether or not MT solutions are appropriate for your corporate project, ask yourself a few questions:

• Is a quick turnaround time necessary?

• Will the translation be distributed externally?

• Is cost a critical factor?

• Is the source documentation confidential, or protected by regulation?

If quality is your top priority, it is highly unlikely that you will

create accurate translations without the help of a human

translator.

On the other hand, if there is a need to translate a large number

of documents to determine which materials are relevant (for

example, if a law firm has a large number of multilingual court

documents to review), MT can be very helpful. MT is also

handy in less formal situations to generate basic translations

of emails, internal communications or memos (getting the

so-called “gist”). Getting the “gist” of foreign-language

correspondence with MT is much less time-consuming, labor-

intensive and costly than calling in a professional linguist. But

always keep data security in mind: Secure MT applications

that utilize standard methods of security, such as 256-bit

encryption, are critical for the secure translation of emails and/

or websites and make for quick and easy solutions without

storing user data.

Typically, for translations that will be distributed externally

(such as PR and brand-sensitive materials), it is usually best

to use a human translator or a combination of human and

machine translation through post-editing (PE). Marketing and

advertising aren’t usually good candidates for MT, given the

complicated and often idiomatic language they contain.

From a practical standpoint, it’s also important to realize that

a substantial amount of setup and preparation goes into MT.

If you don’t have the time and resources to properly train an

MT system, it will produce very inconsistent results.

Related to this point is the fact that MT systems work on

language pairs and can only translate into a target language

the system has been built with. No MT engine is automatically

able to translate any language; it needs to be properly trained

with existing language information.

When MT Doesn’t Work

The biggest drawback of MT is its ongoing inability to pick up

on linguistic nuances, something humans do naturally.

Page 6: Making the Most of Machine Translation Today...Machine Translation is a relatively new technology, with origins in the early 20th century, that has consistently experienced rapid advancements

6

As mentioned earlier, MT systems are weak at translating

marketing copy like blogs, taglines, or proposals accurately and

readably. Likewise, literature and other creative or persuasive

writing are examples of texts where MT may leave readers

confused or put off. Since MT systems are incapable of

sensing and transmitting the “feelings” or “emotions” being

employed in creative copy, MT can’t successfully convey its

message.

In the field of healthcare, using MT for “mission critical”

documentation, such as medical device instructions, is still

not recommended. Whenever patient safety is at stake, it’s

a good rule always to rely on the human touch.

Projects where you should avoid relying solely on MT include:

• Those in highly regulated fields such as medical device or healthcare, where safety could be compromised by a poor translation

• Those that include content that will be distributed externally or used as branding or promotional materials

• Those with content that contain nuanced, complicated meanings such as literature or creative/persuasive texts

That said, it is a good idea for businesses to use MT as a

primer, or first shot, to figure out the “gist” of a text in another

language.

Another key issue is security. It is tempting to rely on

free, online MT offerings, but these almost always lack

confidentiality, and in the end you may be paying a higher

price for the translation than you expect.

Breaching client confidentiality could mean legal action and

damage to professional relationships. Google Translate, for

example, stores any and all data it receives. By inputting your

“Successful application of Machine Translation requires post editing and an engine that will stand up to confidentiality requirements faced by users in regulated fields, or those with sensitive documentation. Modern day machine translation is an invaluable tool when used with the help of professional human linguists.”

– Kristen Giovanis United Language Group President

Page 7: Making the Most of Machine Translation Today...Machine Translation is a relatively new technology, with origins in the early 20th century, that has consistently experienced rapid advancements

7

content into Google, you are allowing Google to make free use

of it for its own purposes. From the site’s terms and services:

“When you upload, submit, store, send or receive content

to or through our Services, you give Google (and those we

work with) a worldwide license to use, host, store, reproduce,

modify, create derivative works (such as those resulting from

translations, adaptations or other changes we make so that

your content works better with our Services), communicate,

publish, publicly perform, publicly display and distribute such

content.”

The bottom line is this: MT is a cost-effective and efficient

way to perform an initial translation of content. And, in some

cases, using post-edited MT output will produce the results

you need.

The biggest mistake you can make is relying solely on MT

for translation if your final product will be client-facing or

distributed to the outside world.

Know Your Audience, Industry and Scope of Your Project

MT is an extremely valuable asset to businesses that know

how to use it properly. Using MT to generally translate the gist

of a text, or processing an enormous volume of documents

to sort out which ones must be translated by a human is

hugely beneficial.

Combining PE and MT results in faster turnaround times than

standard human translation (because a linguist isn’t needed

to complete the initial translation) and saves money.

MT tools used in a secure environment can be extremely

advantageous in the corporate world. Maybe most importantly,

to be truly effective, businesses who plan to use MT need to

understand how the technology will be used and what the

scope of their project is.

Always remember that MT cannot pick up on the linguistic

complexities that a human translator would, and that MT

almost always produces a less-than-perfect translation.

Here are some best practices around the use of MT:

• Use MT to get the “gist” of a text quickly and cost-effectively.

• Employ MT when you have a large number of documents that need to be translated quickly.

• Use MT when reviewing documents internally.

• Understand the risks of using MT. The best way to produce accurate translations is to combine machine and human translation.

• Know your audience, your industry and the scope of your project before you deploy MT.

Page 8: Making the Most of Machine Translation Today...Machine Translation is a relatively new technology, with origins in the early 20th century, that has consistently experienced rapid advancements

8

References:

Craciunescu, O., Gerding-Salas, C., & Stringer-O’Keeffe, S.

(2004). Machine Translation and Computer-Assisted Translation:

A New Way of Translating? Retrieved from:

http://www.translationjournal.net/ journal/29computers.htm

Wu,Y., Schuster, M., Chen, Z., Le, Q., Norouzi, M. (2016)

Google’s Neural Machine Translation System: Bridging the

Gap Between Human and MachineTranslation.

Retrieved from: https://arxiv.org/pdf/1609.08144v1. pdf

AlSukhni, E., Alkabi, M., Alsmadi, I. (2016) An Automatic

Evaluation for Online Machine Translation: Holy Quran Case

Study. Retrieved from:

http://www.thesai.org/Downloads/Volume7No6/Paper_14-An_

Automatic_Evaluation_for_Online_Machine_Translation.pdf

Hutchins, J. Two Precursors of Machine Translation: Artsrouni

and Trojanskij. Retrieved from:

http://www.hutchinsweb.me.uk/IJT-2004.pdf

Automatic Language Processing Advisory Committee.

Language and Machines: Computers in Translation and

Linguistics. Retrieved from:

http://www.mt-archive.info/ALPAC-1966.pdf

Machine Translation, redefined forthe modern global business.

Security Speed

Text Documents Websites

www.octavemt.info