Download - TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

Transcript
Page 1: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE

A Moses MT engine for legal translation

By Joël Sigling

Page 2: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

a Moses MT engine forlegal

translationModern technology in a traditional sector

Joël SiglingDirector

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASEMonte Carlo, 25 March 2012

Page 3: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

AVB Translations background• Amstelveens Vertaalburo: founded 1972 – traditional, high-

quality agency

• Translation World: founded 2002, tech-savvy all-round player

• Merger in 2010 >> AVB Translations: premium brand with strong tech focus

• Top 5 player in The Netherlands, 2011 turnover € 4.6 million

• Core business: general translations – legal, financial, technical, …NO software localization (yet!)

Page 4: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

History of MT interest

• Member of TAUS since 2008, 1st round table Amsterdam

• Visited TAUS User Conferences in US since 2009

• Sense of urgency developed, merger distraction 2010

• Action in 2011 after merger

• 2011: choice for Dutch <> English legal (not IT-related!) domain engine

• Why SMT, why Moses? Quicker, cheaper, similar quality (shows research)

Page 5: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

Why legal domain MT engine?• Legal translations about approx. 40% of AVB business, 80%

Dutch <>English

• Not the obvious choice: people said MT wouldn’t work for legal: sentences too long, material too intricate

• Statistical MT suited to non-stylistic materials: eg legal

• If this works, we can make MT happen for all other domains

Page 6: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

MT engine objectives

• Increased productivity, no BLEU % target, but tangible, practical results. How much extra can a translator do when compared to HT?

• Tool to offer usable quality with very quick turnarounds for

high volume (typical “Friday afternoon lawyer requests”) • Becoming an MT front runner in the non-localization sector for

Dutch (5th language in Europe after FIGS)

Page 7: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

Developing the Moses engine• Choice between in-house and external development

• In-house: control, developing expertise, lower long-term cost• External: lower initial cost, much more expertise > best for

now

• Our pre-requisites for development option • ownership and free access to engine• assurance data will not be used or copied by builder• Acceptable costs for development & usage• skilled partner > AsiaOnline, CrossLang, Pangeanic, LetsMT,

SmartMate??

• CrossLang > all of the above, closest to our office, independent

Page 8: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

What we needed

• Large quantities of high-quality translation data

• Aligning existing high-quality legal translations (took longest to prepare)

• Existing legal TMs• Going forward: company-/industry-specific terminology

• Ways to measure gains

• Not just automated evaluation % increase, but also tangible improvements > we are entrepreneurs, not scientists

• CrossLang automated assessment tool (TER, BLEU, NIST, METEOR)

• Manual assessment: eg. how many hours for post-editing 10,000 words?

Page 9: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

Input data

• Highest quality AVB Dutch <>English legal translations: approx. 700k words per language. Predominantly civil law.

• Not fully reviewed AVB TM, still high-quality: approx. 10 mi. words per language. Predominantly civil law.

• Legal translations harvested by CrossLang, more diverse legal material: 7 mi. words per language

Page 10: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

CrossLang automated test results• Best results from AVB + harvested data, AVB data

weighted extra

• Results particularly good in civil law domain (bulk of AVB input data)

• Results improved dramatically for other legal domains by adding harvested data

Page 11: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

AVB results in practice

• Test done in CrossLang production assessment tool: productivity 5% higher for post-editing than human output (human output in this case very high >1000 w p/h, PE even higer)

Page 12: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

AVB results in practice

• Live rush translations done in past two weeks:

• 1,500 word trial done for law firm needing high volume in very short time. Post-edited in 75 minutes. Customer happy with quality/price ratio.

• 25,000 words in two days with moderate PE effort by two post-editors. Quality estimate 80-90% of human translation.

• 4,500 words in 3 hours with almost full PE effort by one post-editor. Quality estimate >90% of human translation

• 15,000 words in one day, done by two post-editors. Quality estimate 80-90% of human translation

Page 13: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

AVB results in practice

• Test and live project show great potential in two areas:

• Producing usable translations very quickly and at 50-60% of normal translation cost. Margins are similar to normal translation, but likely to improve!

• Higher productivity, ie lower production cost and increased margins.

Page 14: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

CrossLang Gateway benefits

• Standard Moses engine offers no high-level functions• Only plain text files, always sentence by sentence,

experimental recasing, experimental tag handling

• CrossLang Gateway offers Java service layer (not wrapper scripts)• Most common file formats: Word, XML, XLIFF, • Adjustable text segmentation • Hardened, aligment-based tag handling• Advanced recasing tool based on alignment data• Named entity recognition & (re)tokenization• Terminology checking and replacement

Gateway features crucial to processing our material properly

Page 15: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

Conclusions

• Developing a good engine is not an “out of the box” task

• Sufficient high-quality data is necessary for good results

• Results are very promising, our objectives can be achieved

• Working with a value added partner is recommended

• Need to integrate MT solution in translation workflow apparent

Page 16: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

Phone: +31 20 645.66.10Mobile: +31 625.025.475E-mail: [email protected]: @JoelAVBAdres: Ouderkerkerlaan 50

1185 AD AmstelveenThe Netherlands

Website: www.avb.nl