TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE
A Moses MT engine for legal translation
By Joël Sigling
a Moses MT engine forlegal
translationModern technology in a traditional sector
Joël SiglingDirector
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASEMonte Carlo, 25 March 2012
AVB Translations background• Amstelveens Vertaalburo: founded 1972 – traditional, high-
quality agency
• Translation World: founded 2002, tech-savvy all-round player
• Merger in 2010 >> AVB Translations: premium brand with strong tech focus
• Top 5 player in The Netherlands, 2011 turnover € 4.6 million
• Core business: general translations – legal, financial, technical, …NO software localization (yet!)
History of MT interest
• Member of TAUS since 2008, 1st round table Amsterdam
• Visited TAUS User Conferences in US since 2009
• Sense of urgency developed, merger distraction 2010
• Action in 2011 after merger
• 2011: choice for Dutch <> English legal (not IT-related!) domain engine
• Why SMT, why Moses? Quicker, cheaper, similar quality (shows research)
Why legal domain MT engine?• Legal translations about approx. 40% of AVB business, 80%
Dutch <>English
• Not the obvious choice: people said MT wouldn’t work for legal: sentences too long, material too intricate
• Statistical MT suited to non-stylistic materials: eg legal
• If this works, we can make MT happen for all other domains
MT engine objectives
• Increased productivity, no BLEU % target, but tangible, practical results. How much extra can a translator do when compared to HT?
• Tool to offer usable quality with very quick turnarounds for
high volume (typical “Friday afternoon lawyer requests”) • Becoming an MT front runner in the non-localization sector for
Dutch (5th language in Europe after FIGS)
Developing the Moses engine• Choice between in-house and external development
• In-house: control, developing expertise, lower long-term cost• External: lower initial cost, much more expertise > best for
now
• Our pre-requisites for development option • ownership and free access to engine• assurance data will not be used or copied by builder• Acceptable costs for development & usage• skilled partner > AsiaOnline, CrossLang, Pangeanic, LetsMT,
SmartMate??
• CrossLang > all of the above, closest to our office, independent
What we needed
• Large quantities of high-quality translation data
• Aligning existing high-quality legal translations (took longest to prepare)
• Existing legal TMs• Going forward: company-/industry-specific terminology
• Ways to measure gains
• Not just automated evaluation % increase, but also tangible improvements > we are entrepreneurs, not scientists
• CrossLang automated assessment tool (TER, BLEU, NIST, METEOR)
• Manual assessment: eg. how many hours for post-editing 10,000 words?
Input data
• Highest quality AVB Dutch <>English legal translations: approx. 700k words per language. Predominantly civil law.
• Not fully reviewed AVB TM, still high-quality: approx. 10 mi. words per language. Predominantly civil law.
• Legal translations harvested by CrossLang, more diverse legal material: 7 mi. words per language
CrossLang automated test results• Best results from AVB + harvested data, AVB data
weighted extra
• Results particularly good in civil law domain (bulk of AVB input data)
• Results improved dramatically for other legal domains by adding harvested data
AVB results in practice
• Test done in CrossLang production assessment tool: productivity 5% higher for post-editing than human output (human output in this case very high >1000 w p/h, PE even higer)
AVB results in practice
• Live rush translations done in past two weeks:
• 1,500 word trial done for law firm needing high volume in very short time. Post-edited in 75 minutes. Customer happy with quality/price ratio.
• 25,000 words in two days with moderate PE effort by two post-editors. Quality estimate 80-90% of human translation.
• 4,500 words in 3 hours with almost full PE effort by one post-editor. Quality estimate >90% of human translation
• 15,000 words in one day, done by two post-editors. Quality estimate 80-90% of human translation
AVB results in practice
• Test and live project show great potential in two areas:
• Producing usable translations very quickly and at 50-60% of normal translation cost. Margins are similar to normal translation, but likely to improve!
• Higher productivity, ie lower production cost and increased margins.
CrossLang Gateway benefits
• Standard Moses engine offers no high-level functions• Only plain text files, always sentence by sentence,
experimental recasing, experimental tag handling
• CrossLang Gateway offers Java service layer (not wrapper scripts)• Most common file formats: Word, XML, XLIFF, • Adjustable text segmentation • Hardened, aligment-based tag handling• Advanced recasing tool based on alignment data• Named entity recognition & (re)tokenization• Terminology checking and replacement
Gateway features crucial to processing our material properly
Conclusions
• Developing a good engine is not an “out of the box” task
• Sufficient high-quality data is necessary for good results
• Results are very promising, our objectives can be achieved
• Working with a value added partner is recommended
• Need to integrate MT solution in translation workflow apparent
Phone: +31 20 645.66.10Mobile: +31 625.025.475E-mail: [email protected]: @JoelAVBAdres: Ouderkerkerlaan 50
1185 AD AmstelveenThe Netherlands
Website: www.avb.nl
Top Related