Effective Post-Editing in Human & Machine Translation - qt21.eu
Transcript of Effective Post-Editing in Human & Machine Translation - qt21.eu
Funded by the 7th Framework Programme of the European Commission through the contract 296347.
Stephen Doherty & Federico Gaspari
Centre for Next Generation Localisation
Dublin City University, Ireland
December 5th, 2013
Effective Post-Editing in Human & Machine Translation Workflows: Critical Knowledge & Techniques
1. Critical overview of post-editing;
2. Post-editing scenarios;
3. Post-editing strategies;
4. To PE or not to PE?
5. Do-it-yourself post-editing (follow-up task);
6. Questions and comments.
Outline
• The correction of texts that have been translated from a source language into a target language by a machine translation system (Allen, 2001)
• Which can mean, "tidying up the raw output, correcting mistakes, revising entire, or, in the worst case, retranslating entire sections" (Somers, 2001, p138)
What is Post-Editing?
• Transferable skill to other aspects of language and translation work:
o Pre-editing o Editing and proofing o Improved knowledge of CAT processes, esp. TM o Improved knowledge of MT systems and what they can and cannot do
• Marketable skill: o Personal and professional questions o Bang for the buck?
Why Learn about Post-Editing?
Critical Overview of PE
• Basic options to maximise the effectiveness of MT: o limiting the input / source text: controlled language, sub-language;
o post-editing the raw MT output (e.g. combined with system customisation).
• Post-editing (PE): o new skill that is acquired with experience;
o little formal training available, but valuable transferable skill;
o different from checking or revising human translation.
• PE productivity (i.e. time gains) increases mainly depend on: o experience with the PE task;
o expertise in the domain;
o familiarity with the language pair;
o knowledge of the specific MT system (kind and frequency of errors): • differences between statistical and rule-based systems.
PE Serves Different Needs from (revising) Human Translation
• The aim of PE is to improve the output, not necessarily to make it perfect o post-edited output must become (more) usable or understandable;
o least possible effort must be applied quickly: • the priority is to save time (not to lose the speed gains due to MT) and money;
o the extent and accuracy of PE are negotiated/specified on a case by case basis, depending on user’s needs and requirements;
• Different “types” and levels of post-editing (in companies, organisations): o no PE:
• internal circulation, almost never external publication (KBs with customised MT);
o minimum or medium PE: • internal circulation, rarely external publication;
o full/complete PE (but… is it worth it?): • very rarely internal circulation, mostly external publication.
• PE helpful to translate texts that would otherwise remain monolingual.
Differences between PE & Revising Human Translations
• Key skills for PE: (also relevant in revising HT?)
o excellent word-processing and editing skills;
o ability to work and make corrections directly on screen;
o general knowledge of the problems and challenges faced by MT;
o specific knowledge of the weaknesses of the particular MT system;
o knowledge of source and target languages (at what level? It depends…);
o quick in making decisions as to what and how to correct (or ignore errors);
o ability to always balance PE speed and cost with respect to required quality;
o ability to adapt to different specifications required for each job;
o different from working in a CAT environment:
• fuzzy matches within a translation memory tool are past human translations!
Differences between PE & Revising Human Translations
“This question [that has never really been touched upon
before in the field of traditional translation] concerns the
acceptance and use of half-finished texts. Within the
[human translation] profession, creating half-finished
texts is a non-issue because producing a partially
completed translated text is not something that human
translators do.”
(Allen, 2003: 297-298, my emphasis)
PE Scenarios
• Differences in errors: o MT systems do not have real-world knowledge or contextual awareness;
o MT errors are possible at any level: lexical, grammatical, syntactic, etc.
o not only linguistic errors, but also factual ones: • MT less likely than humans to make “distraction” errors, e.g. for numbers,
measures, etc.;
• MT can produce garbled output (obvious when extensive PE is required);
• but relatively subtle MT errors may be difficult to detect and correct (e.g. statistical MT systems might occasionally omit negations).
• Differences in the errors mean that different corrections are needed.
• Differences in the required final quality of the target text: o human translation (esp. with revision) aims at optimal, publishable quality;
o the final goal of PE is not necessarily publishable quality.
Factors Affecting PE Effectiveness
• One has to balance and optimise quality/speed/cost in relation to the intended use of the final translation:
o length of use of the translation;
o type, length and “visibility” of the document;
o turnaround time;
o needs and expectations of the end user(s);
o ability of the readers to make use of a less-than-perfect text;
o available and viable options.
• PE guidelines vary hugely, in terms of e.g.: o when to use PE (vs. manual translation from scratch); o how to do PE, its global approach and specific corrections.
Priorities in PE Different from those Applying to (Revision) of HT
• Factors to be considered (priorities): o PE is there to save time and money (optimal quality non essential);
o understandability and correctness of general meaning are key.
• Factors to be ignored (irrelevant in PE scenarios): o details or nuances (of information, meaning, style, register, etc.);
o elegance, fluency, naturalness of expression, etc.
• The MT quality for a language combination of determines the need for, and type/level of, PE.
• PE can be an aspect of diagnostic MT evaluation, i.e. giving feedback to MT developers to rectify frequent/important errors.
Post-Editing Strategies
• Like translation, PE can have various levels of quality requirements, e.g. gisting, high-quality dissemination
• A unique requirement to PE is to ascertain if it would be best to PE the text or translate it from scratch manually;
• These estimations may be quick judgements or more formal measures: o For example, a scale where evaluators are asked to estimate the effort
required (Specia et al. 2009): • 1. Requires complete translation • 2. Post-editing quicker than retranslation • 3. Little post-editing needed • 4. Fit for purpose
• PE may be carried out by translators, editors, bilinguals, and even monolinguals (e.g. crowdsourcing).
Post-Editing Strategies
• PE guides, while still not commonplace, vary greatly given the company, language pairs, and MT systems;
• PE concerns three texts: o The original source text; o The raw MT output; o The post-edited MT output, i.e. the target.
• Common PE operations include: o Fixing punctuation and capitalisation; o Changing sentence and phrase structures; o Editing grammatical agreements, e.g. singular/plural,masculine/
feminine; o Retranslating whole words or expressions.
Post-Editing Strategies
• Machine Translation Workflows:
o Rule-based and corpus-based (aka data-driven);
o RBMT uses (often manually written) grammatical and lexical rules to govern the translation process;
o Data-driven systems, such as statistical MT systems (SMT), are constructed based on large monolingual and bilingual parallel corpora from human translations;
o More recent hybrid systems, and human-in-the-loop scenarios.
Typical workflow where MT and PE is done outside of formal translation process, e.g. without a TM suggestion
Human-in-the-loop workflow where the translator is presented with both TM and MT suggestions (above a defined threshold) which they can choose to accept, reject, or edit as necessary, and the process and product are incorporated back into the system.
Post-Editing Strategies
• Two main approaches:
o fast PE and conventional PE (Loffler-Laurian 1996)
• Fast PE:
o Fast turnaround;
o Limited resources;
o Only essential corrections made to enable understanding.
• Conventional PE:
o Produce the 'gold standard' human translation;
o More resources required.
Post-Editing Strategies
• The deciding factor is the decision of what the text is intended to be used for:
o Gisting -> fast PE;
o Publication - conventional PE.
• There are also cases where no PE is required (Allen 2003), especially when working on sentence level
• A further question of resources and expertise
Post-Editing Strategies
• Error-based approaches:
o evaluating the output to see the error types;
o focusing on specific types;
o refining the MT system and/or linguistic pre-processing
o avoiding repetitive errors (time and frustration)
• Issues:
o no control of TM and/or MT content so errors are propagated
o the onus of quality is shared, unknown, or not considered
o consistency in TM and MT data (Moorkens et al. 2013)
Post-Editing Strategies
• Typical issues for MT system
• SMT tends to have issues with...
• RBMT tends to have issues with...
• However, hybrid approaches make this less clear
• Increased need for in-house guides based on in-house requirements, systems, and assets
To PE or not PE?
• PE is becoming a widespread activity in the translation/localisation industry (Allen 2003, Yunker 2008, O'Brien 2011);
• Clear advantages in industry applications in terms of productivity by informed combinations of MT with PE (O’Brien 2007, Takako et al. 2007, Guerberof 2009, Groves & Schmidtke 2009, Tatsumi 2009);
• Absence of best practice and lack of training materials and resources; • Huge variance in areas of application, business needs, resources, and
expertise; • Estimated time/effort versus actual time/effort? • Are translators automatically good post-editors? (de Almeida 2013) • A case of trial and error.
Do-It-Yourself Post-Editing
• Aim: o to put what we’ve learned today into practice, and to
challenge our estimations on how long PE might take • Time:
o 15 to 20 minutes
• Follow-up short webinar to discuss results, language-specific issues, tips, and evaluation of our estimations and results:
o Tuesday, December 10th:
o http://www2.gotomeeting.com/register/458586994
Do-It-Yourself Post-Editing
Part One:
1. Find two short general texts (~200 words each) in any language you have proficiency in, so that we can translate them into English;
2. On the basis of your expectations of MT and PE, decide upon one of the two texts to translate yourself manually to a publishable standard, and record how long you estimate this will take;
3. Translate this text manually while recording the actual time it takes (e.g. using a watch, mobile phone, or the clock on your computer).
Do-It-Yourself Post-Editing
Part Two:
1. For the other text, MT it with a statistical MT system (e.g. http://translate.google.com/) and a rule-based MT system (e.g. http://www.babelfish.com/) - some languages may only have access to one type of engine and that’s ok too;
2. Once you have your MT output(s), decide which you will post-edit based on which output you think will take less time to PE to a publishable standard - record how long you estimate this will take;
3. Post-edit this MT output while recording the actual time it takes;
4. If you wish to share your times with others so that we can make comparisons and have a richer feedback session, let us know your estimated and actual scores via http://goo.gl/7zxJM9
5. Check back for the follow-up webinar and results via http://www2.gotomeeting.com/register/458586994
Online Resources
• PET: o stand-alone, open-source tool to post-edit and assess machine or human translations while gathering
detailed statistics about post-editing time amongst other effort indicators - http://pers-www.wlv.ac.uk/~in1676/pet/
• MateCAT: o web-based CAT tool that uses MT, machine learning and quality estimation techniques, where post-editing
can be carried out and learnd from - http://www.matecat.com/matecat/the-project/ • Google Translator Toolkit:
o self-serve TM, MT, and post-editing environment in the cloud - http://translate.google.com/toolkit • Accept:
o European project to improve PE and MT with its own environment - http://www.accept-project.eu/ • Microsoft Translator Hub:
o self-serve TM, MT, and post-editing environment in the cloud - http://hub.microsofttranslator.com/ • KantanMT:
o self-serve TM, MT, and post-editing environment in the cloud, with automated post-editing expressions known as PEX to enhance manual PE- http://www.kantanmt.com/help_about_pex.php
• SmartMATE: o self-serve TM, MT, and post-editing environment in the cloud - http://www.smartmate.co/
• More information on translation quality assessment, quality estimation, and industry reports on translation technology, including evaluation and training - http://www.qt21.eu/launchpad/content/training
Funded by the 7th Framework Programme of the European Commission through the contract 296347.
Thank you for your attention!
Q & A
Chapter 16 from Somers, H. (ed.) (2003) Computers and Translation: A Translator’s Guide. Amsterdam and Philadelphia, John Benjamins, i.e. “Post-editing” by Jeffrey Allen, pages 297-317. Petrits, A., F. Braun-Chen, J.M. Martínez García, C. Ross, R. Sauer, A. Torquati & A. Reichling (2001) “The Commission’s MT System: Today and Tomorrow”. In B. Maegaard, B. (ed.) Proceedings of the MT Summit VIII. European Association for Machine Translation. Senez, D. (1998a) “The Machine Translation Help Desk and the Post-Editing Service”. Terminologie & Traduction, 1, 1998. European Commission: OPOCE. Senez, D. (1998b) “Post-editing service for machine translation users at the European Commission”. In Proceedings of Translating and the Computer 20. Aslib. Wagner, E. (1985) “Post-editing Systran – A challenge for Commission Translators”. Terminologie & Traduction, 3, 1985. European Commission: OPOCE.
Suggested Readings
Guerberof Arenas, Ana (2009) “Productivity and Quality in the Post–editing of Outputs from Translation Memories and Machine Translation”. Localisation Focus 7(1): 11-21http://isg.urv.es/library/papers/2009_Ana_Guerberof_Vol_7-11.pdf Guerberof Arenas, Ana (2013) “What do professional translators think about post-editing?”. The Journal of Specialised Translation 19: 75-95. www.jostrans.org/issue19/art_guerberof.pdf O’Brien, Sharon (2002) “Teaching Post-editing: A Proposal for Course Content”. Proceedings of the 6th EAMT Workshop on “Teaching Machine Translation”. EAMT/BCS, UMIST, Manchester, UK. 99-106. http://mt-archive.info/EAMT-2002-OBrien.pdf Poulis, Alexandros and David Kolovratnik (2012) “To Post-edit or not to Post-edit? Estimating the Benefits of MT Post-editing for a European Organization”. Proceedings of the AMTA 2012 Workshop on Post-editing Technology and Practice (WPTP 2012). http://amta2012.amtaweb.org/AMTA2012Files/html/9/9_paper.pdf Moorkens, J., Doherty, S., O’Brien, S. & Kenny, D. (2013). A virtuous circle: laundering translation memory data using statistical machine translation. Perspectives: Studies in Translatology. http://www.tandfonline.com/eprint/dUaZx8QXKFS5aUBISbBM/full
Suggested Readings