Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting...

Challenges in Predicting Machine TranslationUtility for Human Post-Editors

Michael Denkowski and Alon Lavie

Language Technologies InstituteCarnegie Mellon University

October 29, 2012

Source Text FastTranslation

MT System

Good fast translation?

Source Text GoodTranslation

Translators

MT System

Translators

MT System

Translators

MT with Human Post-Editing

Source Text

FastTranslation

Translators

MT System

Good FastTranslation

Source Text

FastTranslation

Translators

MT System

Very SlowRe-Translation

Source Text

FastTranslation

Translators

MT System

Very SlowRe-Translation

Introduction

Utility prediction: We need to reliably predict the usability ofautomatic translations.

“Referenceless” utility prediction:

• Corresponds to confidence estimation task

• Confidence Estimation for post-editing (Specia 2011)

• WMT 2012 Shared Quality (for post-editing) Estimation Task(Callison-Burch et al., 2012)

Reference-aided utility prediction

• Corresponds to MT evaluation task

• This work

Introduction

• This work

Introduction

• This work

This Work

Machine translation as a starting point for human translators

• Goal is utility for post-editing

• Compare post-editing to traditional adequacy-driven tasks

Examine results of a post-editing experiment

• Simulate a real-world localization scenario

• Examine challenges in predicting translation usefulness forhuman translators

Adequacy Tasks

Adequacy: semantic similarity to reference translations

Significant research efforts on improving end quality of machinetranslation:

• ACL Workshops on Statistical Machine Translation(Callison-Burch et al., 2011)

• NIST Open Machine Translation Evaluations(Przybocki et al., 2009)

Measured by absolute scores or rankings

Motivation: MT for user consumption, input for other NLP tasks

Post-Editing

Human-targeted translation edit rate (HTER, Snover et al., 2006)

1. Human translators correct MT output

2. Automatically calculate number of edits using TER

TER =# of edits

# of reference words

Edits: insertion, deletion, substitution, block shift

Translation ExampleWMT 2011 Czech–English Track

Ref: He was supposed to pay half a million to Lubos G.

1: He had for Lubosi G. to pay half a million crowns.

2: He had to pay lubosi G. half a million kronor.

1: He had for to pay Lubosi Lubos G. to pay half a million crowns.

2: He had to pay lubosi Lubos G. half a million kronor.

1: He had for to pay Lubosi Lubos G. to pay half a million crowns.

2: He had to pay lubosi Lubos G. half a million kronor.

Ref: The problem is that life of the lines is two to four years.

1: The problem is that life is two lines, up to four years.

0.49 0.29

2: The problem is that the durability of lines is two or four years.

0.34 0.14

0.49 0.29

0.34 0.14

1: The problem is that life is two of the lines , up to is two to four years.

2: The problem is that the durability life of lines is two or to four years.

1: The problem is that life is two of the lines , up to is two to four years.

0.49 0.29

2: The problem is that the durability life of lines is two or to four years.

0.34 0.14

MT Post-Editing Experiment

90 sentences from Google Docs documentation

Translated from English to Spanish by two systems:

• Microsoft Translator

• Moses system (Europarl)

180 MT outputs total

Sent to human translators at Kent State Institute for AppliedLinguistics for post-editing

Translators never saw the reference translations

90 sentences from Google Docs documentation

Translated from English to Spanish by two systems:

• Microsoft Translator

• Moses system (Europarl)

180 MT outputs total

Sent to human translators at Kent State Institute for AppliedLinguistics for post-editing

Translators never saw the reference translations

Data collected from professional translators (in training):

Post-edited translations

Expert post-editing ratings1: No editing required2: Minor editing, meaning preserved3: Major editing, meaning lost4: Re-translate

From parallel data:

Independent reference translations

Evaluate post-edited results using standard MT evaluation metrics:

BLEU (Papineni et al., 2002):

• n-gram precision with a brevity penalty

TER (Snover et al., 2006):

• Minimum edit distance

Meteor (Denkowski and Lavie, 2011):

• Tunable alignment-based metric

Task: Reference-assisted utility prediction

MT Post-Editing Results

Average rating: 1.69

Average HTER: 12.4

Automatic metric scores:

BLEU TER Meteor

Post-edited 79.2 12.4 90.0

MT vs Ref 31.7 49.5 58.2

Post vs Ref 34.1 48.3 59.2

Average HTER: 12.4

BLEU TER Meteor

Post-edited 79.2 12.4 90.0

MT vs Ref 31.7 49.5 58.2

Post vs Ref 34.1 48.3 59.2

Average HTER: 12.4

BLEU TER Meteor

Post-edited 79.2 12.4 90.0

MT vs Ref 31.7 49.5 58.2

Post vs Ref 34.1 48.3 59.2

Average HTER: 12.4

BLEU TER Meteor

Post-edited 79.2 12.4 90.0

MT vs Ref 31.7 49.5 58.2

Post vs Ref 34.1 48.3 59.2

r 4-pt BLEU TER Meteor

4-point – 0.32 0.28 0.33

HTER 0.49 0.26 0.24 0.27

Metric correlation with post-editing scores

Oracle experiment: tune Meteor to maximize correlation

How well can we (over)fit expert post-editing ratings?

The Meteor Metric

Flexible alignment:

Scoring features:

• Precision/Recall contribution (insertions, deletions)

• Fragmentation penalty (reordering)

• Content/function word contribution

• Flexible match weights

r 4-pt BLEU TER Meteor Meteororacle4-point – 0.32 0.28 0.33 0.35

HTER 0.49 0.26 0.24 0.27 0.34

Metric correlation with post-editing scores

Additional experiment: translation usability

Divide translations into two groups:

• Suitable for post-editing (1-2)

• Not suitable for post-editing (3-4)

Examine metric score distribution of each group

Assess metric ability to distinguish between usable and non-usabletranslations

Unfair advantage: reference translations

Additional experiment: translation usability

Divide translations into two groups:

• Suitable for post-editing (1-2)

• Not suitable for post-editing (3-4)

Examine metric score distribution of each group

Assess metric ability to distinguish between usable and non-usabletranslations

Unfair advantage: reference translations

Usability Experiment Results

0.0 0.2 0.4 0.6 0.8 1.0BLEU Score

UsableNon-usable

0.0 0.2 0.4 0.6 0.8 1.0Oracle Meteor Score

UsableNon-usable

0.0 0.2 0.4 0.6 0.8 1.0BLEU Score

UsableNon-usable

0.0 0.2 0.4 0.6 0.8 1.0BLEU Score

UsableNon-usable

Larger Data Set

Are out results skewed by the small size of the data (180 sentences)?

WMT12 Quality Estimation Task:

1832 English-to-Spanish MT outputs

HTER scores and 5-point multiple-expert ratings

Run usability experiment with this data

Larger Data Set

Are out results skewed by the small size of the data (180 sentences)?

WMT12 Quality Estimation Task:

1832 English-to-Spanish MT outputs

HTER scores and 5-point multiple-expert ratings

Run usability experiment with this data

WMT 2012 Quality Estimation Task Data

0.0 0.2 0.4 0.6 0.8 1.0BLEU Score

UsableNon-usable

Usability vs HTER

How well do experts and HTER agree?

0.0 0.2 0.4 0.6 0.8 1.0HTER

UsableNon-usable

0.0 0.2 0.4 0.6 0.8 1.0HTER

UsableNon-usable

Kent State WMT 2012

Usability vs HTER

How well do experts and HTER agree?

0.0 0.2 0.4 0.6 0.8 1.0HTER

UsableNon-usable

0.0 0.2 0.4 0.6 0.8 1.0HTER

UsableNon-usable

Kent State WMT 2012

Usability vs HTER (WMT12)

0 20 40 60 80 100

Expert

Rating

Conclusions

MT for post-editing utility is a significantly different task fromMT for adequacy

Current MT tools under-perform on predicting post-editingusability

Even metrics that use post-editing information (HTER) don’tmatch expert assessments

To improve post-editing usability, we need better data, bettermetrics, better MT systems

Conclusions

www.transcenter.info

Challenges in Predicting Machine TranslationUtility for Human Post-Editors

Michael Denkowski and Alon Lavie

Language Technologies InstituteCarnegie Mellon University

October 29, 2012

Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting...

Documents

Transcript of Challenges in Predicting Machine Translation Utility for ... · Machine translation as a starting...

The Efﬁcacy of Human Post-Editing for Language …heer+manning.chi13.pdfThe Efﬁcacy of Human Post-Editing for Language Translation Spence Green, Jeffrey Heer, andChristopher D.

MT and Post Editing in master's level translation education

Estimating Post-Editing Effort with Translation Quality ... › exarb › arch › sagemo2016.pdf · The ﬁeld of Quality Estimation aims to predict translation quality without reference

Human factors in machine translation and post-editing among …doras.dcu.ie/23074/1/Human factors in machine translation... · 2019. 3. 11. · 2 Human factors in machine translation

Techniques in translation, computer assisted, machine translation, subtitling, editing post editing

Editing nominalisations in English−German translation ...

Post-editing of Machine Translation - Terminology Coordination Unit · 2020-02-18 · Post-editing is possibly the oldest form of human-machine cooperation for translation, having

About CLS Communication - Professional writing, editing and translation services

What Is the Use of Machine Translation? Exploring the Post-Editing … · 2017-05-30 · Post-editing Given that Fully Automatic High Quality Translation (FAHQT) of any given text

Found in Translation: The Utility of C. elegans Alpha ...

Academic Translation and Editing Brochure

Effective Post-Editing in Human & Machine Translation - qt21.eu

Translation Utility - Rockwell Automationliterature.rockwellautomation.com/idc/groups/literature/documents/... · Interop.Excel.dll ... Run Setup.exe or FactoryTalk View Translation

Human Translation versus Machine Translation and Full Post-Editing

Skills: simple audio editing, exporting the result, finding free sound effects and music Concepts: audio editing operations, audio effects, audio utility.

Post Editing Guidelines For BOLT Machine Translation ...

Introduction: Post-editing in practice Process, product ... · translating process. Post-editing of machine translation is also a service in its own right, with specific guidelines

Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing

Business Planning for Editorial Freelancers · localization, translation, copywriting, editing and proofreading – for simplicity, this guide refers primarily to editing and proofreading,

POST-EDITING MACHINE TRANSLATION: WHAT DOES IT TAKE? · POST-EDITING MACHINE TRANSLATION: WHAT DOES IT TAKE? KATRIN MARHEINECKE text & form GmbH ... Customers ask for help ... do