EAMT Presentation by Welocalize Olga Beregovaya May 2015

46
What We Want, What We Need, What We Can’t Do Without The Enterprise User Perspective on MT Technology & Things Around It Olga Beregovaya VP, Language Tools

Transcript of EAMT Presentation by Welocalize Olga Beregovaya May 2015

Page 1: EAMT Presentation by Welocalize Olga Beregovaya May 2015

What We Want, What We Need, What We Can’t Do

WithoutThe Enterprise User Perspective on MT Technology & Things Around It

Olga BeregovayaVP, Language Tools

Page 2: EAMT Presentation by Welocalize Olga Beregovaya May 2015

ONE THING HAS BECOME ALL THINGS; EVERYTHING

• Global Economy Multicultural Transactions

• Multiple Demand-Driven Content Scenarios

• Multiple Data Sources + Formats

Page 3: EAMT Presentation by Welocalize Olga Beregovaya May 2015

27,500 words - 25,000 emoji lattices

…” And what is the use of a book,” thought Alice, “without pictures or conversations?”

?

Page 4: EAMT Presentation by Welocalize Olga Beregovaya May 2015

“The limits of my language means the limits of my world.” ― Ludwig Wittgenstein

“Never send a human to do a machine’s job.” ― Agent Smith

Page 5: EAMT Presentation by Welocalize Olga Beregovaya May 2015

•Areas of utmost interest and importance as seen from a major Language Service Provider perspective, or “a week in a life of an Language Technology group”

•Engine Output Quality; what is it actually, “output quality”? Are we “stuck”? Next breakthrough?

•Domain adaptation, what can we do when there is neither data nor budget to create it?

•Supporting “Raw” publishing scenarios (UGC, Support, MT to be consumed by other applications) – there will be no human to fix it. Or will there?

•Metadata – an enemy or an ally?

•Collaboration: how can we make it interesting for everyone?

AND WE’LL TALK ABOUT…

Page 6: EAMT Presentation by Welocalize Olga Beregovaya May 2015

• What do translators appreciate?

• What do translators struggle most with?

• Fluency VS. Accuracy?

• Final output quality?

WINS & CHALLENGES

Page 7: EAMT Presentation by Welocalize Olga Beregovaya May 2015

CONTENT DRIVING QUALITY DECISIONS

LEVELS OF POST-EDITING

CONTENT TYPE STRATEGY

FULL POST- EDITINGContent that meets certain “impact” criteria-visibility,

number of clicks, “shelf life”

Post-edit to human translation levels, correct for

terminology, grammar, fluency, style and voice

MEDIUM POST-EDITING

Human translation level requirement but with flexible style and fluency allowances

Content that meets certain “impact” criteria-visibility,

number of clicks, “shelf life”

LIGHT POST-EDITINGEmphasis on quick

turnaround and/or large volumes

Sliding scale depending on the content purpose

Page 8: EAMT Presentation by Welocalize Olga Beregovaya May 2015

Si queréis viajar en grupo, sóis al menos 6 personas y deseáis ir en otras fechas diferentes a las propuestas, preguntarnos porque  lo podemos organizar

;-)

If there are at least 6 of you wishing to travel

together, but on different dates that the ones offered, why don’t

you ask us? We can arrange it. ;- )

If you want to travel in a group, you are at

least 6 people and you wish to travel on

different dates other than the proposed ones, let us know because we can

organize it.

If you want to travel in a group, you are at

least 6 people and you wish to travel on

different dates other than the proposed ones, let us know because we can

organize it.

If you want to travel in groups , are at least 6 people and wish to go on dates other than

the proposed, ask us because

we can organize ;- )

If you want to travel in groups , sóis at least 6 people and wish to go on dates other than the proposals, ask

why we can organize ;- )

A la hora de generar las cuotas de

amortización es posible utilizar dos

porcentajes distintos; el fiscal o el de mercado que la

empresa establezca.

At the time of generating

amortization fees, one might use two different percentages: Fiscal or Market established by

the company.

When generating the amortization fees, it is

possible to use two different percentages; the fiscal percentage

and the company established market

one.

When generating the amortization fees, it is

possible to use two different percentages; the fiscal percentage

and the company established market

one.

When generating the amortization fees is possible to use two

different percentages; the

fiscal or the market rate that the company

stated.

When generating the repayment is possible

to use two different rates; the prosecutor or the market that the

company stated.

Para el supuesto de que haya contratado el

servicio de actualizaciones relativo al programa software

objeto de licencia, usted podrá actualizar el mismo durante los periodos de vigencia que tenga contratado

el servicio.

You can update the licensed software during the period

stated in your contract.

In case you have signed up for the

update service related to the licensed

software, you will be able to update the software during the

validity periods stated in the contract.

In case you have signed up for the

update service related to the licensed

software, you will be able to update the software during the

validity periods stated in the contract.

Because you engaged the update service

related to the licensed software program, you can update it during

periods of validity of the service contracted

For the assumption that engaged the

update service software on the

licensed program, you can update it during

periods of validity has contracted the

DO

MA

INS

SO

UR

CE

TR

AN

SC

REA

TIO

N

TR

AN

SLA

TIO

N

FU

LL

PO

ST-

ED

ITIN

G

LIG

HT

PO

ST-

ED

ITIN

G

RA

W M

T

TRAVEL

FINANCE

LEGAL

CONSUMING MT — QUALITY SCENARIOS

Page 9: EAMT Presentation by Welocalize Olga Beregovaya May 2015

THE POST-EDITOR PRODUCES:Publishable quality

The post-editor is responsible for ensuring that client quality requirements and style guide are met

The post-editor is expected to adhere to client StyleGuide preferences with regard to:

Infinitive / Imperative Passive / Impassive Formal / Informal Different Styles for Headers, Lists, Tables Special Handling of UI Options (Bilingual, English, Target?) Converting All the Measurements Based On the Local Conventions

+ Disambiguate Terminology+ Correct all the grammatical errors

Page 10: EAMT Presentation by Welocalize Olga Beregovaya May 2015

THE POST-EDITOR RECEIVES:GERMAN FRENCH JAPANESE RUSSIAN CHINESE SPANISH ITALIAN BRAZILIAN

WRONG TERMINOLOGY 6.46 4.93 13.63 5.00 6.20 9.63 3.78 1.13

WRONG SPELLING 2.00 0.86 0.88 0.13 0.30 1.13 0.56 1.27

SOURCE NOT TRANSLATED 6.38 5.36 3.88 5.13 3.60 2.50 1.22 1.73

COMPLIANCE WITH CLIENT SPECS 2.46 0.86 3.00 2.13 0.70 0.63 0.44 2.60

LITERAL TRANSLATION 7.85 8.64 5.00 4.00 9.40 5.38 7.67 7.93

TEXT/INFO ADDED 2.69 1.36 2.13 1.25 0.80 1.88 0.44 0.80

CAPITALIZATION 2.69 3.43 0.00 2.63 0.50 1.75 3.33 2.60

WRONG WORD FORM 6.77 7.79 0.13 9.88 0.60 6.75 3.67 6.75

WRONG PART OF SPEECH 2.62 3.21 2.00 1.88 0.60 2.13 3.67 1.33

PUNCTUATION 4.46 3.00 0.75 3.38 4.10 2.13 1.22 3.53

SENTENCE STRUCTURE 12.54 10.00 14.25 8.00 13.00 5.38 6.11 3.67

TAGS + MARK-UP 1.23 0.14 0.13 0.50 0.20 0.38 0.44 0.20

LOCALE ADAPTATION 0.46 0.29 0.75 0.63 0.20 0.75 0.44 0.13

SPACING 0.92 0.36 2.25 1.25 4.00 0.50 0.33 0.40

OTHER 1.92 1.50 1.88 0.13 0.50 0.13 1.44 0.27

TOTAL ERRORS 61.46 51.71 50.63 45.88 44.70 41.00 34.78 32.53

Page 11: EAMT Presentation by Welocalize Olga Beregovaya May 2015

Most time-consuming issues that translators need to fix are:

•Sentence structure (word order)•MT output too literal•Wrong terminology•Word form disagreements•Source term left untranslated

OR, IN A NUTSHELL…

Page 12: EAMT Presentation by Welocalize Olga Beregovaya May 2015

TOP 6 ON THE TRANSLATORS’ LOVE IT-LIST

1. Source of inspiration: reduces thinking and translation choice time2. Provides reference - very useful to translators new to a specific domain3. Reduces typing & lookup time by handling well repetitive terminology

and structures4. …thereby takes away the more monotonous efforts of translation5. Post-editors over time notice improvements; appreciate it more if they

‘co-own’ the engine6. MT output can be funny

LOL!

Page 13: EAMT Presentation by Welocalize Olga Beregovaya May 2015

TOP 3 ON THE TRANSLATORS’ S*#!T-LIST

1. Wrong sentence structure • Major impact on the post-editing effort (Spanish and Portuguese produce fewest

errors) • Japanese has the highest error rate and the lowest productivity gains (supported by

the cognitive effort error ranking research)

2. Wrong and inconsistent terminology• Very time-consuming to check and fix terminology; + enough issues from Fuzzy

Matches already • A major problem for new products where the terminology is not settled yet• Inconsistent output for UI references

3. Correct MT to an agreed standard (=quality expectations)• A challenging concept in the beginning for post-editors – they think they should

edit less if the quality is bad

S*#!T

Page 14: EAMT Presentation by Welocalize Olga Beregovaya May 2015

FEEDBACK LOOP

SOURCE TEXT MT OUTPUT POST-EDITED OUTPUTSPECIFIC

ERRORS/CHANGES MADE

Single-phase options range from 1.4kW to 7.7kW while three-phase PDUs, packed with output receptacles, range from 8.6kW to 21.6kW.

Single-fase 7.7kW Opties variëren van 1.4kW om en driefasige PDU's, boordevol Output-aansluitingen, variëren van 8,6 kW tot 21.6kW.

1,4 kW ... 7,7 kW ... 21,6 kW

Numbers and measurement units are not converted properly and no spaces inserted by MT engine (3 out of 4 occurrences, 1 is correct however, strange...

Single-phase options range from 1.4kW to 7.7kW while three-phase PDUs, packed with output receptacles, range from 8.6kW to 21.6kW.

• Biedt maximaal 24 TB <fmt id="1" tooltip="SUPERSCRIPT" endtooltip="SUPERSCRIPT"> 2 </fmt> maximale capaciteit per-uitbreidingsbehuizing toe te voegen.

• Biedt een maximale capaciteit van 24 TB<fmt id="1" tooltip="SUPERSCRIPT" endtooltip="SUPERSCRIPT">2</fmt> per uitbreidingsbehuizing.

No space should be inserted in front of and behind a number in superscript (in this case a "2")....>2<...and not:> 2 <

<fmt id="1" tooltip="b" endtooltip="b">Interface Speed:</fmt> 6 Gb/s SAS

<fmt id="1" tooltip="b" endtooltip="b"> Interfacesnelheid: 6 </fmt> Gb/s SAS

• Biedt een maximale capaciteit van 24 TB<fmt id="1" tooltip="SUPERSCRIPT" endtooltip="SUPERSCRIPT">2</fmt> per uitbreidingsbehuizing.

The number is inserted before the tag and should be after the tag

<fmt id="1" tooltip="b" endtooltip="b">Intermixed Drive Capacities:</fmt> Yes

<fmt id="1" tooltip="b" endtooltip="b"> Intermixed Capaciteit van de schijven: Ja </fmt>

...</fmt> Ja

The string is inserted before the tag and should be after the tag (and again spacing before and after tags inserted)

A new feature — DR Rapid Data Access — adds tighter integration with backup software applications, starting with Symantec OpenStorage-enabled backup applications.

Een nieuwe functie - DR-Rapid Data Access - voegt strakkere integratie met back-uptoepassingen, beginnend met Symantec OpenStorage geschikte back-uptoepassingen.

... — DR Rapid Data Access — ...

Please ensure any special characters like — (ChrW(151)) are preserved when inserting a TM proposal, and not replaced by a normal hyphen (ChrW(45)).

Can these errors can be learned and corrected automatically? Can we simplify or omit the “feedback

loop”?

Page 15: EAMT Presentation by Welocalize Olga Beregovaya May 2015

• How much more can we squeeze out of SMT phrase-based systems?

• Factored models?

• Deep syntactic/semantic structures?

• Have a closer look at rule-based systems?

• Deep Learning?

TRANSLATION QUALITY

Page 16: EAMT Presentation by Welocalize Olga Beregovaya May 2015

QUALITY DEGRADATION WITH POST-EDITING?

Page 17: EAMT Presentation by Welocalize Olga Beregovaya May 2015

POST-EDITING QUALITY RESULTS

No fails on one of our 28-language PE program thanks to correct terminology choices and few and consistent error.

Page 18: EAMT Presentation by Welocalize Olga Beregovaya May 2015

DOMAIN ADAPTATION

• How much can we get out of minimal amounts of data? A little more data? Mixed-domain data?

• Forcing dictionaries -fluency vs. adequacy? How can we seamlessly integrate client/user dictionaries into standard SMT workflows?

• How often to retrain?

• Does dynamic/interactive/”live” retraining help solve the domain relevance problem?

Page 19: EAMT Presentation by Welocalize Olga Beregovaya May 2015

“History is filled with brilliant people who wanted to fix things and just made them worse.”

― Chuck Palahniuk

Page 20: EAMT Presentation by Welocalize Olga Beregovaya May 2015

CONTENT EXPLOSION

Page 21: EAMT Presentation by Welocalize Olga Beregovaya May 2015

HOW USEFUL IS MT FOR UGC?

• We performed an evaluations after normalization and domain customizations of SMT engines.

• Between 54% and 96% of travel reviews scored between 3 and 5 on the Utility scale.

Page 22: EAMT Presentation by Welocalize Olga Beregovaya May 2015

WHY DO WE CARE?

BACKPACKER WEBSITE REVIEWS

LUXURY HOTEL REVIEWS

TECHNICAL FORUM

Translation purpose: youthful, 5 locales, cheap

Translation purpose: attract high-end clientele in 1 particular target market

Translation purpose: save cost on user support, as many locales as possible

MT + normalization; paid crowd with basic instructions on “do’s + don’ts”; crowd can be mix of translators / customers / …

possibly MT with Full PE, but possibly professional HT / transcreation

MT +”accuracy check” PE; crowd of technical users, savvy on product, linguistic errors are ok

Page 23: EAMT Presentation by Welocalize Olga Beregovaya May 2015

Global Commerce Global Consumer Pandora’s Box of Brand Names and Geographic Locations

Page 24: EAMT Presentation by Welocalize Olga Beregovaya May 2015

SOURCE CONTENT GONE WILD

• Short forms (nite (night), sayin (saying), gr8 (great)), • Acronyms (lol (laugh out loud), iirc (if I remember correctly)), • Typing errors/misspellings (wouls (would), rediculous (ridiculous)), • Punctuation omissions/errors (im (I’m), dont (don’t)), • Non-dictionary slang (that was well mint (that was very good)), • Wordplay (that was soooooo great (that was so great)), • Censor avoidance (sh1t, f***), • Emoticons (:) (smileys), <3 (heart))• Foreign words used intentionally (al dente, bon voyage)

(Jiang et al, 2012; Clark & Araki, 2011)

Page 25: EAMT Presentation by Welocalize Olga Beregovaya May 2015

TRAVEL USER REVIEWS

Non-native writers, typos, grammar errors, two authors with completely different styles & opinions, idioms don’t make sense…

this is our source text.

Page 26: EAMT Presentation by Welocalize Olga Beregovaya May 2015

NORMALIZING TRAVEL CONTENT

Any other normalization techniques?

Page 27: EAMT Presentation by Welocalize Olga Beregovaya May 2015

ONLINE RETAILER GLOBAL LISTINGS

Page 28: EAMT Presentation by Welocalize Olga Beregovaya May 2015

BEHIND THE SCENES

Proper Name Ant Farm

SOURCE MT OUTPUT

Basic MIDI Applications (Keyboard Magazine Library for Electronic Musicians), Ca

Grundlegende MIDI-Anwendungen (Tastatur Magazin Library für elektronische Musiker), Ca

Analog Way smart cut 2 seamless video & computer switcher - hi res scaled output

Analoge Weise smart cut 2 nahtlose Video & Computer-Umschalter - Hallo Res skalierte Ausgabe

TOO FAST TOP RAT BABY FANG VAMPIRE LIPS LEOPARD ROCKABILLY PINUP USA M IRON FIST

ZU SCHNELL TOP RATTE BABY FANG VAMPIR LIPPEN LEOPARD ROCKABILLY PINUP GIRL USA M IRON FIST

YRU Youth Rise Up Kreep Platform Stacked Leopard Animal Suede Spike Studded Pump

YRU Jugend Aufstieg Kreep Plattform gestapelt Leopard Tier Wildleder Spike beschlagene Pumpe

Page 29: EAMT Presentation by Welocalize Olga Beregovaya May 2015

DO WE WANT TO MESS WITH THIS THING?

• Words missing in the target / extra words• Terms are translated with different capitalization within the same message • Incorrect positive / negative translation• Lack of fluency• Mix of formal and informal form of address• Wrong translation for the context

Page 30: EAMT Presentation by Welocalize Olga Beregovaya May 2015

WHAT IS “QUALITY” FOR A GERMAN TOWER

CRANE OPERATOR?

• Do I close or not close the valve before re-pressurizing?

• Do you mean a wheel or the pulley?

Page 31: EAMT Presentation by Welocalize Olga Beregovaya May 2015

Punctuation and numbers:

• Handling locale-specific punctuation ‘Security Systems” to 「セキュリティ システム」• Slashes -space or no space on/off to вкл / выкл• Number formatting, i.e. trailing characters 33 % or 33%?• Target language/locale numbering conventions 44,500 to 44.500• Intelligently match punctuation, i.e. remove unmatched quotes and parenthes

• Keep source capitalization or replace with target language capitalization conventions?• Translate or transliterate addresses based on target language conventions• OOV handling? Leave in the source language or Transliterate (flag for the user?) • Handling DoNotTranslate without breaking the flow of the target sentence • Handling Acronyms – does it expand? Which part of it is in the parentheses?• Recognize groups as proper names (Herr Vogel is not a bird!)

A HUGE ONE: Preserving the negative or positive meaning ("Do remove this part" vs. "Do not remove this part”); handling standalone negation in source with affixed negation in the target

AN EQUALLY HUGE ONE: language identification – file level and even more so sub-file/sentence level

UGC SHOPPING LIST

Page 32: EAMT Presentation by Welocalize Olga Beregovaya May 2015

WORKFLOW, FORMATTING + METADATA

Page 33: EAMT Presentation by Welocalize Olga Beregovaya May 2015

QUIZ: COUNT THE INTEGRATION

Page 34: EAMT Presentation by Welocalize Olga Beregovaya May 2015

‘Segment’ vs. ‘Sentence’

A segment can be a lot of things - a sentence, a part of a sentence, a word , so if the engine is integrated on a “segment level”, ellipse, anaphora, other context-related features will not be taken into account

Doesn’t take much to break segmentation: a line break, a carriage return or anything ambiguous will do the job – damage both on the training and the runtime side

WHAT IS A ‘SEGMENT’?

Page 35: EAMT Presentation by Welocalize Olga Beregovaya May 2015

“A camel is a horse designed by a committee.”

― Sir Alexander Arnold Constantine Issigonis

Page 36: EAMT Presentation by Welocalize Olga Beregovaya May 2015

LOCALIZATION TAG PLACEMENT

This is what a plain-text engine will do:

To become verified and lift your sending limit, please confirm your email address, then add a credit or prepaid card to your account and {30} {31} {32} {33} {34} {35}confirm{36} {37} {38} it.{39}.

{30}Para hacerse verificado y levantar su límite de envío, por favor confirme su dirección de correo electrónico, luego añada un crédito o tarjeta de prepago a su cuenta de y confírmelo.{31}{32}{33}{34}{35}{36}{37}{38}{39}

Page 37: EAMT Presentation by Welocalize Olga Beregovaya May 2015

This is a<ph id="1" x="&lt;b&gt;">{1}</ph>test<ph id="1" x="&lt;/b&gt;">{2}</ph>

Dies ist ein <ph id="1" x="&lt;b&gt;">{1}</ph>Test<ph id="1" x="&lt;/b&gt;">{2}</ph>.

AND THIS IS WHAT’S NEEDED

Page 38: EAMT Presentation by Welocalize Olga Beregovaya May 2015

TAG PROJECTION TECHNIQUES?

• We’ll be happy to consume more information, but then please expose more information

• Walls, zones, pre-processing, post-processing – can we do more?

Page 39: EAMT Presentation by Welocalize Olga Beregovaya May 2015

AN ENEMY OR AN ALLY?

• Domain, sub-domain, product

• Timestamps - deprecate TU-s in the training data?

• XLIFF metadata fields that carry information about specific

terms

• UI strings and other variables markup

• Annotation fields

Can Localization metadata can be helpful for MT?

Page 40: EAMT Presentation by Welocalize Olga Beregovaya May 2015

WE WANT TO SHARE

• Human evaluation and ranking

• Source/MT/Edits corpora (for experimentation only)

• Productivity data per-segment (side-by-side with PE distance

and other metrics) – thanks iOmegaT

• Database of correlations between automated scoring,

human ranking and PE effort and time

• Data on correlation between specific errors and translator

preference – can it help translator-focused confidence

scoring?

We have A LOT of "field" data:

Page 41: EAMT Presentation by Welocalize Olga Beregovaya May 2015

DATA

Statistics from internal

database

Page 42: EAMT Presentation by Welocalize Olga Beregovaya May 2015

CORRELATION RESULTS

Adequacy & Fluency versus Productivity Delta

Productivity and Fluency across all locales with a

cumulative Pearson’s r of 0.77, a very strong

correlation

Productivity and Adequacy across all locales with a

cumulative Pearson’s r of 0.71, a very strong

correlation

According to our data, Human Evaluations are stronger predictors of post-editing productivity gains than Automatic

metrics including PE distance

Page 43: EAMT Presentation by Welocalize Olga Beregovaya May 2015

CORRELATION RESULTS

Automatic Metrics versus Productivity Delta

Productivity delta and BLEU with a cumulative

Pearson’s r of 0.24, a weak positive relationship

With a Pearson’s r of -0.436,

as PE distance increases, indicating a greater effort

from the post-editor, Productivity declines; it is

a strong negative relationship

Page 44: EAMT Presentation by Welocalize Olga Beregovaya May 2015

• More transparency in workings of engine and training• Faster systems, shorter turnaround on large systems• More “wizards” for training and deployment• Easier testing methodologies without full deployments• More standardized scoring and comparison metrics• More “wizards” for training and deployment• Predictive analysis of quality – confidence and utility scores• Normalization integrated into workflow and standardized• Industry-wide proper name and title library• Better transliteration standards • Morphologically aware terminology choices• More research on post-editing environments

1. How to display source/target

2. How to display multiple suggestions

3. Autocomplete

4. Better ways to calculate the productivity improvements with post-

editing• More interoperability, so translators can stay in CAT tool they prefer• Simplified workflows connecting MT engines and other tools

Dear Santa,

Page 45: EAMT Presentation by Welocalize Olga Beregovaya May 2015

SHALL WE?

— Napoleon Hill

Page 46: EAMT Presentation by Welocalize Olga Beregovaya May 2015

Thank you.