Beyond Post-Editing: The Work of the eBay MTLS

59
Beyond Post-Editing How the eBay MT Language Specialists Reinvent the Linguist’s Role November 2016 Jose Luis Bonilla Sánchez, eBay MTLS Supervisor

Transcript of Beyond Post-Editing: The Work of the eBay MTLS

Beyond Post-EditingHow the eBay MT Language Specialists Reinvent the Linguist’s Role

November 2016

Jose Luis Bonilla Sánchez, eBay MTLS Supervisor

This presentation is about…

MeMachine Translation- Different views- A brief history- MT at eBayThe MTLS- Their place in L10n- Tasks- ProfileThe Future

Who am I?

My Journey

EBAY 1

EFI

ISP AMSTERDAM

GRANADA UNIVERSITY

TLT MADRID

Senior LSLead Translator

BA

Translation

Interpreting

Translator & PjM

LQA

Engineer

EBAY 2

APPLE

MONSTER

MTLS Supervisor

Knowledge

Engineer

LQA

Engineer

SILICON VALLEY LINE

SPAIN LINE

NETHERLANDS LINE

The Views on MT

The Nightmare Scenario

How I see it

A Little History

The “MTree”

Rule-Based MT

Statistical

MT

Phrase-Based

Word-Based

Neural MT

Rule-Based MT

The RBMT Workflow

We Write The Rules

Source

Text

Translation

Lexicographic

Analysis

Syntactic

Analysis

Morphological

Analysis

Target

Text

The Limits

- Too laborious

- Too unique

- Hard limits

Statistical Machine Translation: Cracking the Code

How to Crack the Code

Data

Translation

(search for best

possible

translation)

Text(input)

Text(output)

Language

Model

Translation

Model

Training

Forget linguistics – let’s look for statistical

patterns in bilingual texts.

How?

It’s All about the Patterns

car

car

English text

Auto

Auto

German text

Mein Auto ist rot.

My car is red.

decode

car

Wagen

src -> trg | prob

car -> Auto | 0.9

car -> Wagen | 0.1

The Translation Model finds similarities (patterns) between source and target

languages.

…But you still need a “proofreader”

The Language Model makes it sound “natural”.

My car is red.

English text

My car drives fast

You drive my car

I drive my car

N-gram count

my 4

car 4

is 1

… …

my car 4

… …

drive my car 2

… …

Statistical MT Limits

- OOV (out of vocabulary) words: often “out of domain”

- Idioms:

- Word order problems

Neural Machine Translation

What is Neural Machine Translation?

A particular application of Neural Networks

Neural Networks

MT

Self-Driving Cars

Etc.Script

Recognition

Price Prediction

Some Definitions

AI: A branch of computer science dealing with the

simulation of intelligent behavior in computers.

Machine Learning:

A type of AI that provides computers with the ability

to learn without being explicitly programmed.

Neural Networks: A ML data approach consisting of a

large number of simple, high-interconnected processing

elements (artificial neurons) in an architecture inspired by

the structure of the cerebral cortex of the brain.

How does it work?

Source words are

converted to numbers

and added up (encoded)

to produce a final score

for the whole sentence,

which is then decoded

to the target

2 Parts:

Encoder and Decoder

A Closer Look

1

1

0.5

0.9

1.3

INPUT

LAYER

INTERMEDIATE

(HIDDEN)

LAYER

OUTPUT

LAYER

weightsactivation

function

0.79

0.4

0

weights

0.8

0.2

0.3

0.9

0.5

1

0.73

0.8

0.69

Neural MT has great potential

Vector values keep track of long connections (as opposed to SMT’s n-grams)

Will it be a game changer for translators? We’ll get back to this.

MT AT EBAY

25

Who we are

erspective.

“The world’s

marketplace,

where the world

goes to shop,

sell, and give.”

$2.2BRevenue in Q2 2016

$20.1BGMV in Q2 2016

165MGlobal Active Buyers

56%International

revenue

Q3 2016 data

$9.4BMobile Volume

337MApp downloads

eBay by the Numbers

TRUE GLOBAL COMMERCE

of eBay’s business

is international56%

of commercial

sellers engage in

exporting

95%

27

Localized languages13Countries with an

eBay site +30

Why eBay needs MT

Tim

e t

o M

arke

t

Word Volume

Leg

alLegal

Marketing

Help /

User

Documentation

SW

UI

Member

Communication

(e-mail,Forums)

eBay

Seller Listings

1k 10k 100k 1M

No rush

Asap

(MT-

ready)

The Time-to-Market Issue

Use Cases for MT at eBay

MT at eBay. Linguist’s Perspective. 29

• Search Queries

(eBay MT, automatic)

• Item Titles

(eBay MT, automatic)

• Item Descriptions

(on demand)

• Product Descriptions

(eBay MT, coming up)

• Product Reviews

(eBay MT, coming up)

30

Challenges for MT at eBay

1. Variety of context:

~12K categories on ebay.com

30MT at eBay. Linguist’s Perspective.

31

334

Challenges for MT at eBay 2. User-generated content:

31

• Spelling errors/typos/mixed languagesansung samsug samsumg samung amsung samnsung smsung samsuns …

• SyntaxChattanooga Intelect Xt Vectra 2 Channel Emg Stim Chiropractic Physical Therapy

• Improper, broken Englishull buy em rii nah thru paypal you will buy them right now through PayPal

• Ambiguous brand namesGreen Apple iPhone 6 = Manzana verde iPhone 6?

MT at eBay. Linguist’s Perspective.

ENTER THE MTLS

Date of team creation as part of eBay’s MT initiative

The MTLS by the Numbers

2013

69

Linguists based in the US and Germany

Languages supported: US English, UK English, French, German, Italian, Russian, Brazilian-Portuguese, and Latin American / European Spanish

We are a Hybrid Team

MT Science

TeamL10n MTLS

WHAT DO WE DO?

MTLS ≠ Not Your Regular Linguist

36

Raw MT output

Vendor postedits

MTLS reviewData fed into

the engine

Training data:

Testing data:

Source textVendor

translatesMTLS review

Data used for reference

Vendor Review: Workflow

- We need to process very large volumes.

Vendor Review: Scale

4.5M words in 2016(estimate)

Massive Volumes x Limited Resources = Inventiveness

Our guiding principle: Adding Value

Automation (with OS tools)

Integrating QA Upstream

High-value QA: Intelligent sampling

Error pattern detection

Targeted terminology

Scalability (modular guidelines, trainings)

Examples: Patterns

We use Regular Expressions to locate errors: Plurals

cantos?, cases?, bab(y|ies) Replacing accents

câmera, camera > c.mera Gender agreements nov(o|a)

Synonyms celular – 1332 queries - (cell|phone|mobile) cell – 635, phone – 655, mobile – 474 does not contain any – 56 (only 4%)

Units of measurement contains a digit +”in” and the translation is not there – 5 in <> 5 pol

Detecting acronyms [A-Z]{2,4}

Examples: High-value Terms

Specialized acronyms (NWT, BNWT, NOB…)

Ambiguous brand names

Polysemous words

We add value by improving the most strategic asset:

Linguistic QA

Mistranslated queries = bad search results = less sales

We perform Linguistic QA on MT systems.

queries

We check top unique queries

подарок 8 марта

Russian Shopper’sQuery

Literal MT translation

Corrected MT translation

March 8 gift122 matches

Mother’s Day Gift105,000 matches

Example

Ranking: Comparing the qualityof 2 or more MT systems

Human Judgement: Ranking and Rating MT Systems

Rating: Assigning a qualityscore to the output of a MT system

Sometimes combined.

Just like with post-editing, theactual evaluation work is sent to Vendors.

Human Judgement: Ranking and Rating MT Systems

We add value by QA’ing ourvendors’ results (intra-annotator, inter-annotatoragreement).

Example – tagging an eBay listing title:

Reviewed by MTLS to ensure quality

- Used to identify:

- Brands

- Main item in the listing

- Important aspects of the item (color, material, texture, etc.)

NER: Providing QA for Semantic Annotation

Pottery \& China 380990996167 eBay Google Herend Hungary Handpainted Porcelain QUEEN VICTORIA Leaf Dish Flowers Butterfly

b g m as as su t su/ su

Named Entity Recognition (NER) is the process of tagging words as semantic entities that will be used to improve MT performance.

NER: Providing QA for Semantic Annotation

We add value by providing

targeted vendor QA in

2 stages:

1) Sample vendor’s work at

regular intervals

2) Target tokens (words) likely to cause problems. E.g. we filter tokens by:

- Tagged with multiple labels (e.g. 7 times with “a”, 4 with “g”, etc)

- Tagged only sometimes

- That are polysemous

Innovation

A real-life problem

1) eBay listings have about 20,000 frequent acronyms (NOB, NWT, etc.).

2) The MT engine used to create training data doesn’t know most of them, so it just inserts them in the target text “as is”.NWT White Lace Dress Size XXNWT Vestido Encaje Blanco Talla XX

3) This means our vendor post-editors have to spend a long time researching.

4) Researching the equivalent acronym for each language would take too long.

What do you do?

THE MTLS PROFILE

What makes a good MTLS?

The Human Side of MT

Translator skills

- Linguistic knowledge: command of source and target language grammar and style

- Cultural knowledge: at ease in two worlds (US and target language)

Post-editor skills

- Adaptability to different translation quality requirements

- Speed: To process MT’s vast amounts of output

MTLS-specific skills

- Analytical mind: can detect patterns

- Excellent communicator: Interfaces with MT Science Team and with vendors –“translates” between both

- Versatile: our MTLS have to perform many kinds of tasks (ranking, rating, semantic annotation)

- Process improver: - Analyzes QA process to improve it

- Constantly learns to find new applications to his work

A Particular Set of Skills

THE FUTURE OF LINGUISTS IN MT: NEURAL MT AND BEYOND

Neural MT is Different…

Statistical MT is a White Box technology Neural MT is Black Box

Translation Model

Language Model

Alignment

Others

…But our Role Stays the Same

The Machine Learning Flow will always have Linguist-shaped gaps

…Put Another Way

1900-1980

Translator

PCsWord Processors

TMs, TDs

Internet

MT

Data Science

1980-1990 1990-2007 2007-2015 2015-…

MTLS?

Language Data

Specialist?

Linguistic Trainer?

- Core Linguistic Work:Review & regular LQA

- MT work:Human JudgementLarge-scale QA

- Data Science:Continue and expand semantic annotation services beyond Named Entities (name-value pairs, polysemous words…)

- Innovation: Identify quality gaps and provide data sets fix them (e.g. profanities, idioms, etc.)

…Future Tasks for Linguists in MT

56MT at eBay. Linguist’s Perspective.

Join the conversation: eBay MT Language Specialist Series (https://www.linkedin.com/groups/7011515): >40 articles on MT from a translator’s perspective

…Want to Know More?

57MT at eBay. Linguist’s Perspective.

Visit us at the eBay Tech Blog (http://www.ebaytechblog.com/category/machine-translation/ )

…Want to Know More?

….Or Just Write Me a Letter

[email protected]

Q&A