Beyond Post-Editing: The Work of the eBay MTLS

59
Beyond Post-Editing How the eBay MT Language Specialists Reinvent the Linguist’s Role November 2016 Jose Luis Bonilla Sánchez, eBay MTLS Supervisor

Transcript of Beyond Post-Editing: The Work of the eBay MTLS

Page 1: Beyond Post-Editing: The Work of the eBay MTLS

Beyond Post-EditingHow the eBay MT Language Specialists Reinvent the Linguist’s Role

November 2016

Jose Luis Bonilla Sánchez, eBay MTLS Supervisor

Page 2: Beyond Post-Editing: The Work of the eBay MTLS

This presentation is about…

MeMachine Translation- Different views- A brief history- MT at eBayThe MTLS- Their place in L10n- Tasks- ProfileThe Future

Page 3: Beyond Post-Editing: The Work of the eBay MTLS

Who am I?

Page 4: Beyond Post-Editing: The Work of the eBay MTLS

My Journey

EBAY 1

EFI

ISP AMSTERDAM

GRANADA UNIVERSITY

TLT MADRID

Senior LSLead Translator

BA

Translation

Interpreting

Translator & PjM

LQA

Engineer

EBAY 2

APPLE

MONSTER

MTLS Supervisor

Knowledge

Engineer

LQA

Engineer

SILICON VALLEY LINE

SPAIN LINE

NETHERLANDS LINE

Page 5: Beyond Post-Editing: The Work of the eBay MTLS

The Views on MT

Page 6: Beyond Post-Editing: The Work of the eBay MTLS

The Nightmare Scenario

Page 7: Beyond Post-Editing: The Work of the eBay MTLS

How I see it

Page 8: Beyond Post-Editing: The Work of the eBay MTLS

A Little History

Page 9: Beyond Post-Editing: The Work of the eBay MTLS

The “MTree”

Rule-Based MT

Statistical

MT

Phrase-Based

Word-Based

Neural MT

Page 10: Beyond Post-Editing: The Work of the eBay MTLS

Rule-Based MT

Page 11: Beyond Post-Editing: The Work of the eBay MTLS

The RBMT Workflow

We Write The Rules

Source

Text

Translation

Lexicographic

Analysis

Syntactic

Analysis

Morphological

Analysis

Target

Text

Page 12: Beyond Post-Editing: The Work of the eBay MTLS

The Limits

- Too laborious

- Too unique

- Hard limits

Page 13: Beyond Post-Editing: The Work of the eBay MTLS

Statistical Machine Translation: Cracking the Code

Page 14: Beyond Post-Editing: The Work of the eBay MTLS

How to Crack the Code

Data

Translation

(search for best

possible

translation)

Text(input)

Text(output)

Language

Model

Translation

Model

Training

Forget linguistics – let’s look for statistical

patterns in bilingual texts.

How?

Page 15: Beyond Post-Editing: The Work of the eBay MTLS

It’s All about the Patterns

car

car

English text

Auto

Auto

German text

Mein Auto ist rot.

My car is red.

decode

car

Wagen

src -> trg | prob

car -> Auto | 0.9

car -> Wagen | 0.1

The Translation Model finds similarities (patterns) between source and target

languages.

Page 16: Beyond Post-Editing: The Work of the eBay MTLS

…But you still need a “proofreader”

The Language Model makes it sound “natural”.

My car is red.

English text

My car drives fast

You drive my car

I drive my car

N-gram count

my 4

car 4

is 1

… …

my car 4

… …

drive my car 2

… …

Page 17: Beyond Post-Editing: The Work of the eBay MTLS

Statistical MT Limits

- OOV (out of vocabulary) words: often “out of domain”

- Idioms:

- Word order problems

Page 18: Beyond Post-Editing: The Work of the eBay MTLS

Neural Machine Translation

Page 19: Beyond Post-Editing: The Work of the eBay MTLS

What is Neural Machine Translation?

A particular application of Neural Networks

Neural Networks

MT

Self-Driving Cars

Etc.Script

Recognition

Price Prediction

Page 20: Beyond Post-Editing: The Work of the eBay MTLS

Some Definitions

AI: A branch of computer science dealing with the

simulation of intelligent behavior in computers.

Machine Learning:

A type of AI that provides computers with the ability

to learn without being explicitly programmed.

Neural Networks: A ML data approach consisting of a

large number of simple, high-interconnected processing

elements (artificial neurons) in an architecture inspired by

the structure of the cerebral cortex of the brain.

Page 21: Beyond Post-Editing: The Work of the eBay MTLS

How does it work?

Source words are

converted to numbers

and added up (encoded)

to produce a final score

for the whole sentence,

which is then decoded

to the target

2 Parts:

Encoder and Decoder

Page 22: Beyond Post-Editing: The Work of the eBay MTLS

A Closer Look

1

1

0.5

0.9

1.3

INPUT

LAYER

INTERMEDIATE

(HIDDEN)

LAYER

OUTPUT

LAYER

weightsactivation

function

0.79

0.4

0

weights

0.8

0.2

0.3

0.9

0.5

1

0.73

0.8

0.69

Page 23: Beyond Post-Editing: The Work of the eBay MTLS

Neural MT has great potential

Vector values keep track of long connections (as opposed to SMT’s n-grams)

Will it be a game changer for translators? We’ll get back to this.

Page 24: Beyond Post-Editing: The Work of the eBay MTLS

MT AT EBAY

Page 25: Beyond Post-Editing: The Work of the eBay MTLS

25

Who we are

erspective.

“The world’s

marketplace,

where the world

goes to shop,

sell, and give.”

Page 26: Beyond Post-Editing: The Work of the eBay MTLS

$2.2BRevenue in Q2 2016

$20.1BGMV in Q2 2016

165MGlobal Active Buyers

56%International

revenue

Q3 2016 data

$9.4BMobile Volume

337MApp downloads

eBay by the Numbers

Page 27: Beyond Post-Editing: The Work of the eBay MTLS

TRUE GLOBAL COMMERCE

of eBay’s business

is international56%

of commercial

sellers engage in

exporting

95%

27

Localized languages13Countries with an

eBay site +30

Page 28: Beyond Post-Editing: The Work of the eBay MTLS

Why eBay needs MT

Tim

e t

o M

arke

t

Word Volume

Leg

alLegal

Marketing

Help /

User

Documentation

SW

UI

Member

Communication

(e-mail,Forums)

eBay

Seller Listings

1k 10k 100k 1M

No rush

Asap

(MT-

ready)

The Time-to-Market Issue

Page 29: Beyond Post-Editing: The Work of the eBay MTLS

Use Cases for MT at eBay

MT at eBay. Linguist’s Perspective. 29

• Search Queries

(eBay MT, automatic)

• Item Titles

(eBay MT, automatic)

• Item Descriptions

(on demand)

• Product Descriptions

(eBay MT, coming up)

• Product Reviews

(eBay MT, coming up)

Page 30: Beyond Post-Editing: The Work of the eBay MTLS

30

Challenges for MT at eBay

1. Variety of context:

~12K categories on ebay.com

30MT at eBay. Linguist’s Perspective.

Page 31: Beyond Post-Editing: The Work of the eBay MTLS

31

334

Challenges for MT at eBay 2. User-generated content:

31

• Spelling errors/typos/mixed languagesansung samsug samsumg samung amsung samnsung smsung samsuns …

• SyntaxChattanooga Intelect Xt Vectra 2 Channel Emg Stim Chiropractic Physical Therapy

• Improper, broken Englishull buy em rii nah thru paypal you will buy them right now through PayPal

• Ambiguous brand namesGreen Apple iPhone 6 = Manzana verde iPhone 6?

MT at eBay. Linguist’s Perspective.

Page 32: Beyond Post-Editing: The Work of the eBay MTLS

ENTER THE MTLS

Page 33: Beyond Post-Editing: The Work of the eBay MTLS

Date of team creation as part of eBay’s MT initiative

The MTLS by the Numbers

2013

69

Linguists based in the US and Germany

Languages supported: US English, UK English, French, German, Italian, Russian, Brazilian-Portuguese, and Latin American / European Spanish

Page 34: Beyond Post-Editing: The Work of the eBay MTLS

We are a Hybrid Team

MT Science

TeamL10n MTLS

Page 35: Beyond Post-Editing: The Work of the eBay MTLS

WHAT DO WE DO?

Page 36: Beyond Post-Editing: The Work of the eBay MTLS

MTLS ≠ Not Your Regular Linguist

36

Page 37: Beyond Post-Editing: The Work of the eBay MTLS

Raw MT output

Vendor postedits

MTLS reviewData fed into

the engine

Training data:

Testing data:

Source textVendor

translatesMTLS review

Data used for reference

Vendor Review: Workflow

Page 38: Beyond Post-Editing: The Work of the eBay MTLS

- We need to process very large volumes.

Vendor Review: Scale

4.5M words in 2016(estimate)

Page 39: Beyond Post-Editing: The Work of the eBay MTLS

Massive Volumes x Limited Resources = Inventiveness

Our guiding principle: Adding Value

Automation (with OS tools)

Integrating QA Upstream

High-value QA: Intelligent sampling

Error pattern detection

Targeted terminology

Scalability (modular guidelines, trainings)

Page 40: Beyond Post-Editing: The Work of the eBay MTLS

Examples: Patterns

We use Regular Expressions to locate errors: Plurals

cantos?, cases?, bab(y|ies) Replacing accents

câmera, camera > c.mera Gender agreements nov(o|a)

Synonyms celular – 1332 queries - (cell|phone|mobile) cell – 635, phone – 655, mobile – 474 does not contain any – 56 (only 4%)

Units of measurement contains a digit +”in” and the translation is not there – 5 in <> 5 pol

Detecting acronyms [A-Z]{2,4}

Page 41: Beyond Post-Editing: The Work of the eBay MTLS

Examples: High-value Terms

Specialized acronyms (NWT, BNWT, NOB…)

Ambiguous brand names

Polysemous words

Page 42: Beyond Post-Editing: The Work of the eBay MTLS

We add value by improving the most strategic asset:

Linguistic QA

Mistranslated queries = bad search results = less sales

We perform Linguistic QA on MT systems.

queries

We check top unique queries

подарок 8 марта

Russian Shopper’sQuery

Literal MT translation

Corrected MT translation

March 8 gift122 matches

Mother’s Day Gift105,000 matches

Example

Page 43: Beyond Post-Editing: The Work of the eBay MTLS

Ranking: Comparing the qualityof 2 or more MT systems

Human Judgement: Ranking and Rating MT Systems

Rating: Assigning a qualityscore to the output of a MT system

Sometimes combined.

Page 44: Beyond Post-Editing: The Work of the eBay MTLS

Just like with post-editing, theactual evaluation work is sent to Vendors.

Human Judgement: Ranking and Rating MT Systems

We add value by QA’ing ourvendors’ results (intra-annotator, inter-annotatoragreement).

Page 45: Beyond Post-Editing: The Work of the eBay MTLS

Example – tagging an eBay listing title:

Reviewed by MTLS to ensure quality

- Used to identify:

- Brands

- Main item in the listing

- Important aspects of the item (color, material, texture, etc.)

NER: Providing QA for Semantic Annotation

Pottery \& China 380990996167 eBay Google Herend Hungary Handpainted Porcelain QUEEN VICTORIA Leaf Dish Flowers Butterfly

b g m as as su t su/ su

Named Entity Recognition (NER) is the process of tagging words as semantic entities that will be used to improve MT performance.

Page 46: Beyond Post-Editing: The Work of the eBay MTLS

NER: Providing QA for Semantic Annotation

We add value by providing

targeted vendor QA in

2 stages:

1) Sample vendor’s work at

regular intervals

2) Target tokens (words) likely to cause problems. E.g. we filter tokens by:

- Tagged with multiple labels (e.g. 7 times with “a”, 4 with “g”, etc)

- Tagged only sometimes

- That are polysemous

Page 47: Beyond Post-Editing: The Work of the eBay MTLS

Innovation

A real-life problem

1) eBay listings have about 20,000 frequent acronyms (NOB, NWT, etc.).

2) The MT engine used to create training data doesn’t know most of them, so it just inserts them in the target text “as is”.NWT White Lace Dress Size XXNWT Vestido Encaje Blanco Talla XX

3) This means our vendor post-editors have to spend a long time researching.

4) Researching the equivalent acronym for each language would take too long.

What do you do?

Page 48: Beyond Post-Editing: The Work of the eBay MTLS

THE MTLS PROFILE

Page 49: Beyond Post-Editing: The Work of the eBay MTLS

What makes a good MTLS?

Page 50: Beyond Post-Editing: The Work of the eBay MTLS

The Human Side of MT

Translator skills

- Linguistic knowledge: command of source and target language grammar and style

- Cultural knowledge: at ease in two worlds (US and target language)

Post-editor skills

- Adaptability to different translation quality requirements

- Speed: To process MT’s vast amounts of output

MTLS-specific skills

- Analytical mind: can detect patterns

- Excellent communicator: Interfaces with MT Science Team and with vendors –“translates” between both

- Versatile: our MTLS have to perform many kinds of tasks (ranking, rating, semantic annotation)

- Process improver: - Analyzes QA process to improve it

- Constantly learns to find new applications to his work

A Particular Set of Skills

Page 51: Beyond Post-Editing: The Work of the eBay MTLS

THE FUTURE OF LINGUISTS IN MT: NEURAL MT AND BEYOND

Page 52: Beyond Post-Editing: The Work of the eBay MTLS

Neural MT is Different…

Statistical MT is a White Box technology Neural MT is Black Box

Translation Model

Language Model

Alignment

Others

Page 53: Beyond Post-Editing: The Work of the eBay MTLS

…But our Role Stays the Same

The Machine Learning Flow will always have Linguist-shaped gaps

Page 54: Beyond Post-Editing: The Work of the eBay MTLS

…Put Another Way

1900-1980

Translator

PCsWord Processors

TMs, TDs

Internet

MT

Data Science

1980-1990 1990-2007 2007-2015 2015-…

MTLS?

Language Data

Specialist?

Linguistic Trainer?

Page 55: Beyond Post-Editing: The Work of the eBay MTLS

- Core Linguistic Work:Review & regular LQA

- MT work:Human JudgementLarge-scale QA

- Data Science:Continue and expand semantic annotation services beyond Named Entities (name-value pairs, polysemous words…)

- Innovation: Identify quality gaps and provide data sets fix them (e.g. profanities, idioms, etc.)

…Future Tasks for Linguists in MT

Page 56: Beyond Post-Editing: The Work of the eBay MTLS

56MT at eBay. Linguist’s Perspective.

Join the conversation: eBay MT Language Specialist Series (https://www.linkedin.com/groups/7011515): >40 articles on MT from a translator’s perspective

…Want to Know More?

Page 57: Beyond Post-Editing: The Work of the eBay MTLS

57MT at eBay. Linguist’s Perspective.

Visit us at the eBay Tech Blog (http://www.ebaytechblog.com/category/machine-translation/ )

…Want to Know More?

Page 58: Beyond Post-Editing: The Work of the eBay MTLS

….Or Just Write Me a Letter

[email protected]

Page 59: Beyond Post-Editing: The Work of the eBay MTLS

Q&A