Translation by Collaboration among Monolingual Users Benjamin B. Bederson bederson @bederson...

49
Translation by Collaboration among Monolingual Users Benjamin B. Bederson www.cs.umd.edu/~bederson @bederson Computer Science Department Human-Computer Interaction Lab Institute for Advanced Computer Studies iSchool University of Maryland
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of Translation by Collaboration among Monolingual Users Benjamin B. Bederson bederson @bederson...

Translation by Collaboration among Monolingual UsersBenjamin B. Bederson

www.cs.umd.edu/~bederson@bederson

Computer Science DepartmentHuman-Computer Interaction Lab

Institute for Advanced Computer StudiesiSchool

University of Maryland

Programmer User Social Participant

Computational Participant

Human Computation

ThingsHUMANS

can do

ThingsCOMPUTERS

can do

Translation

Photo tagging

Face recognition

Human detection

Speech recognition

Text analysis

Planning

Human Computation Taxonomy

SocialComputing

Data Mining

Collective Intelligence

Crowdsourcing

HumanComputation

The problem of translation

Source: Global Reach, Internet World Stats

Languages on Internet by Population

English28%

Chinese23%

Spanish8%

Japanese5%

the rest37%

2009

English32%

Chinese21%

Spanish8%

Japanese8%

the rest31%

2005

English52%

Chinese5%

Spanish5%

Japanese9%

the rest29%

2000

A real-world problem

International Children’s Digital Library

www.childrenslibrary.org

A real-world problem: ICDL

Now:– ~5,000 books– 55 languages– Some translations in a few

languages– 3,000 volunteer translators– 100K unique visitors/month

Goal:– 10,000 books– 100 languages– Every book in every

language!

www.childrenslibrary.org

The space of solutions

Machine Translation (MT)

Large volume, cheap, fast Unreliable quality

Professional Translators

High quality, but slow and expensive(even for common language pairs)

Amateur Translators

Online Labor Markets

The key idea

Translation with the Crowd

vs. 1,200,000 contributors Wikipedia: 900 translators

Translate with the Monolingual Crowd

Quality

Spee

d /

Affor

dabi

lity

MachineTranslationMachineTranslation

Professional Bilingual Human ParticipationProfessional Bilingual Human Participation

Amateur Bilingual Human ParticipationAmateur Bilingual Human Participation

MonolingualHumanParticipation

Monolingual collaboration

Target LanguageMT

repeat …

Source Language

Original Sentence Translation Candidate

CrowdTasks:

1 Vote

2 Identify translation errors

3 Create new translationcandidates

1 Vote

3 Paraphrase source sentence

2 Explain errors

CrowdTasks:

New candidate

12

3

MT and

word alignment…

MT andword alignment

Explanation

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Mary

Sees: In general, it means well, both.MT

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

MT

MT

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

Sees: In general, we get along fine.

MT

MT

MT

enrichment

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Sees: En général, nous sommes de bons amis.(lit. In general, we are good friends.)

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

Sees: In general, we get along fine.

Edits into: In general, we are good friends.

MT

MT

MT

MT

enrichment

PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)

Sees: En général, Il est à la fois de nous.

Edits into: En général, nous nous entendons bien.

(lit. In general, we get along well.)

Sees: En général, nous sommes de bons amis.(lit. In general, we are good friends.)

Proposes to stop with current translation

Mary

Sees: In general, it means well, both.

Edits into: In general, it is about both of us.

Sees: In general, we get along fine.

Edits into: In general, we are good friends.

Agrees to stop with current translation

MT

MT

MT

MT

enrichment

Target Side - Vote

Target Side - Identify Errors

Target Side - Edit Translations

Source Side – Explain Errors

Source Side – Vote & Confirm

What we’ve accomplished so far

Experiment 1• 60 Spanish / 22 German speakers• ICDL volunteers• Worked on

– 4 Spanish books => German– 1 German book => Spanish

TranslateTheWorld.org

Evaluation• 2 German-Spanish bilingual evaluators• Fluency and adequacy: 5-point score• Compared Google Translate and MonoTrans2

Results - Fluency

1 2 3 4 50

25

50

75

100

125

150

Google MonoTrans2

# of

sen

tenc

es

Results - Fluency

1 2 3 4 50

25

50

75

100

125

150

Google MonoTrans2

# of

sen

tenc

es

Results - Accuracy

1 2 3 4 50

25

50

75

100

125

150

Google MonoTrans2

# of

Sen

tenc

es

Results - Accuracy

1 2 3 4 50

25

50

75

100

125

150

Google MonoTrans2

# of

Sen

tenc

es

Punchline

Google MonoTrans2

Sentences with fluency = 5 21 112

Sentences with accuracy = 5 17 118

Sentences where BOTH = 5 17 110

Sentences for which both bilingual evaluators agree score = 5

(N=162 sentences worked on in the experiment)

Straight MT: 10% of sentences ready for prime time

MonoTrans2: 68% of sentences ready for prime time

Experiment 2

• An alternative use case for crowdsourced translation… Fanmi mwen nan Kafou, 24

Cote Plage, 41A bezwen manje ak dlo

Moun kwense nan Sakre Kè nan Pòtoprens

Ti ekipman Lopital General genyen yo paka minm fè 24 è

Fanm gen tranche pou fè yon pitit nan Delmas 31

Munro, Robert. 2010. Crowdsourced translation for emergency response and beyond. NSF Workshop on crowdsourcing and translation, University of Maryland.

My family in Carrefour, 24 Cote Plage, 41A needs food and water

People trapped in Sacred Heart Church, PauP

General Hospital has less than 24 hrs. supplies

Undergoing children delivery Delmas 31

Experiment 2

• An alternative use case for crowdsourced translation…

Munro, Robert. 2010. Crowdsourced translation for emergency response and beyond. NSF Workshop on crowdsourcing and translation, University of Maryland.

TranslateTheWorld.org

Fluency Distribution

Adequacy Distribution

Punchline

Google MonoTrans2

Sentences with fluency = 5 1 (1%) 22 (30%)

Sentences with adequacy = 5 11 (14%) 29 (38%)

Sentences where BOTH = 5 0 (0%) 14 (18%)

Sentences for which both bilingual evaluators agree score = 5

(N=76 sentences completed)

Straight MT: 0% of sentences preserve all the meaning

MonoTrans2: 38% of sentences preserve all the meaning

Scaling Up

Live for one week:• 137,000 page views• 1,900 task submissions• 19 secs per task

Example

Copying is the sincerest form of flattery…

Toward a more general architecture

Joining forces with Chris Callison-Burch, Johns Hopkins University

Take-aways

• By combining – machine translation technology– human-computer interfaces– Crowdsourcing

it is possible to achieve accurate translation without bilingual human expertise.

Participating Students:

Chang HuCS Ph.D. student

Alex QuinnCS Ph.D. student

Vlad EidelmanCS Ph.D. student

Yakov KronrodLinguistics Ph.D. student

Olivia BuzekCS/Linguistics undergrad

New Paradigms…

Human Comp.

Comp. Ling.

HCI

TranslateTheWorld.org

Philip ResnikProfessor

LinguisticsInstitute of Advanced

Computer Studies