From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

37
From CasMaCat to SEECAT Patterns of Interaction in Advanced Computer Assisted Translation Michael Carl CRITT, Copenhagen Business School Moscow, April, 2014

description

Слайды к выступлению доцента Копенгагенской школы бизнеса (Copenhagen Business School) Михаэля Карла, в котором он рассказал о новейших разработках в области машинного перевода, в частности, о системе CasMaCat, в которой применяются интерактивные методы взаимодействия с пользователем.

Transcript of From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Page 1: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

From CasMaCat to SEECAT

Patterns of Interaction in

Advanced Computer Assisted Translation

Michael Carl

CRITT, Copenhagen Business School

Moscow, April, 2014

Page 2: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Overview

Post-editing Patterns in CasMaCat

Prototype-I: From scratch translation vs. PE

Prototype-II: IMT and advanced PE

Activity Patterns in Post-editing

SEECAT summer project 2013

Extend CasMaCat prototype with speech and gaze input

Page 3: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Source Text Window

Light bulbs and Camera

Target Text Window

Page 4: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Translation Progression Graphs

Page 5: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

CASMACAT Prototype-I (2012)

Page 6: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Experiment 1: Prototype-I

Time saving: PEMT vs. translation from scratch

Domain: newspaper article

Languages: EN → ES

1) Target Segments empty: from-scratch translation

2) Target Segments filled with pre-translated MT output

Moses, trained on news texts

Average time saving of 25% for PEMT

Page 7: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

translating (grey), post-editing (black), in words per hour

Productivity per Participant (Elming, Winther-Balling & Carl, 2014)

Page 8: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Production time ratio (vertical) Post-editing keystroke ratio (horizontal) (Elming, Winther-Balling & Carl, 2014)

Page 9: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

translation (grey),

post-editing (black)

Gaze Point Distribution Across Windows Percentage Spent on each Window (Elming, Winther-Balling & Carl, forthcoming)

Page 10: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Translation production (1) (Winther-Balling & Carl, 2014)

Translation task: from-scratch translation

takes (almost) always takes longer than post-

editing.

Inefficiency: the more keystrokes are

produced the longer it takes to produce the

translation.

Alternating processing: shifting attention

frequently between different areas (TT, ST

keyboard) is time-consuming.

Page 11: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Translation production (2) (Winther-Balling & Carl, 2014)

Average word frequency: lower word

frequency results in slower production time;

this tendency is more pronounced for student

translators.

Number of different possible translations:

high translation ambiguity has a slow-down

effect only in post-editing.

Alignment crossing: crossing distance has

significant effects only for post-editing

German and Spanish.

Page 12: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

CASMACAT Prototype-II (2013)

Page 13: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Usage of advanced IMT

Nine Post-editors

Three datasets (3,000 words each)

Three different CASMACAT configurations:

1) Traditional post-editing (without IMT)

2) Post-editing using IMT

3) Post-editing using advanced IMT (featuring: word/cursor alignment & prediction length control).

Page 14: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation
Page 15: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation
Page 16: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Style 1: Read Target - Check Source

Page 17: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Style 2: Read Source - Check Target

Page 18: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Style 3: Monolingual Post-Editing

Page 19: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Style 4: Read Target - Check Source - Consult previous Segment

Page 20: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Qualitative Assessment of PE-Styles

Page 21: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Post-editing with CASMACAT-II Progression Graph

Page 22: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Row 1 Row 2 Row 3 Row 4

0

2

4

6

8

10

12

Column 1

Column 2

Column 3

Style 2: Read Source - Check Target

Page 23: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Style 3: Monolingual Post-editing

Page 24: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Style 4: Read Target - Check Source - Consult previous Segment

Page 25: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

SEECAT http://bridge.cbs.dk/prototype2/seecat_speech/

Page 26: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Speaking your translation

More than 4 times quicker (Brown et al 1994)

Up to 6 times (Dragsted, et al, 2011)

Using Dragon speech

44% faster if ASR error rate < 4% (Desilets et al, 2008)

Based on estimation

None of the studies used GUI, no use of gaze data

Page 27: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

SEECAT - Speech & Eye-Tracking Enabled CAT

Use speech input as a post-editing tool in order to enhance efficiency for language translators.

Use eyetracker to synchronize reading and speaking with the MT output, for positioning of input cursor.

Demonstrate increase in translation throughput using speech input for post-editing over a system without speech input.

Page 28: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

CASMACAT PEMT

Page 29: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

CASMACAT PEMT

Page 30: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

SEECAT: Workbench

SPANISH typing + speech

SPANISH typing + speech - 100% accurate

HINDI speech - with inaccuracies

GIVE IT A TRY:

http://bridge.cbs.dk/prototype2/seecat_speech/

ASR: English, Hindi, Spanish

Page 31: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

PRE-PILOT EXPERIMENTS (I)

Subjects: 2 participants

Text type: tourism domain (6 texts - 10 segments).

Language pair: English to Spanish

Dependent variable: TIME (productivity gain)

Tasks:

i. Translation from scratch through typing (only keyboard)

ii. Translation from scratch through ASR (only speech)

iii. Post-editing through typing (only keyboard)

iv. Post-editing through ASR (only speech)

v. Translation from scratch through typing + ASR

vi. Post-editing through typing + ASR

34

Page 32: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation
Page 33: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

PRE-PILOT EXPERIMENTS (III)

TIME per segment (seconds) – Participant 01

36

Page 34: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

PRE-PILOT EXPERIMENTS (IV)

TIME per segment (seconds) – Participant 02

37

Page 35: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Conclusions

Translation Process Research is an active field of

research

Multi-modal input can help to improve productivity

both in translation and post-editing.

Further experimentation is needed to:

Understand cognitive processes

Provide better support for translators

Maximize productivity, and quality, with less effort

38

Page 36: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation
Page 37: From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

References Jesus Gonzalez-Rubio, Daniel Ortiz, Jose Miguel Bened, Francisco Casacuberta.

Interactive Machine Translation using Hierarchical Translation Models. Proceedings

of the Conference on Empirical Methods in Natural Language Processing

(EMNLP13). October 18-21, 2013 Seattle, USA.

Vicent Alabau, Ragnar Bonk, Christian Buck, Michael Carl, Francisco Casacuberta,

Mercedes Garcia-Martinez, Jesus Gonzalez, Philipp Koehn, Luis Leiva, Bartolome

Mesa-Lao, Daniel Ortiz, Herve Saint-Amand, German Sanchis, Chara Tsoukala:

"CASMACAT: An Open Source Workbench for Advanced Computer Aided

Translation", The Prague Bulletin of Mathematical Linguistics, Number 100, October

2013, pages 101-112.

Elming, Jakob, Michael Carl, and Laura Winther Balling. Investigating User

Behaviour in Post-editing and Translation Using the CASMACAT Workbench.” In

Expertise in Post-editing: Processes, Technology and Applications, edited by

Sharon O’Brien, Michael Simard, Lucia Specia, Michael Carl and Laura Winther

Balling. Cambridge Scholars Publishing

40