From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

From CasMaCat to SEECAT

Patterns of Interaction in

Advanced Computer Assisted Translation

Michael Carl

CRITT, Copenhagen Business School

Moscow, April, 2014

Overview

Post-editing Patterns in CasMaCat

Prototype-I: From scratch translation vs. PE

Prototype-II: IMT and advanced PE

Activity Patterns in Post-editing

SEECAT summer project 2013

Extend CasMaCat prototype with speech and gaze input

Source Text Window

Light bulbs and Camera

Target Text Window

Translation Progression Graphs

CASMACAT Prototype-I (2012)

Experiment 1: Prototype-I

Time saving: PEMT vs. translation from scratch

Domain: newspaper article

Languages: EN → ES

1) Target Segments empty: from-scratch translation

2) Target Segments filled with pre-translated MT output

Moses, trained on news texts

Average time saving of 25% for PEMT

translating (grey), post-editing (black), in words per hour

Productivity per Participant (Elming, Winther-Balling & Carl, 2014)

Production time ratio (vertical) Post-editing keystroke ratio (horizontal) (Elming, Winther-Balling & Carl, 2014)

translation (grey),

post-editing (black)

Gaze Point Distribution Across Windows Percentage Spent on each Window (Elming, Winther-Balling & Carl, forthcoming)

Translation production (1) (Winther-Balling & Carl, 2014)

Translation task: from-scratch translation

takes (almost) always takes longer than post-

editing.

Inefficiency: the more keystrokes are

produced the longer it takes to produce the

translation.

Alternating processing: shifting attention

frequently between different areas (TT, ST

keyboard) is time-consuming.

Translation production (2) (Winther-Balling & Carl, 2014)

Average word frequency: lower word

frequency results in slower production time;

this tendency is more pronounced for student

translators.

Number of different possible translations:

high translation ambiguity has a slow-down

effect only in post-editing.

Alignment crossing: crossing distance has

significant effects only for post-editing

German and Spanish.

CASMACAT Prototype-II (2013)

Usage of advanced IMT

Nine Post-editors

Three datasets (3,000 words each)

Three different CASMACAT configurations:

1) Traditional post-editing (without IMT)

2) Post-editing using IMT

3) Post-editing using advanced IMT (featuring: word/cursor alignment & prediction length control).

videos/video_PE.html



C:/Users/carl/Downloads/video_ITP.swf.htm

videos/video_LONG.html



videos/video_OPT.html



Style 1: Read Target - Check Source

Style 2: Read Source - Check Target

Style 3: Monolingual Post-Editing

Style 4: Read Target - Check Source - Consult previous Segment

Qualitative Assessment of PE-Styles

Post-editing with CASMACAT-II Progression Graph

Row 1 Row 2 Row 3 Row 4

0

2

4

6

8

10

12

Column 1

Column 2

Column 3

Style 2: Read Source - Check Target

Style 3: Monolingual Post-editing

Style 4: Read Target - Check Source - Consult previous Segment

SEECAT http://bridge.cbs.dk/prototype2/seecat_speech/

Speaking your translation

More than 4 times quicker (Brown et al 1994)

Up to 6 times (Dragsted, et al, 2011)

Using Dragon speech

44% faster if ASR error rate < 4% (Desilets et al, 2008)

Based on estimation

None of the studies used GUI, no use of gaze data

SEECAT - Speech & Eye-Tracking Enabled CAT

Use speech input as a post-editing tool in order to enhance efficiency for language translators.

Use eyetracker to synchronize reading and speaking with the MT output, for positioning of input cursor.

Demonstrate increase in translation throughput using speech input for post-editing over a system without speech input.

CASMACAT PEMT

SEECAT: Workbench

SPANISH typing + speech

SPANISH typing + speech - 100% accurate

HINDI speech - with inaccuracies

GIVE IT A TRY:

http://bridge.cbs.dk/prototype2/seecat_speech/

ASR: English, Hindi, Spanish

videos/EXAMPLE_03.wmv



PRE-PILOT EXPERIMENTS (I)

Subjects: 2 participants

Text type: tourism domain (6 texts - 10 segments).

Language pair: English to Spanish

Dependent variable: TIME (productivity gain)

Tasks:

i. Translation from scratch through typing (only keyboard)

ii. Translation from scratch through ASR (only speech)

iii. Post-editing through typing (only keyboard)

iv. Post-editing through ASR (only speech)

v. Translation from scratch through typing + ASR

vi. Post-editing through typing + ASR

34

PRE-PILOT EXPERIMENTS (III)

TIME per segment (seconds) – Participant 01

36

PRE-PILOT EXPERIMENTS (IV)

TIME per segment (seconds) – Participant 02

37

Conclusions

Translation Process Research is an active field of

research

Multi-modal input can help to improve productivity

both in translation and post-editing.

Further experimentation is needed to:

Understand cognitive processes

Provide better support for translators

Maximize productivity, and quality, with less effort

38

References Jesus Gonzalez-Rubio, Daniel Ortiz, Jose Miguel Bened, Francisco Casacuberta.

Interactive Machine Translation using Hierarchical Translation Models. Proceedings

of the Conference on Empirical Methods in Natural Language Processing

(EMNLP13). October 18-21, 2013 Seattle, USA.

Vicent Alabau, Ragnar Bonk, Christian Buck, Michael Carl, Francisco Casacuberta,

Mercedes Garcia-Martinez, Jesus Gonzalez, Philipp Koehn, Luis Leiva, Bartolome

Mesa-Lao, Daniel Ortiz, Herve Saint-Amand, German Sanchis, Chara Tsoukala:

"CASMACAT: An Open Source Workbench for Advanced Computer Aided

Translation", The Prague Bulletin of Mathematical Linguistics, Number 100, October

2013, pages 101-112.

Elming, Jakob, Michael Carl, and Laura Winther Balling. Investigating User

Behaviour in Post-editing and Translation Using the CASMACAT Workbench.” In

Expertise in Post-editing: Processes, Technology and Applications, edited by

Sharon O’Brien, Michael Simard, Lucia Specia, Michael Carl and Laura Winther

Balling. Cambridge Scholars Publishing

40

From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation

Science

Transcript of From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation