From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation
description
Transcript of From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation
From CasMaCat to SEECAT
Patterns of Interaction in
Advanced Computer Assisted Translation
Michael Carl
CRITT, Copenhagen Business School
Moscow, April, 2014
Overview
Post-editing Patterns in CasMaCat
Prototype-I: From scratch translation vs. PE
Prototype-II: IMT and advanced PE
Activity Patterns in Post-editing
SEECAT summer project 2013
Extend CasMaCat prototype with speech and gaze input
Source Text Window
Light bulbs and Camera
Target Text Window
Translation Progression Graphs
CASMACAT Prototype-I (2012)
Experiment 1: Prototype-I
Time saving: PEMT vs. translation from scratch
Domain: newspaper article
Languages: EN → ES
1) Target Segments empty: from-scratch translation
2) Target Segments filled with pre-translated MT output
Moses, trained on news texts
Average time saving of 25% for PEMT
translating (grey), post-editing (black), in words per hour
Productivity per Participant (Elming, Winther-Balling & Carl, 2014)
Production time ratio (vertical) Post-editing keystroke ratio (horizontal) (Elming, Winther-Balling & Carl, 2014)
translation (grey),
post-editing (black)
Gaze Point Distribution Across Windows Percentage Spent on each Window (Elming, Winther-Balling & Carl, forthcoming)
Translation production (1) (Winther-Balling & Carl, 2014)
Translation task: from-scratch translation
takes (almost) always takes longer than post-
editing.
Inefficiency: the more keystrokes are
produced the longer it takes to produce the
translation.
Alternating processing: shifting attention
frequently between different areas (TT, ST
keyboard) is time-consuming.
Translation production (2) (Winther-Balling & Carl, 2014)
Average word frequency: lower word
frequency results in slower production time;
this tendency is more pronounced for student
translators.
Number of different possible translations:
high translation ambiguity has a slow-down
effect only in post-editing.
Alignment crossing: crossing distance has
significant effects only for post-editing
German and Spanish.
CASMACAT Prototype-II (2013)
Usage of advanced IMT
Nine Post-editors
Three datasets (3,000 words each)
Three different CASMACAT configurations:
1) Traditional post-editing (without IMT)
2) Post-editing using IMT
3) Post-editing using advanced IMT (featuring: word/cursor alignment & prediction length control).
Style 1: Read Target - Check Source
Style 2: Read Source - Check Target
Style 3: Monolingual Post-Editing
Style 4: Read Target - Check Source - Consult previous Segment
Qualitative Assessment of PE-Styles
Post-editing with CASMACAT-II Progression Graph
Row 1 Row 2 Row 3 Row 4
0
2
4
6
8
10
12
Column 1
Column 2
Column 3
Style 2: Read Source - Check Target
Style 3: Monolingual Post-editing
Style 4: Read Target - Check Source - Consult previous Segment
SEECAT http://bridge.cbs.dk/prototype2/seecat_speech/
Speaking your translation
More than 4 times quicker (Brown et al 1994)
Up to 6 times (Dragsted, et al, 2011)
Using Dragon speech
44% faster if ASR error rate < 4% (Desilets et al, 2008)
Based on estimation
None of the studies used GUI, no use of gaze data
SEECAT - Speech & Eye-Tracking Enabled CAT
Use speech input as a post-editing tool in order to enhance efficiency for language translators.
Use eyetracker to synchronize reading and speaking with the MT output, for positioning of input cursor.
Demonstrate increase in translation throughput using speech input for post-editing over a system without speech input.
CASMACAT PEMT
CASMACAT PEMT
SEECAT: Workbench
SPANISH typing + speech
SPANISH typing + speech - 100% accurate
HINDI speech - with inaccuracies
GIVE IT A TRY:
http://bridge.cbs.dk/prototype2/seecat_speech/
ASR: English, Hindi, Spanish
PRE-PILOT EXPERIMENTS (I)
Subjects: 2 participants
Text type: tourism domain (6 texts - 10 segments).
Language pair: English to Spanish
Dependent variable: TIME (productivity gain)
Tasks:
i. Translation from scratch through typing (only keyboard)
ii. Translation from scratch through ASR (only speech)
iii. Post-editing through typing (only keyboard)
iv. Post-editing through ASR (only speech)
v. Translation from scratch through typing + ASR
vi. Post-editing through typing + ASR
34
PRE-PILOT EXPERIMENTS (III)
TIME per segment (seconds) – Participant 01
36
PRE-PILOT EXPERIMENTS (IV)
TIME per segment (seconds) – Participant 02
37
Conclusions
Translation Process Research is an active field of
research
Multi-modal input can help to improve productivity
both in translation and post-editing.
Further experimentation is needed to:
Understand cognitive processes
Provide better support for translators
Maximize productivity, and quality, with less effort
38
References Jesus Gonzalez-Rubio, Daniel Ortiz, Jose Miguel Bened, Francisco Casacuberta.
Interactive Machine Translation using Hierarchical Translation Models. Proceedings
of the Conference on Empirical Methods in Natural Language Processing
(EMNLP13). October 18-21, 2013 Seattle, USA.
Vicent Alabau, Ragnar Bonk, Christian Buck, Michael Carl, Francisco Casacuberta,
Mercedes Garcia-Martinez, Jesus Gonzalez, Philipp Koehn, Luis Leiva, Bartolome
Mesa-Lao, Daniel Ortiz, Herve Saint-Amand, German Sanchis, Chara Tsoukala:
"CASMACAT: An Open Source Workbench for Advanced Computer Aided
Translation", The Prague Bulletin of Mathematical Linguistics, Number 100, October
2013, pages 101-112.
Elming, Jakob, Michael Carl, and Laura Winther Balling. Investigating User
Behaviour in Post-editing and Translation Using the CASMACAT Workbench.” In
Expertise in Post-editing: Processes, Technology and Applications, edited by
Sharon O’Brien, Michael Simard, Lucia Specia, Michael Carl and Laura Winther
Balling. Cambridge Scholars Publishing
40