Exploding the Myth the gerund in machine translation Nora Aranberri.

26
Exploding the Myth the gerund in machine translation Nora Aranberri

Transcript of Exploding the Myth the gerund in machine translation Nora Aranberri.

Page 1: Exploding the Myth the gerund in machine translation Nora Aranberri.

Exploding the Myththe gerund in machine translation

Nora Aranberri

Page 2: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Background

• Nora Aranberri– PhD student at CTTS (Dublin City University)

– Funded by Enterprise Ireland and Symantec (Innovation Partnerships Programme)

• Symantec– Software publisher

– Localisation requirements

• Translation – Rule-based machine translation system (Systran)

• Documentation authoring – Controlled language (CL checker: acrocheck™)

– Project: CL checker rule refinement

Page 3: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

The Myth

• Sources: translators, post-editors, scholars

– Considered a translation issue for MT due to its ambiguity• Bernth & McCord, 2000; Bernth & Gdaniec, 2001

– Addressed by CLs• Adriaens & Schreurs, 1992; Wells Akis, 2003; O’Brien 2003; Roturier, 2004

The gerund is handled badly by MT systems

and should be avoided

• Sources: translators, post-editors, scholars

– Considered a translation issue for MT due to its ambiguity• Bernth & McCord, 2000; Bernth & Gdaniec, 2001

– Addressed by CLs• Adriaens & Schreurs, 1992; Wells Akis, 2003; O’Brien 2003; Roturier, 2004

The gerund is handled badly by MT systems

and should be avoided

Page 4: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

What is a gerund?

• -ing either a gerund, a participle, or continuous tense keeping the same form

• Examples– GERUND: Steps for auditing SQL Server instances.

– PARTICIPLE: When the job completes, BACKINT saves a copy of the Backup Exec restore logs for auditing purposes.

– CONTINUOUS TENSE: Server is auditing and logging.

• Conclusion: gerunds and participles can be difficult to differentiate for MT.

Page 5: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Methodology: creating the corpus

• Initial corpus– Risk management components texts– 494,618 words – uncontrolled

• Structure of study– Preposition or subordinate conjunction + -ing

• Extraction of relevant segments– acrocheck™: CL checker asked to flag the patterns of the

structure• IN + VBG|NN|JJ “-ing”

– 1,857 sentences isolated

Page 6: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Methodology: translation

• Apply machine translation for target language

– MT used: Systran Server 5.05

– Dictionaries • No specific dictionaries created for the project

• Systran in-built computer science dictionary applied

– Languages• Source language: English

• Target languages: Spanish, French, German and Japanese

Page 7: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Methodology: evaluation (1)

• Evaluators

– one evaluator per target language only

– native speakers of the target languages

– translators / MA students with experience in MT

• Evaluation format

Page 8: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Methodology: evaluation (2)

• Analysis of the relevant structure only

• Questions:

– Q1: is the structure correct?

– Q2: is the error due to the misinterpretation of the source or because the target is poorly generated?

• Both are “yes/no” questions.

Page 9: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Results: prepositions / subordinate conjunctions    

prepositionexamples

by + ing 377

for + ing 339

when + ing 256

before + ing 163

after + ing 122

about + ing 96

on + ing 89

without + ing 75

of + ing 71

from + ing 68

while + ing 54

in + ing 36

if + ing 19

rather than + ing 14

such as + ing 13

TOTAL 1857

%  

Page 10: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Results: correctness for Spanish

    Spanish

prepositionexamples

correct

incorrect

by + ing 377 351 26

for + ing 339 243 96

when + ing 256 205 51

before + ing 163 145 18

after + ing 122 107 15

about + ing 96 82 14

on + ing 89 38 51

without + ing 75 47 28

of + ing 71 65 6

from + ing 68 30 38

while + ing 54 3 51

in + ing 36 27 9

if + ing 19 15 4

rather than + ing 14 0 14

such as + ing 13 9 4

TOTAL 1857 1393 464

%  75.01

% 24.99%

Page 11: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Results: correctness for French

    Spanish French

prepositionexamples

correct

incorrect correct

incorrect

by + ing 377 351 26 358 19

for + ing 339 243 96 284 55

when + ing 256 205 51 2 254

before + ing 163 145 18 146 17

after + ing 122 107 15 117 5

about + ing 96 82 14 82 14

on + ing 89 38 51 80 9

without + ing 75 47 28 65 10

of + ing 71 65 6 65 6

from + ing 68 30 38 31 37

while + ing 54 3 51 45 9

in + ing 36 27 9 9 27

if + ing 19 15 4 10 9

rather than + ing 14 0 14 0 14

such as + ing 13 9 4 9 4

TOTAL 1857 1393 464 1341 516

%   75.% 24.99% 72.21% 27.79%

Page 12: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Results: correctness for German

    Spanish French German

prepositionexamples

correct

incorrect correct

incorrect

correct

incorrect

by + ing 377 351 26 358 19 364 13

for + ing 339 243 96 284 55 262 77

when + ing 256 205 51 2 254 213 43

before + ing 163 145 18 146 17 145 18

after + ing 122 107 15 117 5 114 8

about + ing 96 82 14 82 14 88 8

on + ing 89 38 51 80 9 58 31

without + ing 75 47 28 65 10 71 4

of + ing 71 65 6 65 6 60 11

from + ing 68 30 38 31 37 24 44

while + ing 54 3 51 45 9 27 27

in + ing 36 27 9 9 27 23 13

if + ing 19 15 4 10 9 17 2

rather than + ing 14 0 14 0 14 0 14

such as + ing 13 9 4 9 4 9 4

TOTAL 1857 1393 464 1341 516 1514 343

%  75.01

% 24.99% 72.21% 27.79%81.53

% 18.47%

Page 13: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Results: correctness for Japanese

    Spanish French German Japanese

preposition

examples correct incorrect correct incorrect correct

incorrect correct

incorrect

by + ing 377 351 26 358 19 364 13 301 76

for + ing 339 243 96 284 55 262 77 224 115

when + ing 256 205 51 2 254 213 43 161 95

before + ing 163 145 18 146 17 145 18 134 29

after + ing 122 107 15 117 5 114 8 108 14

about + ing 96 82 14 82 14 88 8 88 8

on + ing 89 38 51 80 9 58 31 29 60

without + ing 75 47 28 65 10 71 4 66 9

of + ing 71 65 6 65 6 60 11 57 14

from + ing 68 30 38 31 37 24 44 33 35

while + ing 54 3 51 45 9 27 27 44 10

in + ing 36 27 9 9 27 23 13 9 27

if + ing 19 15 4 10 9 17 2 17 2

rather than + ing 14 0 14 0 14 0 14 1 13

such as + ing 13 9 4 9 4 9 4 8 5

TOTAL 1857 1393 464 1341 516 1514 343 1303 554

%   75.% 24.99% 72.21% 27.79% 81.53% 18.47% 70.17% 29.83%

Page 14: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Significant results

    Spanish French German Japanese

prepositionexamples

correct

incorrect correct

incorrect correct

incorrect correct

incorrect

by + ing 377 351 26 358 19 364 13 301 76

for + ing 339 243 96 284 55 262 77 224 115

when + ing 256 205 51 2 254 213 43 161 95

before + ing 163 145 18 146 17 145 18 134 29

after + ing 122 107 15 117 5 114 8 108 14

about + ing 96 82 14 82 14 88 8 88 8

on + ing 89 38 51 80 9 58 31 29 60

without + ing 75 47 28 65 10 71 4 66 9

of + ing 71 65 6 65 6 60 11 57 14

from + ing 68 30 38 31 37 24 44 33 35

whil e + ing 54 3 51 45 9 27 27 44 10

in + ing 36 27 9 9 27 23 13 9 27

if + ing 19 15 4 10 9 17 2 17 2

rather than + ing 14 0 14 0 14 0 14 1 13

such as + ing 13 9 4 9 4 9 4 8 5

TOTAL 1857 1393 464 1341 516 1514 343 1303 554

%   75.% 24.99% 72.21% 27.79% 81.53% 18.47% 70.17% 29.83%

Page 15: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Results: correlation of problematic structures

0

10

20

30

40

50

60

70

80

Spanish French German Japanese

for when from on while by

• The most problematic structures seem to strongly correlate across languages

• Top 6 prep/conj account for >65% of errors

Page 16: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Analysis and generation errors

    Spanish French German Japanese

prepositionexamples

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

by + ing 377 4 27 10 13 4 9 16 58

for + ing 339 37 120 37 55 33 47 30 82

when + ing 256 13 49 0 256 10 38 3 93

before + ing 163 4 27 4 17 4 14 8  22 

after + ing 122 5 12 5 5 1 7 4  11 

about + ing 96 7 51 10 13 5 3 4  1 

on + ing 89 3 51 0 9 1 30 2 57

without + ing 75 3 26 2 8 2 2 1  8 

of + ing 71 4 4 3 7 4 8 7  11 

from + ing 68 5 36 1 37 1 43 8 33

while + ing 54 2 50 2 8 3 26 0 10

in + ing 36 5 7 6 27 2 13 12  18 

if + ing 19 1 3 1 9 2 0 0  2 

rather than + ing 14 0 14 0 14 0 14 0  13 

such as + ing 13 3 8 1 4 2 2 3  2 

TOTAL 1857 106 523 83 514 85 267 98 459

%   0.60% 0.63% 0.54% 0.74% 0.61% 0.72% 0.60% 0.72%

Page 17: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Analysis and generation errors

    Spanish French German Japanese

prepositionexamples

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

by + ing 377 4 27 10 13 4 9 16 58

for + ing 339 37 120 37 55 33 47 30 82

when + ing 256 13 49 0 256 10 38 3 93

before + ing 163 4 27 4 17 4 14 8  22 

after + ing 122 5 12 5 5 1 7 4  11 

about + ing 96 7 51 10 13 5 3 4  1 

on + ing 89 3 51 0 9 1 30 2 57

without + ing 75 3 26 2 8 2 2 1  8 

of + ing 71 4 4 3 7 4 8 7  11 

from + ing 68 5 36 1 37 1 43 8 33

while + ing 54 2 50 2 8 3 26 0 10

in + ing 36 5 7 6 27 2 13 12  18 

if + ing 19 1 3 1 9 2 0 0  2 

rather than + ing 14 0 14 0 14 0 14 0  13 

such as + ing 13 3 8 1 4 2 2 3  2 

TOTAL 1857 106 523 83 514 85 267 98 459

%   0.60% 0.63% 0.54% 0.74% 0.61% 0.72% 0.60% 0.72%

Page 18: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Analysis and generation errors

    Spanish French German Japanese

prepositionexamples

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

by + ing 377 4 27 10 13 4 9 16 58

for + ing 339 37 120 37 55 33 47 30 82

when + ing 256 13 49 0 256 10 38 3 93

before + ing 163 4 27 4 17 4 14 8  22 

after + ing 122 5 12 5 5 1 7 4  11 

about + ing 96 7 51 10 13 5 3 4  1 

on + ing 89 3 51 0 9 1 30 2 57

without + ing 75 3 26 2 8 2 2 1  8 

of + ing 71 4 4 3 7 4 8 7  11 

from + ing 68 5 36 1 37 1 43 8 33

while + ing 54 2 50 2 8 3 26 0 10

in + ing 36 5 7 6 27 2 13 12  18 

if + ing 19 1 3 1 9 2 0 0  2 

rather than + ing 14 0 14 0 14 0 14 0  13 

such as + ing 13 3 8 1 4 2 2 3  2 

TOTAL 1857 106 523 83 514 85 267 98 459

%   0.60% 0.63% 0.54% 0.74% 0.61% 0.72% 0.60% 0.72%

Page 19: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Analysis and generation errors

    Spanish French German Japanese

prepositionexamples

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

by + ing 377 4 27 10 13 4 9 16 58

for + ing 339 37 120 37 55 33 47 30 82

when + ing 256 13 49 0 256 10 38 3 93

before + ing 163 4 27 4 17 4 14 8  22 

after + ing 122 5 12 5 5 1 7 4  11 

about + ing 96 7 51 10 13 5 3 4  1 

on + ing 89 3 51 0 9 1 30 2 57

without + ing 75 3 26 2 8 2 2 1  8 

of + ing 71 4 4 3 7 4 8 7  11 

from + ing 68 5 36 1 37 1 43 8 33

while + ing 54 2 50 2 8 3 26 0 10

in + ing 36 5 7 6 27 2 13 12  18 

if + ing 19 1 3 1 9 2 0 0  2 

rather than + ing 14 0 14 0 14 0 14 0  13 

such as + ing 13 3 8 1 4 2 2 3  2 

TOTAL 1857 106 523 83 514 85 267 98 459

%   0.60% 0.63% 0.54% 0.74% 0.61% 0.72% 0.60% 0.72%

Page 20: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Analysis and generation errors

    Spanish French German Japanese

prepositionexamples

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

by + ing 377 4 27 10 13 4 9 16 58

for + ing 339 37 120 37 55 33 47 30 82

when + ing 256 13 49 0 256 10 38 3 93

before + ing 163 4 27 4 17 4 14 8  22 

after + ing 122 5 12 5 5 1 7 4  11 

about + ing 96 7 51 10 13 5 3 4  1 

on + ing 89 3 51 0 9 1 30 2 57

without + ing 75 3 26 2 8 2 2 1  8 

of + ing 71 4 4 3 7 4 8 7  11 

from + ing 68 5 36 1 37 1 43 8 33

while + ing 54 2 50 2 8 3 26 0 10

in + ing 36 5 7 6 27 2 13 12  18 

if + ing 19 1 3 1 9 2 0 0  2 

rather than + ing 14 0 14 0 14 0 14 0  13 

such as + ing 13 3 8 1 4 2 2 3  2 

TOTAL 1857 106 523 83 514 85 267 98 459

%   0.60% 0.63% 0.54% 0.74% 0.61% 0.72% 0.60% 0.72%

Page 21: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Analysis and generation errors

    Spanish French German Japanese

prepositionexamples

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

by + ing 377 4 27 10 13 4 9 16 58

for + ing 339 37 120 37 55 33 47 30 82

when + ing 256 13 49 0 256 10 38 3 93

before + ing 163 4 27 4 17 4 14 8  22 

after + ing 122 5 12 5 5 1 7 4  11 

about + ing 96 7 51 10 13 5 3 4  1 

on + ing 89 3 51 0 9 1 30 2 57

without + ing 75 3 26 2 8 2 2 1  8 

of + ing 71 4 4 3 7 4 8 7  11 

from + ing 68 5 36 1 37 1 43 8 33

while + ing 54 2 50 2 8 3 26 0 10

in + ing 36 5 7 6 27 2 13 12  18 

if + ing 19 1 3 1 9 2 0 0  2 

rather than + ing 14 0 14 0 14 0 14 0  13 

such as + ing 13 3 8 1 4 2 2 3  2 

TOTAL 1857 106 523 83 514 85 267 98 459

%   0.60% 0.63% 0.54% 0.74% 0.61% 0.72% 0.60% 0.72%

Page 22: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Source and target error distribution

• Target errors seem to be more important across languages

• The prep/conj with the highest error rate and common to 3 or 4 target languages cover 43-54% of source errors and 48-59% of target errors

  Spanish French German Japanese

 

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

for + ing 37 120 37 55 33 47 30 82

when + ing 13 49 0 256 10 38 3 93

from + ing 5 36 1 37 1 43 8 33

on + ing 3 51 0 9 1 30 2 57

SUM 58 256 38 357 45 158 43 265

Total 106 523 83 514 85 267 98 459

%54.72

% 48.95 45.78 69.45 52.94 59.18 43.88 57.73

Page 23: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Conclusions

• Overall success rate between 70-80% for all languages

• Target language generation errors are higher than the errors due to the misinterpretation of the source.

• Great diversity of prepositions/subordinate conjunctions with varying appearance rates.

• Strong correlation of results across languages.

Page 24: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Next steps

• Further evaluations to consolidate results– 4 evaluators per language– Present sentences to the evaluators out of alphabetical order by

preposition/conjunction– Note the results for the French “when”.

• Make these findings available to the writing teams• Take our prominent issues

– Source issues • controlled language or pre-processing

– Formulate more specific rules in acrocheck to handle the most problematic structures/prepositions and reduce false positives

• Standardise structures with low frequencies

– Target issues • post-processing or MT improvements

Page 25: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

References

• Adriaens, G. and Schreurs, D., (1992) ‘From COGRAM to ALCOGRAM: Toward a Controlled English Grammar Checker’, 14th International Conference on Computational Linguistics, COLING-92, Nantes, France, 23-28 August, 1992, 595-601.

• Bernth, A. and Gdaniec, C. (2001) ‘MTranslatability’ Machine Translation 16: 175-218.

• Bernth, A. and McCord, M. (2000) ‘The Effect of Source Analysis on Translation Confidence’, in White, J. S.,  eds., Envisioning Machine Translation in the Information Future: 4th Conference of the Association for Machine Translation in the Americas, AMTA 2000, Cuernavaca, Mexico, 10-14 October, 2000, Springer: Berlin, 89-99.

• O’Brien, S. (2003) ‘Controlling Controlled English: An Analysis of Several Controlled Language Rule Sets’, in Proceedings of the 4th Controlled Language Applications Workshop (CLAW 2003), Dublin, Ireland, 15-17 May, 2003, 105-114.

• Roturier, J. (2004) ‘Assessing a set of Controlled Language rules: Can they improve the performance of commercial Machine Translation systems?’, in ASLIB Conference Proceedings, Translating and the Computer 26, London, 18-19 November, 2004, 1-14.

• Wells Akis, J. and Sisson, R. (2003) ‘Authoring translation-ready documents: is software the answer?’, in Proceedings of the 21st annual international conference on Documentation, SIGDOC 2003, San Francisco, CA, USA, October 12-15, 2003, 38-44.

Page 26: Exploding the Myth the gerund in machine translation Nora Aranberri.

Optional Footer Information Here

Thank you!

e-mail: nora.aranberrimonasterioATdcu.ie