Proceedings of the 21st Annual Conference of the European ... · Maja Popovi c Celia Rico Andr e...

3
Proceedings of the 21st Annual Conference of the European Association for Machine Translation 28–30 May 2018 Universitat d’Alacant Alacant, Spain Edited by Juan Antonio P´ erez-Ortiz Felipe S´ anchez-Mart´ ınez Miquel Espl` a-Gomis Maja Popovi´ c Celia Rico Andr´ e Martins Joachim Van den Bogaert Mikel L. Forcada Organised by research group

Transcript of Proceedings of the 21st Annual Conference of the European ... · Maja Popovi c Celia Rico Andr e...

Page 1: Proceedings of the 21st Annual Conference of the European ... · Maja Popovi c Celia Rico Andr e Martins Joachim Van den Bogaert Mikel L. Forcada Organised by research group. ...

Proceedings of the21st Annual Conference ofthe European Association

for Machine Translation

28–30 May 2018

Universitat d’AlacantAlacant, Spain

Edited byJuan Antonio Perez-OrtizFelipe Sanchez-Martınez

Miquel Espla-GomisMaja Popovic

Celia RicoAndre Martins

Joachim Van den BogaertMikel L. Forcada

Organised by

research group

Page 2: Proceedings of the 21st Annual Conference of the European ... · Maja Popovi c Celia Rico Andr e Martins Joachim Van den Bogaert Mikel L. Forcada Organised by research group. ...

The papers published in this proceedings are —unless indicated otherwise— covered by theCreative Commons Attribution-NonCommercial-NoDerivatives 3.0 International (CC-BY-ND3.0). You may copy, distribute, and transmit the work, provided that you attribute it (au-thorship, proceedings, publisher) in the manner specified by the author(s) or licensor(s), andthat you do not use it for commercial purposes. The full text of the licence may be found athttps://creativecommons.org/licenses/by-nc-nd/3.0/deed.en.

c© 2018 The authorsISBN: 978-84-09-01901-4

Page 3: Proceedings of the 21st Annual Conference of the European ... · Maja Popovi c Celia Rico Andr e Martins Joachim Van den Bogaert Mikel L. Forcada Organised by research group. ...

mtrain: A Convenience Tool for Machine Translation

Samuel Läubli* and Mathias Müller* and Beat Horat and Martin Volk{laeubli,mmueller,horat,volk}@cl.uzh.ch

Institute of Computational LinguisticsUniversity of Zurich

Abstract

We present mtrain, a convenience toolfor machine translation. It wraps existingmachine translation libraries and scripts toease their use. mtrain is written purelyin Python 3, well-documented, and freelyavailable.1

Machine translation libraries usually focus on coremodel training, while data preparation and auto-matic evaluation are left to the user. This presentsa barrier to experimental reproducibility, rapid pro-totyping, and entry to the field from neighbour-ing disciplines. In the spirit of the Experimen-tal Management System for Moses (Koehn, 2010),our tool is meant to automate these tasks.mtrain is designed to handle most aspects of

a machine translation experiment: it manages pre-processing, model training, and automatic evalua-tion. Preprocessing involves automatically split-ting a data set into training, validation, and testsets; tokenization; casing; byte-pair encoding; andnormalization. On top of these standard prepro-cessing steps, mtrain can also deal with inlineXML markup and intelligently transfer XML tagsto translations (Müller, 2017).

Our tool provides training automation for statis-tical phrase-based models with Moses (Koehn etal., 2007) and neural RNN encoder-decoder mod-els with Nematus (Sennrich et al., 2017). Af-ter training, mtrain offers automatic evaluationof translation quality. It outputs the well-knownBLEU, TER, and METEOR metrics (Clark et al.,

c© 2018 The authors. This article is licensed under a CreativeCommons 3.0 licence, no derivative works, attribution, CC-BY-ND.1https://github.com/ZurichNLP/mtrain∗equal contribution

2011). Given a folder that contains trained mod-els, the separate component mtrans can be usedto translate from files or standard input.

All steps can be configured with config files orcommand line options, but default settings alreadylead to functional baseline systems, making it eas-ier for inexperienced users to use the tool. Goingforward, we consider wrapping additional machinetranslation libraries that are native Python 3, suchas Sockeye (Hieber et al., 2017).

ReferencesClark, Jonathan H, Chris Dyer, Alon Lavie, and Noah A

Smith. 2011. Better hypothesis testing for statisticalmachine translation: Controlling for optimizer insta-bility. In Proceedings of ACL, pages 176–181.

Hieber, Felix, Tobias Domhan, Michael Denkowski,David Vilar, Artem Sokolov, Ann Clifton, and MattPost. 2017. Sockeye: a toolkit for neural machinetranslation. arXiv preprint, arXiv:0902.0885.

Koehn, Philipp, Hieu Hoang, Alexandra Birch, ChrisCallison-Burch, Marcello Federico, Nicola Bertoldi,Brooke Cowan, Wade Shen, Christine Moran,Richard Zens, et al. 2007. Moses: Open sourcetoolkit for statistical machine translation. In Pro-ceedings of ACL, pages 177–180.

Koehn, Philipp. 2010. An experimental managementsystem. The Prague Bulletin of Mathematical Lin-guistics, 94:87–96.

Müller, Mathias. 2017. Treatment of markup in sta-tistical machine translation. In Proceedings of theThird Workshop on Discourse in Machine Transla-tion, pages 36–46.

Sennrich, Rico, Orhan Firat, Kyunghyun Cho, Alexan-dra Birch, Barry Haddow, Julian Hitschler, MarcinJunczys-Dowmunt, Samuel Läubli, Antonio ValerioMiceli Barone, Jozef Mokry, and Maria Nadejde.2017. Nematus: a toolkit for neural machine trans-lation. In Proceedings of EACL, pages 65–68.

Perez-Ortiz, Sanchez-Martınez, Espla-Gomis, Popovic, Rico, Martins, Van den Bogaert, Forcada (eds.)Proceedings of the 21st Annual Conference of the European Association for Machine Translation, p. 357Alacant, Spain, May 2018.