Moses, past, present, future Hieu Hoang XRCE 2013.
-
Upload
malia-pridgen -
Category
Documents
-
view
216 -
download
1
Transcript of Moses, past, present, future Hieu Hoang XRCE 2013.
![Page 1: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/1.jpg)
Moses, past, present, future
Hieu HoangXRCE 2013
![Page 2: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/2.jpg)
Timeline
2002 Pharoah decoder, precursor to Moses
2005 Replacement for Pharoah
2006 JHU Workshop extends Moses significantly
since late 2006 Funding by EU projects EuroMatrix, EuroMatrixPlus
2012 MosesCore
![Page 3: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/3.jpg)
What is Moses?
• Only the decoder• Only for Linux• Difficult to use• Unreliable• Only phrase-based• No sparse features• Developed by one person• Slow
Common Misconceptions
![Page 4: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/4.jpg)
Only the decoder– replacement for Pharoah
• Training• Tuning• Decoder• Other– XML Server. Phrase-table pruning/filtering.
Domain adaptation. Experiment management system
![Page 5: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/5.jpg)
Only works on Linux
• Tested on– Windows 7 (32-bit) with Cygwin 6.1 – Mac OSX 10.7 with MacPorts– Ubuntu 12.10, 32 and 64-bit– Debian 6.0, 32 and 64-bit– Fedora 17, 32 and 64-bit– openSUSE 12.2, 32 and 64-bit
• Project files for– Visual Studio– Eclipse on Linux and Mac OSX
![Page 6: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/6.jpg)
Difficult to use• Easier compile and install– Boost bjam – No installation required
• Binaries available for– Linux– Mac– Windows/Cygwin– Moses + Friends
• IRSTLM• GIZA++ and MGIZA
• Ready-made models trained on Europarl
![Page 7: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/7.jpg)
Unreliable• Monitor check-ins• Unit tests• More regression tests• Nightly tests
– Run end-to-end training– http://www.statmt.org/moses/cruise/
• Tested on all major OSes• Train Europarl models
– Phrase-based, hierarchical, factored– 8 language-pairs– http://www.statmt.org/moses/RELEASE-1.0/models/
![Page 8: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/8.jpg)
Only phrase-based model– replacement for Pharoah– extension of Pharaoh
• From the beginning– Factored models– Lattice and confusion network input– Multiple LMs, multiple phrase-tables
• since 2009– Hierarchical model– Syntactic models
![Page 9: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/9.jpg)
No Sparse Features
• Large number of sparse features– 1+ millions– Sparse AND dense features
• Available sparse features
• Different tuning– MERT– Mira– Batch Mira (Cherry & Foster, 2012)– PRO (Hopkins and May, 2011)
Target Bigram Target Ngram Source Word DeletionSparse Phrase Table Phrase Boundary Phrase LengthPhrase Pair Target Word Insertion Global Lexical Model
![Page 10: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/10.jpg)
Developed by one person• ANYONE can contribute
– 50 contributors
‘git blame’ of Moses repository
Kenneth
Heafield
Hieu Hoan
g
phkoeh
n
Ondrej Bojar
Barry H
addow
sanmarf
Tetsu
o Kiso
Eva H
asler
Rico Se
nnrich
wlin12
nicolab
ertoldi
eherb
st
Ales Ta
mchyn
a
Colin Cherr
y
Matous M
achace
k
Phil Willi
ams
0%5%
10%15%20%25%30%35%40%
![Page 11: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/11.jpg)
Slow
thanks to Ken!!
Decoding
![Page 12: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/12.jpg)
Slow
• Multithreaded
• Reduced disk IO– compress intermediate files
• Reduce disk space requirement
Time (mins) 1-core 2-cores 4-cores 8-cores Size (MB)
Phrase-based
60 47(79%)
37(63%)
33(56%)
893
Hierarchical 1030 677(65%)
473(45%)
375(36%)
8300
Training
![Page 13: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/13.jpg)
What is Moses?Common Misconceptions
• Only the decoder• Only for Linux• Difficult to use• Unreliable• Only phrase-based• No sparse features• Developed by one person• Slow
![Page 14: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/14.jpg)
What is Moses?
• Only the decoder Decoding, training, tuning, server• Only for Linux Windows, Linux, Mac• Difficult to use Easier compile and install• Unreliable Multi-stage testing• Only phrase-based Hierarchical, syntax model• No sparse features Sparse AND dense features• Developed by one person everyone• Slow Fastest decoder, multithreaded training, less
IO
Common Misconceptions
![Page 15: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/15.jpg)
Future priorities
• Code cleanup• MT applications– Computer-Aided Translation– Speech-to-speech
• Incremental Training• Better translation– smaller model– bigger data– faster training and decoding
![Page 16: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/16.jpg)
Code cleanup
• Framework for feature functions– Easier to add new feature functions
• Cleanup– Refactor– Delete old code– Documentation
![Page 17: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/17.jpg)
MT Applications
• Computer-Aided Translation
– integration with front-ends– better user of user-feedback information
![Page 18: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/18.jpg)
MT Applications
• Speech-to-speech
– ambiguous input• lattices and confusion networks
– translate prosody• factored word representation
![Page 19: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/19.jpg)
Incremental Training
• Incremental word alignment• Dynamic suffix array• Phrase-table update
• Better integration with rest of Moses
![Page 20: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/20.jpg)
Smaller files
• Smaller binary – phrase-tables– language models
• Mobile devices• Fits into memory
faster decoding!
• Efficient data structures– suffix arrays– compressed file formats
![Page 21: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/21.jpg)
Better Translations• Consistently beat phrase-based models for
every language pairPhrase-Based Hierarchical
en-es 24.81 24.20
es-en 23.01 22.37
en-cs 11.04 10.93
cs-en 15.72 15.68
en-de 11.87 11.62
de-en 15.75 15.53
en-fr 22.84 22.28
fr-en 25.08 24.37
zh-en 27.46 23.91
ar-en 47.90 46.56
![Page 22: Moses, past, present, future Hieu Hoang XRCE 2013.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518a2ee550346881f8b495c/html5/thumbnails/22.jpg)
The End