MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)
-
Upload
taus-enabling-better-translation -
Category
Presentations & Public Speaking
-
view
225 -
download
1
Transcript of MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)
![Page 2: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/2.jpg)
MMT Project
Horizon 2020 Innovation Action
3M € funding
3 years: 2015-2017
Goal:
deliver a large-scale commercial online machine
translation service based on a new open-source distributed
architecture.
This project has received funding from the European Union's Horizon 2020
research and innovation programme under grant agreement No 645487.
![Page 3: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/3.jpg)
MMT Team
Business Research
Special thanks to Marcello Frederico (FBK) and Ulrich Germann
(University of Edinburgh) for many of the slides!
![Page 4: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/4.jpg)
Setting up MT for CAT today
1. Select TMs
2. Collect extra data
3. Train and evaluate engine
4. Doesn’t work? back to 2.
5. Analyse/process input documents
6. Apply MT on fake TM
7. Import TMs in CAT tool
8. Start translating
9. Adapt engine to new data - go back to 3.
10. New project? back to 1.
![Page 5: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/5.jpg)
The MMT way
1. Drag & drop your private TMs
2. connect your CAT with a key
3. Start translating!
![Page 6: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/6.jpg)
Modern MT in a nutshell
Zero training time
Manages context
Learns from users
Scales with data and users
![Page 7: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/7.jpg)
Prototype (April 2016) - Fast training
![Page 8: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/8.jpg)
Context aware translation
party
CONTEXT
We are going out.
TRANSLATION
fête
SENTENCE
CONTEXT
We approved the law
TRANSLATION
parti
![Page 9: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/9.jpg)
Prototype (March 2016)
![Page 10: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/10.jpg)
MS Translator Hub vs Modern MT
![Page 11: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/11.jpg)
MMT vs. Moses core language processing
● More supported languages
● Faster processing
● Simpler to use
● Tags and XML management
● Localisation of expressions
![Page 12: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/12.jpg)
REST API
GET /translate?q=party&context=We+approved+the+law
"translation": "parti",
"context": [
{ "id": "europarl",
"score": 0.10343984
}, …
]
![Page 13: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/13.jpg)
MMT Architecture
![Page 14: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/14.jpg)
MMT Data Pooling
Partner’s repositories: MyMemory (Translated)
Data Cloud (TAUS)
Volume pooled for the English-Italian prototypes
ca 785M words & 423M segments in total
![Page 15: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/15.jpg)
MMT Data Collection from CommonCrawl
commoncrawl.org – US-based non-profit“CommonCrawl is a 501(c)(3) non-profit organization
dedicated to providing a copy of the internet to internet
researchers, companies and individuals at no cost for the
purpose of research and analysis.”
On average 1.5 billion unique URLs per crawl Vs. an estimated 50 billion pages in Google index and 20
billion pages in Microsoft Bing index
What can be considered the “surface web” vs. the “deep
web”?
Two questions1. What language are these pages in?
2. Which pages are translations of each other?
![Page 16: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/16.jpg)
Monolingual Data Including English
![Page 17: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/17.jpg)
Monolingual Data Excluding English
![Page 18: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/18.jpg)
Parallel Data Projections from en→it
![Page 19: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/19.jpg)
MMT is Open Source
LGPL/Apache licences
new core technology
github.com/ModernMT/MMT
soon: github.com/ModernMT/DataCollectionemail me if you are interested
![Page 20: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/20.jpg)
Roadmap
2015 Q1 2016 Q2 2016 Q4 2017 Q4
development
started
first alpha
release.
10 langs,
fast training,
context aware,
distributed
first beta
release
45 langs,
Incremental
learning
final release
enterprise
ready
![Page 21: MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)](https://reader031.fdocuments.in/reader031/viewer/2022022202/587bab151a28ab81758b6d87/html5/thumbnails/21.jpg)
This slide may not be used or copied without permission from TAUS
THANK YOU!