performance demandedin high end markets
performance demanded in low end markets
sustaining technology
disruptive technology
New uses for Machine Translation
Multilingual customer support
Social Media monitoring
Applications enabled by Big Data
Internet of Everything /Internet of Things
Speech-to-Speech translation
Questions
What is your experience with MT?
1. Quality Metrics
2. Cost reduction
3. Impact on Delivery Times
4. Feedback on quality
5. Your Feelings
Google/Bing Translator vs. tauyou
Advantages Big(gger) data
State-of-the-art technology
Learning curve
Disadvantages
Black-box
Confidentiality
Control
Costs of Machine Translation
Internal development – people and time
Free tools – Google + Bing
DIY solutions
Traditional pricing model
tauyou managed solution
Revenue from Machine Translation
Translation as a Service
Private Machine Translation Portal
MT of internal communication (flat rate)
….
and many others!
Questions1. Where do you provide value now?
2. Where do you think the value will be?
3. How important is confidentiality?
4. Do you care about control?
5. How much could you invest on MT?
(time, people, money)
6. When will your solution be available?
Some Languages Sorted
From EN into
1) FR, ES, PT, IT
2) DE, NL, HE, DA, NO, SV
3) ZH, JA, RU
4) KR, AR, TR, HI
On Domain Quality
Who is willing to pay?
Where does your revenue come from?
What are your key skills?
What domains achieve good quality?
… Quality Order of your domains ...
Questions1. What is your main motivation?
2. Can you try more than 1 domain?
3. Can you train at least 2 language pairs?
4. Can you pilot several MT vendors?
5. What are your expectations?
Data acquisition
OPUS corpora
http://opus.lingfil.uu.se/
WMT workshops
e.g. http://www.statmt.org/wmt16/
Multilingual websites
TAUS
Corpora building
Related vs. unrelated materials
Percentage of out-of-domain
Does mono-lingual data help?
Corpora extension with linguistic processing
Ad-hoc corpus for file translation
The more, the better?
Data cleaning
Clean translation memories
Length, punctuation, terminology, …
Inconsistencies, repetitions, ...
Segment splitting
Optimize weight of most frequent n-grams
Validate their translations
Add out-of-domain data (optimization)
Training strategies
One single system with all TMs
+ glossaries
+ linguistic processing input/output
+ forbidden words lists
Layered approach
Generic domain subdomain client→ → →
Models optimization
Filter the translation tables
Remove the garbage + tune weights
Optimize language models
Adapt them to the translation purpose
Tune parameters correctly
Tune set, test set, optimization parameters
Improve tokenization, recasing, ...
Workflow integration
Use MT as a secondary TM
Bilingual pre-translated translation files
CAT tool integration
Differentiated workflow
Continuous improvement
Qualitative
Use updated TMs in new trainings
Immediate (incremental) retraining
Rule-based automatic post-editing
Selective pre- and/or post-processing
Source content optimization
Linguistic processing notes
In the source and/or target language
Grammar checking
Entities detection
Proper nouns, alphanumeric words, ...
Compound words splitting
Sentence reordering
The Post-editor profile
Do skills needed differ from translation?
Post-editing guidelines
Full vs. light post-editing
http://www.slideshare.net/TAUS/taus-mt-postediting-guidelines
Compensation
Quality Metrics
SMT metrics: BLEU, NIST
Feedback from translators
Translation time vs. Post-editing time
Word Error Rate (WER) or Edit Distance
Cost reduction
Top Related