Lights Out, Translation is Datafied, by Jaap van der Meer (TAUS)
Optimized Big Data Approach to Machine Translation @ TAUS
12
Optimized Big Data Approach to Machine Translation Diego Bartolome @diegobartolome [email protected]
-
Upload
tauyou -
Category
Technology
-
view
401 -
download
0
description
The volume of available multilingual data has exploded. One option is to create machine translation systems based on the previously translated segments, with as much data as possible. This works in many cases, but it is well known in the market that the cleaner the data, the better the results in terms of productivity, cost, and even quality. tauyou has made an additional step in the process by optimizing the translation engines on a per-document basis, which has proven to provide a significant quality increase in the machine translation output. This approach, linked to a joint source content optimization and summarization, leads to significant savings in multilingual communications.
Transcript of Optimized Big Data Approach to Machine Translation @ TAUS
Data for Machine Translation
Some techniques might work...
Baseline engines
In-domain + out-of-domain balance
Domain-specific engines
But ...
Baseline engines
In-domain + out-of-domain balance
Domain-specific engines
Big Data Approach
UnilingualTexts
Small Data
Glossaries/Dictionaries
Translation Memories
.
.
.External Data
Average Improvement
+21%
Sample Results
Important topics
Supervised Data Classification
Data Clustering
Parameter optimization
Key Performance Indicators (KPIs)
Predictive MT quality estimation
Measure + Measure + Measure
Not everything that can be counted counts, and not everything that
counts can be counted.
William Bruce Cameron