Optimized Big Data Approach to Machine Translation @ TAUS

12
Optimized Big Data Approach to Machine Translation Diego Bartolome @diegobartolome [email protected]

description

The volume of available multilingual data has exploded. One option is to create machine translation systems based on the previously translated segments, with as much data as possible. This works in many cases, but it is well known in the market that the cleaner the data, the better the results in terms of productivity, cost, and even quality. tauyou has made an additional step in the process by optimizing the translation engines on a per-document basis, which has proven to provide a significant quality increase in the machine translation output. This approach, linked to a joint source content optimization and summarization, leads to significant savings in multilingual communications.

Transcript of Optimized Big Data Approach to Machine Translation @ TAUS

Page 1: Optimized Big Data Approach to Machine Translation @ TAUS

Optimized Big Data Approach

to Machine Translation

Diego Bartolome @[email protected]

Page 2: Optimized Big Data Approach to Machine Translation @ TAUS
Page 3: Optimized Big Data Approach to Machine Translation @ TAUS

Data for Machine Translation

Page 4: Optimized Big Data Approach to Machine Translation @ TAUS

Some techniques might work...

Baseline engines

In-domain + out-of-domain balance

Domain-specific engines

Page 5: Optimized Big Data Approach to Machine Translation @ TAUS

But ...

Baseline engines

In-domain + out-of-domain balance

Domain-specific engines

Page 6: Optimized Big Data Approach to Machine Translation @ TAUS
Page 7: Optimized Big Data Approach to Machine Translation @ TAUS

Big Data Approach

UnilingualTexts

Small Data

Glossaries/Dictionaries

Translation Memories

.

.

.External Data

Page 8: Optimized Big Data Approach to Machine Translation @ TAUS

Average Improvement

+21%

Page 9: Optimized Big Data Approach to Machine Translation @ TAUS

Sample Results

Page 10: Optimized Big Data Approach to Machine Translation @ TAUS

Important topics

Supervised Data Classification

Data Clustering

Parameter optimization

Key Performance Indicators (KPIs)

Predictive MT quality estimation

Measure + Measure + Measure

Page 11: Optimized Big Data Approach to Machine Translation @ TAUS
Page 12: Optimized Big Data Approach to Machine Translation @ TAUS

Not everything that can be counted counts, and not everything that

counts can be counted.

William Bruce Cameron