TAUS MT Showcase, Beyond Data, John Tinsley, Iconic Translation Machines

22
Wednesday, 4 June Beyond Data: Delivering Machine Transla;on with Subject Ma@er Exper;se John Tinsley, Iconic Transla1on Machines TAUS Machine Transla;on Showcase 2014 Dublin (Ireland) The research within the project MosesCore leading to these results has received funding from the European Union 7th Framework Programme, grant agreement no 288487

Transcript of TAUS MT Showcase, Beyond Data, John Tinsley, Iconic Translation Machines

Wednesday,  4  June  

Beyond  Data:    Delivering  Machine  Transla;on  with  Subject  Ma@er  

Exper;se  John  Tinsley,  Iconic  Transla1on  Machines  

TAUS  Machine  Transla;on  Showcase  2014  Dublin  (Ireland)  

The  research  within  the  project  MosesCore  leading  to  these  results  has  received  funding  from  the  European  Union  7th  Framework  Programme,  grant  agreement  no  288487  

 

Beyond Data Delivering Machine Translation with

Subject Matter Expertise

John Tinsley Director / Co-Founder

TAUS MT Showcase. 4th June 2014, Dublin

We provide Machine Translation solutions with Subject Matter Expertise

We do this using Linguistic Engineering

An “ensemble” MT architecture

The world’s first and only patent specific MT system that’s ready to go

Data Engineering What is Linguistic Engineering?

Pre-processing Post-processing

Input Output

Training Data

Patents: an MT nightmare

L is an organic group selected from -CH2-(OCH2CH2)n-, -CO-NR'-, with R'=H or C1-C4 alkyl group; n=0-8; Y=F, CF3 …

maximum stress of 1.2 to 3.5 N/mm<2> and a maximum elongation of 700 to 1,300% at 0[deg.] C.

Long Sentences

Technical constructions

Largest single document: 249,322 words

Longest Sentence: 1,417 words

Data Engineering What is Linguistic Engineering?

Pre-processing Post-processing

Input Output

Training Data

Data Engineering + Linguistic Engineering An “ensemble” architecture

Chinese pre-ordering rules

Statistical Post-editing

Input

Output

Training Data

Spanish med-device entity recognizer Multi-output

Combination

Korean pharma tokenizer

Patent input classifier

Client TM/terminology (optional)

Japanese script normalisation

German Compounding rules

Moses

RBMT

Moses

Moses

If you don’t understand it, you can’t translate it

MT with Subject Matter Expertise

“Allopurinol-induced serious cutaneous adverse reactions (SCAR), including Steven Johnson’s syndrome

(SJS) and toxic epidermal necrolysis (TEN), are associated with a genetic marker, the HLA-B*5801

allele.”!

“IPTranslator is perfect for someone who needs to search [patents] across multiple languages and with is useful in the case of both patentability and infringement searches.”

– Aalt van de Kuilen, Global Head of Patent Information, Abbott

Machine Translation for Patents

What is the value for users?

Specialist solutions deliver more useable outcomes for the user

Post-editing

For information purposes

Multilingual search

Increased productivity Extract more meaning Retrieve more relevant results

= = =

De-risking the machine translation proposition

What is the value for users?

+ Data + Time + €€€ = ???

+ No data needed + Systems are ready to go + No upfront cost = Evaluate immediately

Our Prerequisites Typical Prerequisites

Customisation. Refinement.

» Incorporation of user feedback » Incremental training with post-edits » Tuning for specific input types

Iconic in practice

client case study

Iconic in practice

Iconic had a domain-specific MT solution for that industry

Machine Translation technology for the legal industry

Business Need

Iconic in practice

Delivered immediately and initial results were positive

Translation samples required for initial evaluation

Process (1)

Iconic in practice

“The complexities and unforeseen but inevitable surprises of MT integration in large scale production processes were handled both competently and efficiently.”

Integrate Iconic with GlobalSight for productivity pilot

Process (2)

Iconic in practice

>20% productivity increase for translator post-editing Iconic output

“Iconic delivered measurable productivity gains from the outset”

Performance

Iconic in practice

•  Ongoing improvement through feedback from translators •  Ongoing improvement through the incorporation of post-edits

•  More than 4 million words translated to date for Asian languages •  Periodic roll-out of new languages over time

Looking forward

Need: short-term solution to provide on-demand translation through a web search interface

Iconic in practice

Process: integrate directly through Iconic API and evaluate quality and throughput concurrently

Outcomes: in 5 months of production for English-Portuguese alone, we processed:

•  15,526 translation requests •  14,606,374 words

All content is not created equal

We cannot afford to be dogmatic when it comes to MT

Domain specific MT is about more than just data

Know your subject matter!

Take home messages…

Thank You! [email protected] @IconicTrans