TAUS MT SHOWCASE, The DQF Tools, Rahzeb Choudhury and Maxim Khalilov, TAUS, 12 June 2013
Mirai Translate - TAUS Tokyo 2015
-
Upload
taus-enabling-better-translation -
Category
Presentations & Public Speaking
-
view
218 -
download
0
Transcript of Mirai Translate - TAUS Tokyo 2015
© 2015 Mirai Translate, Inc. All rights reserved.
Mirai Translate, Inc.
1
“Impossible only means that you have still screwed up the solution.”
-Mick Etoh
© 2015 Mirai Translate, Inc. All rights reserved.2
Number of Inbound Visitors in 2014
13,413,567
JPY2,030,500,000,000 EUR15,522,200,000
© 2015 Mirai Translate, Inc. All rights reserved.
Translation Total Addressable Market (2014)
3
USD 2.1B
MT market USD 10M
© 2015 Mirai Translate, Inc. All rights reserved.
Unforeseen Challenges Ahead
4 Translation Speed (1/cost)
Quality
LSP Solutions
IP
Publication
Reports
CAT
Speech Translater
Google Translate
Web
Crowd Sourcing Solutions
SOHO SOHO
MT+Post Editing Solutions
MT Real Time Solutions
Unforeseen New Market Frontier
性能向上による 新領域
© 2015 Mirai Translate, Inc. All rights reserved.
72% of Japanese don’t speak English.5
© 2015 Mirai Translate, Inc. All rights reserved.6
Vision To realize a society in which everyone can interact freely across language barriers with the use of machine translation technology, and thereby
contribute to invigoration and innovation in businesses.
Mirai Translate, Inc.
© 2015 Mirai Translate, Inc. All rights reserved.
Mirai Translate as Joint Venture
7
Mobile Platform Leader ASR & MT Solution Provider Multilingual Enterprise MT developer
NLP and MT technology leader Multilingual SMT technology leader
Technology Transfer
© 2015 Mirai Translate, Inc. All rights reserved.8
Our Competence• Multiple Translation Engines from Systran and NICT
• MT Training Tools from Systran • NLP Tools Named Entity Extraction, Pre-Ordering,…
• NL Data Assets Corpus from Systran and NTT DOCOMO+ JPN Ontology Dictionary
• Strong Technical Team Experiences in AWS, Data Mining, MT toward our own original MT systems.
© 2015 Mirai Translate, Inc. All rights reserved.
Siri
9Big-Data, Big-Server, and Fat-Pipe Solution
© 2015 Mirai Translate, Inc. All rights reserved.
“Shabette-Concier” Voice agent service
• Launched Mar. 1, 2012
• Over 40 services in it
• Including chatting
• 10 million users
ShabetteVoice
= ConcierConcierge
=How may I help you?
10
© 2015 Mirai Translate, Inc. All rights reserved.
Touch the Concier.“Tell me how to make a pizza.”View a list of recipes of pizza.You can check a detailed recipe of pizza.“Tell me Italian restaurants nearby.”View a list of Italian restaurants.You can check detailed information of restaurants.11
© 2015 Mirai Translate, Inc. All rights reserved.Touch the Concier.Q: “What is the height of Mt. Fuji?”A: “3,766m!”Q: “When is holding schedule of the Tokyo Olympic Games?”A: “It will hold in 2020.” 12
© 2015 Mirai Translate, Inc. All rights reserved.
Basic Architecture 2010
Logging
Fuetrek VoiceRecognition
DOCOMO TaskRecognition
Logging
Voicetext text contents
Service Providers’ DB
contents
text
Text to speech
13
Fat-Pipe
Big-Servers
© 2015 Mirai Translate, Inc. All rights reserved.
Mirai Architecture 2015
Logging
Fuetrek VoiceRecognition
Mirai MT Engines
Logging
Voicetext text contents
Client Dictionary Corpus DB
contents
text
Text to speech
14
© 2015 Mirai Translate, Inc. All rights reserved.15
© 2015 Mirai Translate, Inc. All rights reserved.
We are Cloud Natives
16
システム構成部品
who believe our cloud solution is scalable and safer!
© 2015 Mirai Translate, Inc. All rights reserved.
Bilingual User Dictionaries
SYSnitionTRAN 7 HYBRID ENGINE
SYSTRAN Hybrid Architecture
17
Source
Transl
ation
Main Dictionaries Linguistic Rules
User Entities
Rules-‐Based MT
Statistical Post-‐Edition
SBS BSTarget
Monolingual Corpus
Source Adaptation
BSMonolingual Source Corpus
Bilingual Corpus or Translation Memories
Bilingual Translation Models
Target Language Models
Source Language Models
Self-‐training
Source Normalization Dictionaries
Self-‐Training
Self-‐Training
SBS
Statistical MT
Translation Memories
Bilingual Terminology Extraction
Spell Check Homographs
Target Normalization Dictionaries
Translation Memories
Pre-Filter Formating Normalization Segmentation
Entity Recognition
Translation Memory User Dictionary Match
Post-Processing Formatting Normalization
Post-Filter
a Commercial SMT Engine
© 2015 Mirai Translate, Inc. All rights reserved.
NTT Technology for JPN <-> EN
18
He saw a cat a long tail
this is Keiko Tanaka . this _va0 Keiko Tanaka is . 田中 恵子 と 申し ます
i used to jog every morning . i _va0 every morning jog to used . 毎朝 ジョギング し た もの です 。
she was wearing a sweater and high heals . she _va0 sweater and high heals _va2 wearing was . セーター を 着 て 、 ハイヒール を はい て い まし た 。
with sawcatwithlong tailが をHe
Post-Positional Particles
© 2015 Mirai Translate, Inc. All rights reserved.
Commerce
Patent Application
Finance
Corpus is the king,
19
Not only Size(Coverage) but also Fitness.
Written Language Corpus Variation
SpokenLanguage
CorpusVariation Generic
Corpus
Travel
Public Patents
Ideal Corpus Data
but it must be decent and well-structured.
© 2015 Mirai Translate, Inc. All rights reserved.20
SYSTRAN Training Server ‒ Main components
• Corpus Manager • Mono/bilingual corpus • Txt, html, doc, docx, rtf, xlsx, pptx, pdf, tmx • Virtual file management (aggregation, split) • Content Management Database (TU : Translation Units)
• Training Manager • Baseline Evaluation (Quality metrics: GTM, BLEU, TER) • Hybrid Model Training (SPE : Statistical Post-Edition) • Statistical Model Training (SMT : Statistical Machine Translation)
• Dictionary creation (UD) with bilingual terminology extraction • Dictionary validation (UD) against a bilingual corpus (TMX) • Translation Memory creation (TM) with document aligner
© 2015 Mirai Translate, Inc. All rights reserved.
Training Methodology
21
Collect Data Run Training Evaluate Publish to Pilot/Production
• Collect training data • Define the domain • Collect bilingual corpus (translation memories, documents and translations) • Collect monolingual corpus (text, content relevant to the domain) • Collect terminology if any (bilingual dictionaries, glossaries)
• Run initial training
• Evaluate
• Perform incremental cycles
© 2015 Mirai Translate, Inc. All rights reserved.22
V.S.
© 2015 Mirai Translate, Inc. All rights reserved.
• Collaboration Tools • Intranet Translation Portal • Web & Mobile Apps • Customer Service Portal
• Market Intelligence • Cyber-security • Forensic & eDiscovery Apps • Text Mining & Analytics
• Multilingual Web Site • Technical Translation Project • Translation Workflow Integration
Help and secure information
communication
Detect critical information within large scale foreign
data
Reduce costs and timelines for translation
projects
Business cases
Usages & Applications
Customers Translation Agencies & Corporations
Defense & Securities & Legal Organizations
Corporations & Public Organizations
LocalizationMultilingual Communication Big Data by HPC
Our Business Targets• 3 main markets
23
© 2015 Mirai Translate, Inc. All rights reserved.24
Multilingual MT JP, EN, CN, KR +ASEAN
Enterprise Solutions
Consumer Services
We are an engineering company…
MT APIs TMS
© 2015 Mirai Translate, Inc. All rights reserved.25
“It always seems impossible until it’s done.” - Nelson Mandela
As part of the Tomorrow television series produced by CBS for MIT's Centennial in 1961
© 2015 Mirai Translate, Inc. All rights reserved.
Their dreams are coming true.
Mirai Translate, Inc.26
@mickbean