Post on 16-Dec-2015
Speech-to-Speech Infrastructure Based on UIMA
© 2003 IBM CorporationSpeechTek 2007 | August, 20 2007
Speech-to-Speech Infrastructure Based on UIMA
Jan Kleindienst, Ph.D. (on behalf of TC_STAR partners) Manager, Conversational Interactions and ArchitecturesIBM Prague
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
Overview
Challenges
Approach
The Resulting Infrastructure
Use Cases
Conclusion
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
What is a speech-to-speech system?
S2S system translates spoken input from a source language to a target language
Speech-to-speech systems typically consist of three main processing blocks:
– Transcription
– Translation
– Synthesis
ASR MT TTS
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
Challenges
TC_STAR Project , 2004-2007, www.tc-star.org
Create an open technological infrastructure to support effective delivery of scientific results from speech-to-speech research community
Online distributed speech-to-speech infrastructure for automatic performance evaluation of end-2-end systems as well as individual components
Open technological framework based on open-source Unstructured Information Management Architecture (UIMA)
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
Key Challenge: Support Online System Combinations and Automatic Evaluations
UPC
LIMSI
ELDA
ITC-Irst
UKA
RWTH IBM
?
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
Approach: Pick such an infrastructure, which…
…specifies a common data format understood by all speech-to-speech components
…has well-defined APIs that let the engines pass the data in and read them out
…transparently takes care of network and local connectivity options
…requires just minimum coding to plug the proprietary engines to the infrastructure
Common MUMA Type System
initialize(), process(), destroy(), …
Java/C++/… local calls or SOAP and Vinci
Concept of UIMA Annotators
UIMA Component Model:
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
Unstructured Information Management Architecture (UIMA)
What is UIMA?In Business Terms => the Analysis Bridge between unstructured and structured
information
In Technical Terms => infrastructure for integrating, processing and data managing all kinds of data driven engine entities, incl. support on monitoring
Key featuresUIMA is an emerging standard for text and media processing
UIMA SDK is open source under Apache license
UIMA infrastructure supports interoperability between platforms, component interfacing via Java, C++, Python, Perl, and remote/networked services
Offers a simple XML based integration with UIMA APIs
Distributed data exchange which supports complex data structures
UnstructuredInformation
UnstructuredInformation
AnalysisBridge
AnalysisBridge
StructuredInformationStructuredInformation
….….
Inefficient Search Efficient Search
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
How to make components UIMA-pluggable?
Step1: Implement the required Annotator interface -=> initiate() & process() methods
Step2: Specify Component Descriptor XML file for configuration and lifecycle
Step3: Define in and out data structures of the Type System
proprietaryengine
Wrapper codeUIMA Annotator
CASMeta-dataMeta-data
data
CASdata
componentdescriptor
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
http EvaluationData input
SLT
Wrapper coderAnnotator API
TTS
Wrapper codeAnnotator API Upload
CAS CASCAS CAS
Collection Processing Engine
ASR
wrapper codeAnnotator API
Download
Evaluation
Wrapper coderAnnotator API
CAS
Vinci Name Service
EvaluationData results
EvaluationReports
TC_STAR Speech to Speech Evaluation infrastructure
pcm pcmsource text
pcmsource texttarget text
pcmsource texttarget text
target audio
pcmsource texttarget text
target audioevaluation
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
TC_STAR Speech-2-Speech pan-European deployment
Data Webserver
SLT
TTS
EvalSLT
Vinci nameserver
Control Web Server
Download
CPE
ASR
Upload
TTS
Upload
Annotator
UIMA/other
Profile 2 ASR->SLT->TTS->EVAL in different setup
Profile 1: ASR->SLT->TTS->EVAL (with ASR ROVER)UPC
LIMSI
ELDA
ITC-Irst
UKA
RWTH IBM
ASR
SLT
PuncuatorASR Rover
ASR
ASR
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
UIMA Web Control Console
Distributed Logging and Monitoring
AJAX infrastructure
Current user and status
Annotators combinationin use for the experiment
Experiment ID, and the set of input data
Links to graphical speech-to-speech evaluation results
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
UIMA Web Control ConsoleProcessing
engine
Path of completed processing
Engine where the data are currently
processed
Indication of active engine
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
Lessons learned… Pain in placing machines on public IPs
Firewall configuration for all participating machines, local IT people ;-) Need to support variety of Linux distributions to host UIMA …
Partially eliminated by UIMA school development warm up
Variety of programming languages for writing AnnotatorsJava, C++, Perl, Python, …
Broad Requirements on Common Type System Punctuation, Casing, Lattices
Support for individual secure data download/upload of data serverAuthentication, HTTPS, Firewall rules
Web console for controlling the evaluation lifecycleConcept of profiles, experiment ids, monitoring
Remote Logging and DebuggingDistributed logging capabilities, Logging to Web console
Reliability of components and networks
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
Speech-to-Speech Showcases
UIMA S2S Evaluation Web Portal The video demonstrates how S2S portal users (e.g. S2S researchers) set up, test, and evaluate speech-to-speech chains consisting of individual text and media processing components such as ASR, machine translation, TTS, etc. These components, in UIMA jargon called Annotators, are exported as Web services on public Internet and glued together by UIMA. More that 15 annotators are currently exported by IBM and EU institutes and universities.
http://www.tc-star.org/Demo/ibm/web_console_batch.swf
UIMA S2S Translation Video ConsoleThe individual Web service components can be assembled online into remote services that provide direct value to citizens. We show a video console that translates from English to Spanish (EU parliamentary domain). Note that the three Web services involved – ASR, MT, TTS are hosted by three different sites hundred kilometers away – glued together by UIMA.
http://www.tc-star.org/Demo/ibm/video_console_near_real_time.swf
Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA
Speech Tek 2007 August 20, 2007
Conclusion
First-of-a-kind online multi-partner speech-to-speech system demonstrated on UIMA (Jun 06-May 07)
Remote speech-to-speech components dynamically combined via UIMA infrastructure to support different combinations, e.g. ROVER– Annotators hosted on public IPs of partner’s site
– The framework controlled via UIMA Web AJAX infrastructure
The open infrastructure is used to automatically set-up and evaluate individual components as well as end-to-end systems
Designed to support various use cases from research experiments to technology showcasing