Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt...

Speech-to-Speech Infrastructure Based on UIMA

Jan Kleindienst, Ph.D. (on behalf of TC_STAR partners) Manager, Conversational Interactions and ArchitecturesIBM Prague

Jan Kleindienst Speech-to-Speech Infrastructure Based on UIMA

Speech Tek 2007 August 20, 2007

Overview

Challenges

Approach

The Resulting Infrastructure

Use Cases

Conclusion

What is a speech-to-speech system?

S2S system translates spoken input from a source language to a target language

Speech-to-speech systems typically consist of three main processing blocks:

– Transcription

– Translation

– Synthesis

ASR MT TTS

Challenges

TC_STAR Project , 2004-2007, www.tc-star.org

Create an open technological infrastructure to support effective delivery of scientific results from speech-to-speech research community

Online distributed speech-to-speech infrastructure for automatic performance evaluation of end-2-end systems as well as individual components

Open technological framework based on open-source Unstructured Information Management Architecture (UIMA)

Key Challenge: Support Online System Combinations and Automatic Evaluations

ITC-Irst

RWTH IBM

Approach: Pick such an infrastructure, which…

…specifies a common data format understood by all speech-to-speech components

…has well-defined APIs that let the engines pass the data in and read them out

…transparently takes care of network and local connectivity options

…requires just minimum coding to plug the proprietary engines to the infrastructure

Common MUMA Type System

initialize(), process(), destroy(), …

Java/C++/… local calls or SOAP and Vinci

Concept of UIMA Annotators

UIMA Component Model:

Unstructured Information Management Architecture (UIMA)

What is UIMA?In Business Terms => the Analysis Bridge between unstructured and structured

information

In Technical Terms => infrastructure for integrating, processing and data managing all kinds of data driven engine entities, incl. support on monitoring

Key featuresUIMA is an emerging standard for text and media processing

UIMA SDK is open source under Apache license

UIMA infrastructure supports interoperability between platforms, component interfacing via Java, C++, Python, Perl, and remote/networked services

Offers a simple XML based integration with UIMA APIs

Distributed data exchange which supports complex data structures

UnstructuredInformation

AnalysisBridge

StructuredInformationStructuredInformation

….….

Inefficient Search Efficient Search

How to make components UIMA-pluggable?

Step1: Implement the required Annotator interface -=> initiate() & process() methods

Step2: Specify Component Descriptor XML file for configuration and lifecycle

Step3: Define in and out data structures of the Type System

proprietaryengine

Wrapper codeUIMA Annotator

CASMeta-dataMeta-data

CASdata

componentdescriptor

http EvaluationData input

Wrapper coderAnnotator API

Wrapper codeAnnotator API Upload

CAS CASCAS CAS

Collection Processing Engine

wrapper codeAnnotator API

Download

Evaluation

Wrapper coderAnnotator API

Vinci Name Service

EvaluationData results

EvaluationReports

TC_STAR Speech to Speech Evaluation infrastructure

pcm pcmsource text

pcmsource texttarget text

target audio

pcmsource texttarget text

target audioevaluation

TC_STAR Speech-2-Speech pan-European deployment

Data Webserver

EvalSLT

Vinci nameserver

Control Web Server

Download

Upload

Annotator

UIMA/other

Profile 2 ASR->SLT->TTS->EVAL in different setup

Profile 1: ASR->SLT->TTS->EVAL (with ASR ROVER)UPC

ITC-Irst

RWTH IBM

PuncuatorASR Rover

UIMA Web Control Console

Distributed Logging and Monitoring

AJAX infrastructure

Current user and status

Annotators combinationin use for the experiment

Experiment ID, and the set of input data

Links to graphical speech-to-speech evaluation results

UIMA Web Control ConsoleProcessing

engine

Path of completed processing

Engine where the data are currently

processed

Indication of active engine

Lessons learned… Pain in placing machines on public IPs

Firewall configuration for all participating machines, local IT people ;-) Need to support variety of Linux distributions to host UIMA …

Partially eliminated by UIMA school development warm up

Variety of programming languages for writing AnnotatorsJava, C++, Perl, Python, …

Broad Requirements on Common Type System Punctuation, Casing, Lattices

Support for individual secure data download/upload of data serverAuthentication, HTTPS, Firewall rules

Web console for controlling the evaluation lifecycleConcept of profiles, experiment ids, monitoring

Remote Logging and DebuggingDistributed logging capabilities, Logging to Web console

Reliability of components and networks

Speech-to-Speech Showcases

UIMA S2S Evaluation Web Portal The video demonstrates how S2S portal users (e.g. S2S researchers) set up, test, and evaluate speech-to-speech chains consisting of individual text and media processing components such as ASR, machine translation, TTS, etc. These components, in UIMA jargon called Annotators, are exported as Web services on public Internet and glued together by UIMA. More that 15 annotators are currently exported by IBM and EU institutes and universities.

http://www.tc-star.org/Demo/ibm/web_console_batch.swf

UIMA S2S Translation Video ConsoleThe individual Web service components can be assembled online into remote services that provide direct value to citizens. We show a video console that translates from English to Spanish (EU parliamentary domain). Note that the three Web services involved – ASR, MT, TTS are hosted by three different sites hundred kilometers away – glued together by UIMA.

http://www.tc-star.org/Demo/ibm/video_console_near_real_time.swf

Conclusion

First-of-a-kind online multi-partner speech-to-speech system demonstrated on UIMA (Jun 06-May 07)

Remote speech-to-speech components dynamically combined via UIMA infrastructure to support different combinations, e.g. ROVER– Annotators hosted on public IPs of partner’s site

– The framework controlled via UIMA Web AJAX infrastructure

The open infrastructure is used to automatically set-up and evaluate individual components as well as end-to-end systems

Designed to support various use cases from research experiments to technology showcasing

Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt...

Documents

Transcript of Speech-to-Speech Infrastructure Based on UIMA © 2003 IBM Corporation Presentation subtitle: 20pt...

IBM Research September 2003 | Oklahoma Supercomputing Symposium 2003 Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum.

Dublin Technology Campus 10 Years of Innovation for Growth Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length:

IBM Research © 2004 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines Confidentiality/date.

Wireless eBusiness Belgium | 21 Jan 2004 | Best of Wireless Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length:

September 2004 © 2004 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines Confidentiality/date.

Retail Store Solutions IBM | 2005 Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines Confidentiality/date.

Software Group Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines Confidentiality/date line: 13pt Arial.

IBM Research 11/17/2003 | TRECVID Workshop 2003 Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines.

IBM Research September 2003 | Languages and Compilers for Parallel Computing Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended.

IBM Pervasive Computing OSGi Confidential | 08 Aug 2003 Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length:

Classe 5 a Liceo Scientifico Collegio SantAntonio La sezione Aurea Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum.

JAVA Technology Centre - Hursley Java6 Development Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines.

KM for OHS Performance ORC Occupational Health and Safety Group| May 2003 | Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended.

November 2005 © 2005 IBM Corporation Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines Confidentiality/date.

Almaden Research Center April 24, 2006 Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines Confidentiality/date.

IBM eServer pSeries Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines Confidentiality/date line: 13pt.

Integrated Supply Chain 2007. február 20. Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines Confidentiality/date.

Zurich Research Laboratory LCN ‘03 | 22. October 2003 | Bonn / Königswinter Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended.

IBM Retail Store Solutions Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines Confidentiality/date.

T.J. Watson Research Center, Human Language Technologies 12/1/2003 Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum.