SyNTHEMA Speech & Language Technologies Stato dell’arte da una prospettiva Industriale

Carlo Aliprandi

1

SyNTHEMA

Speech & Language Technologies Stato dell’arte da una prospettiva Industriale

Carlo Aliprandi

Synthema srl

Carlo Aliprandi

2Company Profile

Based in Pisa (Italy), SyNTHEMA is a high-technology SME that was established in 1993 by computer scientists from the IBM Research Center. Since then, the company has rapidly evolved, becoming nowadays a leading provider of Language and Semantic solutions, with state-of-the-art technologies for applications like Enterprise Search, Audio&Text Mining, Technology Watch, Competitive Intelligence, Speech Recognition, Respeaking and Speech Analytics.

Grounding its leadership into a strong IT Research and Development, SyNTHEMA has pioneered a number of innovative applications and solutions, adopted on a daily basis by a vast amount of users to perform productivity tasks in different markets and industries, including Homeland Security, Intelligence and Law Enforcement, Public Administration and Government, Healthcare and Media.

Carlo Aliprandi

3Structure and activities

Semantic Technology

Translation Technology

Speech Technology

• 30 People (20 IT, 10 Localisation Services)

Carlo Aliprandi

4Il linguaggio naturale

Source Ethnologue Source Netz-Tipp.De

Source http://www.netz-tipp.de/languages.html

http://www.netz-tipp.de/languages.html

Tecnologie del linguaggio, alcuni esempi

LINGUAGGIO SCRITTO Traduzione Automatica Semantica Ricerca in linguaggio naturale Information Retrieval Question Answering

LINGUAGGIO PARLATO

Speech Recognition – Speech to Text Respeaking Trascrizione Automatica Sottotitolazione Assistita Comprensione del Parlato Gestione del dialogo (Avatar,..)

Carlo Aliprandi

5

Carlo Aliprandi

6Semantica

The Italian market offers State of the art for:• Lemmatisation• POS Tagging • MultiWord Detection (MWD)• Named Entity Recogniiton (NER)• Parsing (dependency – constituency)• Word Sense Disambiguation (WSD)• Sentiment Analysis (SA)• Semantic Role Labeling (SLR)

Languages:

Carlo Aliprandi

7Semantica

• è un cool topic?– Bing Microsoft – Powerset (linguistic processor)– Google – Applied Semantics (ontology, or knowledge base of

concepts and their relationships, coupled with linguistic processing engine)

– Google Squared (structures the unstructured data on web pages)– Hakia (meaning-based search engine, ontology and semantic lexicon,

ontological parser)– WolphramAlpha

+ computational knowledge engine, distilled and revised knowledge, NL query, rich visualisation

- Knowledge engineering, language dependent

– IBM Watson (Jeopardy!)

• aspettando la killer app, c’è una domanda latente di “Semantic Search”

Carlo Aliprandi

8Speech Technology

The Italian market offers State of the art for:

• Automatic Speech Recognition

• Automatic Transcription

• Dialogue Systems

• Speech Analytics

Languages:

Carlo Aliprandi

9The evolution of Dictation

• 1° generation: 1990-2000, Application of ASR products to respeaking– Players (technology for CSR):

• IBM ViaVoice, Dragon DNS, L&H Xspeech, Philips FreeSpeech, Kurtzweil, Nuance, Loquendo and others (>10!!) tools plugged into existing subtitling solutions

– Technology Benefits:• Speaker dependent, great accuracy and large accent coverage• Large Vocabularies available (LVSR)• Good accuracy up to 95-97%• Good throughput (up to 170 wpm)

– Some technology limitations:• SR mainly designed for dictation• SR available for ‘general’ domains / main languages• Partial coverage of specific domains (news, politics, economy, gossip…)• Problem to deal with Out-of-Vocabulary-Words• Error correction (live and deferred)• Improvement of language models

– But main benefit:• technology can allow fast training of new (untrained) staff• technology affordable and costless, no need for huge investments • Well fitting to pre-recorder and close-to-live programs

– And main operating limitations:• Typically support single operator (Respeaker)• The respeaker ‘alone’ has to face a challenging task, with a big cognitive overload• Hardly fitting to Live programs (talk-shows, interviews…)

Carlo Aliprandi

10The evolution of Dictation

• 2° generation 2000-2010 :• Global Players: Nuance DNS, Philips Speechmagic, IBM ViaVoice

– Technology Benefits:• Speaked dependent, great accuracy and large accent coverage• Large Vocabularies available (LVSR)• Good accuracy up to 97-99%• Good throughput (up to 170 wpm)

– Overcomed technology limitations:• SR mainly designed for dictation -> Adaptation to different speech (conversational speech)

-> Reduced training time (30’ - > 5’)• SR available for ‘general’ domains -> development of specific topics (news, politics, …• Problem to deal with OVW -> preanalysis of similar text/scripts

-> live management (editing+insertion) of OVW • Error correction (live and deferred) -> live: dual operator systems (respeaker+corrector)• Improvement of language models -> respeaked speech and aligned scripts saved: error

correction improving language models (lettuce - let’us)

– benefits:• Fitting to ‘major’ Live Programs (News, sport events)

– And main operating limitations:• The respeaker has still to face a cognitive overload• Not completely fitting to specific kind of Live programs (chat magazines, talk-shows, major political

debates..• Introducing subtitles with some delay (5-7’ acceptable)

Carlo Aliprandi

11The present (and future) of Dictation

• 3° generation: 2010-2015– Global Player technology for CSR:

• Nuance DNS (and no others !!). – emerging of providers of new professional technology for SR:

• Emerging of new ASR engines for (batch and live) transcription • Speaker Independent systems (Nuance Dictate, IBM Attila….)• SR engines for Smartphones and cloud services (Google Speech, Apple, Facebook, …)

– new emerging interest and applications• Audio Alignement and segmentatoin• Audio annotation and indexing for cross-media search• Media Monitoring

ASR from an Industry perspective

• Needs?– – ASR has several limitations, because it has been designed for

dictation applications, thus performing too poorly in specific tasks, like Subtitling.

– language coverage may be limited, as commercial systems have been developed to target the main language markets (i.e. English, Spanish, French, German, ..) and they are not available for many languages and dialects

– domain coverage may be limited, as commercial systems have been developed to target general and generic topics

• Limitations– Data: resources (raw data – tagged data – models) to build an ASR

technology are not available for several languages– Needs are different, from the market perspective

Carlo Aliprandi

Carlo Aliprandi

13SAVAS

• Is ASR god enough for an application task like Subtitling?• Is an IT provider (academy or R&D) sufficient to fullfill market needs

(improving operations, new offerings ..)?

• Reporting is different (vs Respeaking) :– Not real time– Typically Verbatim (or close-to)– Different audience– No persistence and visualization boundaries (colors, formatting, audio

descriptors….)• Dictation has proved to be a valid alternative for subtitling, taking over

traditional reporting methods• Traditional reporting methods, like fast keyboarding and stenotyping early

adopted• SAVAS brings together Broadcasters, Subtitling Companies, Universities

and Companies involved in the industries of Media, Accessibility and LVCSR

Carlo Aliprandi

14Speech Recognition

• Dictation– Dictation is the interactive composition of text– Medical Report, court – parliamentary proceedings

• Transcription– Transcription is transforming

speech into text (Batch – Online)

• Dialogue– CRM, device control, navigation, call routing

• Multimedia Mining– Audio2text ; Text2Audio

http://128.65.123.18/ent-it-up/index.php

Carlo Aliprandi

15

– Q&A

Thank you

Courtesy of

SyNTHEMA Speech & Language Technologies Stato dell’arte da una prospettiva Industriale

Documents

Transcript of SyNTHEMA Speech & Language Technologies Stato dell’arte da una prospettiva Industriale