Human – Network Voice Interface in A Wireless Era

Information–related Activities, Applications and Services in Future Network Era

• Multi–media, Multi–lingual, Multi–functionalities• Cross–cultures, Cross–domains, Cross–regions• Integrating All Knowledge Systems and Information–related Activities

and Services Globally• Multiple User Terminals

– telephone set, hand set, PDA, vehicular electronics, home appliance, personal computer, etc.

Future Integrated Networks

Real–time Information– weather, traffic– flight schedule– stock price– sports scores

Electronic Commerce– virtual banking– on–line transactions– on–line investments

Knowledge Archieves– digital libraries– virtual museums

Intelligent Working Environment– e–mail processors– intelligent agents– teleconferencing– distant learning

Private Services– personal notebook– business databases– home appliances– network entertainments

Wireless Access of Global Multi–media Information

• At Any Time, from Anywhere• As Handset Size Shrinks While Required Functionalities Grows and the

User Environment Changes, Voice Interface will be Useful for all User Terminals

• Examples– voice retrieval,voice browser, voice portal, voice web– spoken dialogue based access to intelligent agents

speech information

speech

Private Services/

Databases/ Applications

Public Services/

Information/Knowledge

InternetInformation Retrieval

textinformation

Text-to-speechSynthesis

Spoken Dialogue

Scenario for Network Information Access

text, image, video, speech, …

Convergence of PSTN and Internet

handsets

• PSTN(for Voice) and Internet(for Data and Multi-media Contents) are Converging

telephones

• Driving Force for the Convergence– “anywhere, any time” of wireless services– voice provides the most convenient and natural interaction interface– attractive contents over the Internet– contents(human information) are why the Internet is attractive, while voice direct

ly carries human information– Speech-enabled Access of Web-based Applications

Internet

servers

Voice Interface for Human-network Interaction

– huge volumes of data disseminated across the globe by optical fiber networks

– any time, from anywhere by wireless terminals

– vehicular electronics, PDA, handset, home appliance, etc.

new platforms accessing the global network information/services

– traditional keyboard/mouse not adequate any longer size shrinkage, different user environment, etc.

desired functionalities/human–network interactions increasing

– voice interface will be one out of the few most important, natural, user friendly, attractive interface

– examples: voice retrieval, voice browser, voice portal, voice webvoice–based web–user interaction

voice–based web tools/Application Interfaces, etc.

– voice interface is the only major “missing link” in the “semi–mature” technology chain

Core Technologies / Functionalities for Voice Interface

Feature Extraction

unknown speech signal

Pattern Matching

Decision Making

x(t)WX

output wordfeature

vector sequence

Reference Patterns

Feature Extraction

y(t) Y

training speech

Speech Recognition as a pattern recognition problem

• A Simplified Block Diagram

• Example Input Sentence this is speech• Acoustic Models (th-ih-s-ih-z-s-p-ih-ch)• Lexicon (th-ih-s) → this (ih-z) → is (s-p-iy-ch) → speech• Language Model (this) – (is) – (speech)

P(this) P(is | this) P(speech | this is) P(wi|wi-1) bi-gram language model

P(wi|wi-1,wi-2) tri-gram language model,etc

Basic Approach for Large Vocabulary Speech Recognition

Front-endSignal Processing

AcousticModels Lexicon

FeatureVectors

Linguistic Decoding and

Search Algorithm

Output Sentence

SpeechCorpora

AcousticModel

Training

LanguageModel

Construction

TextCorpora

LexicalKnowledge-base

Language

Input Speech

ICGGrammar

Speech Recognition Technologies, Applications and Problems

• Word Recognition

– voice command/instructions

• Keyword Spotting

– identifying the keywords out of a pre-defined keyword set from input voice utterances

• Large Vocabulary Continuous Speech Recognition

– entering longer texts

– remote dictation

• Speaker Dependent/Independent/Adaptive

• Acoustic Reception/Background Noise/Channel Distortion

• Read/Spontaneous/Conversational Speech

Text-to-speech Synthesis

Text Analysis and Letter-to-

sound Conversion

Text Analysis and Letter-to-

sound Conversion

Prosody Generation

Signal Processing

and Concatenation

Signal Processing

and Concatenation

Lexicon and Rules

Prosodic Model

Voice Unit Database

Input Text

Output Speech Signal

• Transforming any input text into corresponding speech signals • E-mail/Web page reading • Prosodic modeling • Basic voice units/rule-based, non-uniform units/corpus-based

Speaker Verification

Feature Extraction

Feature Extraction VerificationVerification

input speech yes/no

• Verifying the speaker as claimed• Applications requiring verification • Text dependent/independent• Integrated with other verification schemes

Speaker Models

Information Retrieval Including Voice

• Text Documents/Instructions• Speech Documents/Instructions• Voice Personal Notebook/Private Database

speech instruction

我想找有關新政府組成的新聞？我想找有關新政府組成的新聞？text instruction

text documents

speech documents

總統當選人陳水扁今天早上…

Multi-lingual Functionalities

• Code-Switching Problem– English words/phrases inserted in Spoken Chinese sentences

人人都用 Computers，家家都上 Internet– the whole sentence switched to English

準備好了嗎？ Let’s go!

• Cross-language Network Information Processing– globalized network with multi-lingual content/users– cross-language network information processing with spoken Chinese language

input as an example

• Chinese Dialects/Accents– Taiwanese, Cantonese, Shanghainese, etc.– hundreds of Chinese dialects– code-switching problem─dialects mixed with Mandarin(or plus English)– Mandarin with a variety of strong accents

• Language Dependent/Independent Technologies

Spoken Dialogue Systems

• Almost all human-network interactions can be made by spoken dialogue

• Speech understanding• System/user/mixed initiatives• Reliability/efficiency, dialogue modeling/flow control

Databases

Sentence Generation and Speech Synthesis

Output Speech

Input Speech

DialogueManager

Speech Recognition and Understanding

User’s Intention

Discourse Context

Response to the user

Internet

Networks

Dialogue Server

Human – Network Voice Interface in A Wireless Era

Documents

Transcript of Human – Network Voice Interface in A Wireless Era

OpenScape Voice Interface Manual: Volume 5, SIP Interface to ...

Interface D EDS - Electro-Voice

“I think I’m losing my voice” Voice telecommunications in the Internet era

Engaging Pictorial Images and Voice Prompts Interface ...iosrjournals.org/iosr-jmca/papers/Vol5-Issue5/C05051122.pdf · images and voice prompts interface significantly influence

OpenScape Voice V8 Interface Manual: Volume 4, CSTA Interface_CSTA_Inte… · OpenScape Voice V8 Interface Manual: Volume 4, CSTA Interface Description A31003-H8080-T109-03-7618

controlling a robot using android interface and voice - Theseus

Digital Voice Interface Application Note€¦ · the Digital Voice Interface developed on the Telit’s modules shown in the Applicability Table. 1.1. Scope This Application Note

Baseband IP Voice handover interface specification - … IP voice... · Baseband IP Voice handover interface specification ... service provider soft-switch and do not rely on the

Implementation of a Biometric Interface in Voice ...

A Greek voice recognition interface for ROV applications ...

Voice as the user interface a new era in speech … as the user interface – a new era in speech processing Debbie Greenstreet Product Marketing Manager Digital Signal Processors

VoLTE & VoMBB The New Era in Voice Services

Mitel MI Voice Business [v7.2 SP1 PR1 13.2.1.27] to ... · voice-class codec 1 voice-class sip bind control source-interface GigabitEthernet0/0/3 voice-class sip bind media source-interface

OpenScape Voice V6 Interface Manual: Volume 6, SIP ...wiki.unify.com/images/a/ac/OpenScape_Voice_V6,_SIP_to_SP_Interfac… · OpenScape Voice V6 Interface Manual: Volume 6, ... SIP

Voice Driven User Interface Design

Voice User Interface Design

OpenScape Voice V8 Interface Manual: Volume 1, CDR Interface...CDR Overview cdr_overview.fm What is a CDR? A31003-H8080-T104-04-7618, 11/2015 8 OpenScape Voice V8, Interface Manual:

OpenScape Voice V8 Interface Manual: Volume 5, SIP ......A31003-H8080-T111-02-7618, 11/2015 OpenScape Voice V8, Interface Manual: Volume 5, SIP Interface to Phones, Description 7 Contents

OpenScape Voice Interface Manual: Volume 5, SIP Interface ... · OpenScape Voice Interface Manual: Volume 5, SIP Interface to Phones Description A31003-H8070-T106-03-7618

OpenScape Voice Interface Manual: Volume 5, SIP …wiki.unify.com/images/9/9b/OpenScape_Voice_V7,_SIP_to_Phones... · OpenScape Voice Interface Manual: Volume 5, SIP Interface to