Interactive Systems Technical Design

19
ISTD 2003, Audio / Speech Interactive Systems Technical Design Seminar work: Audio / Speech Ville-Mikko Rautio Timo Salminen Vesa Hyvönen

description

Interactive Systems Technical Design. Seminar work: Audio / Speech Ville-Mikko Rautio Timo Salminen Vesa Hyvönen. Introduction. - PowerPoint PPT Presentation

Transcript of Interactive Systems Technical Design

Page 1: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Interactive Systems Technical Design

Seminar work: Audio / SpeechVille-Mikko Rautio

Timo SalminenVesa Hyvönen

Page 2: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Introduction• When gathering information about surrounding

environment, hearing is one basic sense for humans. Therefore, usage of audio and speech as an alternative input and output method can effort a lot to a user experience in interactive systems and make it more natural.

Page 3: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Motivation• Building interactive systems,

user interface should behave according to the expectations of the user experiences of the real world.

• Generally, user interfaces today are mainly based on keyboard and screen. Feedback from system is given basically only in visual form. In computer-based systems, much better user experience can be achieved by offering information using also other basic senses such as hearing, sense of taste, touch and smell.

Page 4: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Implementation• Basically two components: Audio playback

and speech/audio recognition.• Design issues:

• Audio can be speech / non-speech • To whom are you designing for?

• Different users – different abilities• Blind, old and disabled people

• Human diversity – physical, perceptual, cultural and intellectual differences

• Mobile computing• Limited input, limited output, slow processor, small memory, limited

battery life, slow network connection• Communication protocol• Speech recognition causes major problems• Accuracy

• Usage in critical systems?

Page 5: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Applications• MIT Media Lab –

Nomadic Radio: Wearable Audio Computing• A client-server based messaging infrastructure • utilizes spatialized audio, speech synthesis and

recognition • hourly news broadcasts, voice mail, email, calendar

reminders, weather forecasts, stock reports are delivered

• HP Labs – SpeechBot• a search engine for audio & video content that is hosted

and played from other websites using speech recognition• http://speechbot.research.compaq.com/

Page 6: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Nomadic Radio Network Architecture

Page 7: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Page 8: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Strengths / Advantages

• Data input possible without keyboard.• Mobile devices

• Excellent for hands/eyes busy – situation.

Page 9: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Strengths / Advantages• People with visual or other disabilities• Natural way for humans to interface with the environment• Increase the bandwidth of communication• Devices with limited screen – need for additional output

method• Technology available now

Page 10: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Limitations / Weaknesses• Input is error prone especially in noisy environments• Vocabulary size in recognition - Controlling objects and things is limited• Communication protocol needed

• “Computer! Shut down the lights!”• Can lead to unnatural experience• How to tell user what communication protocol is like:

• Explicit – tell exactly what to say (“Welcome to library, say “XXX” to ...”)• Implicit – open ended, potential for errors (“Welcome to library, what would you

like to do….”).

Page 11: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Limitations / Weaknesses• Speech output sounds unnatural• Asymmetrical

• speech input is faster than typing• speech output is slower than reading

• Feedback & latency• User needs to know if recognition was successful• Is system processing data or waiting input?• Time taken to recognise utterance• Pauses

Page 12: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Selected Industrial Players• IBM

• Conversational Biometrics• Combines multiple verification sources such as voice

biometrics with spoken knowledge.• Embedded ViaVoice

• IBM speech technology to mobile devices • Command and control (C&C) • Text-to-Speech (TTS)

• Sony• SDR-4X

• Prototype of entertainment robot using multi-modal human interaction technology

• Individual person detection by the tone of voice• Continuos speech recognition and unknown vocabulary

acquisition• Speech synthesis and singing voice production

Page 13: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

SDR-4X

Page 14: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Selected International Research Groups and Projects

• The MBROLA Project• Develops speech engine which synthesizes written text for

many different languages• Speech Engine core freely available!• http://tcts.fpms.ac.be/synthesis/mbrola.html

• Stanford University – Interactive Workspaces• Goal is to create interactive space where you can work

collaboratively using natural gestures• http://iwork.stanford.edu/

• Speech Interface Group, MIT Media Laboratory• Major player, numerous projects• Example: Nomadic Radio: Wearable Audio Computing• http://web.media.mit.edu/~nitin/NomadicRadio/

Page 15: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Selected International Research Groups and Projects

•MIT, PROJECT OXYGEN•Pervasive, human-centered computing•Integrated software system that will reside in the public domain

•Speech and vision, provide the main modes of interaction in Oxygen.•Multilingual systems support dialog among participants speaking different languages.•The SpeechBuilder utility supports development of spoken interfaces.•http://oxygen.lcs.mit.edu/Overview.html

Page 16: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Page 17: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Selected Finnish Research Groups and Projects

• VTT, Interactive Intelligent Electronics (IIE)• User interface technologies for future home environments,

The Smart-Its Project, Beyond the GUI, …• http://www.vtt.fi/ele/projects/iie/index.htm

• Helsinki University of Technology,Neural Network Research Centre• Adaptive Natural Language Processing• http://www.cis.hut.fi/projects/natlang/

• Tampere University of Technology,Speech-based and Pervasive Interaction Group• USIX-Interact, Dumas, Mobile User Interfaces, …• http://www.cs.uta.fi/research/hci/spi/

Page 18: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Companies and Research Groups in Oulu

• MediaTeam Oulu, Language and Audio Technology• CBIR – Content Based Information Retrieval• Filling of the Semantic Gap in the Retrieval of

Audio and Video Recordings• Multiparametric prosodic analysis of phonetic

and phonological correlates of emotions• Vikings• http://www.mediateam.oulu.fi/research/lat/?

lang=en

Page 19: Interactive Systems Technical Design

ISTD 2003, Audio / Speech

Future Developments

• Multimodality• Multilingual, natural speech interaction• Emotional state• Biometrics