CP SC 881 Spoken Language Systems. 2 of 23 Auditory User Interfaces Welcome to SLS Syllabus...

23
CP SC 881 Spoken Language Systems

Transcript of CP SC 881 Spoken Language Systems. 2 of 23 Auditory User Interfaces Welcome to SLS Syllabus...

CP SC 881

Spoken Language Systems

2 of 23

Auditory User Interfaces

Welcome to SLS

Syllabus

Introduction

3 of 23

Good Design (our goal!)

“Every designer wants to build a high-quality interactive system that is admired by colleagues, celebrated by users, circulated widely, and imitated frequently.” (Shneiderman, 1992, p.7)

…and anything goes!…

4 of 23

Auditory User Interfaces

An Auditory user interface (AUI) is an interface which relies primarily or exclusively on audio for interaction, including speech and sound. (Weinschenk & Barker 2000)

5 of 23

Auditory User Interfaces

Natural Language/Speech User Interfaces Conversation is natural

Multimodal User Interfaces Combines voice, text, graphics, gestures,

keypad, stylus, etc. into one interface

6 of 23

Multimodal User Interfaces

Simultaneous Multimodality Multiple modes at the same time, voice-visual

Sequential Multimodality Uses multiple modes sequentially and

seamlessly

7 of 23

But What Makes a Good VUI?

Functionality Speed & efficiency Reliability, security, data integrity Standardization, consistency USABILITY !

8 of 23

Closer to Fine: A Philosophy

…The human user of any system is the focus of the design process. Planning and implementation is done with the user in mind, and the system is made to fit the user, not the other way around….

Bruce WalkerGeorgia Institute of

Technology

9 of 23

How Do You Know It’s Good?!

Usability Test and Evaluation

Human Factors in Speech

11 of 23

Human Factors in Speech

High Error Rates Speech recognition Background noise, intonation, pitch, volume Grammars (missing words, size limitations)

“When speech recognition becomes genuinely reliable, this will cause another big change in operating systems.” (Bill Gates, The Road Ahead 1995)

12 of 23

Human Factors in Speech

Unpredictable Errors Grammars

Sound alike words Austin-Boston Missing Words Grammar Size Limitations

Note: We do not like using unpredictable machines.

13 of 23

Human Factors in Speech

User Expectations Novice users have high expectations of computers

and speech Natural Language

Novices expect to say “anything” to the machine i.e. Star Trek

Spoken Language differs from written language. i.e. ums or uhs appear in spoken language

14 of 23

Human Factors in Speech

Memory Speech only systems can be taxing on human

memory, i.e. large telephone menu systems. Miller - 7 plus or minus 2

Definitions and Terms

16 of 23

Speech Recognition

Refers to the technologies that enable computing devices to identify the sound of human voice.

List all the Clemson University orders.

17 of 23

Speech Recognition

Continuous Recognition Allows a user to speak to the system in an

everyday manner without using specific, learned commands.

Discrete Recognition Recognizes a limited vocabulary of individual

words and phrases spoken by a person.

18 of 23

Speech Recognition

Word Spotting Recognizes predefined words or phrases. Used by discrete recognition applications.

“Computer I want to surf the Web” “Hey, I would like to surf the Web”

19 of 23

Speech Recognition

Voice Verification or Speaker Identification Voice verification is the science of verifying a person's

identity on the basis of their voice characteristics. Unique features of a person's voice are digitized and

compared with the individual's pre-recorded "voiceprint" sample stored in the database for identity verification.

It is different from speech recognition because the technology does not recognize the spoken word itself.

20 of 23

Speech Synthesis

Refers to the technologies that enable computing devices to output simulated human speech.

James, here are the Clemson University

orders.

21 of 23

Speech Synthesis

Formant Synthesis Uses a set of phonological rules to control an

audio waveform that simulates human speech.

Sounds like a robot, very synthetic, but getting better.

22 of 23

Speech Synthesis

Concatenated Synthesis Uses computer assembly of recorded voice

sounds to create meaningful speech output. Sounds very human, most people can’t tell

the difference.

23 of 23

Uses of Speech Technologies

Interactive Voice Response Systems Call centers

Medical, Legal, Business, Commercial, Warehouse

Handheld Devices Toys and Education Automobile Industry Universal Access (visual/physical impaired)