Micai 13 contextualized practical speech

Post on 04-Jul-2015

41 views 1 download

description

Talk given in MICAI 2013: Practical Speech Recognition for Contextualized Service Robots

Transcript of Micai 13 contextualized practical speech

Practical Speech Recognition for Contextualized Service Robots

Departamento de Ciencias de la ComputaciónInstituto de Investigaciones en Matemáticas Aplicadas y en Sistemas

Universidad Nacional Autónoma de México

http://golem.iimas.unam.mx/

Ivan Meza, Caleb Rascón and Luis Pineda

GrupoGolem

Service robots● Our future butlers ● They are task oriented

○ Clean up a room○ Play a game

● Interaction with spoken language ● They work in noisy environments● Microphone is not close to the speaker● Poor speech recognition

Proposal● Improve the system on four aspects

● Contextualized recogniser

● Prompting strategies

● Recovery strategies

● Audio calibration

I. Contextualized recognition

● Use specific language models for the given expectations

■ YES: yes, okay, all right■ NO: no, don’t, do not

■ NAVIGATE: go to the kitchen, go to the living room, go to the bedroom

ASR module

II. Prompting strategies

● Let know the user when to speak

■ Beep sound

● Speaker volume monitor

■ Could you speak louder or softer

III. Recovery strategy

● Let know the user when something went wrong

■ could you repeat? ■ i can’t hear you well, could you repeat■ sorry, i’m a little deaf

IV. Calibration of audio setting

● Hardware■ 1 directional microphone■ 1 USB interface with 4 channels■ 2 speakers

● Calibration of SNR in situ■ For background noise -58dB■ SNR set to 20 dB

Corpus evaluation

● Logs from the robot performing RoboCup tasks■ 2 years interactions in lab and competition■ 1,439 utterances■ 2,472 tokens■ 120 types■ 11 tasks■ 9 of 11 tasks are contextualized■ 14 language models

Contextualized recognitionWe measure WER (the lower the better)

● With a unique LM for all tasks: 53.84%

● With task-based LM: 28.28%

● With contextualized: 23.42%

17.2% relative error reduction

Beep sound

● 79 utterances were recorded without the beep sound

■ Without beeps 55.86%

■ With beeps 39.75%

■ With beeps full 53.72%

30%-4% Relative error reduction

Usage of SoundLoc System ● We measure usage

■ 174 times could have been triggered

■ 21 soft speech

■ 4 louder

14.36% of the times

Recovery strategy ● We measure usage

■ 504 times could have been triggered

■ 85 times activated

16.87% of the times

Conclusions

● These strategies help to improve in small amounts the performance

● Together they allow practical speech recognition on a service robot

Thank you

● ¿Questions?