Characterizing and Processing Robot-Directed Speech

26
Characterizing and Processing Robot-Directed Speech Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group

description

Characterizing and Processing Robot-Directed Speech. Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group. Characterizing and Processing Robot-Directed Speech. Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal - PowerPoint PPT Presentation

Transcript of Characterizing and Processing Robot-Directed Speech

Page 1: Characterizing and Processing  Robot-Directed Speech

Characterizing and Processing

Robot-Directed Speech

Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group

Page 2: Characterizing and Processing  Robot-Directed Speech

Characterizing and Processing

Robot-Directed Speech

Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group

Page 3: Characterizing and Processing  Robot-Directed Speech

Characterizing and Processing

Robot-Directed Speech

Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group

Page 4: Characterizing and Processing  Robot-Directed Speech

Characterizing and Processing

Robot-Directed Speech

Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group

Page 5: Characterizing and Processing  Robot-Directed Speech
Page 6: Characterizing and Processing  Robot-Directed Speech
Page 7: Characterizing and Processing  Robot-Directed Speech
Page 8: Characterizing and Processing  Robot-Directed Speech
Page 9: Characterizing and Processing  Robot-Directed Speech

Speech Recognition

What speech does the robot evoke?

How can we process that speech?

– Support a growing vocabulary

– Without hurting recognition accuracy

Page 10: Characterizing and Processing  Robot-Directed Speech

Baseline System

Extending Vocabulary

Extending Vocabulary

Page 11: Characterizing and Processing  Robot-Directed Speech

Baseline System

“Magic word” to introduce vocabulary

Separation of introduction from use

Robot confused by unknown/filler words

Page 12: Characterizing and Processing  Robot-Directed Speech

Study: Characterizing Speech

13 children

5-10 years old

20 minute sessions

Some prompting (Turkle et al)

Page 13: Characterizing and Processing  Robot-Directed Speech
Page 14: Characterizing and Processing  Robot-Directed Speech

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7

8

9

10

children (cumulative %)

nu

mb

er o

f u

tter

ance

s p

er m

inu

te (approx)

Number of Utterances10 20 30 40 50 60 70 80 90 100

Page 15: Characterizing and Processing  Robot-Directed Speech

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

children (cumulative %)

sin

gle

wo

rd u

tter

ance

s (%

)Single Word Utterances

10 20 30 40 50 60 70 80 90 100

Page 16: Characterizing and Processing  Robot-Directed Speech

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

children (cumulative %)

utt

eran

ces

wit

h e

nu

nci

atio

n (

%)

Utterances with Enunciation10 20 30 40 50 60 70 80 90 100

Page 17: Characterizing and Processing  Robot-Directed Speech

Cooperative Speech Exists

Some clean word-learning scenarios– “Tiffany”

Can we use them as leverage?– “My name is Tiffany”

What recognition rates are possible?

Page 18: Characterizing and Processing  Robot-Directed Speech

Study: Processing Speech

“LCSINFO” domain (Glass, Weinstein)

Provide some initial vocabulary

Build language model without transcripts

Page 19: Characterizing and Processing  Robot-Directed Speech

Language Model

wordn

word2

word1

Page 20: Characterizing and Processing  Robot-Directed Speech

OOV

Out Of Vocabulary Model

wordn

word2

word1

Page 21: Characterizing and Processing  Robot-Directed Speech

Out Of Vocabulary Model

OOVphonen

phone2

phone1

(Bazzi)

Page 22: Characterizing and Processing  Robot-Directed Speech

Hypothesized transcript

N-Best hypotheses

Run recognizer

Extract OOV fragments

Identify competition

Identify rarely-used additions

Remove from lexiconAdd to lexicon

Update lexicon,

baseforms

Update Language Model

Page 23: Characterizing and Processing  Robot-Directed Speech

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

coverage (%)

keyw

ord

erro

r rat

e (%

)Baseline performance Performance after clustering

K

eyw

ord

Err

or

Rat

e (%

)

coverage (%)

Page 24: Characterizing and Processing  Robot-Directed Speech

Qualitative Results

Given:1600 Utterances

“email, phone, room, office, address”

Finds:1. n ah m b er 6. p l iy z2. w eh r ih z 7. ae ng k y uw3. w ah t ih z 8. n ow4. t eh l m iy 9. hh aw ax b

aw5. k ix n y uw 10. g r uw p

Page 25: Characterizing and Processing  Robot-Directed Speech

Conclusions

What about acoustic models?

Idiosyncratic vocabulary is important

Pick a good name for your robot!

Page 26: Characterizing and Processing  Robot-Directed Speech

Acknowledgements

MIT Initiative on Technology and Self

MIT Spoken Language Systems group

DARPA

NTT