SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

23
Optimizing speech recognizer rejection thresholds Dan Burnett Director of Speech Technologies,Voxeo August 24, 2009

description

At SpeechTEK 2009 in New York on August 24, 2009, Dr. Daniel C Burnett, Director of Speech Technologies at Voxeo, spoke on optimizing speech recognizer rejection thresholds. Abstract: This session will explain ASR (automatic speech recognizer) confidence rejection thresholds: what they are, where they come from, and their criticality to your ASR-enabled IVR. We describe the steps necessary to optimize this important threshold value throughout your application, covering transcription, the importance of grammar coverage, and an explanation of terms such as the Equal Error Rate. This session is ideal for those ready to take their ASR-enabled IVR tuning to the next level.

Transcript of SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Page 1: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Optimizing speech recognizer rejection

thresholdsDan Burnett

Director of Speech Technologies, VoxeoAugust 24, 2009

Page 2: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Why this talk?

• Sometimes we forget the basics, which are:

• Recognizers are not perfect

• They can be optimized in a straightforward manner

• The simplest optimization is the rejection threshold

Page 3: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

The Goal

• End user goal: optimal experience

• Our Goal: determine user experience for each possible rejection threshold, then choose optimum threshold

• Must compare true classification of an audio sample against the ASR engine’s classification

Page 4: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

True classifications• Assume human-level recognition

• App should still distinguish (i.e. possibly behave differently) among the following cases:

Case Possible behaviorNo speech in audio sample

(nospeech)Mention that you didn’t hear anything and ask for repeat

Speech, but not intelligible (unintelligible) Ask for repeat

Intelligible speech, but not in app grammar

(out-of-grammar)Encourage in-grammar speech

Intelligible speech, and within app grammar (in-grammar) Respond to what person said

Page 5: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

ASR Engine Classifications

• Silence/nospeech (nospeech)

• Reject (rejected)

• Recognize (recognized)

Page 6: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Crossing these two . . .

nospeech rejected recognized

nospeechCorrect

classificationImproperly

rejected Incorrect

unintelligible Improperly treated as silence

Correct behavior

Assume incorrect

out-of-grammar

Improperly treated as silence

Correct behavior

Incorrect

in-grammarImproperly

treated as silenceImproperly

rejectedEither correct or incorrect

True

ASR

Page 7: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Crossing these two . . .

nospeech rejected recognized

nospeechCorrect

classificationImproperly

rejected Incorrect

unintelligible Improperly treated as silence

Correct behavior

Assume incorrect

out-of-grammar

Improperly treated as silence

Correct behavior

Incorrect

in-grammarImproperly

treated as silenceImproperly

rejectedEither correct or incorrect

True

ASRMisrecognitions

Page 8: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Crossing these two . . .

nospeech rejected recognized

nospeechCorrect

classificationImproperly

rejected Incorrect

unintelligible Improperly treated as silence

Correct behavior

Assume incorrect

out-of-grammar

Improperly treated as silence

Correct behavior

Incorrect

in-grammarImproperly

treated as silenceImproperly

rejectedEither correct or incorrect

True

ASR“Misrejections”

Page 9: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Crossing these two . . .

nospeech rejected recognized

nospeechCorrect

classificationImproperly

rejected Incorrect

unintelligible Improperly treated as silence

Correct behavior

Assume incorrect

out-of-grammar

Improperly treated as silence

Correct behavior

Incorrect

in-grammarImproperly

treated as silenceImproperly

rejectedEither correct or incorrect

True

ASR“Missilences”

Page 10: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Three types of errors

• Missilences -- called silence, but wasn’t

• Misrejections -- rejected inappropriately

• Misrecognitions -- recognized inappropriately or incorrectly

Page 11: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Three types of errors

• Missilences -- called silence, but wasn’t

• Misrejections -- rejected inappropriately

• Misrecognitions -- recognized inappropriately or incorrectly

So how do we evaluate these?

Page 12: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Evaluating errors

1. Evaluation data set

2. Try every rejection threshold value

3. Plot errors as function of threshold

4. Select optimal value for your app

Page 13: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

1. Evaluation data set(s)• Data selection

• Must be representative (“every nth call”)

• Ideally at least 100 recordings per grammar path for good confidence in results

• Transcription

• Goal is to compare against recognition results, so no punctuation, coughs, etc. needed in transcription itself (but good to have in separate comments)

Page 14: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

2. Try every rejection threshold value

• Run recognizer in batch mode with rejection threshold of 0 (i.e., no rejection)Remember to collect confidence scores!

• Then, for each threshold from 0 to 100

• Calculate number of misrecognitions, misrejections, and missilences

Page 15: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

3. Plot errors

“Missilences”

Misrecognitions

“Misrejections”

Equal ErrorRate

0 100Rejection Threshold

Page 16: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

3. Plot errors

Sum

MinimumTotal Error

0 100Rejection Threshold

Page 17: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

4. Select optimal value

Page 18: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

4. Select optimal value

• Equal-error-rate: not necessarily the optimum

Page 19: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

4. Select optimal value

• Equal-error-rate: not necessarily the optimum

• Minimum of the sum: good starting point, great for comparing across engines (on same data set only!!)

Page 20: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

4. Select optimal value

• Equal-error-rate: not necessarily the optimum

• Minimum of the sum: good starting point, great for comparing across engines (on same data set only!!)

• Optimal: depends on your app; some errors may be more critical than others

Page 21: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

4. Select optimal value

• Equal-error-rate: not necessarily the optimum

• Minimum of the sum: good starting point, great for comparing across engines (on same data set only!!)

• Optimal: depends on your app; some errors may be more critical than others

• Question: if missilences not affected by threshold, why did I include it?

Page 22: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Further optimizations

• Move OOG into IG category if semantically correct (“You bet” -> “yes”)

• Consider additional threshold for confirmation

• Optimize endpointer parameters (affects missilences and/or “too much speech”)

Page 23: SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Optimizing speech recognizer rejection

thresholdsDan Burnett

Director of Speech Technologies, VoxeoAugust 24, 2009