SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Optimizing speech recognizer rejection

thresholdsDan Burnett

Director of Speech Technologies, VoxeoAugust 24, 2009

Why this talk?

• Sometimes we forget the basics, which are:

• Recognizers are not perfect

• They can be optimized in a straightforward manner

• The simplest optimization is the rejection threshold

The Goal

• End user goal: optimal experience

• Our Goal: determine user experience for each possible rejection threshold, then choose optimum threshold

• Must compare true classification of an audio sample against the ASR engine’s classification

True classifications• Assume human-level recognition

• App should still distinguish (i.e. possibly behave differently) among the following cases:

Case Possible behaviorNo speech in audio sample

(nospeech)Mention that you didn’t hear anything and ask for repeat

Speech, but not intelligible (unintelligible) Ask for repeat

Intelligible speech, but not in app grammar

(out-of-grammar)Encourage in-grammar speech

Intelligible speech, and within app grammar (in-grammar) Respond to what person said

ASR Engine Classifications

• Silence/nospeech (nospeech)

• Reject (rejected)

• Recognize (recognized)

Crossing these two . . .

nospeech rejected recognized

nospeechCorrect

classificationImproperly

rejected Incorrect

unintelligible Improperly treated as silence

Correct behavior

Assume incorrect

out-of-grammar

Improperly treated as silence

Correct behavior

Incorrect

in-grammarImproperly

treated as silenceImproperly

rejectedEither correct or incorrect

nospeechCorrect

rejected Incorrect

Correct behavior

Assume incorrect

out-of-grammar

Correct behavior

Incorrect

ASRMisrecognitions

nospeechCorrect

rejected Incorrect

Correct behavior

Assume incorrect

out-of-grammar

Correct behavior

Incorrect

ASR“Misrejections”

nospeechCorrect

rejected Incorrect

Correct behavior

Assume incorrect

out-of-grammar

Correct behavior

Incorrect

ASR“Missilences”

Three types of errors

• Missilences -- called silence, but wasn’t

• Misrejections -- rejected inappropriately

• Misrecognitions -- recognized inappropriately or incorrectly

Three types of errors

• Missilences -- called silence, but wasn’t

• Misrejections -- rejected inappropriately

• Misrecognitions -- recognized inappropriately or incorrectly

So how do we evaluate these?

Evaluating errors

1. Evaluation data set

2. Try every rejection threshold value

3. Plot errors as function of threshold

4. Select optimal value for your app

1. Evaluation data set(s)• Data selection

• Must be representative (“every nth call”)

• Ideally at least 100 recordings per grammar path for good confidence in results

• Transcription

• Goal is to compare against recognition results, so no punctuation, coughs, etc. needed in transcription itself (but good to have in separate comments)

2. Try every rejection threshold value

• Run recognizer in batch mode with rejection threshold of 0 (i.e., no rejection)Remember to collect confidence scores!

• Then, for each threshold from 0 to 100

• Calculate number of misrecognitions, misrejections, and missilences

3. Plot errors

“Missilences”

Misrecognitions

“Misrejections”

Equal ErrorRate

0 100Rejection Threshold

3. Plot errors

MinimumTotal Error

0 100Rejection Threshold

4. Select optimal value

• Equal-error-rate: not necessarily the optimum

• Minimum of the sum: good starting point, great for comparing across engines (on same data set only!!)

• Optimal: depends on your app; some errors may be more critical than others

• Question: if missilences not affected by threshold, why did I include it?

Further optimizations

• Move OOG into IG category if semantically correct (“You bet” -> “yes”)

• Consider additional threshold for confirmation

• Optimize endpointer parameters (affects missilences and/or “too much speech”)

Optimizing speech recognizer rejection

thresholdsDan Burnett

Director of Speech Technologies, VoxeoAugust 24, 2009

SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Technology

Transcript of SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Sudoku Downloader and Recognizer Autor: Pedro Evaristo González Sánchez.

A Microprocessor based Speech Recognizer for Isolated ...

Nexidia Confidential “Searching Audio and Video Sources On the Web” SpeechTEK West 2007.

SpeechTEK University Outtakes 2014: Zero Out Strategies

Improving Analytics with Machine and Deep Learning - Jeh Daruvala - SpeechTek - 10 Aug 2015

An Interactive Mathematical Handwriting Recognizer for the …watt/home/students/theses/BWan2002-msc.pdf · 2011-07-06 · An Interactive Mathematical Handwriting Recognizer for the

Comparative ASR Evaluation - Voxeo - SpeechTEK NY 2010

Naive Bayes Named Entity Recognizer

Synchronous Sequential Circuits: Design Procedure and …liacs.leidenuniv.nl/~stefanovtp/courses/DITE/lectures/DITE10.pdf · Design Example1: Sequence Recognizer Sequence Recognizer

Speechtek West 2007 1 Luisa Cordano 1 February 21, 2007 Speechtek West 2007 Text-to-Speech for the less fortunate: from talking cellular phones to augmentative.

Binary Context-Sensitive Recognizer (BCSR)

Service Design and the Omnichannel Experience - SpeechTEK 2015

AN FFICIENT HOLY QURAN RECITATION RECOGNIZER BASED ON …

RESIDENTIAL THRESHOLDS - …residentialthreshold.com/images/PemkoCatalog.pdfRESIDENTIAL THRESHOLDS CONTENTS: Interlocking Thresholds 130 Threshold Caps 130

Effect of Motion-Gesture Recognizer Error Pattern on User ...

An Improved Context-Free Recognizer - University of Maltastaff.um.edu.mt/mros1/SHELF/graham_harrison_ruzzo_1980.pdf · An Improved Context-Free Recognizer • 417 important, it is

DOMAIN PARKING RECOGNIZER: AN EXPERIMENTAL STUDY ON WEB

Sign language recognizer

SpTEKWest Final Program - SpeechTEK

Building A Highly Accurate Mandarin Speech Recognizer