Voxeo Summit 2010: Best Practices in Speech Technology
-
Upload
voxeo-corp -
Category
Technology
-
view
1.573 -
download
0
description
Transcript of Voxeo Summit 2010: Best Practices in Speech Technology
State of the Art/Best Practices in Speech Technology Dan Burnett, Director of Speech Technologies
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Why speech?
2
Ma Ma
Vok Say Oh
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Speech is the natural human interface
15% of world population has a personal computer
Greater than 60% of world population has a mobile phone
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
What is communication?
You (Your speech-enabled IVR)
Your Customer
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Communication is natural?
249694
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
But for IVRs . . .
You (Your untuned speech-enabled IVR)
Your Customer
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
So why do we tune?
For better communication, which leads to
More satisfied customers
Shorter call durations
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
What can we tune?
Your untuned speech-enabled IVR
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
What can we tune?
Your untuned speech-enabled IVR
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
What we say – prompts
Goal: naturally reduce variability in caller's responses
Because: predictability simplifies grammars and increases recognition accuracy
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Prompt tuning
Vocabulary • Use the words your customers use • For sales, say “sales”; For billing, say “billing”; ... • Are you calling to learn more about our products, to
fix a problem with your bill, or …
Keep in mind • Speech allows your customer to describe things
THEIR way rather than to use your internal company description
• Make it easier for them to do that!
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Prompt tuning
Prompt specificity • General: “What would you like?” • More specific: “Which department would you like?” • Precise: “Would you like A, B, C, or something else?”
Keep in mind • The caller will often use the exact words YOU use
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Ever heard this before?
For Sales, press 1
For Billing, press 2
For option I can't remember, press 3
For another option I can't remember, press 4
For yet another option I can't remember, press 5
For more of the same, press 6
Blah blah, press 7
For help with this menu, press 8
To hear these options again, press 9
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Prompt tuning
Prompt length • Keep it short: less than a few sentences total, only
one of which asks for input • Or: provide pauses (at least one second long) for
interruption
Keep in mind • Speech communication is only natural if it's not
drawn out • Primacy and recency effects
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
What can we tune?
Your untuned speech-enabled IVR
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
What we listen for – grammars
Goal: Cover everything they are likely to say, and nothing more
Because: Accuracy in grammar coverage directly affects recognition accuracy
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Grammar tuning
Cover everything they say • Pre- and post- phrases such as please, I would like,
and thank you • Synonyms such as (for yes/no) yeah, sure, absolutely
not
Keep in mind • Recognizers can only hear it if it's in the grammar
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Grammar tuning
Include only what they say • Write grammars that don't overgenerate • If matching numbers/digits, only include valid strings
if at all possible
Keep in mind • Every unnecessary grammar phrase is a potential
misrecognition
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
What can we tune?
Your untuned speech-enabled IVR
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
How we listen – parameter optimization
Goal: Optimize recognizer parameter settings
Because: Better accuracy, of course!
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Parameter optimization – which parameters?
Rejection threshold
Endpointer settings (sensitivity)
Large grammar parameters
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Rejection threshold – what is it?
Misrecognitions
False Rejections
Rejection Threshold 0 100
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Rejection threshold – what is it?
Misrecognitions
False Rejections
Rejection Threshold 0 100
Cutoff value for the recognizer confidence below which the speaker's utterance will be rejected
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Rejection threshold – total error
Misrecognitions
False Rejections
Rejection Threshold 0 100
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Rejection threshold – comparison
Rejection Threshold 0 100
ASR Engine A
ASR Engine B
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Rejection threshold – comparison
Rejection Threshold 0 100
ASR Engine A
ASR Engine B
Optimal thresholds
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Rejection threshold – another comparison
Rejection Threshold 0 100
ASR Engine A
ASR Engine B
Optimal thresholds
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Parameter optimization
Rejection threshold • Generally largest impact on accuracy • Optimum varies across recognition engines • Optimum varies by set of active grammars
Keep in mind • Optimizing the rejection threshold is the SINGLE
MOST IMPORTANT parameter tuning you can do
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Endpointer sensitivity
You (Your hard-of-hearing speech-enabled IVR)
Your Customer
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Parameter optimization
Endpointer sensitivity • Second-largest impact on accuracy • Unnecessarily high and low sensitivity are both bad • Optimum should be set once, checked annually
Keep in mind • If the recognizer can't hear you, it can't understand
what you say
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Parameter optimization
Large grammar parameters • Typically need to be adjusted if grammar has more
than 5000 entries • Typically consumes more memory and/or CPU • Vary by ASR engine, so ask
Keep in mind • If your grammar has many options, your recognizer
needs to “think” more than the default settings usually allow
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
What can we tune?
Your untuned speech-enabled IVR
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
Summary – Keep in mind
Speech allows your customer to describe things THEIR way rather than to use your internal company description – make it easy for them!
The caller will often use the exact words YOU use
Speech communication is only natural if it's not drawn out
Recognizers can only hear it if it's in the grammar
Every unnecessary grammar phrase is a potential misrecognition
Optimizing the rejection threshold is the SINGLE MOST IMPORTANT parameter tuning you can do
If the recognizer can't hear you, it can't understand what you say
If your grammar has many options, your recognizer needs to “think” more than the default settings usually allow
© Voxeo Corporation © Voxeo Corporation © Voxeo Corporation
For help
34
State of the Art/Best Practices in Speech Technology Dan Burnett, Director of Speech [email protected]