Beyond Usability: Measuring Speech Application Success
Silke Witt-Ehsani, PhDVP, VUI Design CenterTuVox
S P E E C H W I T H I N R E A C H
Outline
What is Success?Success Criteria
Success Metrics
Putting it all together:
A health check
methodology
Success vs DesignHow they effect each
other
Case studies
S P E E C H W I T H I N R E A C H
Success Criteria: i.e. What is “success”?
Common criteria:Are callers transferred to the correct destination?
How many callers are being helped?
How do callers like my speech applications?
What is the system recognition accuracy?
Different questions (Success Criteria) require different answers (Success Metrics)
How do we do that?
S P E E C H W I T H I N R E A C H
Success Metrics: Subjective vs Objective
SubjectiveUsability studyWhole call recordingsIndividual caller feedback
Objective = Application StatisticsAutomation ratesContainment ratesNon-cooperative caller rate
S P E E C H W I T H I N R E A C H
Success Metrics: Business vs Technical
Business Metrics for Business User:
• Routing Accuracy• Agent Transfers• Customer Satisfaction Technical Users:
• need detailed application performance on dialog state level
• grammar coverage• NoMatch, NoInput
• need ability to drill down
More Transfers out of application = higher call center cost
Higher Routing Accuracy = Less Agent-to-agent transfers
Business stakeholders care about the bottom line impact of several application and speech events
S P E E C H W I T H I N R E A C H
Common Business Metrics
Containment rate = “keep caller hostage in the system”
Automation rate = “offer complete functionality…”
Successful routing = “get the caller to the right expert”
Average call duration
And many, many more ….
S P E E C H W I T H I N R E A C H
Application Health Check - Business
3 main elements of a Business Health Check are
1. Custom defined success rate
2. Non co-operative Caller rate
3. Agent Transfer rate Transfer due to explicit caller request Transfer due to errors (both speech and system) Transfer by design (i.e. correctly routed calls)
S P E E C H W I T H I N R E A C H
Example Success Metric: Routing Accuracy
Definition:Confirmed routed calls (calls reaching an end destination) over all calls
Useful metric when using:Skills-based routing
Routing application with N routing points 68.3%
77%
% Routing Accuracy
~150 routing points
~ 50 routing points
4 routing points
85%
S P E E C H W I T H I N R E A C H
Example: Non Co-operative Callers
Possible reasons: Degree of caller acceptance of system
Non application related, such as wrong number, child crying etc.
Definition:Non-cooperative callers is the percentage of all callers that immediately hang-up or request an agent but never interact with the application
Expected range:
5-10% of call volume
6.3%
8.6%
% Non-cooperative Callers
Open-endedRouter
Directed DialogTechnical Support
S P E E C H W I T H I N R E A C H
Example: Agent Transfers
Applications tend to have many different types of agent transfers.Main categories:
Customer zero-ing outRouting to an agent based on caller information is a “Designed Transfer”Routing due to some logic in the application is a “Necessary Transfer”
Agent Transfers have immediately impact on call center cost
45%
4.7%
% Agent Requests
Definition:% Agent transfers of all calls
Example from a Telecommunications Company
S P E E C H W I T H I N R E A C H
Baseline and Trending
Numbers are relative, they only have meaning in a context
When defining success metrics,
1. create a baseline
2. then compare to that.
Potential Baselines:previous IVR touch-tone application
Go-live Performance
52%
66%
Customers finding speech easier or much easier than IVR
76%
Usability Go-live Tuning 1
S P E E C H W I T H I N R E A C H
Application Health check = Technical
Purpose of hotspot analysisIdentify areas where application is performing
sub-optimal
Hotspot analysis should be done for each dialog state
Important: Hotspot analysis gives the
“where” of issues, not the “why”!
S P E E C H W I T H I N R E A C H
Framework for Technical Health Check
TuVox Hotspot analysis = Integrated view of:Hang-up ( %H )
% Final NoInput ( %NI)
% Final NoMatch ( %NM)
Transfer Requests ( %TR )
State Exit Count =
# of calls * ( %H + %NI + %NM + %TR)
Rule of Thumb :
These numbers are a first order of approximation:Sort by highest state exit count
Review one by one in context, i.e. high hang-up because it is a logical end point
S P E E C H W I T H I N R E A C H
Hotspot Analysis Example
Prompt ID Prompt Text # Hits# 2nd No Match
# 2nd No Input
# User Hangup
Total Exit Number
STTransferTS#124Would you like to hear that website again 8993 181 205 6091 6477
STTransferSS#11
Please hold while I get someone who can help you. 2894 0 0 2894 2894
NTGetQueue#302 Please say yes or no. 21573 180 0 143 323
NTDisBilling#9
Which do you need help with a bill a service charge, a purchase or something else. 2211 217 25 81 323
NTFinder#301 Please say yes or no. 3711 121 0 102 223
Success Criteria and Design
S P E E C H W I T H I N R E A C H
Success and Design are tightly linked
Success Metric
Authentication
Look up all loans for this callers
Does caller has a line of credit?
no
yes
no
Loan Menu: Balance More loan details Make loan
payment
Caller selects from list of loans
Does caller have more than
1 loan?yes
Design
Success determines the design
Design influences success
S P E E C H W I T H I N R E A C H
Case Study 1: Airline application
Customer requirement: 64% Success
Success definition:“For 64% of the callers entering the application, their ticket reservation record has to be retrieved from the back-end
Design consequences:Ensure via prompting that callers have their record identifier number before entering the application
Make it hard to get to an agent, i.e. multiple retries
Explain what the record identifier was
0%
10%
20%
30%
40%
50%
60%
70%
80%
Go-live Tuning 1 Tuning 2Lo
ok-u
p Su
cces
s
Design tailored to success criteria but at the expense of ease of use and caller experience
S P E E C H W I T H I N R E A C H
Case Study 2: Travel Application
Impact on Application PerformanceTurn failure rate = Decreased by 39%
Opt-out rate to the call center = Decreased by 44%
0
5
10
15
20
25
Turn FailureRate
Opt-out Rate
Menu-style
Question style
Hotspot analysis identifies a too high number of exists at a main menu
Observation: One menu option is much more common than other 5 choicesOld Design: Menu with 6 optionsNew Design: Yes/no question followed by a menu
S P E E C H W I T H I N R E A C H
Case Study 3: HighTech Routing Application
3 success criteria:Average call handling less than 30 secs
High customer satisfaction
4 queues to route to, but many different call reasons
Influence of these criteria on the design:Only 1 reprompt instead to standard 2 attempts
No traditional error prompting a la ‘sorry I didn’t get that’
Natural language open ended prompting with high coverage grammar
S P E E C H W I T H I N R E A C H
Summary
Define Application Success Criteria
Based on that, define success metrics
Use trending and baseline to put data in context
Success Criteria and Design are highly interlinked, i.e. success criteria determine the design
The design influences how targeted success metrics can be met
Top Related