A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces
description
Transcript of A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces
A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces
Dan Bohuswww.cs.cmu.edu/[email protected]
Computer Science DepartmentCarnegie Mellon UniversityPittsburgh, PA, 15217
2
problem
spoken language interfaces lack robustness when faced with understanding errors.
3
more concretely …
S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]S: Sorry I’m not sure I understood what you said. What city are you leaving from ?U: Urbana Champaign [FOR MINUTE SINCE HEY]S: I’m still having trouble understanding you. To go on, I need you to answer the
following question. What city are you leaving from?U: Chicago [CHICAGO]S: traveling from Chicago. Where would you like to go?U: Huntsville [SEOUL]S: traveling to Seoul. What day did you need to travel?U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M]S: traveling in the afternoon. Okay what day would you be departing chicagoU: /uh/ the tenth of august [AT THE TENTH OF AUGUST]S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm,
arrives Seoul at 5pm the next day. Do you want to take that?U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND]S: traveling on Saturday, August 12th … I have a flight departing Chicago at
1:40pm arrives Seoul at ………
4
problem source
stems mostly from speech recognition spans most domains and interaction types exacerbated by operating conditions
spontaneous speech medium / large vocabularies large, varied, and changing user populations
5
speech recognition impact
typical word-error-rates 10-20% for natives (novice users) 40% and above for non-native users
significant negative impact on performance[Walker, Sanders]
word-error-rate
task success
6
approaches for increasing robustness
gracefully handle errors through interaction
fix recognition
detect the problems develop a set of recovery strategies know how to choose between them (policy)
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
7
outline
a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
8
MISunderstanding
non- and misunderstandings
NONunderstanding
S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]S: Sorry I’m not sure I understood what you said. What city are you leaving from ?U: Urbana Champaign [FOR MINUTE SINCE HEY]S: I’m still having trouble understanding you. To go on, I need you to answer the
following question. What city are you leaving from?U: Chicago [CHICAGO]S: traveling from Chicago. Where would you like to go?U: Huntsville [SEOUL]S: traveling to Seoul. What day did you need to travel?U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M]S: traveling in the afternoon. Okay what day would you be departing chicagoU: /uh/ the tenth of august [AT THE TENTH OF AUGUST]S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm,
arrives Seoul at 5pm the next day. Do you want to take that?U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND]S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm
arrives Seoul at ………
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
9
six not-so-easy pieces
detection
strategies
policy
misunderstandings non-understandings
recognition or semanticconfidence scores
typically trivial[some exceptions may apply]
explicit confirmationDid you say 10am?
implicit confirmationStarting at 10am… until what time?
accept, reject
Sorry, I didn’t catch that …Can you repeat that?
Can you rephrase that?You can say something like “at 10 a.m.”
[MoveOn]
Handcrafted heuristicsfirst notify, then ask repeat, then give help,
then give up
confidence threshold model
reject accept
explicit implicit0 1
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
10
outline
a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
11
spoken dialog system architecture
LanguageUnderstanding
Dialog Manager
DomainBack-end
LanguageGeneration
SpeechRecognition
SpeechSynthesis
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
12
reinforcement learning in dialog systems
LanguageUnderstanding
Dialog Manager
DomainBack-end
LanguageGeneration
SpeechRecognition
SpeechSynthesis
noisysemantic input
actions(semantic output)
debate over design choices
learn choices using reinforcement learning
agent interacting with an environment
noisy inputs temporal / sequential aspect task success / failure
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
13
NJFun
“Optimizing Dialog Management with Reinforcement Learning: Experiments with the NJFun System”
[Singh, Litman, Kearns, Walker]
provides information about “fun things to do in New Jersey”
slot-filling dialog type-of-activity location time
provide information from a database
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
14
NJFun as an MDP
define state-space define action-space define reward structure collect data for training & learn policy evaluate learned policy
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
15
NJFun as an MDP: state-space
internal system state: 14 variables state for RL → vector of 7 variables
greet: has the system greeted the user attribute: which attribute the system is currently querying confidence: recognition confidence level (binned) value: value has been obtained for current attribute tries: how many times the current attribute was asked grammar: non-restrictive or restrictive grammar was used history: was there any trouble on previous attributes
62 different states
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
16
NJFun as an MDP: actions & rewards
type of initiative (3 types) system initiative mixed initiative user initiative
confirmation strategy (2 types) explicit confirmation no confirmation
resulting MDP has only 2 action choices / state
reward: binary task success
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
17
NJFun as an MDP: learning a policy
training data: 311 complete dialogs collected using exploratory policy
learned the policy using value iteration begin with user initiative back-off to mixed or system initiative when re-asking for an
attribute specific type of back-off is different for different attributes confirm when confidence is low
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
18
NJFun as an MDP: evaluation
evaluated policy on 124 testing dialogs
task success rate: 52% → 64% weak task completion: 1.72 → 2.18 subjective evaluation: no significant
improvements, but move-to-the-mean effect
learned policy better than hand-crafted policies comparatively evaluated policies on learned MDP
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
19
outline
a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
20
challenge 1: scalability
contrast NJFun with RoomLine conference room reservation and scheduling mixed-initiative task-oriented interaction
system obtains list or rooms matching initial constraints system negotiates with user to identify room that best matches their
needs 37 concepts (slots), 25 questions that can be asked
another example: LARRI
full-blown MDP is intractable not clear how to do state-abstraction
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
21
challenge 2: reusability
underlying MDP is system-specific MDP design still requires a lot of human expertise new MDP for each system new training & new evaluation are we really saving time & expertise?
maybe we’re asking for too much?
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
22
addressing the scalability problem
approach 1: user models / simulations costly to obtain real data → simulate simplistic simulators [Eckert, Levin] more complex, task-specific simulators [Scheffler & Young]
real-world evaluation becomes paramount
approach 2: value function approximation data-driven state abstraction / state aggregation [Denecke]
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
23
outline
a closer look at the problem RL in spoken dialog systems current challenges a proposed RL approach for error handling
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
24
reinforcement learning in dialog systems
LanguageUnderstanding
Dialog Manager
DomainBack-end
LanguageGeneration
SpeechRecognition
SpeechSynthesis
semantic input
actions / semantic output
Focus RL only on the difficult decisions!
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
25
task-decoupled approach
decouple
error handling decisions
domain-specific dialog control decisions
use reinforcement learning
use your favorite DM framework
advantages reduces the size of the learning problem favors reusability of learned policies lessens system authoring effort
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
26
RavenClaw
Dialogue Task (Specification)
Domain-Independent Dialogue Engine
RoomLine
Login
Welcome
AskRegistered AskName
GreetUser
GetQuery
DateTime Location Properties
Network Projector Whiteboard
GetResults DiscussResults
user_nameregistered
query
results
RoomLine
Login
AskRegistered
Dialogue Stack
registered: [No]-> false, [Yes] -> true
registered: [No]-> false, [Yes] -> trueregistered: [No]-> false, [Yes] -> trueuser_name: [UserName]
registered: [No]-> false, [Yes] -> trueregistered: [No]-> false, [Yes] -> trueuser_name: [UserName]user_name: [UserName]query.date_time: [DateTime]query.location: [Location]query.network: [Network]
Expectation Agenda
Error HandlingDecision Process
Strategies
ErrorIndicators
ExplicitConfirm
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
27
decision process architecture
RoomLine
Login
Welcome
AskRegistered AskName
GreetUser
user_nameregistered
GatingMechanism
Concept-MDP Concept-MDP
Topic-MDP
Topic-MDP
Small-size models Parameters can be tied across
models Accommodate dynamic task
generation
Favors reusability of policies Initial policies can be easily
handcrafted
No Action
Explicit Confirm
No Action
No Action
ExplicitConfirmation
Independence assumption
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
28
reward structure & learning
Gating Mechanism
MDP MDP MDP
Action
Global, post-gate rewardsReward
Gating Mechanism
MDP MDP MDP
Action
Local rewards
Reward Reward Reward
Rewards based on any dialogue performance metric
Atypical, multi-agent reinforcement learning setting
Multiple, standard RL problems
Risk solving local problems, but not the global one
a closer look : RL in spoken dialog systems : current challenges : RL for error handling
29
conclusion
reinforcement learning – very appealing approach for dialog control
in practical systems, scalability is a big issue how to leverage knowledge we have?
state-space design solutions that account or handle sparse data
bounds on policies hierarchical models
30
thankyou!
31
Structure of Individual MDPs
HC
ExplConf
ImplConf
NoAct
LC
ExplConf
ImplConf
NoAct
MC
ExplConf
ImplConf
NoAct
0NoAct
Concept MDPs State-space: belief indicators Action-space: concept scoped system actions
Topic MDPs State-space: non-understanding, dialogue-on-track indicators Action-space: non-understanding actions, topic-level actions