Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information...
-
Upload
joanna-rosaline-barker -
Category
Documents
-
view
214 -
download
0
Transcript of Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information...
Still Talking to MachinesStill Talking to Machines
(Cognitively Speaking)(Cognitively Speaking)
Machine Intelligence LaboratoryMachine Intelligence LaboratoryInformation Engineering DivisionInformation Engineering Division
Cambridge University Engineering DepartmentCambridge University Engineering DepartmentCambridge, UKCambridge, UK
Steve YoungSteve Young
2
Interspeech Plenary September 2010 © Steve Young
Outline of TalkOutline of Talk
A brief historical perspective
Cognitive User Interfaces
Statistical Dialogue Modelling
Scaling to the Real World
System Architecture
Some Examples and Results
Conclusions and future work.
3
Interspeech Plenary September 2010 © Steve Young
Why Talk to Machines?Why Talk to Machines?
it should be an easy and efficient way of finding out information and controlling behaviour
sometimes it is the only way
hands-busy eg surgeon, driver, package handler, etc.
no internet and no call-centres e.g. areas of 3rd world
very small devices
one day it might be func.f. Project Natal - Milo
4
Interspeech Plenary September 2010 © Steve Young
VODIS - circa 1985VODIS - circa 1985
Natural language/mixed initiative Train-timetable Inquiry Service150 word DTW connected speech recognition
Frame-based
DialogueManager
Frame-based
DialogueManager
DecTalkSynthesiser
DecTalkSynthesiser
RecognitionGrammars
PDP11/458 x 8086
Processors
128k Mem/2x5Mb Disk
Words
Text
Demo
Logos Speech
Recogniser
Logos Speech
Recogniser
Collaboration between BT, Logica and Cambridge U.
5
Interspeech Plenary September 2010 © Steve Young
Some desirable properties of a Spoken Dialogue SystemSome desirable properties of a Spoken Dialogue System
able to support reasoning and inference
interpret noisy inputs and resolve ambiguities in context
able to plan under uncertainty
clearly defined communicative goals performance quantified as rewards plans optimized to maximize rewards
able to adapt on-line
robust to speaker (accent, vocab, behaviour,..) robust to environment (noise, location, ..)
able to learn from experience
progressively optimize models and plans over time
CognitiveUser
Interface
CognitiveUser
Interface
S. Young (2010). "Cognitive User Interfaces." Signal Processing Magazine 27(3)
6
Interspeech Plenary September 2010 © Steve Young
Essential Ingredients of a Cognitive User Interface (CUI)Essential Ingredients of a Cognitive User Interface (CUI)
• Explicit representation of uncertainty using a probability model over dialogue states e.g. using Bayesian networks
• Inputs regarded as observations used to update the posterior state probabilities via inference
• Responses defined by plans which map internal states to actions
• The system’s design objectives defined by rewards associated with specific state/action pairs
• Plans optimized via reinforcement learning
• Model parameters estimated via supervised learning and/or optimized via reinforcement learning
Partially ObservableMarkov Decision Process
(POMDP)
Partially ObservableMarkov Decision Process
(POMDP)
7
Interspeech Plenary September 2010 © Steve Young
A Framework for Statistical Dialogue ManagementA Framework for Statistical Dialogue Management
DistributionParameters λ Distribution
Parameters λ
ModelDistributionof Dialogue
States st
ModelDistributionof Dialogue
States st
Policyπ(at|bt,θ)
Policyπ(at|bt,θ)
SpeechUnderstanding
SpeechUnderstanding
bt = P(st|ot-1,bt-1; λ )Belief
ot
PolicyParameters θ
PolicyParameters θ
Observation
ResponseGenerationResponseGeneration
Action at
User
RewardFunction rReward
Function r
R = r(bt,at)t
Σ
bt,at
Reward
8
Interspeech Plenary September 2010 © Steve Young
Belief Tracking aka Belief MonitoringBelief Tracking aka Belief Monitoring
Belief is updated following each new user input
However, the state space is huge and the above equation is intractable for practical systems. So we approximate:
Track just the N most likely statesHidden InformationState System (HIS)
Factorise the state space andignore all but major conditionaldependencies
Graphical ModelSystem (GMS aka BUDS)
S. Young (2010). "The Hidden Information State Model" Computer Speech and Language 24(2)
B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)
9
Interspeech Plenary September 2010 © Steve Young
Dialogue StateDialogue State
otype
gtype
utype
htype
Goal
UserAct
History
Observationat time t
UserBehaviour
Recognition/Understanding
Errors
Tourist Information Domain•type = bar,restaurant •food = French, Chinese, none
Memory ofood
gfood
ufood
hfood
Nex
t T
ime
Slic
e t+
1
J. Williams (2007). ”POMDPs for Spoken Dialog Systems." Computer Speech and Language 21(2)
10
Interspeech Plenary September 2010 © Steve Young
(ignoring history nodes for simplicity)
Dialogue Model ParametersDialogue Model Parameters
otype
gtype
utype
ofood
gfood
ufood
otype
gtype
utype
ofood
gfood
ufood
p(u|g) g French Chinese None
French 0.7 0 0
Chinese 0 0.7 0
NoMention 0.3 0.3 1.0
p(o|u) u French Chinese NoMention
French 0.8 0.2 0
Chinese 0.2 0.8 0
NoMention 0 0 1.0
time t time t+1
11
Interspeech Plenary September 2010 © Steve Young
Belief Monitoring (Tracking)Belief Monitoring (Tracking)
B R F C - B R F C -
otype
gtype
utype
ofood
gfood
ufood
otype
gtype
utype
ofood
gfood
ufood
t=1 t=2
inform(food=french) {0.9} confirm(food=french) affirm() {0.9}
12
Interspeech Plenary September 2010 © Steve Young
Belief Monitoring (Tracking)Belief Monitoring (Tracking)
B R F C - B R F C -
otype
gtype
utype
ofood
gfood
ufood
otype
gtype
utype
ofood
gfood
ufood
t=1 t=2
inform(type=bar, food=french) {0.6}inform(type=restaurant, food=french) {0.3}
confirm(type=restaurant, food=french)
affirm() {0.9}
13
Interspeech Plenary September 2010 © Steve Young
Belief Monitoring (Tracking)Belief Monitoring (Tracking)
B R F C - B R F C -
otype
gtype
utype
ofood
gfood
ufood
otype
gtype
utype
ofood
gfood
ufood
t=1 t=2
inform(type=bar) {0.4} select(type=bar, type=restaurant)
inform(type=bar) {0.4}
14
Interspeech Plenary September 2010 © Steve Young
Choosing the next action – the PolicyChoosing the next action – the Policy
gtype gfood
B R F C -
inform(type=bar) {0.4}select(type=bar, type=restaurant)
0 0 0 1 0 1 0 0
type food
Quantize
001010
001010
001010
All Possible SummaryActions: inform, select,
confirm, etc
000000
000000
000000
000000
000000
000000
000000
000000
000000
Policy Vector
Sample
a = select
Map
15
Interspeech Plenary September 2010 © Steve Young
Policy OptimizationPolicy Optimization
Policy parameters chosen to maximize expected reward
Natural gradient ascent works well
Gradient is estimated by sampling dialogues and in practiceFisher Information Matrix does not need to be explicitly computed.
FisherInformation
Matrix
This is the Natural Actor Critic Algorithm.
J. Peters and S. Schaal (2008). "Natural Actor-Critic." Neurocomputing 71(7-9)
16
Interspeech Plenary September 2010 © Steve Young
Dialogue Model Parameter OptimizationDialogue Model Parameter Optimization
Approximation of belief distribution via feature vectors prevents policy differentiation wrt Dialogue Model parameters .
This is the Natural Belief Critic Algorithm.
However a trick can be used. Assume that are drawn from a prior which is differentiable wrt . Then optimize reward wrt to and sample to get .
It is also possible to do maximum likelihood model parameter estimation using Expectation Propagation.
F. Jurcicek (2010). "Natural Belief-Critic" Interspeech 2010
B. Thomson (2010). "Parameter learning for POMDP spoken dialogue models. SLT 2010
17
Interspeech Plenary September 2010 © Steve Young
Performance Comparison in Simulated TownInfo DomainPerformance Comparison in Simulated TownInfo Domain
Handcrafted Modeland Handcrafted Policy
Handcrafted Modeland Trained Policy
Trained Modeland Trained Policy
Handcrafted Policy and Trained Model
Reward = 100 for success – 1 for each turn taken
Reward = 100 for success – 1 for each turn taken
18
Interspeech Plenary September 2010 © Steve Young
Scaling up to Real World ProblemsScaling up to Real World Problems
compact representation of dialogue state eg HIS, BUDS
mapping belief states into summary states via quantisation, feature vectors, etc
mapping actions in summary space back into full space
Several of the key ideas have already been covered
But inference itself is also a problem …
19
Interspeech Plenary September 2010 © Steve Young
CamInfo OntologyCamInfo Ontology
ComplexDialogue
State
• Many concepts• Many values
per conceptmultiple
nodes per concept
20
Interspeech Plenary September 2010 © Steve Young
Belief Propagation TimesBelief Propagation Times
Network Branching Factor
Time
LBP withGrouping
StandardLBP
LBP withGrouping &Const Prob of Change
B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)
21
Interspeech Plenary September 2010 © Steve Young
Architecture of the Cambridge Statistical SDSArchitecture of the Cambridge Statistical SDS
DialogueManager
HISor
BUDS
DialogueManager
HISor
BUDS
p(v|y)
dialogacts
a
dialogacts
SpeechRecognition
SpeechRecognition
SemanticDecoder
SemanticDecoder
SpeechSynthesiser
SpeechSynthesiser
MessageGeneratorMessageGenerator
y
speech
p(w|y)
words
speech
p(m|a)
words
p(x|a)
Corpus DataCorpus Data
Run-time mode
22
Interspeech Plenary September 2010 © Steve Young
Architecture of the Cambridge Statistical SDSArchitecture of the Cambridge Statistical SDS
DialogueManager
HISor
BUDS
DialogueManager
HISor
BUDS
p(v|y)
dialogacts
a
dialogacts
UserSimulator
UserSimulator
ErrorModelErrorModel
Corpus DataCorpus Data
Training mode
23
Interspeech Plenary September 2010 © Steve Young
CMU Let’s Go Spoken Dialogue ChallengeCMU Let’s Go Spoken Dialogue Challenge
Organised by the Dialog Research Center, CMUSee http://www.dialrc.org/sdc/
• Telephone-based spoken dialog system to provide bus schedule information for the City of Pittsburgh, PA (USA).
• Based on existing system with real users.
• Two stage evaluation process
1. Control Test with recruited subjects given specific known tasks
2. Live Test with competing implementations switched according to a daily schedule
• Full results to be presented at a special session at SLT
24
Interspeech Plenary September 2010 © Steve Young
All Qualifying Systems
Let’s Go 2010 Control Test ResultsLet’s Go 2010 Control Test Results
Word Error Rate (WER)
Average Success = 64.8%Average WER = 42.4%
System Z89% Success33% WER
System X65% Success42% WER
System Y75% Success34% WER
PredictedSuccessRate
B. Thomson "Bayesian Update of State for the Let's Go Spoken Dialogue Challenge.” SLT 2010.
25
Interspeech Plenary September 2010 © Steve Young
CamInfo DemoCamInfo Demo
26
Interspeech Plenary September 2010 © Steve Young
ConclusionsConclusions
End-end statistical dialogue systems can be built and are competitive
Core is a POMDP-based dialogue manager which provides an explicit representation of uncertainty with the following benefits
o robust to recognition errors
o objective measure of goodness via reward function
o ability to optimize performance against objectives
o reduced development costs – no hand-tuning, no complex design processes, easily ported to new applications
o natural dialogue – say anything, any time
Still much to do
o faster learning, off-policy learning, long term adaptation, dynamic ontologies, multi-modal input/output
Perhaps talking to machines is within reach ….
27
Interspeech Plenary September 2010 © Steve Young
CreditsCredits
EU FP7 Project: Computational Learning in Adaptive Systems for Spoken Conversation
Spoken Dialogue Management using Partially Observable Markov Decision Processes
Past and Present Members of the CUED Dialogue Systems Group
Milica Gasic, Filip Jurcicek, Simon Keizer, Fabrice Lefevre,
Francois Mairesse, Jorge Prombonas, Jost Schatzmann,
Matt Stuttle, Blaise Thomson, Karl Weilhammer, Jason Williams,
Hui Ye, Kai Yu