Introduction to statistical spoken dialogue systemsMachine learning in spoken dialogue systems Data...

Introduction to statistical spoken dialoguesystems

Milica Gasic

Dialogue Systems Group, Cambridge University Engineering Department

October 28, 2016

1 / 32

In this lecture...

Architecture of a spoken dialogue system

Turn-taking in dialogue

Dialogue acts

Modular architecture of a dialogue system

2 / 32

What is a spoken dialogue system?

I A spoken dialogue system is a computer system that enableshuman computer interaction where primary input is speech.

I Speech does not need to be the only input. We can interactwith machines also using touch, gesture or facial expressionsand these are multi-modal dialogue systems.

3 / 32

Examples from popular culture

4 / 32

Dialogue and AI

I Turing test: Are we talking to a machine or a human?

I Chat-bots Eliza

I Goal-oriented dialogue - there is a goal or several goals thatmust be fulfilled during conversation Medical Bayesian Kiosk

5 / 32

Personal assistants

I Most commonly used dialogue systems are personal assistantssuch as Siri, Cortana, Google Now and Alexa

I these are server-based accessed via a range of devices:smart-phones, tablets, laptops, watches and specialist devicessuch as Amazon Echo (Alexa).

6 / 32

Properties

What constitutes a spoken dialogue system?

I Being able to understand the user

I Being able to decide what to say back

I Being able to conduct a conversation beyond simple voicecommands or question answering

7 / 32

Limited domain spoken dialogue systems

ontology a database that defines properties of entities that adialogue system can talk about

system-initiative vs user-initiative who takes the initiative in thedialogue:

I System: Hello. Please tell me your date of birth using the sixdigit format.

I System: Hello, how may I help you?

8 / 32

Turn-taking in dialogue – Who speaks when?

Dialogue can be described in terms of system and user turns

I System: How may I help you?

I User: I’m looking for a restaurant

I System: What kind of food would you like?

Turn taking can be more complex and characterised by

barge-ins System: How may I... User: I’m looking for arestaurant

back channels User: I’m looking for a restaurant [System: uhuh] inthe centre of town

9 / 32

Turn-taking in dialogue – Multi-party dialogue

I Dialogue system can be built to operate with multiple usersand also be situated in space.

I Example: Robot giving directions

I In this case a complex attention mechanism is needed todetermine who speaks when. For that both spoken and visualinput can be used.

10 / 32

Dialogue acts

One simple dialogue act formalism would consist of

dialogue act type - encodes the system or the user intention in a(part of) dialogue turn

semantic slots and values - further describe entities from theontology that a dialogue turn refers to

inform ( price = cheap, area = centre)

Is there um maybe a cheap place in the centre of town please?

dialogue act type semantics slots and values

11 / 32

Dialogue acts

Dialogue act formalism describes meaning encoded in eachdialogue turn [Traum, 2000].

I Relation to ontology

I Encode intention of the speaker

I Relation to logic

I Context

I Partial information from ASR (primitive dialogue acts)

12 / 32

Traditional approach to dialogue systems – Call flow

restaurant hotel

Chinese three

center

13 / 32

Node processing in a call flow

Conf. score

yes high

14 / 32

Part of a deployed call-flow [Paek and Pieraccini, 2008]

15 / 32

Problems

What breaks dialogue systems?

I Speech recognition errors

I Not keeping track of what happened previously

I Need to hand-craft a large number of rules

I Poor decisions

I User’s request is not supported

16 / 32

Modular architecture of a dialogue system

Speech recognition

Semantic decoding

Dialogue management

Natural language generationSpeech synthesis

Ontologywaveform text dialogue acts

I’m looking for a restaurant inform(type=restaurant)

request(food)What kind of food do you have in mind?

17 / 32

Machine learning in spoken dialogue systems

I Dialogues

I Labelled userintents

I Transcribedspeech

I Regression

I Classification

I Markovdecisionprocess

I Neuralnetworks

Predictions

I What the userwants

I What is thebest response

I How toformulate theresponse

18 / 32

Architecture of a statistical dialogue system

Speech recognition

Semantic decoding

Dialogue management

Ontologywaveform distribution overtext hypotheses

distribution over dialogue acts

19 / 32

Automatic speech recognition for dialogue

Provide alternative recognition result

I N-best list

I Confusion network

I Lattice

am looking forI

an inexpensive

a expensive

Figure 1: Confusion network

20 / 32

Automatic speech recognition for dialogue systems

Recognise when the user has started speaking

I Key-word spotter running on a smartphone - alwayslistening [Chen et al., 2015]

I Requirements: low memory footprint, low computational costand high precision

Recognise when the user has stopped speaking

I This is studied in the broad context of voice activity detection

21 / 32

Acoustic modelling for dialogue systems

I Spoken dialogue systems are meant to be used everywhere:busy street, noisy car

I Advantage: the conversation spans over several turns so it ispossible to perform adaptation in the first turn to improvefuture interactions

I Advantage: the same speaker through-out the dialogue

22 / 32

Language modelling for dialogue systems

I The vocabulary in limited domain dialogue systems is small sothe language model can be trained with in-domain data

I A general purpose language model can be combined within-domain language model to provide better recognitionresults and also deal with out-of-domain requests.

23 / 32

Semantic decoding for dialogue systems

I Extract meaning from user utterance

I Do they serve Korean food

I Can you repeat that please

I Hi I want to find an Italianrestaurant

I I want a different restaurant

I confirm(food=Korean)

I repeat()

I hello(type=restaurant,food=Italian)

I reqalts()

24 / 32

Statistical semantic decoding

Italian restaurant please

<tagged-food-value> restaurant please

<tagged-food-value> restaurant

restaurant please

Delexicalise

Count N-grams

1please100

classifier for food=<tagged-food-value>

classifier for dialog act

classifier for area=<tagged-area-value>

classifier for price=<tagged-price-value>

inform()

inform(food=Italian)

request() 0.1

request()

Query SVMs Produce valid dialogue acts

and renormalise distributions

0.850.15

Ontology

25 / 32

Dialogue management

I Maintain belief about what user said: dialogue states/userintent

I Choose the best answer

inform(type=restaurant, food=Thai) R

type food

request(area)I’m looking for a Thai restaurant.

semantic input states /user intent

actions /system responce

26 / 32

Belief tracking and policy optimisation

Speech recognition

Semantic decoding

Ontologywaveform distribution overtext hypotheses

distribution over dialogue acts

Belief tracking

Policy optimisation

Supervisedlearning

Reinforcementlearning

27 / 32

Reinforcement learning for dialogue

Dialogue manager

observation ot

belief state bt

reward rt

action at

Q-function: Qπ(b, a) = Eπ

rk |bt = b, at = a

)Policy: π(b) = a

28 / 32

Natural language generation for dialogue systems

I Generate semantic representation of system action intonatural text

I request(area)

I select(pricerange=expensive,pricerange=cheap)

I confirm(food=Korean)

I inform(name=”LittleSeoul”, food=Korean,area=centre)

I What part of town are youlooking for?

I Are you looking forsomething cheap orexpensive?

I Did you say Korean food?

I Little Seoul is a nice Koreanrestaurant in the centre.

29 / 32

Speech synthesis for dialogue systems

I In a dialogue system the context is available from the dialoguemanager.

I Text-to-speech system can make use of the context toproduce more natural and expressive speech [Yu et al., 2010].

30 / 32

Summary

I Goal directed limited domain dialogue systems

I Turn taking mechanism between system and user

I Dialogue act formalism for conveying meaning

I Limitations of traditional approach

I Architecture of statistical dialogue systems

I Role of each component in a spoken dialogue system

31 / 32

References

Chen, G., Parada, C., and Sainath, T. (2015).Query-by-example keyword spotting using long short-termmemory networks.In Acoustics, Speech and Signal Processing (ICASSP), 2015IEEE International Conference on, pages 5236–5240.

Paek, T. and Pieraccini, R. (2008).Automating spoken dialogue management design usingmachine learning: An industry perspective.Speech Commun., 50(8-9):716–729.

Traum, D. R. (2000).20 questions on dialogue act taxonomies.JOURNAL OF SEMANTICS, 17:7–30.

Yu, K., Zen, H., Mairesse, F., and Young, S. (2010).Context adaptive training with factorized decision trees forhmm-based speech synthesis.In Proceedings of Interspeech.

32 / 32

Introduction to statistical spoken dialogue systemsMachine learning in spoken dialogue systems Data...

Documents

Transcript of Introduction to statistical spoken dialogue systemsMachine learning in spoken dialogue systems Data...

Developing Multimodal Spoken Dialogue Systems - speech.kth.se

Implementing advanced spoken dialogue management …Des.Greer/ImplAdvancedSpokenDialogMgt_JSCP... · Implementing advanced spoken dialogue ... (e.g. “if a location and a hotel have

Spoken Dialogue Systems: Architectures and Applications

HIGGINS Error handling strategies in a spoken dialogue system

Experiments with ITSPOKE: An Intelligent Tutoring Spoken Dialogue System

Spoken Dialogue Systems

Machine Learning Methods for Spoken Dialogue Simulation ...

Developing Multimodal Spoken Dialogue Systemsjocke/final_thesis.pdfdeveloping multimodal spoken dialogue systems where users can express themselves freely without having to learn a

Spoken vs. Written or Dialogue vs. Non-Dialogue? …384373/FULLTEXT01.pdf · Spoken vs. Written or Dialogue vs. Non-Dialogue? Frequency Analysis of Verbs, ... to an oral conversation,

INF5820 Part 2: Spoken Dialogue Systems - UiO

Spoken Dialogue Systems A Tutorial

1 Spoken Dialogue Systems Dialogue and Conversational Agents (Part I) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction.

Clarification in Spoken Dialogue Systems : Modeling User Behaviors

Intonational Variation in Spoken Dialogue Systems

Spoken Dialogue Technology: Enabling the Conversational ...

SpeechBuilder: Facilitating Spoken Dialogue System Creation

Software Infrastructure for Spoken Dialogue System

Spoken Dialogue Interfaces for Older People Dialogue Interfaces for Older People ... When designing for older users, ... we summarise the work on spoken dialogue interfacesAuthors:

Spoken dialogue systems: challenges, and opportunities … · Spoken dialogue systems: challenges, and opportunities for research ... New York last-call: ... Spoken dialogue systems:

Automating spoken dialogue management design … spoken dialogue management design using machine learning: An industry perspective Tim Paeka,*, Roberto Pieraccinib aMicrosoft Research,