Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016

36
Using Deep Reinforcement Learning for Dialogue Systems Harm van Seijen, Research Scientist Montréal, Canada

Transcript of Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016

Using Deep Reinforcement Learningfor Dialogue Systems

Harm van Seijen, Research Scientist Montréal, Canada

spoken dialogue system

natural language understanding state tracker

policy managernatural language generation

data

“Hi, do you know a good Indian restaurant”

system response

user act

systemact

dialogue stateuser

inform(food=“Indian”)

user input

“Sure. What price range are you thinking of?” request(price_range)

spoken dialogue system

natural language understanding state tracker

policy managernatural language generation

data

“Hi, do you know a good Indian restaurant”

system response

user act

systemact

dialogue stateuser

The central question: how to train the policy manager?

inform(food=“Indian”)

user input

“Sure. What price range are you thinking of?” request(price_range)

outline1. what is reinforcement learning 2. solution strategies for RL 3. applying RL to dialogue systems

what is reinforcement learningReinforcement Learning is a data-driven approach towards learning behaviour.

what is reinforcement learningReinforcement Learning is a data-driven approach towards learning behaviour.

machine learning

unsupervised learning

supervised learning

reinforcement learning

what is reinforcement learningReinforcement Learning is a data-driven approach towards learning behaviour.

machine learning

unsupervised learning

supervised learning

reinforcement learning

+deep learning deep learning

+ +deep learning

what is reinforcement learningReinforcement Learning is a data-driven approach towards learning behaviour.

machine learning

unsupervised learning

supervised learning

reinforcement learning

+deep learning deep learning

+ +deep learning

= deep reinforcement

learning

RL vs supervised learningbehaviour: function that maps environment states to actions

RL vs supervised learning

supervised learninghard to specify function easy to identify correct output

behaviour: function that maps environment states to actions

RL vs supervised learning

supervised learninghard to specify function easy to identify correct output

behaviour: function that maps environment states to actions

example: recognizing cats in images

f cat / no cat

RL vs supervised learningbehaviour: function that maps environment states to actions

reinforcement learning: hard to specify function hard to identify correct output easy to specify behaviour goal

RL vs supervised learningbehaviour: function that maps environment states to actions

reinforcement learning: hard to specify function hard to identify correct output easy to specify behaviour goal

example: double inverted pendulum

state: θ1, θ2, ω1, ω2 action: clockwise/counter-clockwise torque on top joint goal: balance pendulum upright

advantages RL

does not require knowledge of good policy does not require labelled data online learning: adaptation to environment changes

challenges RL

requires lots of data sample distribution changes during learning samples are not i.i.d.

outline1. what is reinforcement learning 2. solution strategies for RL 3. applying RL to dialogue systems

definitions

definitions

definitions

definitions

definitions

estimating the value function

estimating the value function

estimating the value function

estimating the value function

estimating the value function

finding the optimal policypolicy estimation

policy improvement:

finding the optimal policy

Q-learning: classical RL algorithm combines (partial) policy evaluation with (partial) policy improvement update target:

policy estimation

policy improvement:

deep reinforcement learning2015 Nature paper from DeepMind introduced an RL method based on deep learning, called DQN

main result: with same network architecture, learned to play large number of Atari 2600 games effectively

deep reinforcement learning2015 Nature paper from DeepMind introduced an RL method based on deep learning, called DQN

main result: with same network architecture, learned to play large number of Atari 2600 games effectively

DQN characteristics variation on Q-learning that uses deep neural networks to approximate the Q function uses experience replay to deal with non-i.i.d. samples uses two networks (Q and Q’) to mitigate non-stationarity of update targets

outline1. what is reinforcement learning 2. solution strategies for RL 3. applying RL to dialogue systems

applying RL to dialogue system

training dialogue manager requires huge number of online samples hence, a user simulator, trained on offline data, is used to train dialogue manager

policy manager

systemact

user simulator

trainingstate tracker

dialogueact

offline data

deep RL for dialogue system

exact state is not observed, hence belief state is used belief-state spaces are typically discretized into summary state spaces to make the task tractable deep RL can be applied directly to the belief-state space due to its strong generalization properties with pre-training, a deep RL method can become even more efficient

effect of pre-training

without pre-training with pre-training

[based on DSTC2 dataset]

summaryRL is a data-driven approach towards learning behaviour RL does not require knowledge of good policy RL can be used for online learning combining RL with deep learning means that RL can be applied to much bigger problems constructing a good policy for a modern dialogue manager is a challenging task deep RL is the perfect candidate to address this challenge

Further reading:

“Introduction to Reinforcement Learning” by Richard S. Sutton & Andrew G. Barto https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html

“Algorithms for Reinforcement Learning” by Csaba Szepesvarihttps://sites.ualberta.ca/~szepesva/RLBook.html

“Policy Networks with Two-Stage Training for Dialogue Systems” by Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, Kaheer Suleman https://arxiv.org/abs/1606.03152

Code examples:

simple DQN example in Python: https://edersantana.github.io/articles/keras_rl/

tool for testing/developing RL algorithms: https://gym.openai.com/