neural dialog -...

53
Neural Dialog Shrimai Prabhumoye Alan W Black Speech Processing 11-[468]92

Transcript of neural dialog -...

Page 1: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

NeuralDialogShrimai Prabhumoye

AlanWBlackSpeechProcessing11-[468]92

Page 2: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Review

• TaskOrientedSystems• Intents,slots,actionsandresponse

• Non-TaskOrientedSystems• Noagenda,forfun

• Buildingdialogsystems• RuleBasedSystems• Eliza

• RetrievalTechniques• Representations:TF-IDF,N-grams,wordsthemselves• SimilarityMeasures:Jaccard,cosine,euclidean distance• Limitations– fixedsetofresponses,novariationinresponse

Page 3: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Review

• TaskOrientedSystems• Non-TaskOrientedSystems• Buildingdialogsystems• RetrievalTechniques

• Representation• WordVectors

• SimilarityMeasures• Limitations– fixedsetofresponses,novariationinresponse

• GenerativeModels

Page 4: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Overview

• WordEmbeddings

• LanguageModelling

• RecurrentNeuralNetworks• SequencetoSequenceModels

• HowtoBuildDialogSystem

• IssuesandExamples

• Alexa-Prize

Page 5: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

NeuralDialog

• Wewanttomodel:

• Howtowerepresentsentence(𝑃 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 , 𝑃 𝑖𝑛𝑝𝑢𝑡 ?)• Howtobuildalanguagemodel.• Howtorepresentswords(wordembeddings?)

𝑷 𝒓𝒆𝒔𝒑𝒐𝒏𝒔𝒆 𝒊𝒏𝒑𝒖𝒕)

Page 6: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

NaturalLanguageProcessing

• Typicalpreprocessingstepso FormvocabularyofwordsthatmapswordstoauniqueIDo Differentcriteriacanbeusedtoselectwhichwordsarepartof

thevocabulary(eg:thresholdfrequency)o Allwordsnotinthevocabularywillbemappedtoaspecial‘out-

of-vocabulary’• Typicalvocabularysizeswillvarybetween10,000and250,000

(Salakhutdinov,2017)

Page 7: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

PreprocessingTechniques

• Tokenization• “Iamagirl.”tokenizedto“I”,“am”,“a”,“girl”,“.”

• Lowercaseallwords• RemovingStopWords• Ex:“the”,“a”,“and”,etc

• FrequencyofWords• SetathresholdandmakeallwordsbelowthisfrequencyasUNK

• Add<START>and<EOS>tagatthebeginningandendofsentence.

(Salakhutdinov,2017)

Page 8: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Vocabulary

Page 9: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

One-HotEncoding

• FromitswordID,wegetabasicrepresentationofawordthroughthe

one-hotencodingoftheID

• theone-hotvectorofanIDisavectorfilledwith0s,exceptfora1at

thepositionassociatedwiththeID

• ForvocabularysizeD=10,theone-hotvectorofwordIDw=4is:

𝑒 𝑤 = [0001000000]

(Salakhutdinov,2017)

Page 10: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

LimitationsofOne-HotEncoding

Page 11: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

LimitationsofOne-HotEncoding

• Aone-hotencodingmakesnoassumptionaboutwordsimilarity.o [“working”,“on”,“Friday”,“is”,“tiring”]doesnotappearinourtrainingset.

o [“working”,“on”,“Monday”,“is”,“tiring”]isinthetrainset.oWewanttomodel𝑃 “𝑡𝑖𝑟𝑖𝑛𝑔” “𝑤𝑜𝑟𝑘𝑖𝑛𝑔”, “𝑜𝑛”, “𝐹𝑟𝑖𝑑𝑎𝑦”, “𝑖𝑠”)oWordrepresentationof“Monday”and“Friday” aresimilarthengeneralize

(Salakhutdinov,2017)

Page 12: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

LimitationsofOne-HotEncoding

• Themajorproblemwiththeone-hotrepresentationisthatitisvery

high-dimensional

othedimensionalityofe(w)isthesizeofthevocabulary

oatypicalvocabularysizeis≈100,000

oawindowof10wordswouldcorrespondtoaninputvectorofat

least1,000,000units!

(Salakhutdinov,2017)

Page 13: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

ContinuousRepresentationofWords

• Eachwordwisassociatedwithareal-valuedvectorC(w)• Typicalsizeofword– embeddingis300ormore.

(Salakhutdinov,2017)

Page 14: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

ContinuousRepresentationofWords

• Wewouldlikethedistance||C(w)-C(w’)||toreflectmeaningfulsimilarities betweenwords

(Salakhutdinov,2017)

Page 15: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

LanguageModeling

• Alanguagemodelallowsustopredicttheprobabilityofobserving

the sentence(inagivendataset)as:

𝑃 𝑥G, … , 𝑥I = J𝑃 𝑥K 𝑥G, … , 𝑥KLG)I

KMG

• Herelengthofsentenceisn.• Builda languagemodel usingaRecurrentNeuralNetwork.

Page 16: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

WordEmbeddings fromLanguageModels

(Neubig,2017)

Page 17: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

ContinuousBagofWords(CBOW)

• Predictwordbasedonsumofsurroundingembeddings

(Neubig,2017)

Page 18: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Skip-gram

• usethecurrentwordtopredictthesurroundingwindowofcontext

words

(Neubig,2017)

Page 19: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

BERT(BidirectionalEncoderRepresentationsfromTransformers)

• BERTisamethodofpretraining languagerepresentations

• Data:Wikipedia(2.5Bwords)+BookCorpus (800Mwords)

• Maskoutk%oftheinputwords,andthenpredictthemaskedwords

• WordEmbeddingSize:768

Page 20: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

UseofWordEmbeddings

• torepresentasentence

• asinputtoaneuralnetwork

• tounderstandpropertiesofwords

• Partofspeech

• Dotwowordsmeanthesamething?

• semanticrelation(is-a,part-of,went-to-school-at)?

Page 21: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

NLPandSequentialData

• NLPisfullofsequentialdata

• Charactersinwords

• Wordsinsentences

• Sentencesindiscourse

• …

(Neubig,2017)

Page 22: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Long-distanceDependenciesinLanguage

• Agreementinnumber,gender,etc.

• He doesnothaveverymuchconfidenceinhimself.

• She doesnothaveverymuchconfidenceinherself.

• Selectional preference

• Thereign haslastedaslongasthelifeofthequeen.

• Therain haslastedaslongasthelifeoftheclouds.

(Neubig,2017)

Page 23: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

RecurrentNeuralNetworks

• Toolstorememberinformation

(Neubig,2017)

FeedForwardNN RecurrentNN

Page 24: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

UnrollinginTime

• Whatdoesprocessingasequencelooklike?

I hate this movie

RNN

predict

label

RNN

predict

label

RNN

predict

label

predict

label

RNN

(Neubig,2017)

Page 25: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

TrainingRNNsI hate this movie

RNN

predict

Prediction1

RNN

predict

Prediction2

RNN

predict

Prediction3

predict

Prediction4

RNN

Label1 Label2 Label3 Label4

Loss1 Loss2 Loss3 Loss4

sum totalloss (Neubig,2017)

Page 26: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

WhatcanRNNsdo

• Representasentence

• Readwholesentence,makeaprediction

• Representacontextwithinasentence

• Readcontextupuntilthatpoint

(Neubig,2017)

Page 27: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Representingasentence

• ℎO istherepresentationofthesentence

• ℎO istherepresentationoftheprobabilityofobserving“Ihatethismovie”

I hate this movie

RNN RNN RNN RNN

ℎP ℎG ℎQ ℎR ℎO

(Neubig,2017)

Page 28: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

LanguageModelingusingRNN

(Neubig,2017)

I hate this movie<start>

RNN RNN RNN RNN RNN

predict

hate

predict

I

predict

this

predict

movie

predict

<end>

Page 29: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Bidirectional-RNNs

• Asimpleextension,runtheRNNinbothdirections

(Neubig,2017)

Page 30: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Bidirectional-RNNs

• Asimpleextension,runtheRNNinbothdirections

(Neubig,2017)Prediction1

Page 31: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Bidirectional-RNNs

• Asimpleextension,runtheRNNinbothdirections

(Neubig,2017)Prediction1 Prediction2

Page 32: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Bidirectional-RNNs

• Asimpleextension,runtheRNNinbothdirections

(Neubig,2017)Prediction1 Prediction2 Prediction3 Prediction4

Page 33: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

RecurrentNeuralNetworks

• TheideabehindRNNsistomakeuseofsequentialinformation.

Page 34: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

RecurrentNeuralNetworks

• 𝑥S istheinputattimestept• 𝑥S isthewordembedding• 𝑠S isthehiddenrepresentationattimestept

𝑠S = 𝑓 𝑈𝑥S +𝑊𝑠SLG𝑜S = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑉𝑠S)

• Note: U,V,Wareshared acrossalltimesteps

Page 35: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

RNNProblemsandAlternatives

• Vanishinggradients

• Gradientsdecreaseastheygetpushedback

• Sol:LongShort-termMemory(Hochreiter andSchmidhuber 1997)(Neubig,2017)

Page 36: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

RNNStrengthsandWeaknesses

• RNNs,particularlydeepRNNs/LSTMs,arequitepowerfulandflexible

• Buttheyrequirealotofdata

• Alsohavetroublewithweakerrorsignalspassedbackfromtheend

ofthesentence

Page 37: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

BuildChatbots

• Wewanttomodel𝑃 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑖𝑛𝑝𝑢𝑡_𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒)

• Welearnthowtobuildwordembeddings

• Welearnthowtobuildalanguagemodel

• Welearnthowtorepresentasentence.

• Wewanttogetarepresentationoftheinput_sentence andthen

generatetheresponseconditionedontheinput.

Page 38: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

ConditionalLanguageModels

• LanguageModel

𝑃 𝑋 = J𝑃 𝑥K|𝑥G, … , 𝑥KLG

I

KMG

• ConditionalLanguageModel

𝑃 𝑌 𝑋 = J𝑃 𝑦 |𝑋, 𝑦G, … , 𝑦 LG

a

`MG

(Neubig,2017)

contextnextword

context

Addedcontext

Page 39: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

ConditionalLanguageModel(Sutskever etal.2014)

(Neubig,2017)

Page 40: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Howtopasshiddenstate?

• Initializedecoderw/encoder(Sutskever etal.2014)

• Transform(canbedifferentdimensions)

• Inputateverytimestep(Kalchbrenner &Blunsom 2013)

(Neubig,2017)

Page 41: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

SequencetoSequenceModels

Page 42: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

ConstraintsofNeuralModels

Backchanneling

Long-termconversationplanning

Context

Engagement

Page 43: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

ConstraintsofNeuralModels

Constraints

Gesture

GazeLaughter

Backchanneling

Long-termconversationplanning

Context

Engagement

Page 44: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

ExamplesofNeuralChatbots

Page 45: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Tay

Page 46: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Zo

Page 47: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Xiaoice

• https://www.youtube.com/watch?v=dg-x1WuGhuI

Page 48: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

AlexaPrizeChallenge

• Challenge:Buildachatbotthatengages theusersfor20mins.• Sponsored12UniversityTeamswith$100k.• CMUMagnusandCMURuby.• Systemsaremulticomponent

oCombinationsoftask/non-taskoHand-writtenandstatistical/neuralmodels

• ItsaboutengagingresearchersoHavingmorePhDstudentsdodialogoGivingaccessfordeveloperstousersoCollectingdata:whatdouserssay

Page 49: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

CMUMagnus

• Highaveragenumberofturns

• AverageRating

• Topics:Movies,Sports,Travel,GoT

• Usershadlongerconversationsbutdidnotenjoytheconversation.oIdentifywhenuserisfrustrated orwantstochangetopic.

oIdentifywhattheuserwouldliketotalkabout(intent).

• Detecting“Abusive”remarksandrespondingappropriately

Page 50: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

Summary

• Howtorepresentwordsincontinuousspace.• WhatareRNNsandhowtousethemtorepresentasentence.• Sequencetosequencemodelsfor𝑃 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑖𝑛𝑝𝑢𝑡_𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒)• Issuesinneuralmodel• IssueswithLivesystem!

Page 51: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

References

• http://www.phontron.com/class/nn4nlp2017/assets/slides/nn4nlp-03-wordemb.pdf• http://www.phontron.com/class/nn4nlp2017/assets/slides/nn4nlp-06-rnn.pdf• http://www.phontron.com/class/nn4nlp2017/assets/slides/nn4nlp-08-condlm.pdf• https://www.cs.cmu.edu/~rsalakhu/10707/Lectures/Lecture_Language_2019.pdf• http://www.phontron.com/class/mtandseq2seq2017/mt-spring2017.chapter6.pdf

Page 52: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

References

• http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/• http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/• http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-model-rnn-with-python-numpy-and-theano/• http://www.wildml.com/2016/07/deep-learning-for-chatbots-2-retrieval-based-model-tensorflow/• https://nlp.stanford.edu/seminar/details/jdevlin.pdf

Page 53: neural dialog - tts.speech.cs.cmu.edutts.speech.cs.cmu.edu/courses/11492/slides/neural_dialog.pdf · •From its word ID, we get a basic representation of a word through the one-hot

RNNtorepresentasentence

RNN

Embedding

how

𝒔𝟎 RNN

Embedding

are

𝒔𝟏RNN

Embedding

you

𝒔𝟐RNN

Embedding

?

𝒔𝟑𝒔𝟒

• 𝑠O istherepresentationoftheentiresentence• 𝑠O istherepresentationofprobabilityofobserving“howareyou?”