TTIC 31190: Natural Language Processing -...
Transcript of TTIC 31190: Natural Language Processing -...
TTIC31190:NaturalLanguageProcessing
KevinGimpelWinter2016
Lecture11:RecurrentandConvolutionalNeuralNetworksinNLP
1
Announcements• Assignment3assignedyesterday,dueFeb.29
• projectproposaldueTuesday,Feb.16
• midtermonThursday,Feb.18
2
Roadmap• classification• words• lexicalsemantics• languagemodeling• sequencelabeling• neuralnetworkmethodsinNLP• syntaxandsyntacticparsing• semanticcompositionality• semanticparsing• unsupervisedlearning• machinetranslationandotherapplications
3
2-transformation(1-layer)network
• we’llcallthisa“2-transformation”neuralnetwork,ora“1-layer”neuralnetwork
• inputvectoris• scorevectoris• onehiddenvector(“hiddenlayer”)
4
vectoroflabelscores
ikr smh heaskedfiryo lastnamesohecanadduonfb lololol
7
intj pronoun prepadj prep verbotherverb det noun pronoun
pronoun propernoun
verbprep intj
NeuralNetworksforTwitterPart-of-SpeechTagging
adj =adjectiveprep=prepositionintj =interjection
• inAssignment3,you’llbuildaneuralnetworkclassifiertopredictaword’sPOStagbasedonitscontext
ikr smh heaskedfiryo lastnamesohecan
8
intj pronoun prepadj prep verbotherverbdet noun pronoun
NeuralNetworksforTwitterPart-of-SpeechTagging
• e.g.,predicttagofyo givencontext• whatshouldtheinputxbe?– ithastobeindependentofthelabel– ithastobeafixed-lengthvector
ikr smh heaskedfiryo lastnamesohecan
9
intj pronoun prepadj prep verbotherverbdet noun pronoun
NeuralNetworksforTwitterPart-of-SpeechTagging
• e.g.,predicttagofyo givencontext• whatshouldtheinputxbe?
wordvectorforyo
ikr smh heaskedfiryo lastnamesohecan
10
intj pronoun prepadj prep verbotherverbdet noun pronoun
NeuralNetworksforTwitterPart-of-SpeechTagging
• e.g.,predicttagofyo givencontext• whatshouldtheinputxbe?
wordvectorforyowordvectorforfir
ikr smh heaskedfiryo lastnamesohecan
11
intj pronoun prepadj prep verbotherverbdet noun pronoun
NeuralNetworksforTwitterPart-of-SpeechTagging
wordvectorforyowordvectorforfir
• whenusingwordvectorsaspartofinput,wecanalsotreatthemasmoreparameterstobelearned!
• thisiscalled“updating”or“fine-tuning”thevectors(sincetheyareinitializedusingsomethinglikeword2vec)
ikr smh heaskedfiryo lastnamesohecan
12
intj pronoun prepadj prep verbotherverbdet noun pronoun
NeuralNetworksforTwitterPart-of-SpeechTagging
vectorforlastvectorforyo
• let’susethecenterword+twowordstotheright:
vectorforname
• ifname istotherightofyo,thenyo isprobablyaformofyour• butourx aboveusesseparatedimensionsforeachposition!
– i.e.,nameistwowordstotheright– whatifnameisonewordtotheright?
FeaturesandFilters• wecoulduseafeaturethatreturns1ifnameistotherightofthecenterword,butthatdoesnotusetheword’sembedding
• howdoweincludeafeaturelike“awordsimilartoname appearssomewheretotherightofthecenterword”?
• ratherthanalwaysspecifyrelativepositionandembedding,wewanttoaddfilters thatlookforwordslikenameanywhereinthewindow(orsentence!)
13
Filters• fornow,thinkofafilterasavectorinthewordvectorspace
• thefiltermatchesaparticularregionofthespace• “match”=“hashighdotproductwith”
14
Convolution• convolutionalneuralnetworksuseabunchofsuchfilters
• eachfilterismatchedagainst(dotproductcomputedwith)eachwordintheentirecontextwindoworsentence
• e.g.,asinglefilterisavectorofsamelengthaswordvectors
15
Convolution
19
vectorforlastvectorforyo vectorforname
=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence
Pooling
20
vectorforlastvectorforyo vectorforname
=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence
howdoweconvertthisintoafixed-lengthvector?usepooling:
max-pooling:returnsmaximumvalueinaverage pooling:returnsaverageofvaluesin
Pooling
21
vectorforlastvectorforyo vectorforname
=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence
howdoweconvertthisintoafixed-lengthvector?usepooling:
max-pooling:returnsmaximumvalueinaverage pooling:returnsaverageofvaluesin
then,thissinglefilterproducesasinglefeaturevalue(theoutputofsomekindofpooling).inpractice,weusemanyfiltersofmanydifferentlengths(e.g.,n-gramsratherthanwords).
ConvolutionalNeuralNetworks• convolutionalneuralnetworks(convnets orCNNs)usefiltersthatare“convolvedwith”(matchedagainstallpositionsof)theinput
• informally,thinkofconvolutionas“performthesameoperationeverywhereontheinputinsomesystematicorder”
• “convolutionallayer”=setoffiltersthatareconvolvedwiththeinputvector(whetherx orhiddenvector)
• couldbefollowedbymoreconvolutionallayers,orbyatypeofpooling
• oftenusedinNLPtoconvertasentenceintoafeaturevector
22
Disclaimer• thesediagramsareoftenusefulforhelpingusunderstandandcommunicateneuralnetworkarchitectures
• buttheyrarelyhaveanysortofformalsemantics(unlikegraphicalmodels)
• theyaremorelikecartoons
26
LongShort-TermMemoryRNNs(gateless)
30
Experiment:textclassification• StanfordSentimentTreebank• binaryclassification(positive/negative)
• 25-dimwordvectors• 50-dimcell/hiddenvectors• classificationlayeronfinal hiddenvector• AdaGrad,10epochs,mini-batchsize10• earlystoppingondev set
accuracy
80.6
AllGates
51
acc.
gateless 80.6
outputgates 81.9
inputgates 84.4
input&outputgates 84.6
forgetgates 82.1
input&forget gates 84.1
forget& outputgates 82.6
input,forget,outputgates 85.3
Backward&BidirectionalLSTMs
52
bidirectional:ifshallow,justuseforwardandbackwardLSTMsinparallel,concatenatefinaltwohiddenvectors,feedtosoftmax
Backward&BidirectionalLSTMs
53
bidirectional:ifshallow,justuseforwardandbackwardLSTMsinparallel,concatenatefinaltwohiddenvectors,feedtosoftmax
forward backward
gateless 80.6 80.3
outputgates 81.9 83.7
inputgates 84.4 82.9
forgetgates 82.1 83.4
input,forget,outputgates 85.3 85.9
Backward&BidirectionalLSTMs
54
bidirectional:ifshallow,justuseforwardandbackwardLSTMsinparallel,concatenatefinaltwohiddenvectors,feedtosoftmax
forward backward bidirectional
gateless 80.6 80.3 81.5
outputgates 81.9 83.7 82.6
inputgates 84.4 82.9 83.9
forgetgates 82.1 83.4 83.1
input,forget,outputgates 85.3 85.9 85.1
DeepLSTM(2-layer)
57
layer1
layer2
acc.
gatelessshallow(50) 80.6
deep(30,30) 80.8
input,forget,outputshallow(50) 85.3
deep(30,30) ~85
DeepBidirectionalLSTMs
58
concatenatehiddenvectorsofforward&backwardLSTMs,connecteachentrytoforwardandbackwardhiddenvectorsinnextlayer