TTIC 31190: Natural Language Processing
Transcript of TTIC 31190: Natural Language Processing
![Page 1: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/1.jpg)
TTIC31190:NaturalLanguageProcessing
KevinGimpelWinter2016
Lecture13:DependencySyntax/Parsing
&ReviewforMidterm
1
![Page 2: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/2.jpg)
Announcement• projectproposalduetoday• emailmetosetupa15-minutemeetingnextweektodiscussyourprojectproposal
• timespostedoncoursewebpage• letmeknowifnoneofthoseworkforyou
2
![Page 3: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/3.jpg)
Announcement• midtermisThursday,room#530• closed-book,butyoucanbringan8.5x11sheet(thoughIdon’tthinkyou’llneedto)
• wewillstartat10:35am,finishat11:50am
3
![Page 4: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/4.jpg)
Roadmap• classification• words• lexicalsemantics• languagemodeling• sequencelabeling• neuralnetworkmethodsinNLP• syntaxandsyntacticparsing• semanticcompositionality• semanticparsing• unsupervisedlearning• machinetranslationandotherapplications
4
![Page 5: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/5.jpg)
WhatisSyntax?• rules,principles,processesthatgovernsentencestructureofalanguage
• candifferwidelyamonglanguages• buteverylanguagehassystematicstructuralprinciples
5
![Page 6: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/6.jpg)
ConstituentParse(Bracketing/Tree)(S(NPtheman)(VPwalked(PPto(NPthepark))))
6
themanwalkedtothepark
S
NP
NP
VP
PP
Key:S=sentenceNP=nounphraseVP=verbphrasePP=prepositionalphraseDT=determinerNN=nounVBD=verb(pasttense)IN=preposition
DT NN VBDINDTNN
![Page 7: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/7.jpg)
ConstituentParse(Bracketing/Tree)(S(NPtheman)(VPwalked(PPto(NPthepark))))
7
themanwalkedtothepark
S
NP
NP
VP
PP
DT NN VBDINDTNN preterminals
nonterminals
terminals
![Page 8: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/8.jpg)
PennTreebankNonterminals
8
![Page 9: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/9.jpg)
ProbabilisticContext-FreeGrammar(PCFG)
• assignprobabilitiestorewriterules:NPà DTNN 0.5NPà NNS 0.3NPà NPPP 0.2
NNàman 0.01NNà park 0.0004NNàwalk 0.002NNà….
9
givenatreebank,estimatetheseprobabilitiesusingMLE(“countandnormalize”)
![Page 10: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/10.jpg)
HowwelldoesaPCFGwork?• PCFGlearnedfromthePennTreebankwithMLEgetsabout73%F1score
• state-of-the-artparsersarearound92%• simplemodificationscanimprovePCFGs:– smoothing– treetransformations(selectiveflattening)– parentannotation
10
![Page 11: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/11.jpg)
ParentAnnotationVPà VNPPP
VPS à VNPVP PPVP
addsmoreinformation,butalsofragmentscounts,makingparameterestimatesnoisier(sincewe’rejustusingMLE)
11
![Page 12: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/12.jpg)
HowwelldoesaPCFGwork?• PCFGlearnedfromthePennTreebankwithMLEgetsabout73%F1score
• state-of-the-artparsersarearound92%• simplemodificationscanimprovePCFGs:– smoothing– treetransformations(selectiveflattening)– parentannotation– lexicalization
12
![Page 13: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/13.jpg)
Collins(1997)
13
![Page 14: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/14.jpg)
LexicalizedPCFGs
14
nonterminals aredecoratedwiththeheadwordofthesubtree
![Page 15: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/15.jpg)
Lexicalization• thisaddsalotmorerules!• manymoreparameterstoestimateàsmoothingbecomesmuchmoreimportant– e.g.,right-handsideofrulemightbefactoredintoseveralsteps
• butit’sworthitbecauseheadwordsarereallyusefulforconstituentparsing
15
![Page 16: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/16.jpg)
Results(Collins,1997)
16
![Page 17: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/17.jpg)
HeadRules• howareheadsdecided?• mostresearchersusedeterministicheadrules(Magerman/Collins)
• foraPCFGruleAà B1 …BN,theseheadrulessaywhichofB1 …BNistheheadoftherule
• examples:Sà NPVPVPà VBD NPPPNPà DTJJNN
17
![Page 18: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/18.jpg)
HeadAnnotation
18fromNoahSmith
![Page 19: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/19.jpg)
LexicalHeadAnnotation
19fromNoahSmith
![Page 20: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/20.jpg)
LexicalHeadAnnotationà Dependencies
20
removenonlexicalparts:
fromNoahSmith
![Page 21: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/21.jpg)
Dependencies
21
mergeredundantnodes:
fromNoahSmith
![Page 22: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/22.jpg)
22
constituentparse: dependencyparse:
![Page 23: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/23.jpg)
23
constituentparse: labeled dependencyparse:
nsubj
det
dobj
pobj
det
prep
nsubj =“nominalsubject”dobj =“directobject”prep=“prepositionmodifier”pobj =“objectofpreposition”det =“determiner”
![Page 24: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/24.jpg)
24
constituentparse: labeled dependencyparse:
nsubj
det
dobj
pobj
det
prep
nsubj =“nominalsubject”dobj =“directobject”prep=“prepositionmodifier”pobj =“objectofpreposition”det =“determiner”
capturessomesemanticrelationships
![Page 25: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/25.jpg)
• how(unlabeled)dependencytreesaretypicallydrawn:– rootoftreeisrepresentedby$(“wallsymbol”)– arrowsdrawnentirelyabove(orbelow)sentence– arrowsaredirectedfromchildtoparent(orfromparenttochild);youwillseebothinpractice—don’tgetconfused!
25
source: $ konnten sie es übersetzen ?
reference: $ could you translate it ?“wall”symbol
![Page 26: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/26.jpg)
CrossingDependencies
26
ifdependenciescross(“nonprojective”),nolongercorrespondsto
aPCFG
fromNoahSmith
![Page 27: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/27.jpg)
Projectivevs.Nonprojective DependencyParsing
• Englishdependencytreebanks aremostlyprojective– butwhenfocusingmoreonsemanticrelationships,oftenbecomesmorenonprojective
• some(relatively)freewordorderlanguages,likeCzech,arefairlynonprojective
• nonprojective parsingcanbeformulatedasaminimumspanningtreeproblem
• projectiveparsingcannot
27
![Page 28: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/28.jpg)
DependencyParsing• severalwidely-usedalgorithms• differentguaranteesbutsimilarperformanceinpractice
• graph-based:– dynamicprogramming(Eisner,1997)– minimumspanningtree(McDonaldetal.,2005)
• transition-based:– shift-reduce(Nivre,interalia)
28
![Page 29: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/29.jpg)
DependencyParsers• Stanfordparser• TurboParser• Joakim Nivre’s MALTparser• RyanMcDonald’sMSTparser• andmanyothersformanynon-Englishlanguages
29
![Page 30: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/30.jpg)
ComplexityComparison• constituentparsing:O(Gn3)– parsingcomplexitydependsongrammarstructure(“grammarconstant”G)
– sinceithaslotsofnonterminal-onlyrulesatthetopofthetree,therearemanyruleprobabilitiestoestimate
• dependencyparsing:O(n3)– operatesdirectlyonwords,soparsingcomplexityhasnogrammarconstant
– featuresdesignedonpossibledependencies(pairsofwords)andlargerstructures
– transition-basedparsingalgorithmsareO(n),thoughnotoptimal;also,non-projectiveparsingisfaster
30
![Page 31: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/31.jpg)
ApplicationsofDependencyParsing• widelyusedforNLPtasksbecause:– fasterthanconstituentparsing– capturesmoresemanticinformation
• textclassification(featuresondependencies)• syntax-basedmachinetranslation• relationextraction– e.g.,extractrelationbetweenSamSmithandAITech:SamSmithwasnamednewCEOofAITech.– usedependencypathbetweenSamSmithandAITech:
• Smithà named,namedß CEO,CEOß of,ofß AITech
31
![Page 32: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/32.jpg)
Summary:twotypesofgrammars• phrasestructure/constituentgrammars– inspiredmostlybyChomskyandothers– onlyappropriateforcertainlanguages(e.g.,English)
• dependencygrammars– closertoasemanticrepresentation;somehavemadethismoreexplicit
– problematicforcertainsyntacticstructures(e.g.,conjunctions,nestingofnounphrases,etc.)
• botharewidelyusedinNLP• youcanfindconstituentparsersanddependencyparsersforseverallanguagesonline
32
![Page 33: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/33.jpg)
Review
33
![Page 34: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/34.jpg)
Modeling,Inference,Learning
• Modeling:Howdoweassignascoretoan(x,y)pairusingparameters?
modeling:definescorefunction
34
![Page 35: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/35.jpg)
Modeling,Inference,Learning
• Inference:Howdoweefficientlysearchoverthespaceofalllabels?
inference:solve_ modeling:definescorefunction
35
![Page 36: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/36.jpg)
Modeling,Inference,Learning
• Learning:Howdowechoose?
learning:choose_
modeling:definescorefunctioninference:solve_
36
![Page 37: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/37.jpg)
Applications
37
![Page 38: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/38.jpg)
ApplicationsofourClassificationFramework
38
textclassification:
x y
thehulk isanangerfueledmonsterwithincrediblestrengthandresistancetodamage. objective
intryingtobedaringandoriginal,itcomesoffasonlyoccasionallysatiricalandneverfresh. subjective
={objective,subjective}
![Page 39: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/39.jpg)
ApplicationsofourClassificationFramework
39
wordsenseclassifierforbass:
x y
he’sabassinthechoir. bass3
our bassisline-caughtfromtheAtlantic. bass4
={bass1,bass2,…,bass8}
![Page 40: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/40.jpg)
ApplicationsofourClassificationFramework
40
skip-grammodelasaclassifier:
x y
agriculture <s>
agriculture is
agriculture the
=V (theentirevocabulary)
corpus(EnglishWikipedia):agriculture isthetraditionalmainstayofthecambodian economy.butbenares hasbeendestroyedbyanearthquake .…
![Page 41: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/41.jpg)
determinerverb(past)prep.properproperposs.adj.noun
modalverbdet.adjectivenounprep.properpunc.
41
Part-of-SpeechTagging
determinerverb(past)prep.nounnounposs.adj.nounSomequestionedifTimCook’sfirstproduct
modalverbdet.adjectivenounprep.nounpunc.wouldbeabreakawayhitforApple.
Simplestkindofstructuredprediction:SequenceLabeling
![Page 42: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/42.jpg)
42
OOOB-PERSONI-PERSONOOOSomequestionedifTimCook’sfirstproduct
OOOOOOB-ORGANIZATIONOwouldbeabreakawayhitforApple.
NamedEntityRecognition
B=“begin”I=“inside”O=“outside”
FormulatingsegmentationtasksassequencelabelingviaB-I-Olabeling:
![Page 43: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/43.jpg)
ApplicationsofourClassifierFrameworksofar
43
task input(x) output(y) outputspace() sizeof
textclassification asentence goldstandard
label forx
pre-defined, smalllabelset (e.g.,
{positive,negative})2-10
wordsensedisambiguation
instanceofaparticularword(e.g.,bass)with
itscontext
goldstandardwordsenseofx
pre-definedsenseinventory from
WordNet forbass2-30
learning skip-gramwordembeddings
instanceofawordinacorpus
awordinthecontextofx in
acorpusvocabulary |V|
part-of-speechtagging asentence
goldstandardpart-of-speech
tagsforx
allpossiblepart-of-speech tagsequenceswithsamelengthasx
|P||x|
![Page 44: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/44.jpg)
ApplicationsofClassifierFramework(continued)
44
task input(x) output(y) outputspace() sizeof
namedentity
recognitionasentence
goldstandardnamedentitylabels forx
(BIOtags)
allpossibleBIOlabelsequenceswithsame
lengthasx|P||x|
constituentparsing asentence
goldstandardconstituentparse(labeledbracketing)
ofx
all possible labeledbracketings ofx
exponentialinlengthofx(Catalannumber)
dependencyparsing asentence
goldstandarddependencyparse(labeleddirectedspanning tree)ofx
allpossible labeleddirectedspanning trees
ofx
exponentialinlengthofx
![Page 45: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/45.jpg)
• eachapplicationdrawsfromparticularlinguisticconceptsandmustaddressdifferentkindsoflinguisticambiguity/variability:– wordsense:sensegranularity,relationshipsamongsenses,wordsenseambiguity
– wordvectors:distributionalproperties,senseambiguity,differentkindsofsimilarity
– part-of-speech:taggranularity,tagambiguity– parsing:constituent/dependencyrelationships,attachment&coordinationambiguities
45
![Page 46: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/46.jpg)
Modeling
46
![Page 47: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/47.jpg)
modelfamilies• linearmodels– lotsoffreedomindefiningfeatures,thoughfeatureengineeringrequiredforbestperformance
– learningusesoptimizationofalossfunction– onecan(tryto)interpretlearnedfeatureweights
• stochastic/generativemodels– linearmodelswithsimple“features”(countsofevents)– learningiseasy:count&normalize(butsmoothingneeded)– easytogeneratesamples
• neuralnetworks– canusuallygetawaywithlessfeatureengineering– learningusesoptimizationofalossfunction– hardtointerpret(thoughwetry!),butoftenworksbest
47
![Page 48: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/48.jpg)
specialcaseoflinearmodels:stochastic/generativemodels
48
model tasks contextexpansion
n-gramlanguage models languagemodeling (forMT,ASR,etc.) increasen
hiddenMarkovmodelspart-of-speechtagging,
namedentityrecognition,wordclustering
increaseorderofHMM(e.g.,bigramHMMà trigram HMM)
probabilistic context-freegrammars constituentparsing increasesizeofrules,e.g.,flattening,
parentannotation,etc.
• alluseMLE+smoothing(thoughprobablydifferentkindsofsmoothing)• allassignprobabilitytosentences(someassignprobabilityjointlytopairs
of<sentence,somethingelse>)• allhavethesametrade-offofincreasing“context”(featuresize)and
needingmoredata/bettersmoothing
![Page 49: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/49.jpg)
FeatureEngineeringforTextClassification
• Twofeatures:
where
• Whatshouldtheweightsbe?
49
![Page 50: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/50.jpg)
unigrambinarytemplate:
bigrambinarytemplate:
trigrambinaryfeatures…
50
Higher-OrderBinaryFeatureTemplates
![Page 51: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/51.jpg)
UnigramCountFeatures
• a``count’’featurereturnsthecountofaparticularwordinthetext
• unigramcountfeaturetemplate:
51
![Page 52: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/52.jpg)
FeatureCountCutoffs• problem:somefeaturesareextremelyrare• solution:onlykeepfeaturesthatappearatleastk timesinthetrainingdata
52
![Page 53: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/53.jpg)
2-transformation(1-layer)network
• we’llcallthisa“2-transformation”neuralnetwork,ora“1-layer”neuralnetwork
• inputvectoris• scorevectoris• onehiddenvector(“hiddenlayer”)
53
vectoroflabelscores
![Page 54: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/54.jpg)
1-layerneuralnetworkforsentimentclassification
54
![Page 55: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/55.jpg)
ikr smh heaskedfiryo lastnamesohecan
55
intj pronoun prepadj prep verbotherverbdet noun pronoun
NeuralNetworksforTwitterPart-of-SpeechTagging
vectorforlastvectorforyo
• let’susethecenterword+twowordstotheright:
vectorforname
• ifname istotherightofyo,thenyo isprobablyaformofyour• butourx aboveusesseparatedimensionsforeachposition!
– i.e.,nameistwowordstotheright– whatifnameisonewordtotheright?
![Page 56: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/56.jpg)
Convolution
56
vectorforlastvectorforyo vectorforname
=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence
![Page 57: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/57.jpg)
Pooling
57
vectorforlastvectorforyo vectorforname
=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence
howdoweconvertthisintoafixed-lengthvector?usepooling:
max-pooling:returnsmaximumvalueinaverage pooling:returnsaverageofvaluesin
![Page 58: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/58.jpg)
Pooling
58
vectorforlastvectorforyo vectorforname
=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence
howdoweconvertthisintoafixed-lengthvector?usepooling:
max-pooling:returnsmaximumvalueinaverage pooling:returnsaverageofvaluesin
then,thissinglefilterproducesasinglefeaturevalue(theoutputofsomekindofpooling).inpractice,weusemanyfiltersofmanydifferentlengths(e.g.,n-gramsratherthanwords).
![Page 59: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/59.jpg)
ConvolutionalNeuralNetworks• convolutionalneuralnetworks(convnets orCNNs)usefiltersthatare“convolvedwith”(matchedagainstallpositionsof)theinput
• thinkofconvolutionas“performthesameoperationeverywhereontheinputinsomesystematicorder”
• “convolutionallayer”=setoffiltersthatareconvolvedwiththeinputvector(whetherx orhiddenvector)
• couldbefollowedbymoreconvolutionallayers,orbyatypeofpooling
• oftenusedinNLPtoconvertasentenceintoafeaturevector
59
![Page 60: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/60.jpg)
RecurrentNeuralNetworks
60
“hiddenvector”
![Page 61: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/61.jpg)
LongShort-TermMemory(LSTM)RecurrentNeuralNetworks
61
![Page 62: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/62.jpg)
Backward&BidirectionalLSTMs
62
bidirectional:ifshallow,justuseforwardandbackwardLSTMsinparallel,concatenatefinaltwohiddenvectors,feedtosoftmax
![Page 63: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/63.jpg)
DeepLSTM(2-layer)
63
layer1
layer2
![Page 64: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/64.jpg)
RecursiveNeuralNetworksforNLP• first,runaconstituentparseronthesentence• converttheconstituenttreetoabinarytree(eachrewritehasexactlytwochildren)
• constructvectorforsentencerecursivelyateachrewrite(“splitpoint”):
64
![Page 65: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/65.jpg)
Learning
65
![Page 66: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/66.jpg)
CostFunctions• costfunction:scoresoutputagainstagoldstandard
• shouldreflecttheevaluationmetricforyourtask
• usualconventions:• forclassification,whatcostshouldweuse?• forclassification,whatcostshouldweuse?
66
![Page 67: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/67.jpg)
Empirical RiskMinimization(Vapnik etal.)
67
• replaceexpectationwithsumoverexamples:
![Page 68: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/68.jpg)
Empirical RiskMinimization(Vapnik etal.)
68
• replaceexpectationwithsumoverexamples:
problem:NP-hardevenforbinaryclassificationwithlinearmodels
![Page 69: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/69.jpg)
EmpiricalRiskMinimizationwithSurrogateLossFunctions
69
• giventrainingdata:whereeach isalabel
• wewanttosolvethefollowing:
manypossiblelossfunctionstoconsider
optimizing
![Page 70: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/70.jpg)
LossFunctions
70
name loss whereused
cost(“0-1”)intractable,but
underlies“directerrorminimization”
perceptron perceptronalgorithm(Rosenblatt,1958)
hingesupportvector
machines,other large-marginalgorithms
log
logisticregression,conditional randomfields,maximumentropymodels
![Page 71: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/71.jpg)
(Sub)gradientsofLossesforLinearModels
71
name entryj of(sub)gradientofloss forlinearmodel
cost(“0-1”) notsubdifferentiable ingeneral
perceptron
hinge
log
![Page 72: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/72.jpg)
(Sub)gradientsofLossesforLinearModels
72
name entryj of(sub)gradientofloss forlinearmodel
cost(“0-1”) notsubdifferentiable ingeneral
perceptron
hinge
log
expectationoffeaturevaluewithrespecttodistributionovery (wheredistribution isdefinedbytheta)
alternativenotation:
![Page 73: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/73.jpg)
Visualization
73
score
fivepossibleoutputs
![Page 74: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/74.jpg)
Visualization
74
cost
fivepossibleoutputs
![Page 75: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/75.jpg)
Visualization
75
cost
goldstandard
![Page 76: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/76.jpg)
Visualization
76
cost
goldstandard
![Page 77: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/77.jpg)
Visualization
77
score+cost
goldstandard
![Page 78: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/78.jpg)
78
perceptronloss:
![Page 79: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/79.jpg)
79
score
goldstandard
perceptronloss:
![Page 80: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/80.jpg)
80
score
goldstandard
perceptronloss:
![Page 81: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/81.jpg)
81
score
goldstandard
perceptronloss:
effectoflearning?
![Page 82: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/82.jpg)
82
score
goldstandard
perceptronloss:
effectoflearning:goldstandardwillhavehighestscore
![Page 83: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/83.jpg)
83
hingeloss:
![Page 84: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/84.jpg)
84
score+cost
goldstandard
hingeloss:
![Page 85: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/85.jpg)
85
score+cost
goldstandard
hingeloss:
![Page 86: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/86.jpg)
86
score+cost
goldstandard
hingeloss:
effectoflearning?
![Page 87: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/87.jpg)
87
score+cost
goldstandard
hingeloss:
effectoflearning:scoreofgoldstandardwillbehigherthanscore+costofall
others
![Page 88: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/88.jpg)
Regularized EmpiricalRiskMinimization
88
• giventrainingdata:whereeach isalabel
• wewanttosolvethefollowing:
regularizationterm
regularizationstrength
![Page 89: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/89.jpg)
RegularizationTerms
• mostcommon:penalizelargeparametervalues• intuition:largeparametersmightbeinstancesofoverfitting
• examples:L2 regularization:(alsocalledTikhonov regularizationorridgeregression)
L1 regularization:(alsocalledbasispursuitorLASSO)
89
![Page 90: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/90.jpg)
Dropout• popularregularizationmethodforneuralnetworks
• randomly“dropout”(settozero)someofthevectorentriesinthelayers
90
![Page 91: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/91.jpg)
Inference
91
![Page 92: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/92.jpg)
Exponentially-LargeSearchProblems
92
inference:solve_
• whenoutputisasequenceortree,thisargmax requiresiteratingoveranexponentially-largeset
![Page 93: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/93.jpg)
Learningrequiressolvingexponentially-hardproblemstoo!
93
loss entryj of(sub)gradientofloss forlinearmodel
perceptron
hinge
log
computing eachof thesetermsrequiresiteratingthroughevery
possibleoutput
![Page 94: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/94.jpg)
DynamicProgramming(DP)• whatisdynamicprogramming?– afamilyofalgorithmsthatbreakproblemsintosmallerpiecesandreusesolutionsforthosepieces
– onlyapplicablewhentheproblemhascertainproperties(optimalsubstructureandoverlappingsub-problems)
• inthisclass,weuseDPtoiterateoverexponentially-largeoutputspacesinpolynomialtime
• wefocusonaparticulartypeofDPalgorithm:memoization
94
![Page 95: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/95.jpg)
ImplementingDPalgorithms• evenifyourgoalistocomputeasumoramax,focusfirstoncountingmode (countthenumberofuniqueoutputsforaninput)
• memoization =recursion+saving/reusingsolutions– startbydefiningrecursiveequations– “memoize”bycreatingatabletostoreallintermediateresultsfromrecursiveequations,usethemwhenrequested
95
![Page 96: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/96.jpg)
InferenceinHMMs
96
• sincetheoutputisasequence,thisargmaxrequiresiteratingoveranexponentially-largeset
• lastweekwetalkedaboutusingdynamicprogramming(DP)tosolvetheseproblems
• forHMMs(andothersequencemodels),theforsolvingthisiscalledtheViterbialgorithm
![Page 97: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/97.jpg)
ViterbiAlgorithm• recursiveequations+memoization:
97
basecase:returnsprobabilityofsequencestartingwithlabely forfirstword
recursivecase:computesprobabilityofmax-probabilitylabelsequencethatendswithlabely atpositionm
finalvalueisin:
![Page 98: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/98.jpg)
ViterbiAlgorithm• spaceandtimecomplexity?• canbereadofffromtherecursiveequations:
98
spacecomplexity:sizeofmemoization table,whichis#ofuniqueindicesofrecursiveequations
so,spacecomplexityisO(|x||L|)
lengthofsentence
numberoflabels*
![Page 99: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/99.jpg)
ViterbiAlgorithm• spaceandtimecomplexity?• canbereadofffromtherecursiveequations:
99
timecomplexity:sizeofmemoization table*complexityofcomputingeachentry
so,timecomplexityisO(|x||L||L|)=O(|x||L|2)
lengthofsentence
numberoflabels*
eachentryrequiresiteratingthroughthelabels*
![Page 100: TTIC 31190: Natural Language Processing](https://reader030.fdocuments.in/reader030/viewer/2022012718/61b14493bdbfb420f37d71b4/html5/thumbnails/100.jpg)
FeatureLocality
• featurelocality:how“big”areyourfeatures?• whendesigningefficientinferencealgorithms(whetherw/DPorothermethods),weneedtobemindfulofthis
• featurescanbearbitrarilybigintermsoftheinput,butnotintermsoftheoutput!
• thefeaturesinHMMsaresmallinboththeinputandoutputsequences(onlytwopiecesatatime)
100