Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of...
Transcript of Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of...
![Page 1: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/1.jpg)
Natural Language Processingwith Deep Learning
CS224N/Ling284
Christopher Manning and Richard Socher
Lecture 6: Dependency Parsing
![Page 2: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/2.jpg)
LecturePlan
1. SyntacticStructure:ConsistencyandDependency2. DependencyGrammar3. Research highlight4. Transition-baseddependencyparsing5. Neuraldependencyparsing
Reminders/comments:Assignment1duetonightJAssignment2outtodayLFinalprojectdiscussion– comemeetwithusSorrythatwe’vebeenkindofoverloadedforofficehours
![Page 3: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/3.jpg)
1.Twoviewsoflinguisticstructure:Constituency=phrasestructuregrammar=context-freegrammars(CFGs)
Phrasestructureorganizeswordsintonestedconstituents.
thecatadog
large inacratebarkingonthetablecuddlybythedoor
![Page 4: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/4.jpg)
1.Twoviewsoflinguisticstructure:Constituency=phrasestructuregrammar=context-freegrammars(CFGs)
Phrasestructureorganizeswordsintonestedconstituents.
thecatadog
largeinacratebarkingonthetablecuddlybythedoor
largebarkingtalktolookfor
![Page 5: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/5.jpg)
Twoviewsoflinguisticstructure:Dependencystructure
• Dependencystructureshowswhichwordsdependon(modifyorareargumentsof)whichotherwords.
Lookforthelargebarkingdogbythedoorinacrate
![Page 6: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/6.jpg)
Twoviewsoflinguisticstructure:Dependencystructure
• Dependencystructureshowswhichwordsdependon(modifyorareargumentsof)whichotherwords.
Lookforthelargebarkingdogbythedoorinacrate
![Page 7: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/7.jpg)
Ambiguity:PPattachments
Scientistsstudywhalesfrom space
![Page 8: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/8.jpg)
Ambiguity:PPattachments
Scientistsstudywhalesfromspace
Scientists studywhalesfrom space
![Page 9: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/9.jpg)
Attachmentambiguities
• Akeyparsingdecisionishowwe‘attach’variousconstituents• PPs,adverbialorparticipialphrases,infinitives,coordinations,etc.
![Page 10: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/10.jpg)
Attachmentambiguities
• Akeyparsingdecisionishowwe‘attach’variousconstituents• PPs,adverbialorparticipialphrases,infinitives,coordinations,etc.
• Catalan numbers: Cn = (2n)!/[(n+1)!n!]
• An exponentially growing series, which arises in many tree-like contexts:• E.g., the number of possible triangulations of a polygon with n+2 sides
• Turns up in triangulation of probabilistic graphical models….
![Page 11: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/11.jpg)
The rise of annotated data:Universal Dependencies treebanks
[Universal Dependencies: http://universaldependencies.org/ ;cf. Marcus et al. 1993, The Penn Treebank, Computational Linguistics]
![Page 12: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/12.jpg)
Theriseofannotateddata
Startingoff,buildingatreebankseemsalotslowerandlessusefulthanbuildingagrammar
Butatreebankgivesusmanythings• Reusabilityofthelabor• Manyparsers,part-of-speechtaggers,etc.canbebuiltonit• Valuableresourceforlinguistics
• Broadcoverage,notjustafewintuitions• Frequenciesanddistributionalinformation• Awaytoevaluatesystems
![Page 13: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/13.jpg)
ChristopherManning
Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies
2.DependencyGrammarandDependencyStructure
submitted
Bills were
Senatorby
immigration
Brownback
andon
ports
Republican
of
Kansas
![Page 14: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/14.jpg)
ChristopherManning
Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies
Thearrowsarecommonlytypedwiththenameofgrammaticalrelations(subject,prepositionalobject,apposition,etc.)
DependencyGrammarandDependencyStructure
submitted
Bills were
Senatorby
nsubj:pass aux obl
case
immigration
conj
Brownback
cc
andon
case
nmod
ports flat
Republican
ofcase
nmod
Kansas
appos
![Page 15: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/15.jpg)
ChristopherManning
Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies
Thearrowconnectsahead (governor,superior,regent)withadependent (modifier,inferior,subordinate)
Usually,dependenciesformatree(connected,acyclic,single-head)
DependencyGrammarandDependencyStructure
submitted
Bills were
Senatorby
nsubj:pass aux obl
case
immigration
conj
Brownback
cc
andon
case
nmod
ports flat
Republican
ofcase
nmod
Kansas
appos
![Page 16: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/16.jpg)
ChristopherManning
Pāṇini’s grammar (c. 5th century BCE)
16
Gallery: http://wellcomeimages.org/indexplus/image/L0032691.htmlCC BY 4.0 File:Birch bark MS from Kashmir of the Rupavatra Wellcome L0032691.jpg
![Page 17: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/17.jpg)
ChristopherManning
DependencyGrammar/ParsingHistory
• Theideaofdependencystructuregoesbackalongway• ToPāṇini’s grammar(c.5thcenturyBCE)• Basicapproachof1stmillenniumArabicgrammarians
• Constituency/context-freegrammarsisanew-fangledinvention• 20thcenturyinvention(R.S.Wells,1947)
• ModerndependencyworkoftenlinkedtoworkofL.Tesnière(1959)• Wasdominantapproachin“East”(Russia,China,…)• Goodforfree-er wordorderlanguages
• AmongtheearliestkindsofparsersinNLP,evenintheUS:• DavidHays,oneofthefoundersofU.S.computationallinguistics,builtearly(first?)dependencyparser(Hays1962)
![Page 18: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/18.jpg)
ChristopherManning
ROOTDiscussionoftheoutstandingissueswascompleted.
DependencyGrammarandDependencyStructure
• Somepeopledrawthearrowsoneway;sometheotherway!• Tesnière hadthempointfromheadtodependent…
• UsuallyaddafakeROOTsoeverywordisadependentofprecisely1othernode
![Page 19: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/19.jpg)
ChristopherManning
Whatarethesourcesofinformationfordependencyparsing?1. Bilexical affinities[discussionà issues]isplausible
2. Dependencydistancemostlywithnearbywords
3. InterveningmaterialDependenciesrarelyspaninterveningverbsorpunctuation
4. Valency ofheadsHowmanydependentsonwhichsideareusualforahead?
DependencyConditioningPreferences
ROOTDiscussionoftheoutstandingissueswascompleted.
![Page 20: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/20.jpg)
ChristopherManning
DependencyParsing
• Asentenceisparsedbychoosingforeachwordwhatotherword(includingROOT)isitadependentof
• Usuallysomeconstraints:• OnlyonewordisadependentofROOT• Don’twantcyclesA→B,B→A
• Thismakesthedependenciesatree• Finalissueiswhetherarrowscancross(non-projective)ornot
20
I give a on bootstrappingtalk tomorrowROOT ’ll
![Page 21: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/21.jpg)
ChristopherManning
MethodsofDependencyParsing
1. DynamicprogrammingEisner(1996)givesacleveralgorithmwithcomplexityO(n3),byproducingparseitemswithheadsattheendsratherthaninthemiddle
2. GraphalgorithmsYoucreateaMinimumSpanningTreeforasentenceMcDonaldetal.’s(2005)MSTParser scoresdependenciesindependentlyusinganMLclassifier(heusesMIRA,foronlinelearning,butitcanbesomethingelse)
3. ConstraintSatisfactionEdgesareeliminatedthatdon’tsatisfyhardconstraints.Karlsson (1990),etc.
4. “Transition-basedparsing”or“deterministicdependencyparsing”GreedychoiceofattachmentsguidedbygoodmachinelearningclassifiersMaltParser (Nivre etal.2008).Hasprovenhighlyeffective.
![Page 22: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/22.jpg)
Improving Distributional Similarity with Lessons Learned from Word Embeddings
Omer Levy, Yoav Goldberg, Ido Dagan
Presented by: Ajay Sohmshetty
![Page 23: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/23.jpg)
Count-based distributional models
● SVD (Singular Value Decomposition)
● PPMI (Positive Pointwise Mutual Information)
Neural network-based models
● SGNS (Skip-Gram Negative Sampling)
● GloVe
Word Representation Methods
![Page 24: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/24.jpg)
Word Representation Methods
Count-based distributional models
● SVD (Singular Value Decomposition)
● PPMI (Positive Pointwise Mutual Information)
Neural network-based models
● SGNS (Skip-Gram Negative Sampling) / CBOW
● GloVe
![Page 25: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/25.jpg)
Word Representation Methods
Conventional wisdom: Neural-network based models > Count-based models
Neural network-based models
● SGNS (Skip-Gram Negative Sampling) / CBOW
● GloVe
Count-based distributional models
● SVD (Singular Value Decomposition)
● PPMI (Positive Pointwise Mutual Information)
![Page 26: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/26.jpg)
Word Representation Methods
Levy et. al.:Hyperparameters and system design choices more important,
not the embedding algorithms themselves.
Neural network-based models
● SGNS (Skip-Gram Negative Sampling) / CBOW
● GloVe
Count-based distributional models
● SVD (Singular Value Decomposition)
● PPMI (Positive Pointwise Mutual Information)
![Page 27: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/27.jpg)
Hyperparameters in Skip-Gram
→ These can be transferred over to the count-based variants.
Unigram distribution smoothing exponent
# of negative samples
![Page 28: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/28.jpg)
Context Distribution Smoothing
![Page 29: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/29.jpg)
Shifted PMI
![Page 30: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/30.jpg)
All Transferable HyperparametersHyperparameter Explored Values Applicable Methods
Window 2, 5, 10 All
Dynamic Context Window None, with All
Subsampling None, dirty, clean All
Deleting Rare Words None, with All
Shifted PMI 1, 5, 15 PPMI, SVD, SGNS
Context Distribution Smoothing 1, 0.75 PPMI, SVD, SGNS
Adding Context Vectors Only w, w+c SVD, SGNS, GloVe
Eigenvalue Weighting 0, 0.5, 1 SVD
Vector Normalization None, row, col, both All
Preprocessing
Association Metric
Postprocessing
![Page 31: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/31.jpg)
ResultsWord Similarity Tasks Analogy Tasks
![Page 32: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/32.jpg)
Key Takeaways- This paper challenges the conventional wisdom that neural
network-based models are superior to count-based models.
- While model design is important, hyperparameters are also KEY for achieving reasonable results. Don’t discount their importance!
- Challenge the status quo!
![Page 33: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/33.jpg)
ChristopherManning
4.Greedytransition-basedparsing[Nivre 2003]
• Asimpleformofgreedydiscriminativedependencyparser• Theparserdoesasequenceofbottomupactions
• Roughlylike“shift”or“reduce”inashift-reduceparser,butthe“reduce”actionsarespecializedtocreatedependencieswithheadonleftorright
• Theparserhas:• astackσ,writtenwithtoptotheright• whichstartswiththeROOTsymbol
• abufferβ,writtenwithtoptotheleft• whichstartswiththeinputsentence
• asetofdependencyarcsA• whichstartsoffempty
• asetofactions
![Page 34: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/34.jpg)
ChristopherManning
Basictransition-baseddependencyparser
Start: σ =[ROOT],β=w1,…,wn ,A=∅1. Shiftσ,wi|β,Aè σ|wi,β,A2. Left-Arcr σ|wi|wj,β,Aè σ|wj,β,A∪{r(wj,wi)}3. Right-Arcr σ|wi|wj,β,Aè σ|wi,β,A∪{r(wi,wj)}Finish:σ =[w], β=∅
![Page 35: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/35.jpg)
ChristopherManning
Arc-standard transition-basedparser(thereareothertransitionschemes…)Analysisof“Iatefish”
ate fish[root]
Start
I
[root]
Shift
I ate fish
ate[root] fish
Shift
I
Start: σ = [ROOT], β = w1, …, wn , A = ∅1. Shift σ, wi|β, A è σ|wi, β, A2. Left-Arcr σ|wi|wj, β, A è
σ|wj, β, A∪{r(wj,wi)} 3. Right-Arcr σ|wi|wj, β, A è
σ|wi, β, A∪{r(wi,wj)}Finish: β = ∅
![Page 36: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/36.jpg)
ChristopherManning
Arc-standard transition-basedparserAnalysisof“Iatefish”
ate[root] ate[root]
Left Arc
IA +=nsubj(ate → I)
ate fish[root] ate fish[root]
Shift
ate[root] [root]
Right ArcA +=obj(ate → fish)fish ate
ate[root] [root]
Right ArcA +=root([root] → ate)Finish
![Page 37: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/37.jpg)
ChristopherManning
MaltParser[Nivre andHall2005]
• Wehavelefttoexplainhowwechoosethenextaction• Eachactionispredictedbyadiscriminativeclassifier(often
SVM,canbeperceptron,maxent classifier)overeachlegalmove• Maxof3untyped choices;maxof|R|× 2+1whentyped• Features:topofstackword,POS;firstinbufferword,POS;etc.
• ThereisNOsearch(inthesimplestform)• Butyoucanprofitablydoa beamsearchifyouwish(slowerbutbetter)
• ItprovidesVERY fastlineartimeparsing• Themodel’saccuracyisslightly belowthebestdependency
parsers,but• Itprovidesfast,closetostateoftheartparsingperformance
![Page 38: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/38.jpg)
ChristopherManning
FeatureRepresentation
Feature templates: usually a combination of 1 ~ 3 elements from the configuration.
Indicator features
0 0 0 1 0 0 1 0 0 0 1 0binary, sparsedim =106 ~ 107
…
![Page 39: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/39.jpg)
ChristopherManning
EvaluationofDependencyParsing:(labeled)dependencyaccuracy
ROOT She saw the video lecture 0 1 2 3 4 5
Gold1 2 She nsubj2 0 saw root 3 5 the det4 5 video nn5 2 lecture obj
Parsed1 2 She nsubj2 0 saw root 3 4 the det4 5 video nsubj5 2 lecture ccomp
Acc = #correctdeps#ofdeps
UAS=4/5=80%LAS=2/5=40%
![Page 40: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/40.jpg)
ChristopherManning
Dependencypathsidentifysemanticrelations– e.g,forproteininteraction
[Erkan et al. EMNLP 07, Fundel et al. 2007, etc.]
KaiCçnsubj interactsnmod:withè SasAKaiCçnsubj interactsnmod:withè SasA conj:andè KaiAKaiCçnsubj interactsprep_withè SasA conj:andè KaiB
demonstrated
results
KaiC
interacts
rythmically
nsubj
The
markdet
ccomp
thatnsubj
KaiBKaiA
SasA
conj:and
conj:andadvmod
nmod:with
with andcc
case
![Page 41: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/41.jpg)
ChristopherManning
• DependenciesparalleltoaCFGtreemustbeprojective• Theremustnotbeanycrossingdependencyarcswhenthewordsarelaidoutintheirlinearorder,withallarcsabovethewords.
• Butdependencytheorynormallydoesallownon-projectivestructurestoaccountfordisplacedconstituents• Youcan’teasilygetthesemanticsofcertainconstructionsrightwithoutthesenonprojective dependencies
WhodidBillbuythecoffeefromyesterday?
Projectivity
![Page 42: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/42.jpg)
ChristopherManning
Handlingnon-projectivity
• Thearc-standardalgorithmwepresentedonlybuildsprojectivedependencytrees
• Possibledirectionstohead:1. Justdeclaredefeatonnonprojective arcs2. Useadependencyformalismwhichonlyadmitsprojective
representations(aCFGdoesn’trepresentsuchstructures…)3. Useapostprocessortoaprojectivedependencyparsingalgorithmto
identifyandresolvenonprojective links4. Addextratransitionsthatcanmodelatleastmostnon-projective
structures(e.g.,addanextraSWAPtransition,cf.bubblesort)5. Movetoaparsingmechanismthatdoesnotuseorrequireany
constraintsonprojectivity(e.g.,thegraph-basedMSTParser)
![Page 43: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/43.jpg)
ChristopherManning
5.Whytrainaneuraldependencyparser?IndicatorFeaturesRevisited
• Problem#1:sparse• Problem#2:incomplete• Problem#3:expensivecomputation
Morethan95%ofparsingtimeisconsumedbyfeaturecomputation.
OurApproach:learnadenseandcompactfeaturerepresentation
0.1densedim = ~1000
0.9-0.2 0.3 -0.1 -0.5…
![Page 44: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/44.jpg)
ChristopherManning
Aneuraldependencyparser[ChenandManning2014]
• EnglishparsingtoStanfordDependencies:• Unlabeledattachmentscore(UAS)=head• Labeledattachmentscore(LAS)=headandlabel
Parser UAS LAS sent./s
MaltParser 89.8 87.2 469
MSTParser 91.4 88.1 10
TurboParser 92.3* 89.6* 8
C &M2014 92.0 89.7 654
![Page 45: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/45.jpg)
ChristopherManning
• Werepresenteachwordasad-dimensionaldensevector(i.e.,wordembedding)• Similarwordsareexpected tohaveclosevectors.
• Meanwhile,part-of-speechtags (POS)anddependencylabelsarealsorepresentedasd-dimensionalvectors.• Thesmallerdiscretesetsalsoexhibitmanysemanticalsimilarities.
DistributedRepresentations
come
go
werewas
isgood
NNS(pluralnoun)shouldbecloseto NN(singularnoun).num(numericalmodifier)shouldbecloseto amod(adjectivemodifier).
![Page 46: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/46.jpg)
ChristopherManning
ExtractingTokensandthenvectorrepresentationsfromconfiguration
s1
s2
b1
lc(s1)rc(s1)lc(s2)rc(s2)
goodhascontrol∅∅He∅
JJVBZNN∅∅PRP∅
∅∅∅∅∅nsubj∅
+ +
word POS dep.
• Weextractasetoftokensbasedonthestack/bufferpositions:
• Weconvertthemtovectorembeddingsandconcatenatethem
![Page 47: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/47.jpg)
ChristopherManning
ModelArchitecture
Input layer xlookup+concat
Hidden layer hh = ReLU(Wx + b1)
Output layer yy = softmax(Uh + b2)
Softmax probabilities
cross-entropy error will beback-propagated to the embeddings.
![Page 48: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/48.jpg)
ChristopherManning
Non-linearities betweenlayers:Whythey’reneeded
• Forlogisticregression:maptoprobabilities• Here:functionapproximation,
e.g.,forregressionorclassification• Withoutnon-linearities,deepneuralnetworkscan’tdoanythingmorethanalineartransform• Extralayerscouldjustbecompileddownintoasinglelineartransform
• Peopleusevariousnon-linearities
48
![Page 49: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/49.jpg)
ChristopherManning
Non-linearities:What’sused
logistic(“sigmoid”)tanh
tanh isjustarescaledandshiftedsigmoidtanh isoftenusedandoftenperformsbetterfordeepnets
• It’soutputissymmetricaround0
tanh(z) = 2logistic(2z)−1
49
![Page 50: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/50.jpg)
ChristopherManning
Non-linearities:What’sused
hardtanh softsign linearrectifier(ReLU)
• hardtanh similarbutcomputationallycheaperthantanh andsaturateshard.• [Glorot andBengioAISTATS 2010,2011]discusssoftsign andrectifier• Rectifiedlinearunitisnowmegacommon– transferslinearsignalifactive
rect(z) =max(z, 0)softsign(z) = a1+ a
50
![Page 51: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/51.jpg)
Dependencyparsingforsentencestructure
Neuralnetworkscanaccuratelydeterminethestructureofsentences,supportinginterpretation
ChenandManning(2014)wasthefirstsimple,successfulneuraldependencyparser
Thedenserepresentationsletitoutperformothergreedyparsersinbothaccuracyandspeed
![Page 52: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with](https://reader034.fdocuments.in/reader034/viewer/2022042017/5e75592cbc4a596fd479d580/html5/thumbnails/52.jpg)
Furtherdevelopmentsintransition-basedneuraldependencyparsing
Thisworkwasfurtherdevelopedandimprovedbyothers,includinginparticularatGoogle
• Bigger,deepernetworkswithbettertunedhyperparameters• Beamsearch• Global,conditionalrandomfield(CRF)-styleinferenceoverthedecisionsequence
LeadingtoSyntaxNet andtheParsey McParseFace modelhttps://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html
Method UAS LAS(PTB WSJSD3.3Chen&Manning2014 92.0 89.7Weiss etal.2015 93.99 92.05Andor etal.2016 94.61 92.79