CS378: Natural Language Processing Lecture 10: Seq 3...

CS378:NaturalLanguageProcessingLecture10:Seq3/SyntaxI

GregDurrett

Announcements

‣Midterm:listoftopicsnextweek.CoverscontentuptoMarch7

‣ A2duetoday

‣ CRFswillNOTbeonthemidterm,acoupleothertopicstoo

‣ A3outtomorrow

‣ CondiRonalrandomfields

‣ NamedenRtyrecogniRon

‣ SyntaxandconsRtuencyparsing

CRFsandNER

NamedEnRtyRecogniRon

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

‣ FrameasasequenceproblemwithaBIOtagset:begin,inside,outside

‣WhymightanHMMnotdosowellhere?

‣ LotsofO’s,sotagsaren’tasinformaRveaboutcontext

‣ Needsub-wordfeaturesonunknownwords‣ CRFsarediscriminaRvemodelsthatwillsolvetheseproblems

CondiRonalRandomFields

‣ HMMsareexpressibleasBayesnets(factorgraphs)

y1 y2 yn

x1 x2 xn

‣ ThisreflectsthefollowingdecomposiRon:

‣ Locallynormalizedmodel:eachfactorisaprobabilitydistribuRonthatnormalizes

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

CondiRonalRandomFields

anyreal-valuedscoringfuncRonofitsarguments

‣ Howdowemaxovery?RequiresconsideringanexponenRalnumberofsequencesingeneral

‣ CRFs:discriminaRvemodelswiththefollowingglobally-normalizedform:

‣ HMMs:

‣ NaiveBayes:logisRcregression::HMMs:CRFslocalvs.globalnormalizaRon<->generaRvevs.discriminaRve

normalizerZ

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

P (y|x) =Q

k exp(�k(x,y))Py0

Qk exp(�k(x,y0

SequenRalCRFs

y1 y2 yn

x1 x2 xn

P (y|x) /Y

exp(�k(x,y))

y1 y2 yn

x1 x2 xn

…�t

P (y|x) / exp(�

exp(�

i�1, yi))

exp(�

‣ HMMs:

‣ CRFs:

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

SequenRalCRFs

y1 y2 yn

x1 x2 xn

…�t

P (y|x) / exp(�

exp(�

i�1, yi))

exp(�

‣WecondiRononx,soeveryfactorcandependonallofx

exp(�e(yi, i,x))

‣ ycan’tdependarbitrarilyonxinageneraRvemodel tokenindex—letsuslookatcurrentword

y1 y2 yn…�t

SequenRalCRFs

y1 y2 yn…�t

‣ Don’tincludeiniRaldistribuRon,canbakeintootherfactors

P (y|x) = 1

exp(�t(yi�1, yi))nY

exp(�e(yi, i,x))

SequenRalCRFs:

FeatureFuncRons

y1 y2 yn…

‣ Phiscanbealmostanything!HereweuselinearfuncRonsofsparsefeatures

‣ LookslikeoursingleweightvectormulRclasslogisRcregressionmodel

�t(yi�1, yi) = w>ft(yi�1, yi)

P (y|x) / expw>

ft(yi�1, yi) +nX

fe(yi, i,x)

#�e(yi, i,x) = w>fe(yi, i,x)

P (y|x) = 1

exp(�e(yi, i,x))

BasicFeaturesforNER

OB-LOC

TransiRons:

Emissions: Ind[B-LOC&Currentword=Hangzhou]Ind[B-LOC&Prevword=to]

ft(yi�1, yi) = Ind[yi�1 & yi]

fe(y6, 6,x) =

P (y|x) / expw>

ft(yi�1, yi) +nX

fe(yi, i,x)

=Ind[O—B-LOC]

CRFsOutline

‣Model: P (y|x) = 1

exp(�e(yi, i,x))

‣ Inference:argmaxP(y|x)fromViterbi

P (y|x) / expw>

ft(yi�1, yi) +nX

fe(yi, i,x)

‣ Learning:requiresrunningsum-productViterbitocomputeposteriorprobabiliResP(y|x)ateachstepi

FeaturesforNER

‣ Contextfeatures(can’tuseinHMM!)‣Wordsbefore/ajer‣ Tagsbefore/ajer

‣Wordfeatures(canuseinHMM)‣ CapitalizaRon‣Wordshape‣ Prefixes/suffixes‣ Lexicalindicators

‣ Gazeleers‣Wordclusters

Leicestershire

Boston

Applereleasedanewversion…

AccordingtotheNewYorkTimes…

EvaluaRngNER

‣ PredicRonofallOssRllgets66%accuracyonthisexample!

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

‣Whatwereallywanttoknow:howmanynamedenRtychunkpredicRonsdidwegetright?‣ Precision:oftheoneswepredicted,howmanyareright?

‣ Recall:ofthegoldnamedenRRes,howmanydidwefind?

‣ F-measure:harmonicmeanofthesetwo

‣ CRFwithlexicalfeaturescangetaround85F1onthisproblem

‣ OtherpiecesofinformaRonthatmanysystemscapture

‣Worldknowledge:

ThedelegaRonmetthepresidentattheairport,Tanjugsaid.

ORG?PER?

NonlocalFeatures

ThedelegaRonmetthepresidentattheairport,Tanjugsaid.

ThenewsagencyTanjugreportedontheoutcomeofthemeeRng.

‣Morecomplexfactorgraphstructurescanletyoucapturethis,orjustdecodesentencesinorderandusefeaturesonprevioussentences

FinkelandManning(2008),RaRnovandRoth(2009)

HowwelldoNERsystemsdo?

RaRnovandRoth(2009)

Lampleetal.(2016)

BiLSTM-CRF+ELMo Petersetal.(2018)

Takeaways

‣ CRFsarestructuredfeature-basedmodels

‣ Efficienttodoinferenceandlearningusingdynamicprograms

‣ LookslikelogisRcregression,butrequiresmoreefforttoimplement

ConsRtuencyParsing

Syntax‣ Studyofwordorderandhowwordsformsentences

‣Whydowecareaboutsyntax?

‣ Recognizeverb-argumentstructures(whoisdoingwhattowhom?)

‣MulRpleinterpretaRonsofwords(nounorverb?Fedraises…example)

‣ HigherlevelofabstracRonbeyondwords:somelanguagesareSVO,someareVSO,someareSOV,parsingcancanonicalize

ConsRtuencyParsing‣ Tree-structuredsyntacRcanalysesofsentences

‣ Commonthings:nounphrases,verbphrases,preposiRonalphrases

‣ BolomlayerisPOStags

‣ ExampleswillbeinEnglish.ConsRtuencymakessenseforalotoflanguagesbutnotall

sentenRalcomplement

wholeembeddedsentence

adverbialphrase

ConsRtuencyParsing

Examples

Challenges‣ PPalachment

§  Ifwedonoannota+on,thesetreesdifferonlyinonerule:§  VP→VPPP§  NP→NPPP

§  Parsewillgoonewayortheother,regardlessofwords§  Lexicaliza+onallowsustobesensi+vetospecificwords

sameparseas“thecakewithsomeicing”

Challenges‣ NPinternalstructure:tags+depthofanalysis

ConsRtuency‣ HowdoweknowwhattheconsRtuentsare?

‣ ConsRtuencytests:‣ SubsRtuRonbyproform(e.g.,pronoun)

‣ Clejing(Itwaswithaspoonthat…)

‣ Answerellipsis(Whatdidtheyeat?thecake) (How?withaspoon)

‣ SomeRmesconsRtuencyisnotclear,e.g.,coordinaRon:shewenttoandboughtfoodatthestore

Context-FreeGrammars,CKY

Survey

‣ 1.Thepaceofthefirstfewlectures(naiveBayes,logisRcregression,perceptron,etc.)was[toofast/tooslow/justright]

‣ 2.Thepaceofthelastfewlectures(tagging,Viterbi,parsing)was[toofast/tooslow/justright]

‣ 3.Thehomeworksoverallare[toohard/tooeasy/justright]

‣ 4.IwouldpreferA3bedueon[FridayMarch8/MondayMarch11](midtermisonThursday,March14)

‣ 5.Othercomments(likes/dislikes)

CS378: Natural Language Processing Lecture 10: Seq 3...

Documents

Transcript of CS378: Natural Language Processing Lecture 10: Seq 3...

ee457 Final Sp2019 -

Intro to Perception - Princeton Universitypillowlab.princeton.edu/teaching/sp2019/slides/Lec01... · 2019-02-05 · Grading • two mid-term exams (25% each) •cumulative ﬁnal

Early Visual Processing: Receptive Fields & Retinal ...pillowlab.princeton.edu/teaching/sp2019/slides/Lec05_RetinaAndRFs_Chap2B.pdfanalogy: jpeg compression of images this is a major,

Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... · 2017-10-03 · ‣DAGGER algorithm from RL literature MoTvaTon State s, evaluate acTons

Lecture 25: Natural Language Processing with Neural Netscourses.physics.illinois.edu/cs440/sp2019/slides/hockenmaier25.pdfWhat is Natural Language? •Anyhuman language: English, Chinese,

OpenHouse Poster SP2019 flyer-outline...Title: OpenHouse_Poster_SP2019_flyer-outline Created Date: 1/30/2019 7:38:49 PM

A Preliminary Discussion EE442 Analog & Digital Communication Systems Lecture 2web.sonoma.edu/.../sp2019/lectures/lecture02_signals.pdf · 2019. 1. 18. · ES 442 Signal Preliminaries

Particle Systems - College of Engineeringweb.engr.oregonstate.edu/.../particlesystems.1pp.pdf · Particle Systems •Are used to simulate the appearance of particulate, hairy, or

Vulkan Ray Tracing - web.engr.oregonstate.eduweb.engr.oregonstate.edu/.../VulkanRayTracing.1pp.pdf · Ray Generation Shader (rgen) •A Ray Generation Shader runs on a 2D grid of

Separationsteknik - Åbo Akademi | Startsidausers.abo.fi/rzevenho/SEP1617-OH9-1pp.pdf · Separationsteknik / Separation technology 424105 9. Mixning, omrörning och blandning / Mixing,

Vertex Buffers - College of Engineeringweb.engr.oregonstate.edu/.../VertexBuffers.1pp.pdf · Graphics polygon-filling hardware can be highly optimized if you know that, no matter

ACADEMIC SUCCESS GUIDE - University Of Marylandreslife.umd.edu/global/documents/newsletter/academicsuccessguide-finalsguide-sp2019.pdfthis Academic Success Guide. In it you will find

Geometric Modeling for Computer Graphicsweb.engr.oregonstate.edu/~mjb/cs550/Handouts/GeometricModeling.1pp.pdf · Computer Graphics Object Modeling Rules for 3D Printing Overlapped

Pinching Dynamics and Satellite Droplet Formation in ...yakari.polytechnique.fr/Django-pub/documents/huang2019rp-1pp.pdf · showing great agreement with the theoretically derived

Parallel Programming using OpenMP - College of Engineeringweb.engr.oregonstate.edu/~mjb/cs475/Handouts/openmp.1pp.pdf · mjb – February 28, 2017 Oregon State University Computer

CSE217 INTRODUCTION TO DATA SCIENCEm.neumann/sp2019/cse217/slides/02_EDA.pdfcse217 introduction to data science spring 2019 marion neumann lecture 2: exploratory data analysis

CS388: Natural Language Processing Lecture 6: Neural …gdurrett/courses/fa2018/lectures/lec6-1pp.pdf‣ Collobert and Weston 2011: “NLP (almost) from scratch” ‣ Feedforward

Parallel Programming using OpenMPweb.engr.oregonstate.edu/~mjb/cs575/Handouts/openmp.1pp.pdf · OpenMP Multithreaded Programming • OpenMP stands for “Open Multi-Processing”

Getting Started with OpenGL Graphics Programmingweb.engr.oregonstate.edu/~mjb/cs550/Handouts/GettingStarted.1pp.pdf · mjb–September 6, 2017 1 Computer Graphics Getting Started

CS388: Natural Language Processing Lecture 19: Pretrained ...gdurrett/courses/fa2019/... · Transformer Transformer … NotNext enjoyed like ‣ BERT objecEve: masked LM + next sentence