CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language...
Transcript of CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language...
![Page 1: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/1.jpg)
CS6120/CS4120:NaturalLanguageProcessing
Instructor:Prof.LuWangCollegeofComputerandInformationScience
NortheasternUniversityWebpage:www.ccs.neu.edu/home/luwang
![Page 2: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/2.jpg)
QuestionAnswering
![Page 3: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/3.jpg)
QuestionAnswering
What do worms eat?
worms
eat
what
worms
eat
grass
Worms eat grass
worms
eat
grass
Grass is eaten by wormsbirds
eat
worms
Birds eat worms
horses
eat
grass
Horses with worms eat grass
with
worms
Ques%on: Poten%al-Answers:
OneoftheoldestNLPtasks(punchedcardsystemsin1961)Simmons,Klein,McConlogue.1964.IndexingandDependencyLogicforAnsweringEnglishQuestions.AmericanDocumentation15:30,196-204
![Page 4: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/4.jpg)
QuestionAnswering:IBM’sWatson
• WonJeopardy onFebruary16,2011!
WILLIAMWILKINSON’S“ANACCOUNTOFTHEPRINCIPALITIESOF
WALLACHIAANDMOLDOVIA”INSPIREDTHISAUTHOR’SMOSTFAMOUSNOVEL
BramStoker
![Page 5: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/5.jpg)
Apple’sSiri
![Page 6: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/6.jpg)
WolframAlpha
![Page 7: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/7.jpg)
TypesofQuestionsinModernSystems
• Factoidquestions• Whowrote“TheUniversalDeclarationofHumanRights”?• Howmanycaloriesarethereintwoslicesofapplepie?• Whatistheaverageageoftheonsetofautism?• WhereisAppleComputerbased?
• Complex(narrative)questions:• Inchildrenwithanacutefebrileillness,whatistheefficacyofacetaminopheninreducingfever?
• WhatdoscholarsthinkaboutJefferson’spositionondealingwithpirates?
![Page 8: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/8.jpg)
Commercialsystems:mainlyfactoidquestions
WhereistheLouvreMuseumlocated? InParis,France
What’stheabbreviation forlimitedpartnership?
L.P.
What arethenamesofOdin’sravens? Huginn andMuninn
What currencyisusedinChina? Theyuan
Whatkindofnutsareusedinmarzipan? almonds
WhatinstrumentdoesMaxRoachplay? drums
WhatisthetelephonenumberforStanfordUniversity?
650-723-2300
![Page 9: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/9.jpg)
ParadigmsforQA
•InformationRetrieval(IR)-basedapproaches•TREC;IBMWatson;Google
•Knowledge-basedandHybridapproaches• IBMWatson;AppleSiri;WolframAlpha;TrueKnowledgeEvi
![Page 10: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/10.jpg)
Manyquestionscanalreadybeansweredbywebsearch
![Page 11: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/11.jpg)
IR-basedQuestionAnswering
• a
![Page 12: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/12.jpg)
IR-basedFactoidQA
DocumentDocumentDocument
DocumentDocume
ntDocumentDocume
ntDocument
Question Processing
PassageRetrieval
Query Formulation
Answer Type Detection
Question
Passage Retrieval
Document Retrieval
Answer Processing
Answer
passages
Indexing
RelevantDocs
DocumentDocumentDocument
![Page 13: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/13.jpg)
IR-basedFactoidQA
• QUESTIONPROCESSING• Detectquestiontype,answertype,focus,relations• Formulatequeriestosendtoasearchengine
• PASSAGERETRIEVAL• Retrieverankeddocuments• Breakintosuitablepassagesandrerank
• ANSWERPROCESSING• Extractcandidateanswers• Rankcandidates
• usingevidencefromthetextandexternalsources
![Page 14: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/14.jpg)
Knowledge-basedapproaches(Siri)
• Buildasemanticrepresentationofthequery• Times,dates,locations,entities,numericquantities
• Mapfromthissemanticstoquerystructureddataorresources• Geospatialdatabases• Ontologies(Wikipediainfoboxes,dbPedia,WordNet,Yago)• Restaurantreviewsourcesandreservationservices• Scientificdatabases
![Page 15: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/15.jpg)
Hybridapproaches(IBMWatson)
• Buildashallowsemanticrepresentationofthequery• GenerateanswercandidatesusingIRmethods
• Augmentedwithontologiesandsemi-structureddata
• Scoreeachcandidateusingricherknowledgesources• Geospatialdatabases• Temporalreasoning• Taxonomicalclassification
![Page 16: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/16.jpg)
AnswerTypesandQueryFormulation
![Page 17: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/17.jpg)
FactoidQ/A
DocumentDocumentDocument
DocumentDocume
ntDocumentDocume
ntDocument
Question Processing
PassageRetrieval
Query Formulation
Answer Type Detection
Question
Passage Retrieval
Document Retrieval
Answer Processing
Answer
passages
Indexing
RelevantDocs
DocumentDocumentDocument
![Page 18: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/18.jpg)
QuestionProcessingThingstoextractfromthequestion
• AnswerTypeDetection• Decidethenamedentitytype(person,place)oftheanswer
• QueryFormulation• ChoosequerykeywordsfortheIRsystem
• QuestionTypeclassification• Isthisadefinitionquestion,amathquestion,alistquestion?
• FocusDetection• Findthequestionwordsthatarereplacedbytheanswer
• RelationExtraction• Findrelationsbetweenentitiesinthequestion
![Page 19: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/19.jpg)
Question ProcessingJeopardy!: They’re the two states you could be reentering if you’re crossing Florida’s northern border
•AnswerType:USstate•Query:twostates,border,Florida,north•Focus:thetwostates•Relations:borders(Florida,?x,north)
![Page 20: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/20.jpg)
AnswerTypeDetection:NamedEntities
•WhofoundedVirginAirlines?• PERSON
•WhatCanadiancityhasthelargestpopulation?• CITY.
![Page 21: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/21.jpg)
AnswerTypeTaxonomy
•6coarseclasses• ABBREVIATION,ENTITY,DESCRIPTION,HUMAN,LOCATION,NUMERIC
•50finerclasses• LOCATION:city,country,mountain…• HUMAN:group,individual,title,description• ENTITY:animal,body,color,currency…
Xin Li,DanRoth.2002.LearningQuestion Classifiers.COLING'02
![Page 22: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/22.jpg)
PartofLi&Roth’sAnswerTypeTaxonomy
LOCATION
NUMERIC
ENTITY HUMAN
ABBREVIATIONDESCRIPTION
country city state
datepercent
money
sizedistance
individual
title
group
food
currency
animal
definition
reason expression
abbreviation
![Page 23: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/23.jpg)
AnswerTypes
![Page 24: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/24.jpg)
MoreAnswerTypes
![Page 25: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/25.jpg)
AnswertypesinJeopardy
• 2500answertypesin20,000Jeopardyquestionsample• Themostfrequent200answertypescover<50%ofdata• The40mostfrequentJeopardyanswertypeshe,country,city,man,film,state,she,author,group,here,company,president,capital,star,novel,character,woman,river,island,king,song,part,series,sport,singer,actor,play,team,show,actress,animal,presidential,composer,musical,nation,book,title,leader,game
Ferrucci etal.2010.BuildingWatson:AnOverviewoftheDeepQA Project.AIMagazine.Fall2010.59-79.
![Page 26: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/26.jpg)
AnswerTypeDetection
•Hand-writtenrules•MachineLearning•Hybrids
![Page 27: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/27.jpg)
AnswerTypeDetection
• Regularexpression-basedrulescangetsomecases:• Who{is|was|are|were}PERSON• PERSON(YEAR– YEAR)
• Otherrulesusethequestionheadword:(theheadwordofthefirstnounphraseafterthewh-word)
• Whichcity inChinahasthelargestnumberofforeignfinancialcompanies?
• Whatisthestateflower ofCalifornia?
![Page 28: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/28.jpg)
AnswerTypeDetection
•Mostoften,wetreattheproblemasmachinelearningclassification•Defineataxonomyofquestiontypes•Annotatetrainingdataforeachquestiontype•Trainclassifiersforeachquestionclassusingarichsetoffeatures.
• featuresincludethosehand-writtenrules!
![Page 29: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/29.jpg)
FeaturesforAnswerTypeDetection
•Questionwordsandphrases•Part-of-speechtags•Parsefeatures(headwords)•NamedEntities•Semanticallyrelatedwords
![Page 30: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/30.jpg)
FactoidQ/A
DocumentDocumentDocument
DocumentDocume
ntDocumentDocume
ntDocument
Question Processing
PassageRetrieval
Query Formulation
Answer Type Detection
Question
Passage Retrieval
Document Retrieval
Answer Processing
Answer
passages
Indexing
RelevantDocs
DocumentDocumentDocument
![Page 31: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/31.jpg)
KeywordSelectionAlgorithm
1.Selectallnon-stopwordsinquotations2.SelectallNNPwordsinrecognizednamedentities3.Selectallcomplexnominals withtheiradjectivalmodifiers4.Selectallothercomplexnominals5.Selectallnounswiththeiradjectivalmodifiers6.Selectallothernouns7.Selectallverbs8.Selectalladverbs9.Selectthequestionfocusword(skippedinallprevioussteps)10.Selectallotherwords
DanMoldovan,Sanda Harabagiu,MariusPaca,Rada Mihalcea,RichardGoodrum,RoxanaGirju andVasile Rus.1999.ProceedingsofTREC-8.
![Page 32: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/32.jpg)
Choosingkeywordsfromthequery
Whocoinedtheterm“cyberspace”inhisnovel“Neuromancer”?
1 1
4 4
7
cyberspace/1Neuromancer/1term/4novel/4coined/7
SlidefromMihaiSurdeanu
![Page 33: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/33.jpg)
PassageRetrievalandAnswerExtraction
![Page 34: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/34.jpg)
FactoidQ/A
DocumentDocumentDocument
DocumentDocume
ntDocumentDocume
ntDocument
Question Processing
PassageRetrieval
Query Formulation
Answer Type Detection
Question
Passage Retrieval
Document Retrieval
Answer Processing
Answer
passages
Indexing
RelevantDocs
DocumentDocumentDocument
![Page 35: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/35.jpg)
PassageRetrieval
•Step1:IRengineretrievesdocumentsusingqueryterms•Step2:Segmentthedocumentsintoshorterunits
• somethinglikeparagraphs•Step3:Passageranking
• Useanswertypetohelprerank passages
![Page 36: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/36.jpg)
FeaturesforPassageRanking
• NumberofNamedEntitiesoftherighttypeinpassage• Numberofquerywordsinpassage• NumberofquestionN-gramsalsoinpassage• Proximityofquerykeywordstoeachotherinpassage• Longestsequenceofquestionwords• Rankofthedocumentcontainingpassage
Eitherinrule-basedclassifiersorwithsupervisedmachinelearning
![Page 37: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/37.jpg)
FactoidQ/A
DocumentDocumentDocument
DocumentDocume
ntDocumentDocume
ntDocument
Question Processing
PassageRetrieval
Query Formulation
Answer Type Detection
Question
Passage Retrieval
Document Retrieval
Answer Processing
Answer
passages
Indexing
RelevantDocs
DocumentDocumentDocument
![Page 38: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/38.jpg)
AnswerExtraction
• Runananswer-typenamed-entitytaggeronthepassages• Eachanswertyperequiresanamed-entitytaggerthatdetectsit• IfanswertypeisCITY,taggerhastotagCITY
• CanbefullNER,simpleregularexpressions,orhybrid
• Returnthestringwiththerighttype:• Who is the prime minister of India (PERSON)Manmohan Singh, Prime Minister of India, had told left leaders that the deal would not be renegotiated.
• How tall is Mt. Everest? (LENGTH)The official height of Mount Everest is 29035 feet
![Page 39: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/39.jpg)
RankingCandidateAnswers
•Butwhatiftherearemultiplecandidateanswers!
Q: Who was Queen Victoria’s second son?•AnswerType:Person
• Passage:TheMariebiscuitisnamedafterMarieAlexandrovna,thedaughterofCzarAlexanderIIofRussiaandwifeofAlfred,thesecondsonofQueenVictoriaandPrinceAlbert
![Page 40: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/40.jpg)
RankingCandidateAnswers
•Butwhatiftherearemultiplecandidateanswers!
Q: Who was Queen Victoria’s second son?•AnswerType:Person
• Passage:TheMariebiscuitisnamedafterMarieAlexandrovna,thedaughterofCzarAlexanderIIofRussiaandwifeofAlfred,thesecondsonofQueenVictoriaandPrinceAlbert
![Page 41: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/41.jpg)
Usemachinelearning:Featuresforrankingcandidateanswers
Answertypematch:Candidatecontainsaphrasewiththecorrectanswertype.Patternmatch:Regularexpressionpatternmatchesthecandidate.Questionkeywords:#ofquestionkeywordsinthecandidate.Keyworddistance:DistanceinwordsbetweenthecandidateandquerykeywordsNoveltyfactor:Awordinthecandidateisnotinthequery.Appositionfeatures:ThecandidateisanappositivetoquestiontermsPunctuationlocation:Thecandidateisimmediatelyfollowedbyacomma,period,quotationmarks,semicolon,orexclamationmark.Sequencesofquestionterms:Thelengthofthelongestsequenceofquestiontermsthatoccursinthecandidateanswer.
![Page 42: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/42.jpg)
CandidateAnswerscoringinIBMWatson
• Eachcandidateanswergetsscoresfrom>50components• (fromunstructuredtext,semi-structuredtext,triplestores)
• logicalform(parse)matchbetweenquestionandcandidate
• passagesourcereliability• geospatiallocation
• California is”southwest ofMontana”• temporalrelationships• taxonomicclassification
![Page 43: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/43.jpg)
CommonEvaluationMetrics
1. Accuracy (doesanswermatchgold-labeledanswer?)2. MeanReciprocalRank
• ForeachqueryreturnarankedlistofMcandidateanswers.• Queryscoreis1/Rankofthefirstcorrectanswer
• Iffirstansweriscorrect:1• elseifsecondansweriscorrect:½• elseifthirdansweriscorrect:⅓,etc.• Scoreis0ifnoneoftheManswersarecorrect
• TakethemeanoverallNqueriesMRR =
1rankii=1
N
∑
N
![Page 44: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/44.jpg)
KnowledgeinQA
![Page 45: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/45.jpg)
RelationExtraction
•Answers:DatabasesofRelations• born-in(“EmmaGoldman”,“June271869”)• author-of(“CaoXue Qin”,“DreamoftheRedChamber”)• DrawfromWikipediainfoboxes,DBpedia,FreeBase,etc.
•Questions:ExtractingRelationsinQuestionsWhosegranddaughterstarredinE.T.?(acted-in ?x “E.T.”)
(granddaughter-of ?x ?y)
![Page 46: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/46.jpg)
Temporal Reasoning
•Relation databases• (andobituaries,biographical dictionaries,etc.)
• IBMWatson”In1594hetook ajob asatax collector inAndalusia”Candidates:
• Thoreau isabad answer (born in1817)• Cervantes ispossible (was alive in1594)
![Page 47: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/47.jpg)
Geospatial knowledge(containment,directionality,borders)
• Beijing isagood answer for”Asiancity”• California is”southwest ofMontana”• geonames.org:
![Page 48: CS 6120/CS4120: Natural Language Processing · 2017-11-20 · CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang ... •How many calories are there in two slices](https://reader033.fdocuments.in/reader033/viewer/2022042212/5eb59d8ae2bff764543f06af/html5/thumbnails/48.jpg)
ContextandConversationinVirtualAssistantslikeSiri•Coreferencehelpsresolveambiguities
U:“BookatableatIlFornaio at7:00withmymom”U:“Alsosendher anemailreminder”
•Clarificationquestions:U:“Chicagopizza”S:“DidyoumeanpizzarestaurantsinChicagoorChicago-stylepizza?”