Entity and Aspect Extraction for Organizing News Comments · analysis and lexical database search....
Transcript of Entity and Aspect Extraction for Organizing News Comments · analysis and lexical database search....
EntityandAspectExtractionforOrganizingNewsComments
RadityoEkoPrasojo,MounaKacimi&WernerNuttEMCLWorkshop,11–12February2016
CommentsinNewsWebsiteTypically, commentsarelistedbasedondate-timeandreplyrelation.
Problem:difficultytocatchtheflowofthediscussionsandtounderstandtheirmainpointsofagreementanddisagreement.
Example:whyisindependencegood/bad fortheScots?Willtheireconomy beaffected?
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
2
Thereisaneedfororganizingcomments
to help users to:(1) have a better understandingof the viewpoints related to each topic(2) facilitate the participation in discussions and thus increase the
chance of acquiring new viewpoints
by clustering comments containing similar discussions:
• they talk about the same entities
• they argue about the same aspects of those entities.EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments3
Contributions
• Improvementsonstate-of-the-artunsupervisedentityextractiontools(Zemanta,NERD,AIDAYago)• Addressedissues:noisesandlowcoverage(duetocoreferences)
• Introduced aspectextractioninnewsdomain• Previously:aspectsonlyonproductreviewdomain(Zhang&Liu,2014)
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
4
EntityextractionBaseline:UnsupervisedTools
• Improvedbyapplying:• Entityfiltering• Namenormalization• EntitysearchonKB• Coreference resolution
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
5
Tools:DbpediaStanfordCoreNLP
“Don't be afraid of Rasmussen or NATO, this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
<http://dbpedia.org/resource/NATO>
<http://dbpedia.org/resource/Scotland><http://dbpedia.org/resource/USS_Ronald_Reagan>
“Don't be afraid of Rasmussen or NATO,this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
<http://dbpedia.org/resource/Crimea>
<http://dbpedia.org/resource/Aircraft_Carrier>
“Don't be afraid of Rasmussen or NATO,this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
6
rdf:type ofNATO?à←dbo:PopulatedPlace
Anentity isaninstanceofsomewell-definedclass
rdf:type?à
←dbo:Shiprdf:type?à←dbo:Location
rdf:type?à←owl:Thing
EntityFiltering
“Don't be afraid of Rasmussen or NATO,this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
7
Weremoveentities thatdon’thaverdf:type otherthanowl:Thing andowl:Class
rdf:type ofNATO?à←dbo:PopulatedPlace
Anentity isaninstanceofsomewell-definedclass
rdf:type?à
←dbo:Shiprdf:type?à←dbo:Location
Risk:lowerrecall
rdf:type?à←owl:Thing
EntityFiltering
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
8
Anentitymayappearusingnon-propernames(alias)
“Don't be afraid of Rasmussen or NATO, this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
Lorem ips um dolor s it amet, consecte turadipisc ing el i t. Ut s usc ipi t lacus ac bibendum mol lis. Pe llenteas dsadas que t incitUnited States
Lorem ips um dolor s it amet, consecte turadipisc ing el i t. Ut s usc ipi t lacus ac bibendum mol lis. Pe llentesquetinc idunt vu lputate ligula a e ffic itur. Aliquam era tv olutpat. Al iquam sedenim et tortor
fringi l la asdasdasdasdas das dqwrqweqweqwr w qeqw eAnders Fogh Rasmussen.
Maecenas vehicula urnaeget m etus imperdie t com modo.
ScotlandRejectsIndependenceinReferendumLorem ipsumdolorsitamet,consectetur adipiscing elit.Ut suscipit lacusacbibendum mollis.Pellentesque tincidunt vulputate ligulaaefficitur.Aliquam erat
…………foaf:givenName,foaf:surname?à
←{‘Anders’, ‘Fogh’, ‘Rasmussen’}
dbpedia-owl:wikiPageRedirects of?à←{‘US’,‘USA’, ‘U.S.’,‘U.S.A.’,…}
NameNormalization
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
9
Anentitymayappearusingnon-propernames(alias)
“Don't be afraid of Rasmussen or NATO, this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
Lorem ips um dolor s it amet, consecte turadipisc ing el i t. Ut s usc ipi t lacus ac bibendum mol lis. Pe llenteas dsadas que t incitUnited States
Lorem ips um dolor s it amet, consecte turadipisc ing el i t. Ut s usc ipi t lacus ac bibendum mol lis. Pe llentesquetinc idunt vu lputate ligula a e ffic itur. Aliquam era tv olutpat. Al iquam sedenim et tortor
fringi l la asdasdasdasdas das dqwrqweqweqwr wqeqwe Anders Fogh Rasmussen. Maecenas
v eh icu la urna ege tm etus im perdie t com modo.
ScotlandRejectsIndependenceinReferendumLorem ipsumdolorsitamet,consectetur adipiscing elit.Ut suscipit lacusacbibendum mollis.Pellentesque tincidunt vulputate ligulaaefficitur.Aliquam erat
…………foaf:givenName,foaf:surname?à
←{‘Anders’, ‘Fogh’, ‘Rasmussen’}
dbpedia-owl:wikiPageRedirects of?à←{‘US’,‘USA’, ‘U.S.’,‘U.S.A.’,…}
Risk:lowerprecision
NameNormalization
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
10
Sometimes,mappingforanaliascannotbefound
AllrelatedentitiesofNATOandtheiraliasesà
←dbprop:leaderName dbpedia:Anders_Fogh_Rasmussen,{‘AndersFogh’, ‘Rasmussen’}
Stringcomparison“Don't be afraid of Rasmussen or NATO,
Context-RelatedEntitySearch
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
11
Sometimes,mappingforanaliascannotbefound
AllrelatedentitiesofNATOandtheiraliasesà
←dbprop:leaderName dbpedia:Anders_Fogh_Rasmussen,{‘AndersFogh’, ‘Rasmussen’}
Stringcomparison
Risk:lowerprecision
“Don't be afraid of Rasmussen or NATO,
Context-RelatedEntitySearch
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
12
“Don't be afraid of Rasmussen or NATO, this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
StanfordCoreNLP
Coreference Resolution
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
13
“Don't be afraid of Rasmussen or NATO, this is none of their business. If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
StanfordCoreNLP
Risk:lowerprecision
Coreference resolution
Coreference Resolution
EntityExtraction– ExperimentSetup
• 10newsarticlesthatuseDISQUS• 100commentshavingthehighestwordcounts• 5studentsasentity andaspect annotators• Annotateddataasgroundtruth
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
14
EntityExtraction– ExperimentResults
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
15
Aspects
• ofproductentities• Theaspectsofanentitye arethecomponents andattributes ofe.(Zhang&Liu,2014)
• ofentitiesonnews• “MoreScots woulddefinitelyhavevoted nothanyes”(voting - action)• “OrlandoBloomisagoodactor”(acting - skill)• “…it’stherightthe rightofScottishPeople”(right - possession)• Other:components,attributes,andmoods
Anaspect isallwhatisarguableaboutanentityEMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments16
TypesofAspects
• “…it’stherightthe rightofScottishPeople.”(explicit)
• “Tesco is large.”(implicit)
• “Scotlandisavery beautiful country.”(semi-implicit)
• “TheScotscan vote howevertheywant.” (semi-implicit)
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
17
aspect::right
aspect::employer
aspect::beauty
aspect::voting
ExtractionofExplicitAspects:ExploitingDependency
“…it’stherightthe rightofScottishPeople.”
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
18
StanfordCoreNLP Annotation
Extractallnounphrasesthathaveannmod:of,nmod:in,nmod:on,ornmod:atrelationtowardstheentity inthesentence.
PrepositionalDependency
• Specifically,thecoreference resolutionpart
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
19
ExtractionofExplicitAspects,CombinedwithEntityExtraction
“Don't be afraid of Rasmussen or NATO, this is none of their business.”
possessivedependency
ExtractionofImplicitAspects:Adjective-to-aspectMappingIdeafrom(Zhang&Liu,2014)
“Tesco islarge”
Inothercomments,wefoundasaspectsofTesco,qualifiedas“large”:• employer(2x)• backoffice(1x)• callcenteroperation(1x)
Weconclude:mostprobably,theemployer aspectofTesco wasmeant.Resultisfurtherimprovedby(1)takingintoaccountfrequentcontextwordsand(2)lexicalrelationsofadjective.
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
20
ExtractionofSemi-ImplicitAspects(1)
• Semi-implicitaspects:implicitaspectsthatdon’thavemapping.• “Scotlandisavery beautiful country.”
beautiful beautyWesearchforanounphrasethatisconnectedtotheentityandthewordindicatingtheaspectusingalexicalrelation.EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments21
WordNet:attribute
StanfordCoreNLP Annotation
Lexicaldatabasesearch Weconsiderfollowinglexicalrelations:attribute>pertainym>participleofverb>derivationallyrelatedform(drf)>seealso
ExtractionofSemi-ImplicitAspects(2)
• Generallycanbeusedtoidentifyaspects fromverb,adjective,ornoun.• Otherexamples:
beautiful beautyinstrumental instrumentvote votinginexcusable excusable justifiable
justification justify• Iftherearemultiplepossibleaspects forasingleword,weuseWordNet::Similaritytodecideforthebestone.
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
22
WordNet:attribute
WordNet:pertainym
WordNet:derivationally relatedform
WordNet:antonym WordNet:similar to
WordNet:drfWordNet:drf
AspectExtraction–ExperimentandResults
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
23
Visualization
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
24
REPrasojo,MKacimi,FDarari.IEEEInfoVis.Chicago,25-30October2015.
demoisnowavailableatorcaestra.inf.unibz.it
Conclusion• Generalcontribution:aframeworkfororganizingnewscomment usingentityextractionandaspectextraction.
• Ourentityextraction: unsupervisedtools+entityfiltering,namenormalization,entitysearch,andcoreference resolution.
• Weextractexplicit,implicit,andsemi-implicit aspectsusinggrammaranalysisandlexicaldatabasesearch.
• Experimentshowsimprovementonbothentity andaspect extractioncomparedtobaselinetechnique.
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
25
FutureWorks
• Addressinglimitations:• Conceptextractions• Idiomsandothermetaphoricalexpressions• Difficultcoreference (e.g.demonstrativepronouns)• Experimentsonmore,variousdata
• Completethemissingpieces:• Sentimentanalysisfornewscomments
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
26
• EuropeanMaster’sProgramme inComputationalLogic
• To-KnowProject
• SIGIRStudentTravelGrant
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
27
Acknowledgements
Thankyou!
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
28
Extraslides
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
29
Anentity canbe…
aperson,alocation,anorganization,oranywell-definedconcept suchasnationalities,languages,orwars.
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
30
Entity Extraction Tasks
1. Recognition– throughpropernames/rigiddesignators(Coates-Stephens,1992)(Thielen,1995)(Nadeau&Sekine,2006)
2. Disambiguation – bymappingtoarepresentationinaKB
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
31
“Scotland can vote however it wants, it's the Scottish peoples right.”
“If he ends up sending USS Ronald Reagan aircraft carrier to the coast of Scotland, then he should have done the same to Crimea.”
<http://dbpedia.org/resource/USS_Ronald_Reagan_(CVN-76)>
<http://dbpedia.org/resource/Crimea><http://dbpedia.org/resource/Scotland>
SupervisedvsUnsupervisedApproachestoEntityExtraction
AspectofEE
Prominent tools
Recognitionability
Disambiguation ability
Running time
Supervised
StanfordNLP
dependsonthetrainingset
limited
fast
UnsupervisedZemanta,AlchemyAPI,NERD,AidaYAGO
domain-independent
providedbyKBs
slow
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
32
Thereisaneedfororganizingcommentsto help users to:(1) have a better understandingof the viewpoints related to each topic(2) facilitate the participation in discussions and thus increase the
chance of acquiring new viewpoints
Our contribution is a comment organization framework:
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
33
Entityextraction Aspectextraction
Sentimentanalysis
Visualizationandorganization
ExtractionofImplicitAspects:Adjective-to-aspectMappingIdeafrom(Zhang&Liu,2014)
“Tesco islarge”
Inothercomments,wefoundasaspectsofTesco,qualifiedas“large”:• employer(2x)• backoffice(1x)• callcenteroperation(1x)
Weconclude:mostprobably,theemployer aspectofTesco wasmeant.EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments34
ExtractionofImplicitAspects:ContextWordsSupposenowweneedatiebreaker!
“Tesco islargeandthepayisgood” contextwords:{large,pay,company}
Inothercomments,wefoundasaspectsofTesco,qualifiedas“large”:• employer(2x)contextwords:{employer,large,salary,job,people}• backoffice(2x)contextwords:{back,office,large,headquarter,building}• callcenteroperation(1x)
Weusecontext-words difference,computedusingWordNet:Similarity.Weconclude:mostprobably,theemployer aspectofTesco wasmeant.EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments35
rdf:type
contextwords:{employer,large,salary,job,people}contextwords:{back,office,large,area,building}
contextwords:{large,pay,company},company
ExtractionofImplicitAspects:ExperimentandResult(1)
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
36
Adjective-to-aspectmappingData Precision RecallAllNews 76.87 26.12
withcontextwordsPrecision Recall88.65 30.03
ExtractionofImplicitAspects:LexicalMapping
“Tesco islarge”
Inadditionto“large”,countalsoaspectsqualifiedwithsimilaradjectives(“huge”,“big”,…)
Similarityismeasuredusing1. lexicalrelationship<synonym>and<similarto>inWordNet2. WordNet::Similarity
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
37
ExtractionofImplicitAspects:ExperimentandResult(2)
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
38
Lexicalmapping(1-step)Data Precision RecallAllNews 87.07 32.72
2-stepsPrecision Recall83.33 32.76
ExtractionofSemi-ImplicitAspects(2)
• Ifwedon’tfindtheaspectwithin1stepoflexicalsearch:• “OrlandoBloomisagoodactor”
act actoracting
Wesearchfortheclosestnountothepseudo-aspect.EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments39
WordNet:derivationally relatedformLexicaldatabasesearch Tofind intermediatesynsets,weconsider
following lexicalrelations:synonym>antonym>similarto>derivationallyrelatedform(drf) >seealsoWordNet:derivationally relatedform
StanfordCoreNLP Annotation
FindingAspects
1. Findwordsthatrepresenttheaspectjob,beautiful,large,acted<noun>,<adjective>,or<verb>
2. Identifytheaspectjob,beauty,employer,acting
EMCLWorkshop2016 Prasojo,Kacimi,&Nutt- EntityandAspectExtractionforOrganizingNewsComments
40
Technique:grammaranalysisTool:StanfordCoreNLP
Technique:frequency-basedmapping (implicit)lexicaldatabasesearch(semi-implicit)
Tools:WordNet,DBpedia