Thumbsup?SentimentClassificationusingMachineLearningTechniques
BoPang,LillianLee,andShivakumarVaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,
79—86.
KalaiselvanPanneerselvam
Course3111:Seminar:DataAnalyticsI 1
Outline
• WhatisSentimentAnalysis• SentimentClassificationinMovieReviews• BaselineClassifier• SupervisedLearningProcess• Framework• FeatureExtraction• Classifiers• Results• References
Course3111:Seminar:DataAnalyticsI 2
Terms
• Sentiment• Athought,view,orattitude,especiallyonebasedmainlyonemotioninsteadofreason
• SentimentAnalysis• akaopinionmining• useofnaturallanguageprocessing(NLP)andcomputationaltechniquestoautomatetheextractionorclassificationofsentimentfromtypicallyunstructuredtext
Course3111:Seminar:DataAnalyticsI 3
SentimentAnalysisWhat isitusedfor?
• Naturallanguageandtextprocessingtoidentifyandextractsubjectiveinformation
• Classifyingthepolarity ofagiventextaspositive,negative orneutral
• Ingeneral:todiscoverhowpeoplefeelaboutaparticulartopic
Course3111:Seminar:DataAnalyticsI 4
SentimentAnalysiswho isitusedby?
• Consumerinformation• Productreviews
• Marketing• Consumerattitudes• Trends
• Politics• Politicianswanttoknowvoters’views• Voterswanttoknowpolicitians’stancesandwhoelsesupportsthem
• Social• Findlike-mindedindividualsorcommunities
Course3111:Seminar:DataAnalyticsI 5
BingShopping
• a
Course3111:Seminar:DataAnalyticsI 6
SentimentClassificationinMovieReviews
• Polaritydetection:• IsanIMDBmoviereviewpositiveornegative?
• Data:PolarityData2.0:• http://www.cs.cornell.edu/people/pabo/movie-review-data• Usereviewswithstarornumericalvalueastrainingandtestdata.
BoPang,LillianLee,andShivakumarVaithyanathan.2002.Thumbsup? SentimentClassification usingMachine
LearningTechniques. EMNLP-2002,79—86.
Course3111:Seminar:DataAnalyticsI 7
IMDBdatainthePangandLeedatabase
when_starwars_cameoutsometwentyyearsago,theimageof travelingthroughout thestarshasbecomeacommonplace image.[…]
whenhansologoeslightspeed,thestarschangetobright lines,going towardstheviewerinlinesthatconvergeataninvisiblepoint .
cool.
_octobersky_offersamuchsimplerimage–thatofasinglewhitedot,travelinghorizontallyacrossthenight sky.[...]
“ snake eyes ” is the most aggravating kind of movie : the kind that shows so much potential then becomes unbelievably disappointing . it’s not just because this is a brian depalma film , and since he’s a great director and one who’s films are always greeted with at least some fanfare . and it’s not even because this was a film starring nicolas cage and since he gives a brauvara performance , this film is hardly worth his talents .
✓ ✗
Course3111:Seminar:DataAnalyticsI 8
TheData
pInternetMovieDatabase(IMDB)archivepLimiteddatato:
nReviewswithauthorratingnPositiveandnegativereviews(noneutral)n19positive,19negativereviewsperauthor
pInterimDataset:n752negativereviewsn1301positivereviewsn144reviewersrepresented
pFinalDataset:700positive,700negative(uniformdistribution)
Course3111:Seminar:DataAnalyticsI 9
PriorWork
• Priorclassificationbasedon:• source/sourcestyle• genre• knowledge-based• semanticorientationusingtextcategorization
Course3111:Seminar:DataAnalyticsI 10
Positiveornegativemoviereview?
• unbelievablydisappointing• Fullofzanycharactersandrichlyappliedsatire,andsomegreatplottwists
• thisisthegreatestscrewballcomedyeverfilmed• Itwaspathetic.Theworstpartaboutitwastheboxingscenes.
Course3111:Seminar:DataAnalyticsI 11
Baseline(HumanClassifier)
p CraftedwordlistsusingindependentCSgradstudents
p Positivevs.negativewordcount
Positive List Negative List Accuracy
Ties
Human 1
dazzling, brilliant, phenomenal, excellent, fantastic
suck, terrible, awful, unwatchable, hideous
58% 75%
Human 2
gripping, mesmerizing, riveting, spectacular, cool, awesome, thrilling, badass, excellent, moving, exciting
bad, clichéd, sucks, boring, stupid, slow
64% 39%
q Frequencycounts(including testdata)
q Hand-pickedwords
Positive List Negative List Accuracy
Ties
Human 3 + stats
love, wonderful, best, great, superb, still, beautiful
bad, worst, stupid, waste, boring, ?, !
69% 16%
*Tierates- Percentageofdocumentswherethetwosentimentswereratedequallylikely.
Course3111:Seminar:DataAnalyticsI
12
Thumbsup?SentimentClassificationusingMachineLearningTechniques
• Fromaboveexperiment,itisprovedworthwhiletoexplorecorpus-basedtechniques,ratherthanrelyingonpriorintuitions,toselectgoodindicatorfeaturesandtoperformsentimentclassificationingeneral.
Course3111:Seminar:DataAnalyticsI 13
SupervisedLearningProcess
Course3111:Seminar:DataAnalyticsI 14
Framework
Movie Reviews
Develop Features
NB
ME
SVM
Extract Insights
Evaluate Results
Training Model
Course3111:Seminar:DataAnalyticsI 15
Framework(continued…)
Course3111:Seminar:DataAnalyticsI 16
SentimentTokenizationIssues
• DealwithHTMLandXMLmarkup• Twittermark-up(names,hashtags)• Capitalization(preserveforwordsinallcaps)• Phonenumbers,datesandEmoticons• Commonlyusedtokenizer:
• ChristopherPottssentimenttokenizer• BrendanO’Connortwittertokenizer
Course3111:Seminar:DataAnalyticsI 17
ExtractingFeaturesforSentimentClassification
pHowtohandlenegationnI didn’t like this movie
vsnI really like this movie
pWhichwordstouse?nOnlyadjectivesnAllwords
pAllwordsturnsouttoworkbetter,atleastonthisdata
Course3111:Seminar:DataAnalyticsI 18
ExtractingFeatures(Continued…)
• Features• Unigrams:Asingleword.• Featurefrequency:frequencyofafeatureappears• Featurepresence:1onlywhenafeatureappears• Bigrams:Twocontinuesword.• PartsofSpeech:TagthewordwithitsPOS.• Adjectives:Onlyuseadjectivesinthetext.• Position:Thepositionofawordinthetext.Inthefirstquarter,lastquarterorthemiddlehalf.
Course3111:Seminar:DataAnalyticsI 19
Classifiers
Course3111:Seminar:DataAnalyticsI 20
NaïveBayesClassifiers
Course3111:Seminar:DataAnalyticsI 21
NaïveBayesClassifiers(Continued…)
Course3111:Seminar:DataAnalyticsI 22
NaïveBayesClassifiers(Continued…)
NaïveBayes:
givendocumentd,theclassc*=argmaxcP(c|d).Assumeallfeaturesareconditionallyindependent
ni(d)isthenumber oftimesfioccursindocumentd.
Course3111:Seminar:DataAnalyticsI 23
Example
Course3111:Seminar:DataAnalyticsI 24
Example(Continued…)
Course3111:Seminar:DataAnalyticsI 25
Example(Continued…)
Course3111:Seminar:DataAnalyticsI 26
Course3111:Seminar:DataAnalyticsI
MaximumEntropy
ØMaximumEntropyisatechniqueforlearningprobabilitydistributionsfromdata
Ø“Don’tassumeanythingaboutyourprobabilitydistributionotherthanwhatyouhaveobserved.”
ØAlwayschoosethemostuniformdistributionsubjecttotheobservedconstraints.
27
MaximumEntropy
• MaximumEntropy:
Z(d)isanormalizationfunction.Thearefeature-weightparameters.Largermeansfiisconsideredastrongindicatorforclassc.
Course3111:Seminar:DataAnalyticsI 28
Find a hyperplane that makes the margin between two categories.
SupportVectorMachine
Course3111:Seminar:DataAnalyticsI 29
Scenario1
Course3111:Seminar:DataAnalyticsI 30
Scenario2
Course3111:Seminar:DataAnalyticsI 31
Results• Resultsfordifferentfeature:
• Unigramsworksbetterthanbaseline• Presenceisbetterthanfrequency• Bigramfeaturedoesnotimproveperformance• Adjectivesarepoor• POSimproveslightforNBandME,butdeclineforSVM• Positionalsodoesnothelp
Course3111:Seminar:DataAnalyticsI 32
Insights
• SVMisfoundtobemoreaccurate.• Notcomparabletotopic-basedcategorizationmodels.
• Simpleunigrampresencethebest.• Presence>Frequency,notliketopic-based.
• Uncovered“thwartedexpectations”narrative.
• “Okay,I’mreallyashamedofit,butIenjoyedit.Imean,Iadmitit’sareallyawfulmovie.”
Course3111:Seminar:DataAnalyticsI 33
References
p B.Pang,L.Lee, andS.Vaithyanathan,“Thumbsup?Sentimentclassificationusingmachinelearningtechniques,”inProcConfonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pp.79–86,2002.
p EsuliA,SebastianiF.SentiWordNet:APubliclyAvailableLexicalResourceforOpinionMining.In:ProcofLREC2006- 5thConfonLanguageResourcesandEvaluation,2006.
p ZhangE,ZhangY.UCSConTREC2006BlogOpinionMining.TREC2006BlogTrack,OpinionRetrievalTask.
pDevittA,AhmadK. SentimentPolarityIdentificationinFinancialNews:ACohesion-basedApproach.ACL2007.p BoPang,LillianLee,Asentimentaleducation:sentimentanalysisusing
subjectivitysummarizationbasedonminimumcuts,Proceedingsofthe42ndAnnualMeetingonAssociationforComputationalLinguistics,p.271-
es,July21-26,2004.
Course3111:Seminar:DataAnalyticsI 34
Course3111:Seminar:DataAnalyticsI 35
TwitterasacorpusforSentimentAnalysisandOpinionMining
AlexanderPak,PatrickParoubek
Université deParis-Sud,LREc,2010
Course3111:Seminar:DataAnalyticsI 36
Abstract
• Millionsofusersshareopinions onmicroblogging sites
• Richdatasourceforopinionminingandsentimentanalysis
• Focusontwittercorpuscollectionforsentimentanalysis
• Corpus isusedtobuildasentimentclassifier
• Claimsthattheproposed techniquesworkbetterthanpreviousmethods
Course3111:Seminar:DataAnalyticsI 37
Introduction
• Microblogging hasbecomeverypopularnowadays
• Peoplegenerallywriteabouttheirlife,shareopinions anddiscusscurrentissues
• Userspostaboutproducts&services,expresspolitical andreligiousviews
• Twitterhasenormous numberoftextpostsanditgrowseveryday
• Audiencevariesfromregular users,tocelebs topoliticians andevenpresident
• Thiscanbeusedformarketing orsocialstudies
• Authorsperformed thefollowing steps:• Collected 300,000tweets,dividedtheminto3sets:Positive,NegativeandNeutral• Performedlinguisticanalysisofcollectedcorpus• Builtasentimentclassifier• ExperimentalEvaluation
Course3111:Seminar:DataAnalyticsI 38
CorpusCollection
• UsedtwitterAPI tocollectpositive,negative andobjective sentiments
• Forpositiveandnegativesentiments,usedemoticon approach:• Happy emoticons:“:-)”,“:)”,“=)”,“:D”etc. • Sad emoticons: “:-(”, “:(”, “=(”, “;(” etc.
• Forobjective posts,tweetspulledfrom44popularnewsoutletssuchas‘Washington Post’and‘NewYorkTimes’
• Assumption:Duetoshortcharacterlimit,itwasassumedthatemoticonsrepresentthesentimentfortheentiretweet
• English languagewasused forresearch,however, themethodcanbeeasilyadaptedtootherlanguages
Course3111:Seminar:DataAnalyticsI 39
CorpusAnalysis
• Wordfrequencydistribution followsZipf’s law
• Thisconfirmsapropercharacteristicofthecorpus
• UseTreeTagger (Schmid 1994)forPOStagging
• Comparedtagdistributions betweensetsoftexts
• Pairwisecomparisonofeachtagdonebycalculating:
• WhereNi isthenumberofoccurences oftagTinsetI
• Comparedtwosetsofdata:i. Subjective vs.Objectiveii. Negativevs.Positive
Course3111:Seminar:DataAnalyticsI 40
TreeTagger
• TreeTagger isatoolforannotating textwithpart-of-speechandlemmainformation
word pos lemma
The DT the
TreeTagger NP TreeTagger
is VBZ be
easy JJ easy
to TO to
use VB use
. SENT .
Course3111:Seminar:DataAnalyticsI 41
CorpusAnalysis
Objective Subjective
Containsmorecommonandpropernous (NPS,NP,NNS)
Containsmorepersonalpronouns(PP, PP$)
Verbsareusually in3rd person(VBZ) Verbsarein1st of2nd person(VBP)
pastparticiple(VBN)used Usesimplepasttense(VBD)
Comparativeadjectives(JJR)used forstatingfacts
Superlativeadjectives (JJS) usedforexpressingemotions
Positive Negative
Superlativeadverbsandpossesiveendings (POS)
Verbsinpasttense(VBN,VBD)
Course3111:Seminar:DataAnalyticsI 42
TrainingClassifier
• Useacombination ofn-gramapproachandPOStaggingforsentimentanalysis
• Experimentedwithunigrams,bigrams andtrigrams tofindbestsettingsformicroblogging data
• DataPreperation:i. Filtration:RemoveURLs,usernamesandspecialwordslike‘RT’ii. Tokenization:Splittexttocreatebagofwordsiii. RemoveStopwords:Removearticlessuchas‘a’,‘an’,‘the’frombagofwordsiv. Constructn-grams:Ensurethatnegationisattachedtothewordandconsidered asasingleword
• Usednaïvebayes classifier
wheres issentiment,M isamessage
Course3111:Seminar:DataAnalyticsI 43
TrainingClassifier
• Wetrain2bayes classifiers:Onewithn-gramsasfeatures,anotherwithPOSdistribution
• Assumption:POStagsareconditionally independent ofn-grams
• WhereGisthesetofN-gramsrepresentingamessageandTisthesetofPOS-tags
• Assumption:n-gramsandPOS-tagsarealsoconditionally independent
• Forall
• Finallywetaketheloglikelihood ofeachsentiment
Course3111:Seminar:DataAnalyticsI 44
IncreasingAccuracy
• Toincreaseaccuracy,wecalculateentropy ofaprobabilitydistribution
• Highentropyindicatesthatthedistribution ofsentiments indifferent sentimentdatasetsisclosetouniform
• Thesen-gramscanbediscardedastheydonotcontributemuch totheclassification
• Weintroduceanother term,salience
• Saliencetakesavaluebetween0and1.Lowvaluen-gramsshouldbediscarded
• Thefinalequation remainsthesame,exceptthefactthatwediscardn-gramswithentropyhigher thanθ orsaliencelowerthanθ
Course3111:Seminar:DataAnalyticsI 45
ModelValidation
• Classifierwastestedonasetofrealhand-annotatedtwitterposts Sentiment Samplesize
PositiveNegativeNeutralTotal
1087533216
Course3111:Seminar:DataAnalyticsI 46
Results
• Bestresultsareachievedwhenusingbi-grams
• Bigramshavegoodbalancebetweencoverage,andabilitytocapturesentimentpatterns
• Themodelhashighaccuracy,howeverlowerdecisionvalue
• Classifierisoptimalforasentimentsearchengine
• F-measureisusedtomeasureperformance,howeverwithsomechanges:
Keepingbetaat0.5
• Increasingsamplesizeimprovesperformance,butonlyuptoacertainpoint
• Salience discriminatescommonn-gramsbetterthanentropy
Course3111:Seminar:DataAnalyticsI 47
Conclusion&FutureWork
• Microblogging isanattractivesourceofdataforsentimentandopinionmining
• Authorsusesyntacticstructurestodescribeemotionsorstatefacts
• SomePOS-tagsmaybestrongindicatorsofemotionaltext
• Classifier isabletodetermine positive,negativeandneutralsentiments ofdocuments
• Plan tocollectamultilingual corpusofTwitterdataandcomparethecharacteristicsofthecorpusacrossdifferentlanguages
Course3111:Seminar:DataAnalyticsI 48
MyThoughts
• PartofSpeechtaggingisaneffectivewaytounderstandnuancesoflanguageandemotion
• Authorsclaimed thattheirtechniqueisbetterthanpreviousmethods, however, theynevercomparedittopreviousmethods intheentiretyofthepaper
• Although conditionalindependencycanhelpsimplifycalculations,itistoonaïveanassumption
• Another implicitassumptionisthatpeoplespellwordsproperlyon twitter.Nottrue.
• Moreover, sarcasmisoneofthebigproblemswhichcannotbehandledbysimplesentimentclassifiers.Twitter isfullofsarcasm
• Testsetistoosmall(216datapoints) comparedtotrainset(300,000datapoints!)
• Asof2018,theproposed futureworkhasnotbeenpublishedbythemainauthor
Course3111:Seminar:DataAnalyticsI 49
References• G.Adda,J.Mariani,J.Lecomte,P.Paroubek,andM.Raj-man.1998.TheGRACEFrenchpart-of-speechtaggingevaluationtask.InA.Rubio,N.Gallardo,R.Castro,andA.Tejada,
editors,LREC,volumeI,pages433–441,Granada,May.
• Ethem Alpaydin.2004.IntroductiontoMachineLearn- ing (AdaptiveComputationandMachineLearning).TheMITPress.
• Hayter AnthonyJ.2007.ProbabilityandStatisticsforEn- gineers andScientists.Duxbury,Belmont,CA,USA.Kushal Dave,SteveLawrence,andDavidM.Pennock.
• 2003.Miningthepeanutgallery:opinionextractionandsemanticclassificationofproductreviews.InWWW’03:Proceedingsofthe12thinternationalconferenceonWorldWideWeb,pages519–528,NewYork,NY,USA.ACM.
• AlecGo,LeiHuang,andRicha Bhayani.2009.Twit- ter sentimentanalysis.FinalProjectsfromCS224NforSpring2008/2009atTheStanfordNaturalLanguageProcessingGroup.
• BernardJ.Jansen,MimiZhang,KateSobel,andAbdur Chowdury.2009.Micro-bloggingasonlinewordofmouthbranding.InCHIEA’09:Proceedingsofthe27thinternationalconferenceextendedabstractsonHumanfactorsincomputingsystems,pages3859–3864,NewYork,NY,USA.ACM.
• JohnD.Lafferty,AndrewMcCallum,andFernandoC.N.Pereira.2001.Conditionalrandomfields:Probabilisticmodelsforsegmentingandlabelingsequencedata.InICML’01:ProceedingsoftheEighteenthInternationalConferenceonMachineLearning,pages282–289,SanFrancisco,CA,USA.MorganKaufmannPublishersInc.
• ChristopherD.ManningandHinrich Schu ̈tze.1999.Foun- dations ofstatisticalnaturallanguageprocessing.MITPress,Cambridge,MA,USA.
• BoPangandLillianLee.2008.Opinionminingandsenti-ment analysis.Found.TrendsInf.Retr.,2(1-2):1–135.BoPang,LillianLee,andShivakumar Vaithyanathan.
• 2002.Thumbsup?sentimentclassificationusingma- chinelearningtechniques.InProceedingsoftheCon- ference onEmpiricalMethodsinNaturalLanguagePro- cessing (EMNLP),pages79–86.
• TedPedersen.2000.Asimpleapproachtobuildingen- sembles ofnaivebayesian classifiersforwordsensedis- ambiguation.InProceedingsofthe1stNorthAmericanchapteroftheAssociationforComputationalLinguis- ticsconference,pages63–69,SanFrancisco,CA,USA.MorganKaufmannPublishersInc.
• JonathonRead.2005.Usingemoticonstoreducedepen- dency inmachinelearningtechniquesforsentimentclas- sification.InACL.TheAssociationforComputerLin- guistics.
• HelmutSchmid.1994.Probabilisticpart-of-speechtag- ging usingdecisiontrees.InProceedingsoftheInter- nationalConferenceonNewMethodsinLanguagePro- cessing,pages44–49.
• ClaudeE.ShannonandWarrenWeaver.1963.AMathe-matical TheoryofCommunication.UniversityofIllinoisPress,Champaign,IL,USA.
• TheresaWilson,JanyceWiebe,andPaulHoffmann.2005.Recognizingcontextualpolarityinphrase-levelsenti-ment analysis.InHLT’05:Proceedingsofthecon- ference onHumanLanguageTechnologyandEmpiricalMethodsinNaturalLanguageProcessing,pages347–354,Morristown,NJ,USA.AssociationforComputa- tional Linguistics.
• Changhua Yang,KevinHsin-Yih Lin,andHsin-Hsi Chen.2007.Emotionclassificationusingwebblogcorpora.InWI’07:ProceedingsoftheIEEE/WIC/ACMInterna- tional ConferenceonWebIntelligence,pages275–278,Washington,DC,USA.IEEEComputerSociety.
Course3111:Seminar:DataAnalyticsI 50
DEEP CONVOLUTIONAL NEURAL NETWORKS FOR
SENTIMENT ANALYSIS OF SHORT TEXTS
John Robert 278822
Seminar Data Analytics I
Course3111:Seminar:DataAnalyticsI 51
OUTLINE 1. Introduction2. Convolutional neural network architecture3. Training the network4. Related works5. Advantages of CharSCNN6. Experiment7. Model setup8. Results9. Conclusions10.Advised improvement 11.References
Course3111:Seminar:DataAnalyticsI 52
INTRODUCTION WhatisSentimentanalysis?SentimentAnalysisisalsoknownasopinionmining,itistheprocessofdeterminingtheemotionaltoneofaseriesofwords
Whyshorttext?Intheeraofsocialmedia,weexpressouropinionwithlimitedwords
ProblemwithshorttextsentimentanalysisShottextcontainlimitedcontextualinformation,toanalyzeshorttextitrequiresstrategiesthatcombinethesmalltextcontentwithpriorknowledgeandusemorethanjustbag-of-wordsbutalsoextractinformationfromthesentence/messageinamoredisciplinedway.
Course3111:Seminar:DataAnalyticsI 53
Whatstrategywasused?
• DeepconvolutionalneuralnetworknamedCharactertoSentenceConvolutionalNeuralNetwork(CharSCNN),withtwoconvolutionallayers toextractrelevantfeaturesfromwordsandsentencesofanysize.
• Twocorporaoftwodifferentdomains:theStanfordSentimentTreebank(SSTb),whichcontainssentencesfrommoviereviews;andtheStanfordTwitterSentimentcorpus(STS),whichcontainsTwittermessages.
Course3111:Seminar:DataAnalyticsI 54
CONVOLUTIONAL NEURAL NETWORK
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
INPUT - willholdtherawpixelvaluesthesequenceofwordsinthesentence.CONVOLUTIONALLAYER- willcomputetheoutputofneuronsthatareconnectedtolocalregionsintheinput,eachcomputingadotproductbetweentheirweightsandasmallregiontheyareconnectedtointheinput.RELULAYER- willapplyanelementwiseactivationfunctionPOOLLAYER-willperformadownsamplingoperationalongthespatialdimensions(width,height).FULLY-CONNECTEDLAYER- willcomputetheclassscores,whereeachofthe5predictioncorrespondtoaclassscore.CharSCNN computesascoreforeachsentimentlabelτ∈ T.
Course3111:Seminar:DataAnalyticsI 55
FIRST CONVOLUTIONAL LAYER
• Firstlayerofthenetworktransformswordsintoreal-valuedfeaturevectors(embeddings)thatcapturemorphological,syntacticandsemanticinformationaboutthewords
• Weuseafixed-sizedwordvocabularyVwrd
• Weuseafixed-sizedcharactervocabularyVchr (wordsarecomposedofcharacters)
• Givenasentencesof N words{w1 , w2, …, wN }, everywordwn isconvertedtoavectorun = [rwrd , rwch]
• Eachvectorforhasasub-vector,word-levelembeddingarerwrd∈ ℝdwrd andthecharacter-levelembeddingrwch∈ ℝcl0u foreverywn
Course3111:Seminar:DataAnalyticsI 56
WORD-LEVEL EMBEDDING
http://www.joshuakim.io/understanding-how-convolutional-neural-network-cnn-perform-text-classification-with-word-embeddings/
• Theyaremeanttocapturesyntactic(structure of sentences)andsemantic(relationshipbetweenwords,phrases,signsandsymbols)information
• Weconvertawordwn intoitsword-levelembeddingrwrd byusingthematrix-vectorproductrwrd = Wwrd vw
• where vw isthevectorsizeisavectorofsize|Vwrd| whichhasvalue1atindexwandzeroinallotherpositions.
• ThematrixWwrd isaparametertobelearned.
Course3111:Seminar:DataAnalyticsI 57
CHARACTER-LEVEL EMBEDDING
• Usedtoextractmorphological(howwordsareformed) andshape informationofwords
• Givenawordw composedofM characters{c1, c2, ..., cM},wefirsttransformeachcharactercm intoacharacterembeddingrm
chr
• Givenacharacterc,itsembeddingrchr isobtainedbythematrix-vectorproductrchr =Wchr vc
• wherevc isavectorofsize|Vchr|whichhasvalue1atindex c andzeroinallotherpositions.Theinputfortheconvolutionallayeristhesequenceofcharacterembeddings {r1
chr ,r2chr,...,rM
chr }Convolutionalapproachtocharacter-levelfeatureextraction.
Course3111:Seminar:DataAnalyticsI 58
SENTENCE-LEVEL REPRESENTATION SecondConvolutionalLayerWhydoweuseaconvolutionallayerforsentence-levelrepresentation?Becauseextractingasentence-widefeaturesethavetwomainproblems:sentenceshavedifferentsizesand;importantinformationinasentencecanappearatanypositioninthesentence
• Givenasentencex withN words{w1,w2, ...,wN},whichhavebeenconvertedtojointword-levelandcharacter-levelembedding{u1, u2, ..., uN}
• Extractasentence-levelrepresentationrsentx .
• Thislayerproduceslocalfeaturesaroundeachwordinthesentenceandthencombinesthemusingamaxoperationtocreateafixed-sizedfeaturevectorforthesentence.
Course3111:Seminar:DataAnalyticsI 59
TRAINING THE NETWORK
• NetworkistrainedbyminimizinganegativelikelihoodoverthetrainingsetD.• Givenasentencex,thenetworkwithparametersetθ computesascoresθ(x)τforeachsentimentlabel τ∈ T.
• weapplyasoftmax operationoverthescoresofalltagsτ∈ T:
• Weusestochasticgradientdescent(SGD)tominimizethenegativelog-likelihoodwithrespecttoθ :
• where(x, y) correspondstoasentenceinthetrainingcorpusD andy representsitsrespectivelabel.
• CharSCNN architecturewasimplementedusingtheTheano library
Course3111:Seminar:DataAnalyticsI 60
• In(Chrupala,2013),theauthorproposesasimplerecurrentnetwork(SRN)tolearncontinuousvectorrepresentationsforsequencesofcharacters,andusethemasfeaturesinaconditionalrandomfieldclassifiertosolveacharacterleveltextsegmentationandlabelingtask.
• In(Collobert etal.,2011),theauthorsuseaconvolutionalnetworkforthesemanticrolelabelingtaskwiththegoalavoidingexcessivetask-specificfeatureengineering.Theauthorsuseasimilarnetworkarchitectureforsyntacticparsing.
Course3111:Seminar:DataAnalyticsI
RELATED WORKS
61
ADVANTAGES OF CharSCNN
• Itusesafeed-forwardneuralnetworkinsteadofarecursiveoneanditdoesnotneedanyinputaboutthesyntacticstructureofthesentence.
• Theadditionofoneconvolutionallayertoextractcharacterfeatures.
• CharSCNN approachtoextractcharacter-levelfeaturesisitflexibility.
Course3111:Seminar:DataAnalyticsI 62
EXPERIMENTDataset• ThemoviereviewdatasetusedistherecentlyproposedStanfordSentimentTreebank(SSTb)whichincludesfinegrainedsentimentlabelsfor215,154phrasesintheparsetreesof11,855sentences.
• StanfordTwitterSentimentcorpus(STS)theoriginaltrainingsetcontains1.6milliontweetsthatwereautomaticallylabeledaspositive/negativeusingemoticonsasnoisylabels
Course3111:Seminar:DataAnalyticsI 63
WhyunsupervisedLearningofWord-LevelEmbeddings?• Becauserecentworkhasshownthatlargeimprovementsintermsofmodelaccuracycanbeobtainedbyperformingunsupervisedpre-trainingofwordembeddings
Howwastheunsupervisedlearningofword-levelembeddings wasperformed?• weperformunsupervisedlearningofword-levelembeddings usingtheword2vectool3,whichimplementsthecontinuousbag-of-wordsandskip-gramarchitecturesforcomputingvectorrepresentationsofwords
Course3111:Seminar:DataAnalyticsI
UNSUPERVISED PRE-TRAINING OF WORD-LEVEL EMBEDDINGS
64
Datasetforpre-trainingofwordembeddings• December2013snapshotoftheEnglishWikipediacorpusasasourceofunlabeleddata
Processingthedataset1. removalofparagraphsthatarenotinEnglish2. substitutionofnon-westerncharactersforaspecialcharacter3. tokenizationofthetextusingthetokenizeravailablewiththe
StanfordPOSTagger4. removalofsentencesthatarelessthan20characterslong
(includingwhitespaces)orhavelessthan5tokens.5. welowercaseallwordsandsubstituteeachnumericaldigitbya0
(e.g.,1967becomes0000).
Course3111:Seminar:DataAnalyticsI 65
Process• awordmustoccuratleast10timesinordertobeincludedinthevocabulary,whichresultedinavocabularyof870,214entries
• Totrainourword-levelembeddings weuseword2vec’sskip-grammethodwithacontextwindowofsize9.
• ThetrainingtimefortheEnglishcorpusisaround1h10minusing12threadsinaIntelXeonE5-26433.30GHzmachine
Course3111:Seminar:DataAnalyticsI 66
MODEL SETUP• Developmentsetsisusedtotunetheneuralnetworkhyper-parameters• Spentmoretimewasspenttuningthe learningratethantuningotherparameters,sinceitisthehyper-parameterthathasthelargestimpactinthepredictionperformance
• Theonlytwoparameterswithdifferentvaluesforthetwodatasetsarethelearningrateandthenumberofunitsintheconvolutionallayerthatextractsentencefeatures
• thenumberoftrainingepochsvariesbetweenfiveandtenforthetwodataset
Course3111:Seminar:DataAnalyticsI 67
Neural Network Hyper-Parameters• weuse4threadsinaIntelXeonE5-26433.30GHzmachine.• Theano basedimplementationofCharSCNN takesaround10min.tocompleteonetrainingepochfortheSSTb corpuswithallphrasesandfiveclasses.
Course3111:Seminar:DataAnalyticsI 68
RESULTS FOR SSTB CORPUS
• wecheckwhetherusingexamplesthataresinglephrases,inadditiontocompletesentences,canprovideusefulinformationfortraining
• Phrasesindicateswhetherallphrases(yes)oronlycompletesentences(no)inthecorpusareusedfortraining.
• TheFine-Grainedcolumncontainspredictionresultsforthecasewhere5sentimentclasses(labels)areused(verynegative,negative,neutral,positive,verypositive).
Course3111:Seminar:DataAnalyticsI 69
NOTE
• CharSCNN andSCNNhaveverysimilarresultsinbothfine-grainedandbinarysentimentprediction.Theseresultssuggestthatthecharacter-levelinformationisnotmuchhelpfulforsentimentpredictionintheSSTbcorpus.
• Usingphrasesastrainingexamplesallowstheclassifiertolearnmorecomplexphenomena,sincesentimentlabeledphrasesgivetheinformationofhowwords(phrases)combinetoformthesentimentofphrases(sentences).
• ComparedtoRNTN,CharSCNN hastheadvantageofnotneedingtheoutputofasyntacticparserwhenperformingsentimentprediction
Course3111:Seminar:DataAnalyticsI 70
RESULTS FOR STS CORPUS
Course3111:Seminar:DataAnalyticsI
• character-levelinformationhasagreaterimpactforTwitterdata• Usingunsupervisedpre-training,CharSCNN providesanabsolute
accuracyimprovementof1.2overSCNN
71
CONCLUSIONS
Course3111:Seminar:DataAnalyticsI
• positivesentence(left)anditsnegation(right).• theextractedfeaturesconcentratemainlyaroundthemaintopic,“film”,and
thepartofthephrasethatindicatessentiment(“liked”and“did’nt like”).• leftchartthattheword“liked”hasabigimpactinthesetofextractedfeatures.• rightchart,wecanseethattheimpactoftheword“like’’isreducedbecauseof
thenegation“did’nt”,whichisresponsibleforalargepartoftheextractedfeatures.
72
Course3111:Seminar:DataAnalyticsI
• negativeexpression“incrediblydull”isresponsibleforthefeaturesextractedfromthesentenceintheleft
• “definitelynotdull”,whichissomewhatmorepositive,isresponsibleforthefeaturesextractedfromthesentenceinthechartatright.
Whydoweusenegation?• negationisanimportantissueinsentimentanalysis
73
ADVISED IMPROVEMENT
• Usethreeconvolutionallayer,firstlayerforcharacterlevelembedding,secondforwordlevelembeddingandthirdforsentencelevelembedding
• Performunsupervisedpre-trainingatcharacter-levelrepresentations
Course3111:Seminar:DataAnalyticsI 74
REFERENCES• https://www.brandwatch.com/blog/understanding-sentiment-analysis/• https://deeplearning4j.org/word2vec.html• http://cs231n.github.io/convolutional-networks/• https://www.clickworker.com/2017/03/14/sentiment-analysis-what-is-it-for/• https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/• https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks-Part-2/
Course3111:Seminar:DataAnalyticsI 75
Top Related