Download - Thumbs up? Sentiment Classification using Machine Learning ... · Thumbs up? Sentiment Classification using Machine Learning Techniques • From above experiment, it is proved worthwhile

Thumbsup?SentimentClassificationusingMachineLearningTechniques

BoPang,LillianLee,andShivakumarVaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,

79—86.

KalaiselvanPanneerselvam

Course3111:Seminar:DataAnalyticsI 1

Outline

• WhatisSentimentAnalysis• SentimentClassificationinMovieReviews• BaselineClassifier• SupervisedLearningProcess• Framework• FeatureExtraction• Classifiers• Results• References


Terms

• Sentiment• Athought,view,orattitude,especiallyonebasedmainlyonemotioninsteadofreason

• SentimentAnalysis• akaopinionmining• useofnaturallanguageprocessing(NLP)andcomputationaltechniquestoautomatetheextractionorclassificationofsentimentfromtypicallyunstructuredtext


SentimentAnalysisWhat isitusedfor?

• Naturallanguageandtextprocessingtoidentifyandextractsubjectiveinformation

• Classifyingthepolarity ofagiventextaspositive,negative orneutral

• Ingeneral:todiscoverhowpeoplefeelaboutaparticulartopic


SentimentAnalysiswho isitusedby?

• Consumerinformation• Productreviews

• Marketing• Consumerattitudes• Trends

• Politics• Politicianswanttoknowvoters’views• Voterswanttoknowpolicitians’stancesandwhoelsesupportsthem

• Social• Findlike-mindedindividualsorcommunities


BingShopping

• a


SentimentClassificationinMovieReviews

• Polaritydetection:• IsanIMDBmoviereviewpositiveornegative?

• Data:PolarityData2.0:• http://www.cs.cornell.edu/people/pabo/movie-review-data• Usereviewswithstarornumericalvalueastrainingandtestdata.

BoPang,LillianLee,andShivakumarVaithyanathan.2002.Thumbsup? SentimentClassification usingMachine

LearningTechniques. EMNLP-2002,79—86.


IMDBdatainthePangandLeedatabase

when_starwars_cameoutsometwentyyearsago,theimageof travelingthroughout thestarshasbecomeacommonplace image.[…]

whenhansologoeslightspeed,thestarschangetobright lines,going towardstheviewerinlinesthatconvergeataninvisiblepoint .

cool.

_octobersky_offersamuchsimplerimage–thatofasinglewhitedot,travelinghorizontallyacrossthenight sky.[...]

“ snake eyes ” is the most aggravating kind of movie : the kind that shows so much potential then becomes unbelievably disappointing . it’s not just because this is a brian depalma film , and since he’s a great director and one who’s films are always greeted with at least some fanfare . and it’s not even because this was a film starring nicolas cage and since he gives a brauvara performance , this film is hardly worth his talents .

✓ ✗


TheData

pInternetMovieDatabase(IMDB)archivepLimiteddatato:

nReviewswithauthorratingnPositiveandnegativereviews(noneutral)n19positive,19negativereviewsperauthor

pInterimDataset:n752negativereviewsn1301positivereviewsn144reviewersrepresented

pFinalDataset:700positive,700negative(uniformdistribution)


PriorWork

• Priorclassificationbasedon:• source/sourcestyle• genre• knowledge-based• semanticorientationusingtextcategorization


Positiveornegativemoviereview?

• unbelievablydisappointing• Fullofzanycharactersandrichlyappliedsatire,andsomegreatplottwists

• thisisthegreatestscrewballcomedyeverfilmed• Itwaspathetic.Theworstpartaboutitwastheboxingscenes.


Baseline(HumanClassifier)

p CraftedwordlistsusingindependentCSgradstudents

p Positivevs.negativewordcount

Positive List Negative List Accuracy

Ties

Human 1

dazzling, brilliant, phenomenal, excellent, fantastic

suck, terrible, awful, unwatchable, hideous

58% 75%

Human 2

gripping, mesmerizing, riveting, spectacular, cool, awesome, thrilling, badass, excellent, moving, exciting

bad, clichéd, sucks, boring, stupid, slow

64% 39%

q Frequencycounts(including testdata)

q Hand-pickedwords

Positive List Negative List Accuracy

Ties

Human 3 + stats

love, wonderful, best, great, superb, still, beautiful

bad, worst, stupid, waste, boring, ?, !

69% 16%

*Tierates- Percentageofdocumentswherethetwosentimentswereratedequallylikely.

Course3111:Seminar:DataAnalyticsI

12

Thumbsup?SentimentClassificationusingMachineLearningTechniques

• Fromaboveexperiment,itisprovedworthwhiletoexplorecorpus-basedtechniques,ratherthanrelyingonpriorintuitions,toselectgoodindicatorfeaturesandtoperformsentimentclassificationingeneral.


SupervisedLearningProcess


Framework

Movie Reviews

Develop Features

NB

ME

SVM

Extract Insights

Evaluate Results

Training Model


Framework(continued…)


SentimentTokenizationIssues

• DealwithHTMLandXMLmarkup• Twittermark-up(names,hashtags)• Capitalization(preserveforwordsinallcaps)• Phonenumbers,datesandEmoticons• Commonlyusedtokenizer:

• ChristopherPottssentimenttokenizer• BrendanO’Connortwittertokenizer


ExtractingFeaturesforSentimentClassification

pHowtohandlenegationnI didn’t like this movie

vsnI really like this movie

pWhichwordstouse?nOnlyadjectivesnAllwords

pAllwordsturnsouttoworkbetter,atleastonthisdata


ExtractingFeatures(Continued…)

• Features• Unigrams:Asingleword.• Featurefrequency:frequencyofafeatureappears• Featurepresence:1onlywhenafeatureappears• Bigrams:Twocontinuesword.• PartsofSpeech:TagthewordwithitsPOS.• Adjectives:Onlyuseadjectivesinthetext.• Position:Thepositionofawordinthetext.Inthefirstquarter,lastquarterorthemiddlehalf.


Classifiers


NaïveBayesClassifiers


NaïveBayesClassifiers(Continued…)


NaïveBayesClassifiers(Continued…)

NaïveBayes:

givendocumentd,theclassc*=argmaxcP(c|d).Assumeallfeaturesareconditionallyindependent

ni(d)isthenumber oftimesfioccursindocumentd.


Example


Example(Continued…)



MaximumEntropy

ØMaximumEntropyisatechniqueforlearningprobabilitydistributionsfromdata

Ø“Don’tassumeanythingaboutyourprobabilitydistributionotherthanwhatyouhaveobserved.”

ØAlwayschoosethemostuniformdistributionsubjecttotheobservedconstraints.

27

MaximumEntropy

• MaximumEntropy:

Z(d)isanormalizationfunction.Thearefeature-weightparameters.Largermeansfiisconsideredastrongindicatorforclassc.


Find a hyperplane that makes the margin between two categories.

SupportVectorMachine


Scenario1


Scenario2


Results• Resultsfordifferentfeature:

• Unigramsworksbetterthanbaseline• Presenceisbetterthanfrequency• Bigramfeaturedoesnotimproveperformance• Adjectivesarepoor• POSimproveslightforNBandME,butdeclineforSVM• Positionalsodoesnothelp


Insights

• SVMisfoundtobemoreaccurate.• Notcomparabletotopic-basedcategorizationmodels.

• Simpleunigrampresencethebest.• Presence>Frequency,notliketopic-based.

• Uncovered“thwartedexpectations”narrative.

• “Okay,I’mreallyashamedofit,butIenjoyedit.Imean,Iadmitit’sareallyawfulmovie.”


References

p B.Pang,L.Lee, andS.Vaithyanathan,“Thumbsup?Sentimentclassificationusingmachinelearningtechniques,”inProcConfonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pp.79–86,2002.

p EsuliA,SebastianiF.SentiWordNet:APubliclyAvailableLexicalResourceforOpinionMining.In:ProcofLREC2006- 5thConfonLanguageResourcesandEvaluation,2006.

p ZhangE,ZhangY.UCSConTREC2006BlogOpinionMining.TREC2006BlogTrack,OpinionRetrievalTask.

pDevittA,AhmadK. SentimentPolarityIdentificationinFinancialNews:ACohesion-basedApproach.ACL2007.p BoPang,LillianLee,Asentimentaleducation:sentimentanalysisusing

subjectivitysummarizationbasedonminimumcuts,Proceedingsofthe42ndAnnualMeetingonAssociationforComputationalLinguistics,p.271-

es,July21-26,2004.


TwitterasacorpusforSentimentAnalysisandOpinionMining

AlexanderPak,PatrickParoubek

Université deParis-Sud,LREc,2010


Abstract

• Millionsofusersshareopinions onmicroblogging sites

• Richdatasourceforopinionminingandsentimentanalysis

• Focusontwittercorpuscollectionforsentimentanalysis

• Corpus isusedtobuildasentimentclassifier

• Claimsthattheproposed techniquesworkbetterthanpreviousmethods


Introduction

• Microblogging hasbecomeverypopularnowadays

• Peoplegenerallywriteabouttheirlife,shareopinions anddiscusscurrentissues

• Userspostaboutproducts&services,expresspolitical andreligiousviews

• Twitterhasenormous numberoftextpostsanditgrowseveryday

• Audiencevariesfromregular users,tocelebs topoliticians andevenpresident

• Thiscanbeusedformarketing orsocialstudies

• Authorsperformed thefollowing steps:• Collected 300,000tweets,dividedtheminto3sets:Positive,NegativeandNeutral• Performedlinguisticanalysisofcollectedcorpus• Builtasentimentclassifier• ExperimentalEvaluation


CorpusCollection

• UsedtwitterAPI tocollectpositive,negative andobjective sentiments

• Forpositiveandnegativesentiments,usedemoticon approach:• Happy emoticons:“:-)”,“:)”,“=)”,“:D”etc. • Sad emoticons: “:-(”, “:(”, “=(”, “;(” etc.

• Forobjective posts,tweetspulledfrom44popularnewsoutletssuchas‘Washington Post’and‘NewYorkTimes’

• Assumption:Duetoshortcharacterlimit,itwasassumedthatemoticonsrepresentthesentimentfortheentiretweet

• English languagewasused forresearch,however, themethodcanbeeasilyadaptedtootherlanguages


CorpusAnalysis

• Wordfrequencydistribution followsZipf’s law

• Thisconfirmsapropercharacteristicofthecorpus

• UseTreeTagger (Schmid 1994)forPOStagging

• Comparedtagdistributions betweensetsoftexts

• Pairwisecomparisonofeachtagdonebycalculating:

• WhereNi isthenumberofoccurences oftagTinsetI

• Comparedtwosetsofdata:i. Subjective vs.Objectiveii. Negativevs.Positive


TreeTagger

• TreeTagger isatoolforannotating textwithpart-of-speechandlemmainformation

word pos lemma

The DT the

TreeTagger NP TreeTagger

is VBZ be

easy JJ easy

to TO to

use VB use

. SENT .


CorpusAnalysis

Objective Subjective

Containsmorecommonandpropernous (NPS,NP,NNS)

Containsmorepersonalpronouns(PP, PP$)

Verbsareusually in3rd person(VBZ) Verbsarein1st of2nd person(VBP)

pastparticiple(VBN)used Usesimplepasttense(VBD)

Comparativeadjectives(JJR)used forstatingfacts

Superlativeadjectives (JJS) usedforexpressingemotions

Positive Negative

Superlativeadverbsandpossesiveendings (POS)

Verbsinpasttense(VBN,VBD)


TrainingClassifier

• Useacombination ofn-gramapproachandPOStaggingforsentimentanalysis

• Experimentedwithunigrams,bigrams andtrigrams tofindbestsettingsformicroblogging data

• DataPreperation:i. Filtration:RemoveURLs,usernamesandspecialwordslike‘RT’ii. Tokenization:Splittexttocreatebagofwordsiii. RemoveStopwords:Removearticlessuchas‘a’,‘an’,‘the’frombagofwordsiv. Constructn-grams:Ensurethatnegationisattachedtothewordandconsidered asasingleword

• Usednaïvebayes classifier

wheres issentiment,M isamessage


TrainingClassifier

• Wetrain2bayes classifiers:Onewithn-gramsasfeatures,anotherwithPOSdistribution

• Assumption:POStagsareconditionally independent ofn-grams

• WhereGisthesetofN-gramsrepresentingamessageandTisthesetofPOS-tags

• Assumption:n-gramsandPOS-tagsarealsoconditionally independent

• Forall

• Finallywetaketheloglikelihood ofeachsentiment


IncreasingAccuracy

• Toincreaseaccuracy,wecalculateentropy ofaprobabilitydistribution

• Highentropyindicatesthatthedistribution ofsentiments indifferent sentimentdatasetsisclosetouniform

• Thesen-gramscanbediscardedastheydonotcontributemuch totheclassification

• Weintroduceanother term,salience

• Saliencetakesavaluebetween0and1.Lowvaluen-gramsshouldbediscarded

• Thefinalequation remainsthesame,exceptthefactthatwediscardn-gramswithentropyhigher thanθ orsaliencelowerthanθ


ModelValidation

• Classifierwastestedonasetofrealhand-annotatedtwitterposts Sentiment Samplesize

PositiveNegativeNeutralTotal

1087533216


Results

• Bestresultsareachievedwhenusingbi-grams

• Bigramshavegoodbalancebetweencoverage,andabilitytocapturesentimentpatterns

• Themodelhashighaccuracy,howeverlowerdecisionvalue

• Classifierisoptimalforasentimentsearchengine

• F-measureisusedtomeasureperformance,howeverwithsomechanges:

Keepingbetaat0.5

• Increasingsamplesizeimprovesperformance,butonlyuptoacertainpoint

• Salience discriminatescommonn-gramsbetterthanentropy


Conclusion&FutureWork

• Microblogging isanattractivesourceofdataforsentimentandopinionmining

• Authorsusesyntacticstructurestodescribeemotionsorstatefacts

• SomePOS-tagsmaybestrongindicatorsofemotionaltext

• Classifier isabletodetermine positive,negativeandneutralsentiments ofdocuments

• Plan tocollectamultilingual corpusofTwitterdataandcomparethecharacteristicsofthecorpusacrossdifferentlanguages


MyThoughts

• PartofSpeechtaggingisaneffectivewaytounderstandnuancesoflanguageandemotion

• Authorsclaimed thattheirtechniqueisbetterthanpreviousmethods, however, theynevercomparedittopreviousmethods intheentiretyofthepaper

• Although conditionalindependencycanhelpsimplifycalculations,itistoonaïveanassumption

• Another implicitassumptionisthatpeoplespellwordsproperlyon twitter.Nottrue.

• Moreover, sarcasmisoneofthebigproblemswhichcannotbehandledbysimplesentimentclassifiers.Twitter isfullofsarcasm

• Testsetistoosmall(216datapoints) comparedtotrainset(300,000datapoints!)

• Asof2018,theproposed futureworkhasnotbeenpublishedbythemainauthor


References• G.Adda,J.Mariani,J.Lecomte,P.Paroubek,andM.Raj-man.1998.TheGRACEFrenchpart-of-speechtaggingevaluationtask.InA.Rubio,N.Gallardo,R.Castro,andA.Tejada,

editors,LREC,volumeI,pages433–441,Granada,May.

• Ethem Alpaydin.2004.IntroductiontoMachineLearn- ing (AdaptiveComputationandMachineLearning).TheMITPress.

• Hayter AnthonyJ.2007.ProbabilityandStatisticsforEn- gineers andScientists.Duxbury,Belmont,CA,USA.Kushal Dave,SteveLawrence,andDavidM.Pennock.

• 2003.Miningthepeanutgallery:opinionextractionandsemanticclassificationofproductreviews.InWWW’03:Proceedingsofthe12thinternationalconferenceonWorldWideWeb,pages519–528,NewYork,NY,USA.ACM.

• AlecGo,LeiHuang,andRicha Bhayani.2009.Twit- ter sentimentanalysis.FinalProjectsfromCS224NforSpring2008/2009atTheStanfordNaturalLanguageProcessingGroup.

• BernardJ.Jansen,MimiZhang,KateSobel,andAbdur Chowdury.2009.Micro-bloggingasonlinewordofmouthbranding.InCHIEA’09:Proceedingsofthe27thinternationalconferenceextendedabstractsonHumanfactorsincomputingsystems,pages3859–3864,NewYork,NY,USA.ACM.

• JohnD.Lafferty,AndrewMcCallum,andFernandoC.N.Pereira.2001.Conditionalrandomfields:Probabilisticmodelsforsegmentingandlabelingsequencedata.InICML’01:ProceedingsoftheEighteenthInternationalConferenceonMachineLearning,pages282–289,SanFrancisco,CA,USA.MorganKaufmannPublishersInc.

• ChristopherD.ManningandHinrich Schu ̈tze.1999.Foun- dations ofstatisticalnaturallanguageprocessing.MITPress,Cambridge,MA,USA.

• BoPangandLillianLee.2008.Opinionminingandsenti-ment analysis.Found.TrendsInf.Retr.,2(1-2):1–135.BoPang,LillianLee,andShivakumar Vaithyanathan.

• 2002.Thumbsup?sentimentclassificationusingmachinelearningtechniques.InProceedingsoftheCon- ference onEmpiricalMethodsinNaturalLanguagePro- cessing (EMNLP),pages79–86.

• TedPedersen.2000.Asimpleapproachtobuildingen- sembles ofnaivebayesian classifiersforwordsensedis- ambiguation.InProceedingsofthe1stNorthAmericanchapteroftheAssociationforComputationalLinguis- ticsconference,pages63–69,SanFrancisco,CA,USA.MorganKaufmannPublishersInc.

• JonathonRead.2005.Usingemoticonstoreducedepen- dency inmachinelearningtechniquesforsentimentclas- sification.InACL.TheAssociationforComputerLin- guistics.

• HelmutSchmid.1994.Probabilisticpart-of-speechtag- ging usingdecisiontrees.InProceedingsoftheInter- nationalConferenceonNewMethodsinLanguagePro- cessing,pages44–49.

• ClaudeE.ShannonandWarrenWeaver.1963.AMathe-matical TheoryofCommunication.UniversityofIllinoisPress,Champaign,IL,USA.

• TheresaWilson,JanyceWiebe,andPaulHoffmann.2005.Recognizingcontextualpolarityinphrase-levelsenti-ment analysis.InHLT’05:Proceedingsofthecon- ference onHumanLanguageTechnologyandEmpiricalMethodsinNaturalLanguageProcessing,pages347–354,Morristown,NJ,USA.AssociationforComputa- tional Linguistics.

• Changhua Yang,KevinHsin-Yih Lin,andHsin-Hsi Chen.2007.Emotionclassificationusingwebblogcorpora.InWI’07:ProceedingsoftheIEEE/WIC/ACMInterna- tional ConferenceonWebIntelligence,pages275–278,Washington,DC,USA.IEEEComputerSociety.


DEEP CONVOLUTIONAL NEURAL NETWORKS FOR

SENTIMENT ANALYSIS OF SHORT TEXTS

John Robert 278822

Seminar Data Analytics I


OUTLINE 1. Introduction2. Convolutional neural network architecture3. Training the network4. Related works5. Advantages of CharSCNN6. Experiment7. Model setup8. Results9. Conclusions10.Advised improvement 11.References


INTRODUCTION WhatisSentimentanalysis?SentimentAnalysisisalsoknownasopinionmining,itistheprocessofdeterminingtheemotionaltoneofaseriesofwords

Whyshorttext?Intheeraofsocialmedia,weexpressouropinionwithlimitedwords

ProblemwithshorttextsentimentanalysisShottextcontainlimitedcontextualinformation,toanalyzeshorttextitrequiresstrategiesthatcombinethesmalltextcontentwithpriorknowledgeandusemorethanjustbag-of-wordsbutalsoextractinformationfromthesentence/messageinamoredisciplinedway.


Whatstrategywasused?

• DeepconvolutionalneuralnetworknamedCharactertoSentenceConvolutionalNeuralNetwork(CharSCNN),withtwoconvolutionallayers toextractrelevantfeaturesfromwordsandsentencesofanysize.

• Twocorporaoftwodifferentdomains:theStanfordSentimentTreebank(SSTb),whichcontainssentencesfrommoviereviews;andtheStanfordTwitterSentimentcorpus(STS),whichcontainsTwittermessages.


CONVOLUTIONAL NEURAL NETWORK

https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

INPUT - willholdtherawpixelvaluesthesequenceofwordsinthesentence.CONVOLUTIONALLAYER- willcomputetheoutputofneuronsthatareconnectedtolocalregionsintheinput,eachcomputingadotproductbetweentheirweightsandasmallregiontheyareconnectedtointheinput.RELULAYER- willapplyanelementwiseactivationfunctionPOOLLAYER-willperformadownsamplingoperationalongthespatialdimensions(width,height).FULLY-CONNECTEDLAYER- willcomputetheclassscores,whereeachofthe5predictioncorrespondtoaclassscore.CharSCNN computesascoreforeachsentimentlabelτ∈ T.


FIRST CONVOLUTIONAL LAYER

• Firstlayerofthenetworktransformswordsintoreal-valuedfeaturevectors(embeddings)thatcapturemorphological,syntacticandsemanticinformationaboutthewords

• Weuseafixed-sizedwordvocabularyVwrd

• Weuseafixed-sizedcharactervocabularyVchr (wordsarecomposedofcharacters)

• Givenasentencesof N words{w1 , w2, …, wN }, everywordwn isconvertedtoavectorun = [rwrd , rwch]

• Eachvectorforhasasub-vector,word-levelembeddingarerwrd∈ ℝdwrd andthecharacter-levelembeddingrwch∈ ℝcl0u foreverywn


WORD-LEVEL EMBEDDING

http://www.joshuakim.io/understanding-how-convolutional-neural-network-cnn-perform-text-classification-with-word-embeddings/

• Theyaremeanttocapturesyntactic(structure of sentences)andsemantic(relationshipbetweenwords,phrases,signsandsymbols)information

• Weconvertawordwn intoitsword-levelembeddingrwrd byusingthematrix-vectorproductrwrd = Wwrd vw

• where vw isthevectorsizeisavectorofsize|Vwrd| whichhasvalue1atindexwandzeroinallotherpositions.

• ThematrixWwrd isaparametertobelearned.


CHARACTER-LEVEL EMBEDDING

• Usedtoextractmorphological(howwordsareformed) andshape informationofwords

• Givenawordw composedofM characters{c1, c2, ..., cM},wefirsttransformeachcharactercm intoacharacterembeddingrm

chr

• Givenacharacterc,itsembeddingrchr isobtainedbythematrix-vectorproductrchr =Wchr vc

• wherevc isavectorofsize|Vchr|whichhasvalue1atindex c andzeroinallotherpositions.Theinputfortheconvolutionallayeristhesequenceofcharacterembeddings {r1

chr ,r2chr,...,rM

chr }Convolutionalapproachtocharacter-levelfeatureextraction.


SENTENCE-LEVEL REPRESENTATION SecondConvolutionalLayerWhydoweuseaconvolutionallayerforsentence-levelrepresentation?Becauseextractingasentence-widefeaturesethavetwomainproblems:sentenceshavedifferentsizesand;importantinformationinasentencecanappearatanypositioninthesentence

• Givenasentencex withN words{w1,w2, ...,wN},whichhavebeenconvertedtojointword-levelandcharacter-levelembedding{u1, u2, ..., uN}

• Extractasentence-levelrepresentationrsentx .

• Thislayerproduceslocalfeaturesaroundeachwordinthesentenceandthencombinesthemusingamaxoperationtocreateafixed-sizedfeaturevectorforthesentence.


TRAINING THE NETWORK

• NetworkistrainedbyminimizinganegativelikelihoodoverthetrainingsetD.• Givenasentencex,thenetworkwithparametersetθ computesascoresθ(x)τforeachsentimentlabel τ∈ T.

• weapplyasoftmax operationoverthescoresofalltagsτ∈ T:

• Weusestochasticgradientdescent(SGD)tominimizethenegativelog-likelihoodwithrespecttoθ :

• where(x, y) correspondstoasentenceinthetrainingcorpusD andy representsitsrespectivelabel.

• CharSCNN architecturewasimplementedusingtheTheano library


• In(Chrupala,2013),theauthorproposesasimplerecurrentnetwork(SRN)tolearncontinuousvectorrepresentationsforsequencesofcharacters,andusethemasfeaturesinaconditionalrandomfieldclassifiertosolveacharacterleveltextsegmentationandlabelingtask.

• In(Collobert etal.,2011),theauthorsuseaconvolutionalnetworkforthesemanticrolelabelingtaskwiththegoalavoidingexcessivetask-specificfeatureengineering.Theauthorsuseasimilarnetworkarchitectureforsyntacticparsing.


RELATED WORKS

61

ADVANTAGES OF CharSCNN

• Itusesafeed-forwardneuralnetworkinsteadofarecursiveoneanditdoesnotneedanyinputaboutthesyntacticstructureofthesentence.

• Theadditionofoneconvolutionallayertoextractcharacterfeatures.

• CharSCNN approachtoextractcharacter-levelfeaturesisitflexibility.


EXPERIMENTDataset• ThemoviereviewdatasetusedistherecentlyproposedStanfordSentimentTreebank(SSTb)whichincludesfinegrainedsentimentlabelsfor215,154phrasesintheparsetreesof11,855sentences.

• StanfordTwitterSentimentcorpus(STS)theoriginaltrainingsetcontains1.6milliontweetsthatwereautomaticallylabeledaspositive/negativeusingemoticonsasnoisylabels


WhyunsupervisedLearningofWord-LevelEmbeddings?• Becauserecentworkhasshownthatlargeimprovementsintermsofmodelaccuracycanbeobtainedbyperformingunsupervisedpre-trainingofwordembeddings

Howwastheunsupervisedlearningofword-levelembeddings wasperformed?• weperformunsupervisedlearningofword-levelembeddings usingtheword2vectool3,whichimplementsthecontinuousbag-of-wordsandskip-gramarchitecturesforcomputingvectorrepresentationsofwords


UNSUPERVISED PRE-TRAINING OF WORD-LEVEL EMBEDDINGS

64

Datasetforpre-trainingofwordembeddings• December2013snapshotoftheEnglishWikipediacorpusasasourceofunlabeleddata

Processingthedataset1. removalofparagraphsthatarenotinEnglish2. substitutionofnon-westerncharactersforaspecialcharacter3. tokenizationofthetextusingthetokenizeravailablewiththe

StanfordPOSTagger4. removalofsentencesthatarelessthan20characterslong

(includingwhitespaces)orhavelessthan5tokens.5. welowercaseallwordsandsubstituteeachnumericaldigitbya0

(e.g.,1967becomes0000).


Process• awordmustoccuratleast10timesinordertobeincludedinthevocabulary,whichresultedinavocabularyof870,214entries

• Totrainourword-levelembeddings weuseword2vec’sskip-grammethodwithacontextwindowofsize9.

• ThetrainingtimefortheEnglishcorpusisaround1h10minusing12threadsinaIntelXeonE5-26433.30GHzmachine


MODEL SETUP• Developmentsetsisusedtotunetheneuralnetworkhyper-parameters• Spentmoretimewasspenttuningthe learningratethantuningotherparameters,sinceitisthehyper-parameterthathasthelargestimpactinthepredictionperformance

• Theonlytwoparameterswithdifferentvaluesforthetwodatasetsarethelearningrateandthenumberofunitsintheconvolutionallayerthatextractsentencefeatures

• thenumberoftrainingepochsvariesbetweenfiveandtenforthetwodataset


Neural Network Hyper-Parameters• weuse4threadsinaIntelXeonE5-26433.30GHzmachine.• Theano basedimplementationofCharSCNN takesaround10min.tocompleteonetrainingepochfortheSSTb corpuswithallphrasesandfiveclasses.


RESULTS FOR SSTB CORPUS

• wecheckwhetherusingexamplesthataresinglephrases,inadditiontocompletesentences,canprovideusefulinformationfortraining

• Phrasesindicateswhetherallphrases(yes)oronlycompletesentences(no)inthecorpusareusedfortraining.

• TheFine-Grainedcolumncontainspredictionresultsforthecasewhere5sentimentclasses(labels)areused(verynegative,negative,neutral,positive,verypositive).


NOTE

• CharSCNN andSCNNhaveverysimilarresultsinbothfine-grainedandbinarysentimentprediction.Theseresultssuggestthatthecharacter-levelinformationisnotmuchhelpfulforsentimentpredictionintheSSTbcorpus.

• Usingphrasesastrainingexamplesallowstheclassifiertolearnmorecomplexphenomena,sincesentimentlabeledphrasesgivetheinformationofhowwords(phrases)combinetoformthesentimentofphrases(sentences).

• ComparedtoRNTN,CharSCNN hastheadvantageofnotneedingtheoutputofasyntacticparserwhenperformingsentimentprediction


RESULTS FOR STS CORPUS


• character-levelinformationhasagreaterimpactforTwitterdata• Usingunsupervisedpre-training,CharSCNN providesanabsolute

accuracyimprovementof1.2overSCNN

71

CONCLUSIONS


• positivesentence(left)anditsnegation(right).• theextractedfeaturesconcentratemainlyaroundthemaintopic,“film”,and

thepartofthephrasethatindicatessentiment(“liked”and“did’nt like”).• leftchartthattheword“liked”hasabigimpactinthesetofextractedfeatures.• rightchart,wecanseethattheimpactoftheword“like’’isreducedbecauseof

thenegation“did’nt”,whichisresponsibleforalargepartoftheextractedfeatures.

72


• negativeexpression“incrediblydull”isresponsibleforthefeaturesextractedfromthesentenceintheleft

• “definitelynotdull”,whichissomewhatmorepositive,isresponsibleforthefeaturesextractedfromthesentenceinthechartatright.

Whydoweusenegation?• negationisanimportantissueinsentimentanalysis

73

ADVISED IMPROVEMENT

• Usethreeconvolutionallayer,firstlayerforcharacterlevelembedding,secondforwordlevelembeddingandthirdforsentencelevelembedding

• Performunsupervisedpre-trainingatcharacter-levelrepresentations


REFERENCES• https://www.brandwatch.com/blog/understanding-sentiment-analysis/• https://deeplearning4j.org/word2vec.html• http://cs231n.github.io/convolutional-networks/• https://www.clickworker.com/2017/03/14/sentiment-analysis-what-is-it-for/• https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/• https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks-Part-2/