Sentiment Analysis - sikaman.dyndns.org · Dan Jurafsky Posive or negave ... • Public sen1ment:...

83
Sentiment Analysis What is Sen+ment Analysis?

Transcript of Sentiment Analysis - sikaman.dyndns.org · Dan Jurafsky Posive or negave ... • Public sen1ment:...

Sentiment Analysis

WhatisSen+mentAnalysis?

DanJurafsky

Posi%veornega%vemoviereview?

•  unbelievablydisappoin+ng•  Fullofzanycharactersandrichlyappliedsa+re,andsome

greatplottwists•  thisisthegreatestscrewballcomedyeverfilmed•  Itwaspathe+c.Theworstpartaboutitwastheboxing

scenes.

2

DanJurafsky

GoogleProductSearch

•  a

3

DanJurafsky

BingShopping

•  a

4

DanJurafsky Twi;ersen%mentversusGallupPollofConsumerConfidence

Brendan O'Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. 2010. FromTweetstoPolls:LinkingTextSen+menttoPublicOpinionTimeSeries.InICWSM-2010

5

DanJurafsky

Twi;ersen%ment:

JohanBollen,HuinaMao,XiaojunZeng.2011.TwiYermoodpredictsthestockmarket,JournalofComputa+onalScience2:1,1-8.10.1016/j.jocs.2010.12.007.

6

DanJurafsky

7

DowJo

nes•  CALMpredicts

DJIA3dayslater

•  Atleastonecurrenthedgefundusesthisalgorithm

CALM

Bollenetal.(2011)

DanJurafsky

TargetSen%mentonTwi;er

•  TwiYerSen+mentApp•  AlecGo,RichaBhayani,LeiHuang.2009.

TwiYerSen+mentClassifica+onusingDistantSupervision

8

DanJurafsky

Sen%mentanalysishasmanyothernames

•  Opinionextrac+on•  Opinionmining•  Sen+mentmining•  Subjec+vityanalysis

9

DanJurafsky

Whysen%mentanalysis?

•  Movie:isthisreviewposi+veornega+ve?•  Products:whatdopeoplethinkaboutthenewiPhone?•  Publicsen1ment:howisconsumerconfidence?Isdespairincreasing?

•  Poli1cs:whatdopeoplethinkaboutthiscandidateorissue?•  Predic1on:predictelec+onoutcomesormarkettrendsfromsen+ment

10

DanJurafsky

SchererTypologyofAffec%veStates

•  Emo%on:brieforganicallysynchronized…evalua+onofamajorevent•  angry,sad,joyful,fearful,ashamed,proud,elated

•  Mood:diffusenon-causedlow-intensitylong-dura+onchangeinsubjec+vefeeling•  cheerful,gloomy,irritable,listless,depressed,buoyant

•  Interpersonalstances:affec+vestancetowardanotherpersoninaspecificinterac+on•  friendly,flirta1ous,distant,cold,warm,suppor1ve,contemptuous

•  AGtudes:enduring,affec+velycoloredbeliefs,disposi+onstowardsobjectsorpersons•  liking,loving,ha1ng,valuing,desiring

•  Personalitytraits:stablepersonalitydisposi+onsandtypicalbehaviortendencies•  nervous,anxious,reckless,morose,hos1le,jealous

11

DanJurafsky

SchererTypologyofAffec%veStates

•  Emo%on:brieforganicallysynchronized…evalua+onofamajorevent•  angry,sad,joyful,fearful,ashamed,proud,elated

•  Mood:diffusenon-causedlow-intensitylong-dura+onchangeinsubjec+vefeeling•  cheerful,gloomy,irritable,listless,depressed,buoyant

•  Interpersonalstances:affec+vestancetowardanotherpersoninaspecificinterac+on•  friendly,flirta1ous,distant,cold,warm,suppor1ve,contemptuous

•  AGtudes:enduring,affec%velycoloredbeliefs,disposi%onstowardsobjectsorpersons•  liking,loving,ha1ng,valuing,desiring

•  Personalitytraits:stablepersonalitydisposi+onsandtypicalbehaviortendencies•  nervous,anxious,reckless,morose,hos1le,jealous

12

DanJurafsky

Sen%mentAnalysis

•  Sen+mentanalysisisthedetec+onofaGtudes“enduring,affec+velycoloredbeliefs,disposi+onstowardsobjectsorpersons”1.   Holder(source)ofagtude2.   Target(aspect)ofagtude3.   Typeofagtude•  Fromasetoftypes

•  Like,love,hate,value,desire,etc.•  Or(morecommonly)simpleweightedpolarity:

•  posi1ve,nega1ve,neutral,togetherwithstrength4.   Textcontainingtheagtude•  Sentenceoren+redocument13

DanJurafsky

Sen%mentAnalysis

•  Simplesttask:•  Istheagtudeofthistextposi+veornega+ve?

•  Morecomplex:• Ranktheagtudeofthistextfrom1to5

•  Advanced:• Detectthetarget,source,orcomplexagtudetypes

14

DanJurafsky

Sen%mentAnalysis

•  Simplesttask:•  Istheagtudeofthistextposi+veornega+ve?

•  Morecomplex:• Ranktheagtudeofthistextfrom1to5

•  Advanced:• Detectthetarget,source,orcomplexagtudetypes

15

Sentiment Analysis

WhatisSen+mentAnalysis?

Sentiment Analysis

ABaselineAlgorithm

DanJurafsky

Sentiment Classification in Movie Reviews

•  Polaritydetec+on:•  IsanIMDBmoviereviewposi+veornega+ve?

•  Data:PolarityData2.0:•  hYp://www.cs.cornell.edu/people/pabo/movie-review-data

BoPang,LillianLee,andShivakumarVaithyanathan.2002.Thumbsup?Sen+mentClassifica+onusingMachineLearningTechniques.EMNLP-2002,79—86.BoPangandLillianLee.2004.ASen+mentalEduca+on:Sen+mentAnalysisUsingSubjec+vitySummariza+onBasedonMinimumCuts.ACL,271-278

18

DanJurafsky

IMDBdatainthePangandLeedatabase

when_starwars_cameoutsometwentyyearsago,theimageoftravelingthroughoutthestarshasbecomeacommonplaceimage.[…]whenhansologoeslightspeed,thestarschangetobrightlines,goingtowardstheviewerinlinesthatconvergeataninvisiblepoint.cool._octobersky_offersamuchsimplerimage–thatofasinglewhitedot,travelinghorizontallyacrossthenightsky.[...]

“snakeeyes”isthemostaggrava+ngkindofmovie:thekindthatshowssomuchpoten+althenbecomesunbelievablydisappoin+ng.it’snotjustbecausethisisabriandepalmafilm,andsincehe’sagreatdirectorandonewho’sfilmsarealwaysgreetedwithatleastsomefanfare.andit’snotevenbecausethiswasafilmstarringnicolascageandsincehegivesabrauvaraperformance,thisfilmishardlyworthhistalents.

✓ ✗

19

DanJurafsky

BaselineAlgorithm(adaptedfromPangandLee)

•  Tokeniza+on•  FeatureExtrac+on•  Classifica+onusingdifferentclassifiers

•  NaïveBayes•  MaxEnt•  SVM

20

DanJurafsky

Sen%mentTokeniza%onIssues

•  DealwithHTMLandXMLmarkup•  TwiYermark-up(names,hashtags)•  Capitaliza+on(preserveforwordsinallcaps)•  Phonenumbers,dates•  Emo+cons•  Usefulcode:

•  ChristopherPoYssen+menttokenizer•  BrendanO’ConnortwiYertokenizer21

[<>]? # optional hat/brow[:;=8] # eyes[\-o\*\']? # optional nose[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth | #### reverse orientation[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth[\-o\*\']? # optional nose[:;=8] # eyes[<>]? # optional hat/brow

PoYsemo+cons

DanJurafsky

Extrac%ngFeaturesforSen%mentClassifica%on

•  Howtohandlenega+on•  I didn’t like this movievs•  I really like this movie

•  Whichwordstouse?•  Onlyadjec+ves•  Allwords•  AllwordsturnsouttoworkbeYer,atleastonthisdata

22

DanJurafsky

Nega%on

AddNOT_toeverywordbetweennega+onandfollowingpunctua+on:

didn’t like this movie , but I

didn’t NOT_like NOT_this NOT_movie but I

Das,SanjivandMikeChen.2001.Yahoo!forAmazon:Extrac+ngmarketsen+mentfromstockmessageboards.InProceedingsoftheAsiaPacificFinanceAssocia+onAnnualConference(APFA).Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86.

23

DanJurafsky

Reminder:NaïveBayes

24

P̂(w | c) = count(w,c)+1count(c)+ V

cNB = argmaxc j∈C

P(cj ) P(wi | cj )i∈positions∏

DanJurafsky

Binarized(Booleanfeature)Mul%nomialNaïveBayes

•  Intui+on:•  Forsen+ment(andprobablyforothertextclassifica+ondomains)•  WordoccurrencemaymaYermorethanwordfrequency•  Theoccurrenceofthewordfantas1ctellsusalot•  Thefactthatitoccurs5+mesmaynottellusmuchmore.

•  BooleanMul+nomialNaïveBayes•  Clipsallthewordcountsineachdocumentat1

25

DanJurafsky

BooleanMul%nomialNaïveBayes:Learning

•  CalculateP(cj)terms•  ForeachcjinCdo

docsj←alldocswithclass=cj

P(cj )←| docsj |

| total # documents|

P(wk | cj )←nk +α

n+α |Vocabulary |

•  Textj←singledoccontainingalldocsj•  ForeachwordwkinVocabulary

nk←#ofoccurrencesofwkinTextj

•  Fromtrainingcorpus,extractVocabulary

•  CalculateP(wk|cj)terms•  Removeduplicatesineachdoc:

•  Foreachwordtypewindocj•  Retainonlyasingleinstanceofw

26

DanJurafsky

BooleanMul%nomialNaïveBayesonatestdocumentd

27

•  Firstremoveallduplicatewordsfromd•  ThencomputeNBusingthesameequa+on:

cNB = argmaxc j∈C

P(cj ) P(wi | cj )i∈positions∏

DanJurafsky

Normalvs.BooleanMul%nomialNBNormal Doc Words ClassTraining 1 ChineseBeijingChinese c

2 ChineseChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j

Test 5 ChineseChineseChineseTokyoJapan ?

28

Boolean Doc Words ClassTraining 1 ChineseBeijing c

2 ChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j

Test 5 ChineseTokyoJapan ?

DanJurafsky

Binarized(Booleanfeature)Mul%nomialNaïveBayes

•  BinaryseemstoworkbeYerthanfullwordcounts•  ThisisnotthesameasMul+variateBernoulliNaïveBayes•  MBNBdoesn’tworkwellforsen+mentorothertexttasks

•  Otherpossibility:log(freq(w))29

B.Pang,L.Lee,andS.Vaithyanathan.2002.Thumbsup?Sen+mentClassifica+onusingMachineLearningTechniques.EMNLP-2002,79—86.V.Metsis,I.Androutsopoulos,G.Paliouras.2006.SpamFilteringwithNaiveBayes–WhichNaiveBayes?CEAS2006-ThirdConferenceonEmailandAn+-Spam.K.-M.Schneider.2004.Onwordfrequencyinforma+onandnega+veevidenceinNaiveBayestextclassifica+on.ICANLP,474-485.JDRennie,LShih,JTeevan.2003.Tacklingthepoorassump+onsofnaivebayestextclassifiers.ICML2003

DanJurafsky

Cross-Valida%on

•  Breakupdatainto10folds•  (Equalposi+veandnega+veinsideeachfold?)

•  Foreachfold•  Choosethefoldasatemporarytestset

•  Trainon9folds,computeperformanceonthetestfold

•  Reportaverageperformanceofthe10runs

TrainingTest

Test

Test

Test

Test

Training

Training Training

Training

Training

Iteration

1

2

3

4

530

DanJurafsky

OtherissuesinClassifica%on

•  MaxEntandSVMtendtodobeYerthanNaïveBayes

31

DanJurafsky

Gene%cAlgorithmsforbinarysen%ment

•  Asbefore,createavocabularyonnwords•  GArepresenta+onis:

•  Eachgeneisafloatinrange(0,1).Ciseitherposi+veornega+vesen+mentastheothervaluewithbe1-p(wi|c)

•  Useslide27topredictsen+mentclass.Fitnessiserrorinpredic+on

32

p(wi|c) p(w1|c) p(wn|c) … …

DanJurafsky

GAFitness

•  Ifdocumentdihascsi(+1forposi+veor0fornega+vesen+ment)andcpikpredictedbythekthpopula+onmember

•  Normalizedfitnessforkthpopula+onmemberformdocuments:

•  Here,wewouldbemaximizingfitness,Fk.Thisisinrange(0,1)33

Fk =1−1m

csi − cpiki=1

m

DanJurafsky Problems:Whatmakesreviewshardtoclassify?

•  Subtlety:•  PerfumereviewinPerfumes:theGuide:•  “Ifyouarereadingthisbecauseitisyourdarlingfragrance,pleasewearitathomeexclusively,andtapethewindowsshut.”

•  DorothyParkeronKatherineHepburn•  “Sherunsthegamutofemo+onsfromAtoB”

34

DanJurafsky

ThwartedExpecta%onsandOrderingEffects

•  “Thisfilmshouldbebrilliant.Itsoundslikeagreatplot,theactorsarefirstgrade,andthesuppor+ngcastisgoodaswell,andStalloneisaYemp+ngtodeliveragoodperformance.However,itcan’tholdup.”

•  WellasusualKeanuReevesisnothingspecial,butsurprisingly,theverytalentedLaurenceFishbourneisnotsogoodeither,Iwassurprised.

35

Sentiment Analysis

ABaselineAlgorithm

Sentiment Analysis

Sen+mentLexicons

DanJurafsky

TheGeneralInquirer

•  Homepage:hYp://www.wjh.harvard.edu/~inquirer•  ListofCategories:hYp://www.wjh.harvard.edu/~inquirer/homecat.htm

•  Spreadsheet:hYp://www.wjh.harvard.edu/~inquirer/inquirerbasic.xls•  Categories:

•  Posi+v(1915words)andNega+v(2291words)•  StrongvsWeak,Ac+vevsPassive,OverstatedversusUnderstated•  Pleasure,Pain,Virtue,Vice,Mo+va+on,Cogni+veOrienta+on,etc

•  FreeforResearchUse

PhilipJ.Stone,DexterCDunphy,MarshallS.Smith,DanielM.Ogilvie.1966.TheGeneralInquirer:AComputerApproachtoContentAnalysis.MITPress

38

DanJurafsky

LIWC(Linguis%cInquiryandWordCount)Pennebaker,J.W.,Booth,R.J.,&Francis,M.E.(2007).Linguis+cInquiryandWordCount:LIWC2007.Aus+n,TX

•  Homepage:hYp://www.liwc.net/•  2300words,>70classes•  Affec%veProcesses

•  nega+veemo+on(bad,weird,hate,problem,tough)•  posi+veemo+on(love,nice,sweet)

•  Cogni%veProcesses•  Tenta+ve(maybe,perhaps,guess),Inhibi+on(block,constraint)

•  Pronouns,Nega%on(no,never),Quan%fiers(few,many)•  $30or$90fee39

DanJurafsky

MPQASubjec%vityCuesLexicon

•  Homepage:hYp://www.cs.piY.edu/mpqa/subj_lexicon.html•  6885wordsfrom8221lemmas

•  2718posi+ve•  4912nega+ve

•  Eachwordannotatedforintensity(strong,weak)•  GNUGPL40

Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005. Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.

DanJurafsky

BingLiuOpinionLexicon

•  BingLiu'sPageonOpinionMining•  hYp://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar

•  6786words•  2006posi+ve•  4783nega+ve

41

MinqingHuandBingLiu.MiningandSummarizingCustomerReviews.ACMSIGKDD-2004.

DanJurafsky

Sen%WordNetStefanoBaccianella,AndreaEsuli,andFabrizioSebas+ani.2010SENTIWORDNET3.0:AnEnhancedLexicalResourceforSen+mentAnalysisandOpinionMining.LREC-2010

•  Homepage:hYp://sen+wordnet.is+.cnr.it/•  AllWordNetsynsetsautoma+callyannotatedfordegreesofposi+vity,

nega+vity,andneutrality/objec+veness•  [es+mable(J,3)]“maybecomputedores+mated”

Pos 0 Neg 0 Obj 1 •  [es+mable(J,1)]“deservingofrespectorhighregard”

Pos .75 Neg 0 Obj .25

42

DanJurafsky

Disagreementsbetweenpolaritylexicons

OpinionLexicon

GeneralInquirer

Sen%WordNet LIWC

MPQA 33/5402(0.6%) 49/2867(2%) 1127/4214(27%) 12/363(3%)

OpinionLexicon 32/2411(1%) 1004/3994(25%) 9/403(2%)

GeneralInquirer 520/2306(23%) 1/204(0.5%)

Sen%WordNet 174/694(25%)

LIWC

43

ChristopherPoYs,Sen+mentTutorial,2011

DanJurafsky

AnalyzingthepolarityofeachwordinIMDB

•  Howlikelyiseachwordtoappearineachsen+mentclass?•  Count(“bad”)in1-star,2-star,3-star,etc.•  Butcan’tuserawcounts:•  Instead,likelihood:•  Makethemcomparablebetweenwords

•  Scaledlikelihood:

PoYs,Christopher.2011.Onthenega+vityofnega+on.SALT20,636-659.

P(w | c) = f (w,c)f (w,c)

w∈c∑

P(w | c)P(w)

44

DanJurafsky

AnalyzingthepolarityofeachwordinIMDB

●●

●●

●●

●●

POS good (883,417 tokens)

1 2 3 4 5 6 7 8 9 10

0.080.10.12

● ● ● ● ●●

amazing (103,509 tokens)

1 2 3 4 5 6 7 8 9 10

0.05

0.17

0.28

●●

●●

great (648,110 tokens)

1 2 3 4 5 6 7 8 9 10

0.05

0.11

0.17

● ● ● ●●

awesome (47,142 tokens)

1 2 3 4 5 6 7 8 9 10

0.05

0.16

0.27

Pr(c|w)

Rating

● ● ● ●

●● ●

NEG good (20,447 tokens)

1 2 3 4 5 6 7 8 9 10

0.03

0.1

0.16● ●

●●

●● ● ●

depress(ed/ing) (18,498 tokens)

1 2 3 4 5 6 7 8 9 10

0.080.110.13

●● ●

bad (368,273 tokens)

1 2 3 4 5 6 7 8 9 10

0.04

0.12

0.21

●● ● ●

terrible (55,492 tokens)

1 2 3 4 5 6 7 8 9 10

0.03

0.16

0.28

Pr(c|w)

Rating

Scaled

likelihoo

dP(w|c)/P(w)

Scaled

likelihoo

dP(w|c)/P(w)

PoYs,Christopher.2011.Onthenega+vityofnega+on.SALT20,636-659.

45

DanJurafsky

Othersen%mentfeature:Logicalnega%on

•  Islogicalnega+on(no,not)associatedwithnega+vesen+ment?

•  PoYsexperiment:•  Countnega+on(not,n’t,no,never)inonlinereviews•  Regressagainstthereviewra+ng

PoYs,Christopher.2011.Onthenega+vityofnega+on.SALT20,636-659.

46

DanJurafsky Po;s2011Results:Morenega%oninnega%vesen%ment

a

Scaled

likelihoo

dP(w|c)/P(w)

47

Sentiment Analysis

Sen+mentLexicons

Sentiment Analysis

LearningSen+mentLexicons

DanJurafsky

Semi-supervisedlearningoflexicons

•  Useasmallamountofinforma+on•  Afewlabeledexamples•  Afewhand-builtpaYerns

•  Tobootstrapalexicon

50

DanJurafsky

HatzivassiloglouandMcKeownintui%onforiden%fyingwordpolarity

•  Adjec+vesconjoinedby“and”havesamepolarity•  Fairandlegi+mate,corruptandbrutal•  *fairandbrutal,*corruptandlegi+mate

•  Adjec+vesconjoinedby“but”donot•  fairbutbrutal

51

VasileiosHatzivassiloglouandKathleenR.McKeown.1997.Predic+ngtheSeman+cOrienta+onofAdjec+ves.ACL,174–181

DanJurafsky

Hatzivassiloglou&McKeown1997Step1

•  Labelseedsetof1336adjec+ves(all>20in21millionwordWSJcorpus)•  657posi+ve•  adequatecentralcleverfamousintelligentremarkablereputedsensi+veslenderthriving…

•  679nega+ve•  contagiousdrunkenignorantlankylistlessprimi+vestridenttroublesomeunresolvedunsuspec+ng…

52

DanJurafsky

Hatzivassiloglou&McKeown1997Step2

•  Expandseedsettoconjoinedadjec+ves

53

nice, helpful

nice, classy

DanJurafsky

Hatzivassiloglou&McKeown1997Step3

•  Supervisedclassifierassigns“polaritysimilarity”toeachwordpair,resul+ngingraph:

54

classy

nice

helpful

fair

brutal

irrational corrupt

DanJurafsky

Hatzivassiloglou&McKeown1997Step4

•  Clusteringforpar++oningthegraphintotwo

55

classy

nice

helpful

fair

brutal

irrational corrupt

+ -

DanJurafsky

Outputpolaritylexicon

•  Posi+ve•  bolddecisivedisturbinggenerousgoodhonestimportantlargematurepa+entpeacefulposi+veproudsounds+mula+ngstraigh�orwardstrangetalentedvigorouswiYy…

•  Nega+ve•  ambiguouscau+ouscynicalevasiveharmfulhypocri+calinefficientinsecureirra+onalirresponsibleminoroutspokenpleasantrecklessriskyselfishtediousunsupportedvulnerablewasteful…

56

DanJurafsky

Outputpolaritylexicon

•  Posi+ve•  bolddecisivedisturbinggenerousgoodhonestimportantlargematurepa+entpeacefulposi+veproudsounds+mula+ngstraigh�orwardstrangetalentedvigorouswiYy…

•  Nega+ve•  ambiguouscau%ouscynicalevasiveharmfulhypocri+calinefficientinsecureirra+onalirresponsibleminoroutspokenpleasantrecklessriskyselfishtediousunsupportedvulnerablewasteful…

57

DanJurafsky

TurneyAlgorithm

1.  Extractaphrasallexiconfromreviews2.  Learnpolarityofeachphrase3.  Rateareviewbytheaveragepolarityofitsphrases

58

Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

DanJurafsky

Extracttwo-wordphraseswithadjec%ves

FirstWord SecondWord ThirdWord(notextracted)

JJ NNorNNS anythingRB,RBR,RBS JJ NotNNnorNNSJJ JJ NotNNorNNSNNorNNS JJ NorNNnorNNSRB,RBR,orRBS VB,VBD,VBN,VBG anything59

DanJurafsky

Howtomeasurepolarityofaphrase?

•  Posi+vephrasesco-occurmorewith“excellent”•  Nega+vephrasesco-occurmorewith“poor”•  Buthowtomeasureco-occurrence?

60

DanJurafsky

PointwiseMutualInforma%on

•  Mutualinforma%onbetween2randomvariablesXandY•  Pointwisemutualinforma%on:

•  Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?

I(X,Y ) = P(x, y)y∑

x∑ log2

P(x,y)P(x)P(y)

PMI(X,Y ) = log2P(x,y)P(x)P(y)

61

DanJurafsky

PointwiseMutualInforma%on

•  Pointwisemutualinforma%on:•  Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?

•  PMIbetweentwowords:•  Howmuchmoredotwowordsco-occurthaniftheywereindependent?

PMI(word1,word2 ) = log2P(word1,word2)P(word1)P(word2)

PMI(X,Y ) = log2P(x,y)P(x)P(y)

62

DanJurafsky

HowtoEs%matePointwiseMutualInforma%on

• Querysearchengine(Altavista)• P(word)es+matedbyhits(word)/N• P(word1,word2)byhits(word1 NEAR word2)/N•  (MorecorrectlythebigramdenominatorshouldbekN,becausethereareatotalofNconsecu+vebigrams(word1,word2),butkNbigramsthatarekwordsapart,butwejustuseNontherestofthisslideandthenext.)

PMI(word1,word2 ) = log2

1Nhits(word1 NEAR word2)

1Nhits(word1) 1

Nhits(word2)

63

DanJurafsky

Doesphraseappearmorewith“poor”or“excellent”?

64

Polarity(phrase) = PMI(phrase,"excellent")−PMI(phrase,"poor")

= log2hits(phrase NEAR "excellent")hits("poor")hits(phrase NEAR "poor")hits("excellent")!

"#

$

%&

= log2hits(phrase NEAR "excellent")

hits(phrase)hits("excellent")hits(phrase)hits("poor")

hits(phrase NEAR "poor")

= log2

1N hits(phrase NEAR "excellent")1N hits(phrase) 1

N hits("excellent")− log2

1N hits(phrase NEAR "poor")1N hits(phrase) 1

N hits("poor")

DanJurafsky

Phrasesfromathumbs-upreview

65

Phrase POStags Polarity

onlineservice JJNN 2.8

onlineexperience JJNN 2.3

directdeposit JJNN 1.3

localbranch JJNN 0.42…

lowfees JJNNS 0.33

trueservice JJNN -0.73

otherbank JJNN -0.85

inconvenientlylocated JJNN -1.5

Average 0.32

DanJurafsky

Phrasesfromathumbs-downreview

66

Phrase POStags Polarity

directdeposits JJNNS 5.8

onlineweb JJNN 1.9

veryhandy RBJJ 1.4…

virtualmonopoly JJNN -2.0

lesserevil RBRJJ -2.3

otherproblems JJNNS -2.8

lowfunds JJNNS -6.8

unethicalprac+ces JJNNS -8.5

Average -1.2

DanJurafsky

ResultsofTurneyalgorithm

•  410reviewsfromEpinions•  170(41%)nega+ve•  240(59%)posi+ve

•  Majorityclassbaseline:59%•  Turneyalgorithm:74%

•  Phrasesratherthanwords•  Learnsdomain-specificinforma+on67

DanJurafsky

UsingWordNettolearnpolarity

•  WordNet:onlinethesaurus(coveredinlaterlecture).•  Createposi+ve(“good”)andnega+veseed-words(“terrible”)•  FindSynonymsandAntonyms

•  Posi+veSet:Addsynonymsofposi+vewords(“well”)andantonymsofnega+vewords

•  Nega+veSet:Addsynonymsofnega+vewords(“awful”)andantonymsofposi+vewords(”evil”)

•  Repeat,followingchainsofsynonyms•  Filter68

S.M.KimandE.Hovy.2004.Determiningthesen+mentofopinions.COLING2004M.HuandB.Liu.Miningandsummarizingcustomerreviews.InProceedingsofKDD,2004

DanJurafsky

SummaryonLearningLexicons

•  Advantages:•  Canbedomain-specific•  Canbemorerobust(morewords)

•  Intui+on•  Startwithaseedsetofwords(‘good’,‘poor’)•  Findotherwordsthathavesimilarpolarity:•  Using“and”and“but”•  Usingwordsthatoccurnearbyinthesamedocument•  UsingWordNetsynonymsandantonyms

•  Useseedsandsemi-supervisedlearningtoinducelexicons

69

Sentiment Analysis

LearningSen+mentLexicons

Sentiment Analysis

OtherSen+mentTasks

DanJurafsky

Findingsen%mentofasentence

•  ImportantforfindingaspectsoraYributes•  Targetofsen+ment

•  The food was great but the service was awful

72

DanJurafsky

Findingaspect/a;ribute/targetofsen%ment

•  Frequentphrases+rules•  Findallhighlyfrequentphrasesacrossreviews(“fish tacos”)•  Filterbyruleslike“occursrighta�ersen+mentword”•  “…great fish tacos”meansfish tacos alikelyaspect

Casino casino,buffet,pool,resort,bedsChildren’sBarber haircut,job,experience,kidsGreekRestaurant food,wine,service,appe+zer,lambDepartmentStore selec+on,department,sales,shop,clothing

M.HuandB.Liu.2004.Miningandsummarizingcustomerreviews.InProceedingsofKDD.S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSen+mentSummarizerforLocalServiceReviews.WWWWorkshop.

73

DanJurafsky

Findingaspect/a;ribute/targetofsen%ment

•  Theaspectnamemaynotbeinthesentence•  Forrestaurants/hotels,aspectsarewell-understood•  Supervisedclassifica+on

•  Hand-labelasmallcorpusofrestaurantreviewsentenceswithaspect•  food,décor,service,value,NONE

•  Trainaclassifiertoassignanaspecttoasentence•  “Giventhissentence,istheaspectfood,décor,service,value,orNONE”

74

DanJurafsky

PuGngitalltogether:Findingsen%mentforaspects

75

ReviewsFinalSummary

Sentences&Phrases

Sentences&Phrases

Sentences&Phrases

Text Extractor

Sentiment Classifier

Aspect Extractor Aggregator

S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSen+mentSummarizerforLocalServiceReviews.WWWWorkshop

DanJurafsky

ResultsofBlair-Goldensohnetal.methodRooms(3/5stars,41comments)

(+)Theroomwascleanandeverythingworkedfine–eventhewaterpressure...

(+)Wewentbecauseofthefreeroomandwaspleasantlypleased...

(-)…theworsthotelIhadeverstayedat...Service(3/5stars,31comments)

(+)Uponcheckingoutanothercouplewascheckingearlyduetoaproblem...

(+)Everysinglehotelstaffmembertreatedusgreatandansweredevery...

(-)ThefoodiscoldandtheservicegivesnewmeaningtoSLOW.

Dining(3/5stars,18comments)(+)ourfavoriteplacetostayinbiloxi.thefoodisgreatalsotheservice...(+)OfferoffreebuffetforjoiningthePlay76

DanJurafsky

Baselinemethodsassumeclasseshaveequalfrequencies!

•  Ifnotbalanced(commonintherealworld)•  can’tuseaccuraciesasanevalua+on•  needtouseF-scores

•  Severeimbalancingalsocandegradeclassifierperformance•  Twocommonsolu+ons:

1.  Resamplingintraining•  Randomundersampling

2.  Cost-sensi+velearning•  PenalizeSVMmoreformisclassifica+onoftherarething

77

DanJurafsky

Howtodealwith7stars?

1. Maptobinary2.  Uselinearorordinalregression• Orspecializedmodelslikemetriclabeling

78

Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. ACL, 115–124

DanJurafsky

SummaryonSen%ment

•  Generallymodeledasclassifica+onorregressiontask•  predictabinaryorordinallabel

•  Features:•  Nega+onisimportant•  Usingallwords(innaïvebayes)workswellforsometasks•  Findingsubsetsofwordsmayhelpinothertasks•  Hand-builtpolaritylexicons•  Useseedsandsemi-supervisedlearningtoinducelexicons79

DanJurafsky

SchererTypologyofAffec%veStates

•  Emo%on:brieforganicallysynchronized…evalua+onofamajorevent•  angry,sad,joyful,fearful,ashamed,proud,elated

•  Mood:diffusenon-causedlow-intensitylong-dura+onchangeinsubjec+vefeeling•  cheerful,gloomy,irritable,listless,depressed,buoyant

•  Interpersonalstances:affec+vestancetowardanotherpersoninaspecificinterac+on•  friendly,flirta1ous,distant,cold,warm,suppor1ve,contemptuous

•  AGtudes:enduring,affec+velycoloredbeliefs,disposi+onstowardsobjectsorpersons•  liking,loving,ha1ng,valuing,desiring

•  Personalitytraits:stablepersonalitydisposi+onsandtypicalbehaviortendencies•  nervous,anxious,reckless,morose,hos1le,jealous

80

DanJurafsky

Computa%onalworkonotheraffec%vestates

•  Emo%on:•  Detec+ngannoyedcallerstodialoguesystem•  Detec+ngconfused/frustratedversusconfidentstudents

•  Mood:•  Findingtrauma+zedordepressedwriters

•  Interpersonalstances:•  Detec+onofflirta+onorfriendlinessinconversa+ons

•  Personalitytraits:•  Detec+onofextroverts

81

DanJurafsky

Detec%onofFriendliness

•  Friendlyspeakersusecollabora+veconversa+onalstyle•  Laughter•  Lessuseofnega+veemo+onalwords•  Moresympathy•  That’s too bad I’m sorry to hear that

•  Moreagreement•  I think so too

•  Lesshedges•  kind of sort of a little …

82

Ranganath,Jurafsky,McFarland

Sentiment Analysis

OtherSen+mentTasks