Learning Sentiment Lexicons

7/26/2019 Learning Sentiment Lexicons

1/6

Well it's great that there are sentimentlexicons available. For lots of otherpurposes we'd like to build our ownsentiment lexicons. And the way we do thisis often by, by semi supervised learningand the idea of semi supervised learningis we have some small amount ofinformation. Maybe we have a few labeledexamples or maybe we have a few hand builtpatterns. And, from that set of data, we'dlike to bootstrap a complete lexicon. Andwe might want to do learning of lexiconsinstead of taking an online lexicon, ifwe're looking at a particular domain thatmaybe, that doesn't match the domain ofthe lexicon that was built, or we'retrying to do a particular task, or maybewe're just, think that our online lexiconsmight not have an, enough words that arerelevant to the topic we're looking at.One of earliest ways of inducing this kindof sentiment-lexicon was proposed in 1997by [inaudible]. And I'm gonna show youthis because the intuit, although the

paper is old, this intuition is includedin almost all modern ways of, of doingthis semi supervised learning of getting alexicon. And their intuition, very simple,is that. To adjectives if their conjoinedby the word and they probably have thesame polarity and if their conjoined bythe word but they don't. So if you see ona very high frequency of fair andlegitimate or corrupt and brutal probablyfair and legitimate or both on the samepolarity. And, and so we might suspectthat fair and brutal are just less likely

to occur on the web or in some corpus. Andso if we see a lot of something occurringwith hand, may be the two are likely tohave the same sentence. But if we see twowords linked by but, they probably havedifferent sentiments. So fair, but, brutalmore likely to occur than fair and brutal.And here's how they use this intuition.They first labeled by hand a seed set of1300 adjectives and so they had about asimilar number of positive and negativeadjectives. So central or clever or famousor thriving and negative adjectives like

ignorant or, or list listserv unresolved.And now from the seed set, they expandthat to any adjective that's conjoinedwith the word in their scene set. So, forexample, we can go to Google and we cantype in was nice and, and look at whatwords occur next, and what do we see afterwas nice and. What we see was nice andhelpful. So that tells us that nice andhelpful are likely to have the same


2/6

sentiment. And here we see nice andclassy, so that tells us that nice andclassy might have the same sentiment. Andwe can do that for, for all sorts of wordsthat occur conjoined with our seed set.And we take any word that occursfrequently enough in the right conjunctionwith our seed set words. And now we canbuild from that, a classifier. And the jobof a classifier is just to assign. Foreach pair of words how similar the twowords are. And so we can give theclassifier the count of the two wordsoccurring with an and in between them anda count of the words with a but in betweenthem. And it, the classifier can learnthat nice and helpful are somewhatsimilar. Nice and fair are very similar.Fair and corrupt are very dissimilar, markthem in red with a dotted line. Becausemaybe but occurs more often between thesetwo and, and occurs more often betweenthese two and so on. So we can get betweenany pair of words a number that indicates

it's similarity. And now we can justcluster the graph and use any kind ofclustering algorithm and we get for thewords that tend to be linked togetherhelpful and nice and fair and classy willget that these are linked together andbrutal and corrupt and irrational linkedtogether then we can get our, our output.Polari lexicon and here I have shown youan output Polari lexicon and of courseit'll have some mistakes in it. So, see ifyou can find the mistakes in this lexicon.Here are some of the mistakes. So we have

disturbing as a positive word or strangeas a positive word, or pleasant as anegative word. So of course there's gonnabe errors in any of these kind ofautomatic, semi-supervised algorithms.[sound] So while the [inaudible] and[inaudible] algorithm automatically findsthe sentiment or the polarity ofindividual words, it doesn't do well withphrases, and we'd like, often to getphrases as well as words. So the[inaudible] algorithm is another way tobootstrap in a semi-supervised wave

lexicon. And what this does is extract abunch of phrases from reviews, learn thepolarity of reviews, and take all thephrases that occur in a review, take theiraverage polarity and then use that to ratethe review. So let's look at these stagesin the turning algorithm. First we'regoing to extract, every two word phrase.That has a particular set of part ofspeech tag. And we haven't talked about


3/6

parts of speech yet, but, so this is addedto the tag for adjectives. So if the firstword is an adjective and the second wordthis means noun and this means pluralnoun. So if the first word is anadjective, so an adjective followed by anoun or plural noun, we'll extract thatphrase, whatever the third word is. Or ofthe first word is an adverb, RB meansadverb, and the second word is anadjective and the last word is not a noun,we'll extract that phrase. So adverb,adjective. Adjective, adjective. Noun,adjective. Add the verb. So these areparticular phrases. We are gonna run ourpart of speech tagger. We're gonna talkabout that in a few lectures and assignfor each word a part of speech adjectivenoun and so on and then we'll extract twoword phrases that meet these criterion.The first word is this tag, second word isthis tag the third word has someconstraints but we don't extract it andnow we look at all those phrases. And our

goal is for each of those phrases we haveextracted to measure it's polarity. So howpositive or negative is a particularphrase and the intuition of the Tourneyalgorithm is just like the hustle bustleof [inaudible] Maculen algorithm. To thinkabout co-occurrence. So we ask, well aphrase is positive if that co occurs alot, nearby lets say in the web or somelarge corpus with the word excellent. Anegative phrase is likely to co occur moreoften with a word like poor. So how we'regoing to measure this co occurrence. The

standard way to measure, this kind ofconcurrence is by, point wise mutualinformation. Point wise mutualinformation. And that's a variant of astandard information theoretic measurecalled mutual information. The mutualinformation between two random variablesis the sum over all the values the twovariables can take of the. Jointprobability of the two, times the log ofthe joint over the individual probability.So point wise mutual information takesthis intuition from mutual information and

just ask a very simple compute a verysimple ratio. The probability of twoevents X and Y divided by the product ofthe individual probabilities. So what,point wise mutual information is asking,it's a ratio is how much more the twoevents X and Y co occur than if they wereindependent. If they were independent theywould have these independent multipliedprobabilities. And, and how much does the


4/6

joint occur more often than we'd expectfrom independence. It's the intuition thatpoint wise mutual information. So lookingat point wise mutual information, I justrepeated the equation here, we can look atit between two words. We say how much moredo these two words, word one and word twoco-occur than if they were independent? Bytaking the ratio, the probability of thetwo words occurring together divided bythe probability of each word separatelyand multiply and we take the log of that.How do we estimate this? Well, the wayTourney, did it originally is by using theAltaVista Search engine. And we're gonnaestimate the probability of word just byhow many times, how many hits we see forthat word. And we're gonna estimate theprobability of two words by how often wordone occurs near word two. So all of thishas the near operator that, that lets justcheck if a word is near another word. Andin each case we're gonna want to normalizeto get a real probability from these

counts. And by our definition from theprevious slide, point wise mutualinformation, it's the probability of thejoint, so that's hits of word one nearword two over N squared. Divided by hitsof word one over N over hits of word N twoover N and these N's are gonna cancel andwe're gonna end up simply with, with thisequation. So how, how many times wordsoccur near each other, how many times dothey occur individually, multiply ittogether. So once we have a measure ofco-occurence of how, how much a phrase

co-occurs with the word excellent oranother word good we can now determine itspolarity. So [inaudible] the equation forpolarity in the [inaudible] algorithm isjust how much is the mutual information,the point [inaudible] mutual informationof the, of any phrase with the wordexcellent and we subtract its neutralinformation with the phrase poor. So howmuch more does the phrase appear withexcellent than poor or vice a versa? Andjust doing a little algebra. So that'sthe, the number of. Here's the definition

of point wise mutual information of phrasewith excellent. How often it occurs nearthe word excellent divided by theindividual number of hits of, of thephrase itself in excellent, and thensubtract the same thing for poor. And ofcourse by the definition of logs we canbring things inside the log and turn theminus into a divide and then we can dosome, some messing with terms So we have


5/6

hits of phrase here. And we have hits ofphrase there. And so, in the end we canget our formula for deciding the polarityof a phrase. It's just how many times doesthe phrase occur in your excellent, timeshow many times it occurs with, the wordpoor occurs, over how many times thephrase occurs in your poor, over how manytimes excellent occurs. These are, theseare constants that we can get once, andthen for each phrase we can compute thesechief quantities. And using the twentyalgorithm, Here's a positive reviewobviously of a bank and so here's a phraselike online service. Adjective nounphrase. And here's the polarity assignedby it, 2.8. Here's another one directdeposit, 1.3. And here's some negativephrases with negative polarities. So thephrases occurred more often near the wordpoor than near the word excellent. So incommunity located has a negative polarity.So on an average though more, I have justshown you a sub set more of the positive

phrases occur in this review than thenegative phases and the average polarityis positive. So this is a thumbs upreview. A, thumbs down review, you can seethat phrases like virtual monopoly,negative polarity, lesser evil, negativepolarity, other problems, negativepolarity, all occur more frequently, Andon average, have a more negative polaritythan the positive polarity words thatoccur in this review. And we end up with areview that has an average negativepolarity. So the turning algorithm was

evaluated on various kinds of review sitesand does a better job than just themajority class base line, at predictingthe polarity of a review. So what'simportant about the turning algorithm isthat it lets us look at phrases, learnphrases rather than just words and what'strue about in general about theseunsupervised algorithms, is that we canlearn domain specific information thatmight not be in some online dictionary.So, for doing, go back to the previousslide, if we're doing. Banking, a word

like direct deposit or virtual monopolymight, simply might not be in an onlinepolarity dictionary or online web so we,we need to use some of these semisupervised algorithms for learning thepolarity of these kind of words. Finally,I mention briefly a third algorithm forlearning polarity, and this uses WordNet,again that's the online Thesaurus whichwe'll talk about in detail later. And the


6/6

intuition of using these onlinethesauruses is, is similar to what we'veseen in the first two semi-supervisedalgorithm, we'll start with a positiveseed set and a negative seed set, so whenI have words like good in the positiveseed set and terrible in the negative seedset, and now we use the thesaurus justvery simply to find synonyms and antonyms.So we add synonyms of positive words likewell. And antonyms of negative words tothe positive set. And to the negative set,which we started out with words liketerrible, now we add synonyms of all thosewords. Maybe awful is a synonym ofterrible. And antonyms of our positivewords to the negative set and we justrepeat and the sets grow as we keep addingsynonyms and antonyms to it. And then eachof the algorithms that make use of WordNet for learning clarity will have variousways of filtering our bad examples orproblems with word synthesis and thingslike that. So, in summary, we, learning

lexicons can help us. Deal withdomain-specific issues. In banking wemight have a word like direct deposit.It's just not gonna be in a standardonline polarity lexicon. And we can morerobust in general, as new names are,people start using new names, or a newcompany name might not be in some trainingdata, but, but we might be able to learnit from the web or from the data we'relook at. And, again, the intuition of allthese algorithms is really the same. Westart with some seed set. We find other

words that have similar polarity to thatseed set in some semi-automatic way andthe ways you've seen are using one of theca, words are conjoined with and or butusing words that just occur near, by,towards, like poor or excellent so that'sa receipt set there. Or using coordinate,synonyms or antonyms.

Learning Sentiment Lexicons

Documents

Transcript of Learning Sentiment Lexicons