Sentiment Analysis

15
Thumbs up? Sentiment Classification using Machine Learning Techniques - Bo Pang and Lillian Lee - Shivakumar Vaithyanathan

Transcript of Sentiment Analysis

Page 1: Sentiment Analysis

Thumbs up? Sentiment Classification using Machine Learning Techniques

- Bo Pang and Lillian Lee

- Shivakumar Vaithyanathan

Page 2: Sentiment Analysis

What is it??

• Input – raw text over some topic

• Output – opinion ( +ve, -ve or neutral )

• Its is hard – why???

- determines the opinion on overall text rather than just subject of the topic

-- lets understand the problem

Page 3: Sentiment Analysis

We know …

• Web – enormous amount of data

• Topical categorization – active research

Page 4: Sentiment Analysis

Rise of blogs, forums …

• Web 2.0 is commonly associated with web applications that facilitate interactive informationsharing, interoperability, user-centered design, and collaboration on the World Wide Web – (source : Wikipedia)

Page 5: Sentiment Analysis

Why is it interesting?

• Represents the voice about particular topic from broader audience

• Example : product reviews, movie reviews, book reviews

• Important to business intelligence applications

- What do people (dis)like in Nikon D40

Page 6: Sentiment Analysis

What this paper does

• Examines the effectiveness of applying machine learning techniques to sentiment classification problem

• Challenging – while topic are identifiable by keywords alone, sentiment can be expressed in a more subtle manner.

Page 7: Sentiment Analysis

Dataset : Movie-Review Domain

Reason :

– Large online collection for reviews

– Easy to summarize with machine-extractable rating indicator than to handle data for supervised learning

Corpus of 752 –ve, 1301 +ve, with total 144 reviewers represented

Page 8: Sentiment Analysis

Naïve approach

• Idea: people tend to use certain words to express strong sentiments, produce such list and rely to classify text

Page 9: Sentiment Analysis

Machine Learning methods

• Let {f1, f2, …, fm} be predefined m features that can appear in document.Example : “still” or bigram “really stinks”

• ni(d) – number of times fi occurs in document d

• Document vector(d) = (n1(d), n2(d), …, nm(d))

Page 10: Sentiment Analysis

Naïve Bayes

Assign to a given document d the class

Naïve Bayes rule :

Page 11: Sentiment Analysis

Maximum Entropy

• Idea is to make fewest assumptions about the data while still being consistent with it

Page 12: Sentiment Analysis

Support Vector Machines(SVM)

• Are large-margin, non-probabilistic classifiers in contrast to Naïve Bayes and Maximum Entropy

• Letting (corresponding to +ve,-ve), be the correct class of document dj,

Page 13: Sentiment Analysis

Evaluations

• Randomly selected 700 positive, 700 negative sentiment documents

• Automatically removed rating indicators, extracted textual information from original HTML

• Added NOT_ to every word between a negation word(“not”, “isn’t”) and first punctuation.

Page 14: Sentiment Analysis

Results

Page 15: Sentiment Analysis

Conclusion

• Unigram presence information turned out to be most effective

• The superiority of presence information in comparison to feature frequency indicates a difference between sentiment and topic categorization.