Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu...

17
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006

Transcript of Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu...

Page 1: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Comparative Experiments on Sentiment Classification for Online Product Reviews

Hang Cui, Vibhu Mittal,

and Mayur Datar

AAAI 2006

Page 2: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Introduction A large amount of Web content is

subjective and reflects peoples’ opinions.

Two focuses of their research: Large-scale, real-world datasets. Unigrams vs. n-grams

Page 3: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Contributions Conduct experiments on a corpus of

over 200k online reviews with an average length of over 800 bytes.

Study the impact of higher order n-grams (n>=3)

Study multiple classification algorithms for processing large scale data.

Page 4: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Previous Work Pang, Lee and Vaithyanathan (2002)

Thumbs up? Sentiment classification using machine learning techniques.

Naïve Bayes, Maximum Entropy, SVM (bigram), PSP (2005)

PA Algorithm, Language Model, Winnow classifier (Nigam and Hurst, 2004)

Page 5: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Classifiers - PA Passive-Aggressive (PA) Algorithm

Based Classifier: the new classifier should be a close proximity to the current one (passive update) while achieve at least a unit margin on the most recent example (aggressive update).

Constrained optimization problem

Page 6: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Classifiers - PA PA vs. SVM

PA follows an online learning pattern, which is attractive to Web applications.

PA has a theoretical loss bound. 10 cross validation

Page 7: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Classifiers - LM Language Modeling (LM) Based

Classifier: a generative method that calculates the probability of generating a given word sequence.

Page 8: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Classifiers - LM Due to the limitations of training

data, n-gram language modeling often suffers from data sparseness: smoothing.

Good-Turing estimation:

Page 9: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Classifiers - Winnow Winnow learns a linear classifier

from bag-of-words of documents to predict the polarity of review x:

cw(x) = 0 or 1

Page 10: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Classifiers - Winnow Training phase:

Calculate h(x) If the review is positive but is predicted

as negative, update fw where cw(x) = 1 by fw x 2

If the review is negative but is predicted as positive update fw where cw(x) = 1 by fw / 2

Page 11: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

N-grams as Linguistic Feature N-gram in this paper: 1+2+3+…+N-

gram N is set to 6 Calculate x2 scores for each n-gram

(term vs. class) Take top M ranked n-gram as

features

Page 12: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Data Set Electronic products (digital cameras,

laptops, PDAs, MP3 players…from Froogle http://froogle.google.com)

Rate R = 5 or 10, R=1 and R for training, R=2 and R-1 for testing.

Page 13: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Results

Page 14: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Results Discussion High order n-grams improve the

performance of the classifiers, especially the performance on the negative instances.

Discriminative models are more appropriate than sentiment classification than generative models. (4% up) Mixture makes the generative models confused.

Page 15: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Results Discussion

The performance of the PA classifier is not sensitive to the number of features.

Filtering out objective sentences does not show obvious advantage for our data set. (Product category/movie reviews, filtering performance, testing rate level…)

Page 16: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Conclusion

Large-scale data set Discriminating classifier + high-

order n-gram performs comparatively better

Learning online is possible

Page 17: Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Future Work

Better feature selection scheme (noisy n-grams)

Classification in different scales (Pang and Lee, 2005)