Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu...

Comparative Experiments on Sentiment Classification for Online Product Reviews

Hang Cui, Vibhu Mittal,

and Mayur Datar

AAAI 2006

Introduction A large amount of Web content is

subjective and reflects peoples’ opinions.

Two focuses of their research: Large-scale, real-world datasets. Unigrams vs. n-grams

Contributions Conduct experiments on a corpus of

over 200k online reviews with an average length of over 800 bytes.

Study the impact of higher order n-grams (n>=3)

Study multiple classification algorithms for processing large scale data.

Previous Work Pang, Lee and Vaithyanathan (2002)

Thumbs up? Sentiment classification using machine learning techniques.

Naïve Bayes, Maximum Entropy, SVM (bigram), PSP (2005)

PA Algorithm, Language Model, Winnow classifier (Nigam and Hurst, 2004)

Classifiers - PA Passive-Aggressive (PA) Algorithm

Based Classifier: the new classifier should be a close proximity to the current one (passive update) while achieve at least a unit margin on the most recent example (aggressive update).

Constrained optimization problem

Classifiers - PA PA vs. SVM

PA follows an online learning pattern, which is attractive to Web applications.

PA has a theoretical loss bound. 10 cross validation

Classifiers - LM Language Modeling (LM) Based

Classifier: a generative method that calculates the probability of generating a given word sequence.

Classifiers - LM Due to the limitations of training

data, n-gram language modeling often suffers from data sparseness: smoothing.

Good-Turing estimation:

Classifiers - Winnow Winnow learns a linear classifier

from bag-of-words of documents to predict the polarity of review x:

cw(x) = 0 or 1

Classifiers - Winnow Training phase:

Calculate h(x) If the review is positive but is predicted

as negative, update fw where cw(x) = 1 by fw x 2

If the review is negative but is predicted as positive update fw where cw(x) = 1 by fw / 2

N-grams as Linguistic Feature N-gram in this paper: 1+2+3+…+N-

gram N is set to 6 Calculate x2 scores for each n-gram

(term vs. class) Take top M ranked n-gram as

features

Data Set Electronic products (digital cameras,

laptops, PDAs, MP3 players…from Froogle http://froogle.google.com)

Rate R = 5 or 10, R=1 and R for training, R=2 and R-1 for testing.

Results

Results Discussion High order n-grams improve the

performance of the classifiers, especially the performance on the negative instances.

Discriminative models are more appropriate than sentiment classification than generative models. (4% up) Mixture makes the generative models confused.

Results Discussion

The performance of the PA classifier is not sensitive to the number of features.

Filtering out objective sentences does not show obvious advantage for our data set. (Product category/movie reviews, filtering performance, testing rate level…)

Conclusion

Large-scale data set Discriminating classifier + high-

order n-gram performs comparatively better

Learning online is possible

Future Work

Better feature selection scheme (noisy n-grams)

Classification in different scales (Pang and Lee, 2005)

Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu...

Documents

Transcript of Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu...