Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu...
-
Upload
flora-chapman -
Category
Documents
-
view
213 -
download
0
Transcript of Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu...
Comparative Experiments on Sentiment Classification for Online Product Reviews
Hang Cui, Vibhu Mittal,
and Mayur Datar
AAAI 2006
Introduction A large amount of Web content is
subjective and reflects peoples’ opinions.
Two focuses of their research: Large-scale, real-world datasets. Unigrams vs. n-grams
Contributions Conduct experiments on a corpus of
over 200k online reviews with an average length of over 800 bytes.
Study the impact of higher order n-grams (n>=3)
Study multiple classification algorithms for processing large scale data.
Previous Work Pang, Lee and Vaithyanathan (2002)
Thumbs up? Sentiment classification using machine learning techniques.
Naïve Bayes, Maximum Entropy, SVM (bigram), PSP (2005)
PA Algorithm, Language Model, Winnow classifier (Nigam and Hurst, 2004)
Classifiers - PA Passive-Aggressive (PA) Algorithm
Based Classifier: the new classifier should be a close proximity to the current one (passive update) while achieve at least a unit margin on the most recent example (aggressive update).
Constrained optimization problem
Classifiers - PA PA vs. SVM
PA follows an online learning pattern, which is attractive to Web applications.
PA has a theoretical loss bound. 10 cross validation
Classifiers - LM Language Modeling (LM) Based
Classifier: a generative method that calculates the probability of generating a given word sequence.
Classifiers - LM Due to the limitations of training
data, n-gram language modeling often suffers from data sparseness: smoothing.
Good-Turing estimation:
Classifiers - Winnow Winnow learns a linear classifier
from bag-of-words of documents to predict the polarity of review x:
cw(x) = 0 or 1
Classifiers - Winnow Training phase:
Calculate h(x) If the review is positive but is predicted
as negative, update fw where cw(x) = 1 by fw x 2
If the review is negative but is predicted as positive update fw where cw(x) = 1 by fw / 2
N-grams as Linguistic Feature N-gram in this paper: 1+2+3+…+N-
gram N is set to 6 Calculate x2 scores for each n-gram
(term vs. class) Take top M ranked n-gram as
features
Data Set Electronic products (digital cameras,
laptops, PDAs, MP3 players…from Froogle http://froogle.google.com)
Rate R = 5 or 10, R=1 and R for training, R=2 and R-1 for testing.
Results
Results Discussion High order n-grams improve the
performance of the classifiers, especially the performance on the negative instances.
Discriminative models are more appropriate than sentiment classification than generative models. (4% up) Mixture makes the generative models confused.
Results Discussion
The performance of the PA classifier is not sensitive to the number of features.
Filtering out objective sentences does not show obvious advantage for our data set. (Product category/movie reviews, filtering performance, testing rate level…)
Conclusion
Large-scale data set Discriminating classifier + high-
order n-gram performs comparatively better
Learning online is possible
Future Work
Better feature selection scheme (noisy n-grams)
Classification in different scales (Pang and Lee, 2005)