Post on 18-Jul-2015
Building Large Arabic Multi-domain Resources
for Sentiment AnalysisHady ElSahar and Samhaa R. El-Beltagy
Center for Informatics Science, Nile University
CICLing 2015 – April 19, 2014
hadyelsahar@gmail.com
Agenda
• Problem Statement
• Building Multi-Domain Datasets for Sentiment Analysis
• Building Multi-Domain lexicons
• Experiments and Evaluation
• Mining experiments results
Problem Statement
• Small size
• Domain Specificity
• Not publicly available
• Insufficient coverage of different Arabic dialects and non standard terms
Current resources for sentiment analysis suffer many deficiencies:
Problem Statement
Author Dataset name Size Multi Domain Publicly Available
Rushdi-Saleh et al. OCA 500 NO YES
Abdul-Mageed & Diab AWATIF < 10K Yes NO
Aly, M. & Atiya, A. LABR 63K NO YES
Eshrag Refaee et al. Twitter Corpus 8,868 N/A YES
Sentiment Datasets related work
Problem Statement
Author Size MSA / Dialect Multi Domain Publicly Available
El-Beltagy et al. 4K MSA + Dialect N/A YES
Abdul-Mageed & Diab (SANA) 225K MSA + Dialect Yes NO
Badaro et al. 150K MSA only N/A YES
Sentiment lexicons related work
Proposed solution
• Building large Arabic datasets and lexicons for sentiment analysis• Large size
• Multi-domain
• Arabic dialects
• Well documented, tested for sentiment classification
• Publicly available for every one to use
Agenda
• Problem Statement
• Building Multi-Domain Datasets for Sentiment Analysis
• Building Multi-Domain lexicons
• Experiments and Evaluation
• Mining experiments results
Building Datasets
• Lack of Arabic reviewing content on the internet:
• Less Arabic based e-commerce & reviewing websites • Arabic speakers use the English language to write their reviews
English *** , Do you Speak it !!!!
Domain Reviewing Websites Scrapped
Hotel reviews
Restaurant reviews
Product Reviews
Movie Reviews
Building Datasets
Scrapping Arabic Reviewing content on the Internet
Building Datasets
• Normalize different ratings systems into ( positive, negative and neutral ) classes using heuristics.
• Automatic labeling of reviews.
Building Datasets
• Removing redundant and spamming reviews• Removing contradicting reviews ( Similar Text Different polarity )
• Remove duplicate reviews
Datasets Statistics
Hotels Restaurants Movies Products ALL
#Reviews 15579 11310 1524 14279 42692
#Unique Reviews 15562 10940 1522 5092 33116
#Users 13407 1639 416 7465 24653
#Items 8100 4654 933 5906 19593
Sizes of Extracted Datasets
Agenda
• Problem Statement
• Building Multi-Domain Datasets for Sentiment Analysis
• Building Multi-Domain lexicons
• Experiments and Evaluation
• Mining experiments results
Building multi domain lexicons
• Manually hand crafting sentiment lexicons is a tedious task
• Proposed approach utilizes feature selection and ranking of Support Vector
Machines (SVM)
• SVM with L1 regularization penalty results in sparse coefficient vectors
doesn't deserve
bad failure happen scene better wonderful enjoyable
ال يستحق سئ فشل ……. حصل مشهد …….. افضل رائع ممتع
-0.532 -0.52 -0.4 ……. 0 0 …….. 0.270 0.272 0.357
Coefficient vector of a trained support vector machine
DatasetsTraining
L1-norm SVM
Selecting Top Features
Manually Verification
Multi domain
Lexicons
Building multi domain lexicons
• Train SVM classifier on each of the generated datasets using a unigram + bigram model
• Omit features corresponding to zero coefficients
• Label features with positive coefficient values as positive lexicon entries
• Label features with negative coefficient values as Negative lexicon entries
• Manually filter and verify resulting lexicon ( a lot easier ! )
Building multi domain lexicon from the datasets
Hotels Restaurants Movies Products LABR / Books ALL
# non-zero coef.
features556 1413 526 661 3552 6708
# Manually filtered 218 734 87 369 874 1913
Size of built multi-domain lexicons before and after manual filtration
Building multi domain lexicon from the datasets Selected examples from the Generated lexicons:
Hotels Restaurants Movies
لن أعود not coming back
بارد cold
يستحق المشاهدةworth watching
ضعيفة المياهlow water pressure
يشبعEnough portions
برافوBravo
Agenda
• Problem Statement
• Building Multi-Domain Datasets for Sentiment Analysis
• Building Multi-Domain lexicons
• Experiments and Evaluation
• Mining experiments results
Experiments and bench marking Datasets
• Verify the viability of using the datasets for sentiment analysis
• Test the effectiveness of the generated lexicon
• Export the results of all experiments publicly for further analysis
• Provide easy benchmarking framework for future sentiment classifiers
Experiments Benchmarking the datasets for the task of sentiment analysis :
Experiments and bench marking Datasets
Datasets setups : • 2 Class sentiment Classification (Positive or Negative)
• 3 Class Sentiment Classification problem (Positive, Negative or Mixed/Neutral )
• Balanced / Unbalanced Setups
• 20%-80% Splits (testing generated lexicons on unseen data)
• Cross validation
Experiments and bench marking Datasets
Feature building Methods :
• Standard feature building methods : • Count, TF-IDF, Delta-TFIDF
• Features built from generated lexicons :• (term existence, term count, weighted count )• Domain specific lexicon, domain general lexicon
• Merging Lexicon based features with other features
Classifiers : Linear SVM, Logistic regression, BNB , KNN and SGD
Experiments and bench marking Datasets
• 3075 experiments, resulted from using all classifiers, features and Datasets setups combinations together.
• Results are publicly available for further analysis and as benchmarks
Agenda
• Problem Statement
• Building Multi-Domain Datasets for Sentiment Analysis
• Building Multi-Domain lexicons
• Experiments and Evaluation
• Mining experiments results
Mining experiments results
Mining the experiments results to answer questions like :
• What are the top performing classifiers and features combinations ?
• Can we rely only on lexicons for sentiment analysis ?
• What is the effect of combining lexicon based features with other features ?
• Are shorter documents easier to classify ?
• Are documents richer with subjective words easier to classify ?
Can we rely only on lexicon based features for sentiment classification?Can features generated from lexicons provide an adequate accuracy relative to other feature generating methods.
Mining experiments results
Features Number of features Average Accuracy
2 C
lass
Lex-domain ~ 500 0.768
Lex-all 1913 0.782
Count ~ 50K features 0.783
Mining experiments results
Features Number of features Average Accuracy
3C
lass
Lex-domain ~ 500 0.549
Lex-all 1913 0.554
Count ~ 50K features 0.570
Effect of merging lexicon based features with other features?Can features generated from lexicons provide an adequate accuracy relative to other feature generating methods.
Mining experiments results
Features Aggregated Lexicon Average Accuracy Enhancement
2 C
lass
Count
None 0.783
Lex-domain 0.790 + 1 %
Lex-all 0.796 + 1.6 %
TFIDF
None 0.7
Lex-domain 0.791 + 9.1 %
Lex-all 0.8 +10 %
Delta-TFIDF
None 0.692
Lex-domain 0.789 + 9.7 %
Lex-all 0.798 + 10.6 %
Shorter documents are easier to classify?Or longer ones?, How about longer ones rich with subjective terms ?
Mining experiments results
Storyline : Patch Adams was desperate and attempt to commit a suicide many times, until he was sent to a mental hospital….……..Then he started unintentionally helping others through socializing with them until they have become better
Mining experiments results
• Document length : No. of tokens in per document (log scale)
• Subjectivity score • Sum of polarities of words that appear in the document (using generated
lexicons)
• Error Rate • Number of misclassified documents of this specific group (doc. Length and
subjectivity score )
Mining experiments results
The error rate for various document lengths and subjectivity score groups (the Darker the worse)
Conclusion• Built a large multi-domain datasets for sentiment Analysis ( 33K
reviews)
• Proposed an approach for semi-automatically learning multi-domain lexicons (~2K)
• Everything is publicly available :• Datasets (raw + processed)
• Lexicons
• Web Scrappers (to rerun for more recent reviews)
• Experiments code and results