SentimentOverview

6
Sentiment Analysis Over Social Networks: An Overview Khaled Ahmed Faculty of Computyers and Information Cairo University Cairo, Egypt [email protected] Neamat El Tazi Faculty of computers and Information Cairo University Cairo, Egypt [email protected] Ahmad Hany Hossny Centre for Intelligent Systems Research Deakin University Geelong, Australia [email protected] Abstract—The rapid increase in data on social media creates a need for mining such data to get valuable insights. The data type can be unstructured with large volumes. Sentiment analysis addresses such need by detecting opinions or emotions on the social media text. Sentiment analysis can be performed in various domains such as social, medical and industrial applications. This paper presents a survey about sentiment analysis addressing the different concepts in this area, problems and its solutions, available APIs, tools used and presenting a list of open challenges in this area. Index Terms—Social Media, Sentiment Analysis, Feature Se- lection, Recommendation, Spam Detection, Sentiment Lexicons and Emotion Detection I. I NTRODUCTION Most of the data that exist in social networks is unstructured [1]. Such unstructured data is approximately 80% of the data all over the world. This makes it difficult to analyze and gain valuable insights from such data. Sentiment analysis or opinion mining are two important techniques, which help in detecting emotions and opinions on social media data. This can help in solving many problems and provide many indicators in election, public opinion, and advertisement, health care and public satisfaction. Discovering hidden patterns from data by applying analysis and data mining techniques over data can help in discovering many indictors [2], [3], [4], [5], [29]. Sentiment analysis helps in bioinformatics such as cancer detection, as well as in predicting future stock market trends by analyzing and mining social media posts. Applying mining techniques and sentiment analysis over unstructured data is considered a big challenge in the sentiment analysis research area. Sentiment analysis can be applied in four levels: sentence, aspect and document and user level. This can be performed using machine learning (clustering or classification), lexi- con, NLP, Ontology or hybrid techniques. There are many enhancement methods to enhance sentiment analysis results such as feature selection, data integration, data cleaning, and crowdsourcing. Feature selection is used for choosing suitable features from text that enhance sentiment analysis results. Feature selection has multiple techniques [6], which are applied for choosing best features for better sentiment results. Using crowdsourcing [7] and public crowd experience in labeling can be one of the techniques that are used to enhance the sentiment analysis results according to the crowd labeling as well as on feedback given on sentiment classes. Data cleaning [8] and data integra- tion [9], [10] also serve as enhancement techniques towards better sentiment analysis results. Spam or fake sentiment detection in reviews or posts is an important application of the the sentiment analysis[11], [8]. Sentiment analysis also can be used to define trust over social network for a brand or a service [12] or to build recommen- dation systems [13], [4], which recommend a service, a place or a product for user. Fig. 1. The Different phases of sentiment mining including the steps to build the model and the steps to use it later in medical and social applications The rest of the paper is organized as follows: Section II describes sentiment analysis levels. Section III presents the 978-1-4799-7492-4/15/$31.00 c 2015 IEEE

Transcript of SentimentOverview

Page 1: SentimentOverview

Sentiment Analysis Over Social Networks: AnOverview

Khaled AhmedFaculty of Computyers and Information

Cairo UniversityCairo, Egypt

[email protected]

Neamat El TaziFaculty of computers and Information

Cairo UniversityCairo, Egypt

[email protected]

Ahmad Hany HossnyCentre for Intelligent Systems Research

Deakin UniversityGeelong, Australia

[email protected]

Abstract—The rapid increase in data on social media createsa need for mining such data to get valuable insights. The datatype can be unstructured with large volumes. Sentiment analysisaddresses such need by detecting opinions or emotions on thesocial media text. Sentiment analysis can be performed in variousdomains such as social, medical and industrial applications. Thispaper presents a survey about sentiment analysis addressingthe different concepts in this area, problems and its solutions,available APIs, tools used and presenting a list of open challengesin this area.

Index Terms—Social Media, Sentiment Analysis, Feature Se-lection, Recommendation, Spam Detection, Sentiment Lexiconsand Emotion Detection

I. INTRODUCTION

Most of the data that exist in social networks is unstructured[1]. Such unstructured data is approximately 80% of the dataall over the world. This makes it difficult to analyze and gainvaluable insights from such data. Sentiment analysis or opinionmining are two important techniques, which help in detectingemotions and opinions on social media data. This can helpin solving many problems and provide many indicators inelection, public opinion, and advertisement, health care andpublic satisfaction.

Discovering hidden patterns from data by applying analysisand data mining techniques over data can help in discoveringmany indictors [2], [3], [4], [5], [29]. Sentiment analysishelps in bioinformatics such as cancer detection, as well as inpredicting future stock market trends by analyzing and miningsocial media posts. Applying mining techniques and sentimentanalysis over unstructured data is considered a big challengein the sentiment analysis research area.

Sentiment analysis can be applied in four levels: sentence,aspect and document and user level. This can be performedusing machine learning (clustering or classification), lexi-con, NLP, Ontology or hybrid techniques. There are manyenhancement methods to enhance sentiment analysis resultssuch as feature selection, data integration, data cleaning, andcrowdsourcing.

Feature selection is used for choosing suitable features fromtext that enhance sentiment analysis results. Feature selectionhas multiple techniques [6], which are applied for choosing

best features for better sentiment results. Using crowdsourcing[7] and public crowd experience in labeling can be one ofthe techniques that are used to enhance the sentiment analysisresults according to the crowd labeling as well as on feedbackgiven on sentiment classes. Data cleaning [8] and data integra-tion [9], [10] also serve as enhancement techniques towardsbetter sentiment analysis results.

Spam or fake sentiment detection in reviews or posts is animportant application of the the sentiment analysis[11], [8].Sentiment analysis also can be used to define trust over socialnetwork for a brand or a service [12] or to build recommen-dation systems [13], [4], which recommend a service, a placeor a product for user.

Fig. 1. The Different phases of sentiment mining including the steps to buildthe model and the steps to use it later in medical and social applications

The rest of the paper is organized as follows: Section IIdescribes sentiment analysis levels. Section III presents the

978-1-4799-7492-4/15/$31.00 c©2015 IEEE

Page 2: SentimentOverview

state of the art sentiment analysis techniques. Section IV statessentiment analysis enhancement techniques. Section V presentsrecent applications and models. Section VI presents a list ofavailable sentiment analysis tools, APIs and lexicons . SectionVII presents a list of accuracy measures used for evaluatingthe sentiment analysis techniques. Section VIII presents openchallenges in this research area and the conclusion of this paperis presented in Section IX.

II. SENTIMENT ANALYSIS LEVELS

Applying sentiment analysis over big data [14] leads to a lotof insights and business benefits. Sentiment Analysis, opinionmining or emotion detection is the process of extractingsentiment from text which is commonly used over onlineunstructured text like micro-blogger data and social media datastreams [15].

Sentiment analysis can be applied on four different levels[16]. Level 1 is the sentence level, which detects positive,negative and neutral sentiment for each sentence. Level 2 is thedocument level, which detects the whole document sentimentas one unit or one entity positive or negative or neutral. Thirdlevel is the aspect level and it is used in case of the availabilityof attributes inside entity, post or input text. Each attribute canhold a sentiment in its own. For example, a customer review ona mobile phone has the attributes battery life, screen light andother attributes. Each attribute can have a different sentiment

Consider the following example on a sentence level: happyto meet you is considered a positive sentence, while My phoneis very interesting but need enhancement in some issues isconsidered a positive in document level if the whole text wasconsidered as one entity.

The aspect level can lead to a better analysis and results iftaken into consideration. Consider the following example onthe aspect level: My phone is really nice but I have a badbattery. It contains slow applications but I am happy withits screen. The aspect here is the phone while the attributesare battery, applications, and screen. Sentiment detection canlead to the following results (battery, negative), (application,negative), (screen, positive).

Some sentiment analysis techniques apply grouping on theaspect level where all attributes having the same sentimentresult are grouped together. The grouping of the previousexample will lead to the following result: (battery, application,and negative) and (screen, Positive).

The fourth level is the user level which handles the socialrelationships between different users using graph theory [16].Consider the following example: A is a user who has a friendB connected to him. User B is always mentioned in user Aposts, always gains likes and shares from user A.. User Amight have the same opinion or sentiment as user B. This canbe the result of the influence of user B on user A and howmuch such user can affect user B opinion. The user level takessuch influence into consideration.

III. SENTIMENT ANALYSIS METHODS

A. Sentiement analysis lexicons

Sentiment analysis is the process of defining positive ornegative or neutral feeling through text [6] . There are threesolutions for defining the text sentiment, the first is to labeltext manually and this takes a lot of effort. The second isusing NLP, Lexicon or machine learning solution. The third ishybrid, which uses human experts or crowdsourcing in givingfeedback of sentiment analysis results or in labeling trainingdata sets.

There are two types of lexicons [17]. The first type is corpuslexicon, which is divided into two types (semantic oriented,statistical oriented). Corpus lexicon, such as SenticNet [12] canachieve more accurate sentiment results as it is context orientednot similarity of words oriented. An example of semanticoriented lexicons can be found in [18] where the authors dealtwith the meaning of words based on a concept net lexicon.On the other hand, in [19] the authors presented a statisticalmethod in defining sentiments.

The second type of lexicons is dictionary based. In [20],two dictionaries were presented. The first is a word dictionary,which ripped with human emotion. The second is a topicmodeling or a topic oriented dictionary, which is helpfulin aspect sentiment analysis. Some researchers [21] tacklesdictionary based lexicons by integrating existing seeds ordictionaries to build more valuable multi domain dictionaries.

B. Sentiment analysis and NLP

Using of NLP, natural language processing, one can achieveaccurate sentiment results by resolving context of words, theimplicit or indirect meaning of words challenges. Stanford sen-timent Treebank [22] provides a solution for these challengeson a sentence level. The authors in [23] proved that the usingof NLP, lemmatizes, n-grams (unigram bigram or trigram),negations, valence shifters and stemming as a preprocessingphase will enhance sentiment results. Hash-tags can also helpin indicting tweets or posts polarity or objective [15], it canalso be used to identify the author using his writing-print orthe stylometry of the tweets [24] .

C. Sentiment analysis and machine learning

Machine learning solutions, are supervised using labeledtrained data, unsupervised without trained labeled data andsemi-supervised with mixed of labeled and unlabeled data.

Supervised learning [6] , has a different classifiers to handlethe classification process based on the trained data. Classifiersinclude but are not limited to decision tree classifier, linearclassifier (support vector machine, neural network), rule basedclassifier, or probabilistic classifier (Bayesian network, maxi-mum Entropy, nave bayes). Decision Tree classifier presents ahierarchy division of the trained data based on a condition.

Linear classifier support vector machine (SVM) linearlyseparates trained data based on the highest or maximum marginand lowest generalization error in the classification process.

The Linear equation is:

Page 3: SentimentOverview

Y = Bx+A.

where point(X,Y ) has two dimensional values X, Y

and A is constant value.

(1)

Equation 2 is used to check if a point with value X can beclassified in a certain class.

(2)W =

n∑j=0

αjyjxj

Linear classifier Neural network iterates over the data thedata is classified. The results of each iteration is taken as afeedback to the next iteration to return a better classificationwith the smallest error values.

Probabilistic classifier Nave Bayes calculates word distri-bution in a document and uses it to forecast the suitable classor label for a feature or word. This classifier is based on theassumption of independent features as presented in Equation3.

P ((label)|features) = (P (label) ∗ P (features|label))P (features)

(3)

On the other hand, the probabilistic classifier Bayesiannetwork [25] is based on the assumption of dependent featuresIt is an acyclic graph of nodes with a set variables anddependency edges as presented in Equation 4, where a, b, cand are features.

(4)P (a, b, c, d) = P (a

b, c, d) ∗ P (b, c, d)

The probabilistic classifier, Maximum Entropy [6] encodesthe features into vector space to calculate the weight of featurefor labeled class in feature set (fs) where d is a dot product

(5)P (features

labels)

=d(weights, encoded(fs, label))∑

d(weights, encoded(fs, label)for all labels)

Rule based classifier [26] is based on a set of rules. Decisiontrees and sequential algorithms are useful for If-then rulesusing FOIL pruning (FP) or Rule pruning.

On the other hand, unsupervised learning deals with unla-beled data for performing the clustering process using LDAand HowNet lexicon [27]. In [28], the authors presented clus-tering based on document similarity. While in [1], a frameworkwas presented, which automated detecting hotspots from onlineshared data (online forums data using K-means and SVMclassifier. SVM Classifier was also used in [9] to define publicusers opinion towards products. And Nave Bayes SVM wasused in [29] on several medical forums data. Meanwhile,Semi-supervised learning deals with labeled and unlabeleddata as presented in [6]. Ensemble learning techniques werepresented in [30] to resolve language ambiguity problem andproduce more accurate polarity prediction with combination ofclassifiers.

IV. SENTIMENT ANALYSIS ENHANCEMENT METHODS

A. Sentiment anaylsis data cleaning

Data cleaning [31], [8] is an important preprocessing phasewhich enhances sentiment analysis results. Data cleaningoperations include tokenizing, stemming and filtering. Datacleaning can be applied in two phases [31], data transformationand data filtering.

Data transformation [31], [8] operations involve but are notlimited to removing useless spaces, handling abbreviations andnegations, stemming and removing stop words. Data filtering[31] is related to selecting features which are suitable forsentiment analysis.

B. Dimension Reduction

Dimension reduction [28] is the process of reducing highdimensions using two methods either feature selection or fea-ture extraction. Feature extraction is a transformative methodwhich applies a transformation on the data to project it into anew feature space with lower dimensions.

Feature selection [28] is the process of selecting featuresfrom the original data set based on specific selection criteriataking into consideration that the result subset has the smallestclassification error with lossless content meaningure.

Feature selection algorithms include Chi-square, Latentsemantic indexing and Point-wise Mutual Information (PMI).Chi-square algorithm is used to define features. Suppose F(w)is a global fraction of a data source, w is a word and n isthe number of data source files or documents. Pj is a globalfraction of data sources which contains the class labels whilepj(w) is the conditional probability for class label j, whichcontains a word w and X2 represents the goodness of fit of aset of values and the expected value to select the best suitablefeature from a set of candidate features to represent the classes.The equation of Chi-square is presented in Equation 6.

(6)x2j = (n.F (w)2.(pj(w)− pi)2

(F (w)(1− F (w)).pj ..(1− pj)

Point-wise Mutual Information (PMI), can help in definingmutual information between classes and features.. As presentedin Equation 7, Mj(W ) is the mutual information between classj and word w and Pj(W ) is the probability of a word w inclass label j.

(7)Mj(W ) = log(Pj(W )

Pj)

Applying Feature Frequency (FF), Feature Presence (FP)and term frequency inverse document frequency (TF-IDF) as in[31] lead to better feature selection. Feature frequency selectsthe words or features which are most frequent in a class or adocument as presented in Equation 8 where w is a word andj is the class. Wo is the word or feature occurrence. For allfeature of word use i.

(8)FF =Max(

∫ ∫ i,j

i,j

Wo)

Page 4: SentimentOverview

Feature presence, on the other hand, concerns with thepresence or absence of a feature inside a document. Term fre-quency inverse document frequency (TF-IDF) merges the twoconcepts of term frequency and inverse document frequency.It presents a composite weight for each feature or word withina document as shown in Equation 9 where N is the numberof documents, DF is the document frequency, which is thenumber of documents that contain the features and FF is theFeature Frequency.

(9)TF − IDF = FF ∗ log (N/DF )

C. Sentiment Analysis and Data Intergation.

Data is integrated, from different sentiment lexicons forsentiment analysis classification. This integration is performedby combining, filtering and deleting the duplicated data fromindividual dictionaries. Available dictionaries include AFINN,General Inquirer, Micro-WNOp, Opinion Lexicon, SenticNet,SentiSense, SentiWordNet, SO-CAL, Subjectivity Lexicon andWordNet-Affect [9].

Using data-driven or data integration lexicons builds ahigh quality sentiment lexicon, which enhances the sentimentdetection results. Other approaches [10] involve integratinguser reviews with the users profile by presenting the integrationin a multi-dimensional model for opinion mining. A similarapproach EMOTube [10] integrated YouTube movie reviewsusing Mashups for a better classification of reviews. Moreover,a semantic integration [32] was performed on different datasources from different social and medical domains as a wayto enhance prediction of diseases.

D. Sentiment Analysis and Crowdsourcing

Crowdsourcing is the science of resolving a problem ortask by the help of crowd [5]. Crowdsourcing can help inproviding more accurate sentiment analysis results. Crowd canhelp in assigning labels to training data set or giving feedbackabout sentiment classification results, which can enhance thepredication and the classification models.

Crowdsourcing was used extensively in enhancing senti-ment analysis results [5], [7]. It was used in [5] to predict andmeasure depression over social media through twitter tweetsusing SVM classifier. In [7] authors proved that the use ofcrowdsourcing resulted in more accurate sentiment detectionfor topic and sentiment classification in social media data.

E. Sentiment Analysis and Ontologies

Ontology based sentiment analysis systems produces moreefficient classifiers and presents more detailed analysis aboutthe results. Ontologies were used in building visual ontologysystem called Sentibank [33], [14] over the attached imagesin posts. This helped in detecting emotions in the images.The same model was also trained to predict emotions in newimages. Another approach [34] presented a fuzzy ontologymodel for opinion mining based on HowNet. This approachwas built over a micro blogging data where the emotions weredivided into hate, surprise, anxiety, sorrow, anger, expectedjoy and love. In [33], authors built a sentiment analysis

ontology framework using OpenDover to define aspects thatare related to tweet topics. Ontologies [35] proved to enhancethe results as they provide the semantics rather than onlysyntactic matching.

F. Sentiment Analysis And Spam Detection

Fake and spam sentiments on social media leads to inac-curate sentiment detection results. In [36], [11] the authorsextracted geographic user characteristics and tweets content-based features to discover spam sentiment within text. Charac-teristics to define fake sentiment include but are not limited tospeed of publishing a tweet/post , the tweet/post location fakewriting identity, number of mentions, number of hash tags,emotions, URLs in tweet/post and number of tweets/posts orposts per day. Building user profiles [36] over social networksor defining online identity can help in detecting fake spamsentiment contents and fake spam users.

G. Sentiment Analysis And User Profiling

Online user identity and user profiling helps in making thesentiment analysis results more accurate as it measures polaritybased on the user profile in addition to the post polarity. In[37], the authors proved that there is a strong relationshipbetween online user identity and the users contribution inblogs. They divided the online users into classes based ontheir online features like (kindness, social skills, creativity).Others presented profiling models [37], [38] to either predictthe political interests of users or data publishing interests [38].

H. Sentiment Analysis And Text Summarization

Sentiment summarization is the process of summarizingsentiment according to a specific domain or a topic, also calledtarget based summarization. A hybrid system for target oraspect oriented summarization was presented in [38] where theauthors defined features of the aspect (product or service) andapplied sentiment analysis to classify the available sentiments.Another attempt of summarization was carried out in [17]where a summarization of Arabic tweets was performed togenerate specific topics rather than reading all the tweets.

V. SENTIMENT ANALYSIS APPLICATIONS

A. Social Applications

Sentiment analysis has been used extensively in socialapplications. Some existing applications include monitoring vi-olence by detecting violence polarity in tweets [30], predictingelection results and public attention [39], determining satisfac-tion of places and recommending those places accordingly [13]and monitoring and tracking students opinions in education [2].Another application for sentiment analysis is to enhance themachine translation quality by detecting the implicit emotionof the text that may change the meaning such as the sarcasm[40]

Page 5: SentimentOverview

B. Medical and Health applications

Applying sentiment analysis on medical data can help indefining and predicting suicide rates, depression rates [5],monitoring and tracking healthy and unhealthy areas accordingto tweets as well as ranking doctors according to patientssatisfaction (from posts) and experience levels [41].

C. Industrial applications

In industry, sentiment analysis was used in brand monitor-ing [42], stock market prediction [43], predicting box officeresults according to users tweets [3] and measuring usersatisfaction levels [44].

VI. TOOLS, LEXICONS AND APIS

A. Tools and APIs

A list of available sentiment analysis tools include but arenot limited to Weka [23], R [4], NLTK [13], QDA Miner [4],ifeel[45] and OntoGen [33].

B. Lexicons

Common used Lexicons [9] are AFINN, General inquirerMicro-WNOp, Opinion Lexicon, SentiNet, SentiSense, Senti-wordnet, SO-CAL, WordNet.affect, NRC-emotion, NRC Hashtag, Sentiment 140, Sentistrengh, Liu and OpioionFinder.

VII. SENTIMENT ANALYSIS EVALUATIONSentiment analysis results are evaluated according to several

measures. As presented in Table 1, a correct classification ofa positive data is named A, an incorrect classification of anegative data is named B, while an incorrect classification ofa positive data is named C and correct one is named D. Usingthese notations, one can calculate the different measures thatcan be used to evaluate the results using Recall, Precision,F-measure and Accuracy as presented in Equations 10 to 13.

Positive data Negative DataPredicted positive A BPredicted Negative C D

TABLE IACCURACY MEASURES FACTORS

(10)Recall =A

A+B

(11)Precision =A

A+ C

(12)F −Measure =2 ∗Recall ∗ precisionRecall + Precision

(13)Accuracy =A+D

A+B + C +D

Another measure is Correlation coefficient which is used tomeasure the similarity of the predicted value to the originalone as indicated in Figure 2.

Fig. 2. Correlation coefficient degree

Moreover, Relative error and Relative error percentagemeasures are used to measure error or error rate value. Theseerrors can exist in any of classification, clustering or predictionprocess. Equations 14 and 15 calcuate both measures consid-ering the actual correct value as A and the predicated value asP.

(14)Relativeerror =|P −A|A

(15)Relativeerrorpercentage =|P −A|A

∗ e

VIII. OPEN PROBLEMS AND RESEARCH GAPS

Sentiment analysis is still a hot topic which contains severalopen challenges and research gaps. These challenges includebuilding multilingual classifiers, building common user profileby integrating the same user data from different social mediaapplications, and enhancing Stanford Treebank by adding theability to be applied at aspect level or document level instead ofsentence level. In addition to handling implicit word meaningand indirect text, building domain independent lexicon orclassifiers, building real time sentiment analysis systems whichcan dynamically capture new data and enhances results ac-cording to feedback. Moreover, one can also investigate multilabeling and clustering using unsupervised dynamic clusteringand multi label feature selection in enhancing the sentimentanalysis results.

IX. CONCLUSION AND FUTURE WORKS

Sentiment analysis, opinion mining or emotion detectionis the process of defining feeling or emotion through text.Sentiment analysis is a very important process as it pro-vides many valuable indictors in different domains such asmedical, social and industrial domains. This survey presentedsentiment analysis levels, techniques, enhancement methods,applications, list of APIs, lexicons, tools and existing researchgaps. Future work will consider comparing the state of the arttechniques presented through this paper using the same dataset across all the different techniques to be able to evaluate thebest techniques used.

Page 6: SentimentOverview

REFERENCES

[1] Y. Ko and J. Seo, “Automatic text categorization by unsupervisedlearning,” in Proceedings of the 18th conference on Computationallinguistics-Volume 1. Association for Computational Linguistics, 2000,pp. 453–459.

[2] A. Ortigosa, J. M. Martın, and R. M. Carro, “Sentiment analysisin facebook and its application to e-learning,” Computers in HumanBehavior, vol. 31, pp. 527–541, 2014.

[3] J. Du, H. Xu, and X. Huang, “Box office prediction based on microblog,”Expert Systems with Applications, vol. 41, no. 4, pp. 1680–1689, 2014.

[4] M. M. Mostafa, “More than words: Social networks text mining forconsumer brand sentiments,” Expert Systems with Applications, vol. 40,no. 10, pp. 4241–4251, 2013.

[5] M. De Choudhury, M. Gamon, S. Counts, and E. Horvitz, “Predictingdepression via social media.” in ICWSM, 2013.

[6] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithmsand applications: A survey,” Ain Shams Engineering Journal, vol. 5,no. 4, pp. 1093–1113, 2014.

[7] R. Machedon, W. Rand, and Y. Joshi, “Automatic crowdsourcing-basedclassification of marketing messaging on twitter,” in Social Computing(SocialCom), 2013 International Conference on. IEEE, 2013, pp. 975–978.

[8] A. K. Uysal and S. Gunal, “The impact of preprocessing on textclassification,” Information Processing & Management, vol. 50, no. 1,pp. 104–112, 2014.

[9] H. Cho, S. Kim, J. Lee, and J.-S. Lee, “Data-driven integration ofmultiple sentiment dictionaries for lexicon-based sentiment classificationof product reviews,” Knowledge-Based Systems, vol. 71, pp. 61–71, 2014.

[10] E. Polymerou, D. Chatzakou, and A. Vakali, “Emotube: A sentimentanalysis integrated environment for social web content,” in Proceedingsof the 4th International Conference on Web Intelligence, Mining andSemantics (WIMS14). ACM, 2014, p. 20.

[11] D. Guo and C. Chen, “Detecting non-personal and spam users on geo-tagged twitter network,” Transactions in GIS, vol. 18, no. 3, pp. 370–384,2014.

[12] E. Cambria, A. Livingstone, and A. Hussain, “The hourglass of emo-tions,” in Cognitive behavioural systems. Springer, 2012, pp. 144–157.

[13] D. Yang, D. Zhang, Z. Yu, and Z. Wang, “A sentiment-enhancedpersonalized location recommendation system,” in Proceedings of the24th ACM Conference on Hypertext and Social Media. ACM, 2013,pp. 119–128.

[14] D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang, “Large-scalevisual sentiment ontology and detectors using adjective noun pairs,” inProceedings of the 21st ACM international conference on Multimedia.ACM, 2013, pp. 223–232.

[15] K. Rajan, “Materials informatics,” Materials Today, vol. 15, no. 11, pp.470 –, 2012. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1369702112702043

[16] C. Tan, L. Lee, J. Tang, L. Jiang, M. Zhou, and P. Li, “User-levelsentiment analysis incorporating social networks,” in Proceedings of the17th ACM SIGKDD international conference on Knowledge discoveryand data mining. ACM, 2011, pp. 1397–1405.

[17] N. El-Fishawy, A. Hamouda, G. M. Attiya, and M. Atef, “Arabic sum-marization in twitter social network,” Ain Shams Engineering Journal,vol. 5, no. 2, pp. 411–420, 2014.

[18] H. Saif, M. Fernandez, Y. He, and H. Alani, “Senticircles for contextualand conceptual semantic sentiment analysis of twitter,” in The SemanticWeb: Trends and Challenges. Springer, 2014, pp. 83–98.

[19] A. Hogenboom, F. Boon, and F. Frasincar, “A statistical approach to starrating classification of sentiment,” in Management Intelligent Systems.Springer, 2012, pp. 251–260.

[20] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-based methods for sentiment analysis,” Computational linguistics,vol. 37, no. 2, pp. 267–307, 2011.

[21] A. C.-R. Tsai, C.-E. Wu, R. T.-H. Tsai, and J. Y.-j. Hsu, “Building aconcept-level sentiment dictionary based on commonsense knowledge,”IEEE Intelligent Systems, no. 2, pp. 22–30, 2013.

[22] R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y.Ng, and C. Potts, “Recursive deep models for semantic compositionalityover a sentiment treebank,” in Proceedings of the conference on empiricalmethods in natural language processing (EMNLP), vol. 1631. Citeseer,2013, p. 1642.

[23] A. F. Anta, L. N. Chiroque, P. Morere, and A. Santos, “Sentimentanalysis and topic detection of spanish tweets: A comparative study of ofnlp techniques,” Procesamiento del lenguaje natural, vol. 50, pp. 45–52,2013.

[24] S. Keretna, A. Hossny, and D. Creighton, “Recognising user identityin twitter social networks via text mining,” in Systems, Man, andCybernetics (SMC), 2013 IEEE International Conference on. IEEE,2013, pp. 3079–3082.

[25] C. C. Aggarwal and C. Zhai, Mining text data. Springer Science &Business Media, 2012.

[26] H. Y. Abu Mansour, “Rule pruning and prediction methods for as-sociative classification approach in data mining,” Ph.D. dissertation,University of Huddersfield, 2012.

[27] F. Xianghua, L. Guo, G. Yanyan, and W. Zhiqiang, “Multi-aspectsentiment analysis for chinese online social reviews based on topicmodeling and hownet lexicon,” Knowledge-Based Systems, vol. 37, pp.186–195, 2013.

[28] G. Chandrashekar and F. Sahin, “A survey on feature selection methods,”Computers & Electrical Engineering, vol. 40, no. 1, pp. 16–28, 2014.

[29] T. Ali, D. Schramm, M. Sokolova, and D. Inkpen, “Can i hear you?sentiment analysis on medical forums,” in Proceedings of the SixthInternational Joint Conference on Natural Language Processing. AsianFederation of Natural Language Processing, Nagoya, Japan, 2013, pp.667–673.

[30] J. Ko, H. Kwon, H. Kim, K. Lee, and M. Choi, “Model for twitterdynamics: Public attention and time series of tweeting,” Physica A:Statistical Mechanics and its Applications, vol. 404, pp. 142–149, 2014.

[31] E. Haddi, X. Liu, and Y. Shi, “The role of text pre-processing insentiment analysis,” Procedia Computer Science, vol. 17, pp. 26–32,2013.

[32] X. Ji, “Social data integration and analytics for health intelligence,”Management, 2013.

[33] E. Kontopoulos, C. Berberidis, T. Dergiades, and N. Bassiliades,“Ontology-based sentiment analysis of twitter posts,” Expert systems withapplications, vol. 40, no. 10, pp. 4065–4074, 2013.

[34] W. Shi, H. Wang, and S. He, “Sentiment analysis of chinese microblog-ging based on sentiment ontology: a case study of 7.23 wenzhou traincollision,” Connection Science, vol. 25, no. 4, pp. 161–178, 2013.

[35] L.-z. Liu, H. Liu, H.-s. Wang, W. Song, and X.-l. Zhao, “Generatingdomain-specific affective ontology from chinese reviews for sentimentanalysis,” Journal of Shanghai Jiaotong University (Science), vol. 20,pp. 32–37, 2015.

[36] X. Hu, J. Tang, and H. Liu, “Online social spammer detection,” inTwenty-Eighth AAAI Conference on Artificial Intelligence, 2014.

[37] H.-W. Kim, J. R. Zheng, and S. Gupta, “Examining knowledge contribu-tion from the perspective of an online identity in blogging communities,”Computers in Human Behavior, vol. 27, no. 5, pp. 1760–1770, 2011.

[38] S.-A. Bahrainian and A. Dengel, “Sentiment analysis and summarizationof twitter data,” in Computational Science and Engineering (CSE), 2013IEEE 16th International Conference on. IEEE, 2013, pp. 227–234.

[39] H. Hodson, “Twitter hashtags predict rising tension in egypt,” NewScientist, vol. 219, no. 2931, p. 22, 2013.

[40] A. Hossny, K. Shaalan, and A. Fahmy, “Machine translation model usinginductive logic programming,” in Natural Language Processing andKnowledge Engineering, 2009. NLP-KE 2009. International Conferenceon. IEEE, 2009, pp. 1–8.

[41] A. Lopez, A. Detz, N. Ratanawongsa, and U. Sarkar, “What patientssay about their doctors online: a qualitative content analysis,” Journal ofgeneral internal medicine, vol. 27, no. 6, pp. 685–692, 2012.

[42] K. Ikeda, G. Hattori, C. Ono, H. Asoh, and T. Higashino, “Twitter userprofiling based on text and community mining for market analysis,”Knowledge-Based Systems, vol. 51, pp. 35–47, 2013.

[43] X. Zhang, H. Fuehres, and P. A. Gloor, “Predicting stock marketindicators through twitter i hope it is not as bad as i fear,” Procedia-Social and Behavioral Sciences, vol. 26, pp. 55–62, 2011.

[44] S. O. Orimaye, S. M. Alhashmi, and E.-G. Siew, “Buy it-dont buy it:sentiment classification on amazon reviews using sentence polarity shift,”in PRICAI 2012: Trends in Artificial Intelligence. Springer, 2012, pp.386–399.

[45] M. Araujo, P. Goncalves, M. Cha, and F. Benevenuto, “ifeel: A systemthat compares and combines sentiment analysis methods,” in Proceedingsof the companion publication of the 23rd international conference onWorld wide web companion. International World Wide Web ConferencesSteering Committee, 2014, pp. 75–78.