Sentiment Analysis in Social...

22
Sentiment Analysis in Social Streams Hassan Saif, F. Javier Ortega, Miriam Fern´ andez, Iv´ an Cantador Abstract In this chapter we review and discuss the state of the art on sentiment analysis in social streams –such as web forums, micro-blogging systems, and so- cial networks–, aiming to clarify how user opinions, affective states, and intended emotional effects are extracted from user generated content, how they are modeled, and how they could be finally exploited. We explain why sentiment analysis tasks are more difficult for social streams than for other textual sources, and entail going beyond classic text-based opinion mining techniques. We show, for example, that social streams may use vocabularies and expressions that exist outside the main- stream of standard, formal languages, and may reflect complex dynamics in the opinions and sentiments expressed by individuals and communities. 1 Introduction Sentiment Analysis is the field of study that analyzes the people’s attitudes towards entities –individuals, organizations, products, services, events, and topics–, and their attributes [36]; The attitudes may correspond to personal opinions and evaluations, affective states (sentiments and moods), or intended emotional effects. It represents a large problem space, covering different tasks, such as subjectivity identification, sentiment extraction and analysis, and opinion mining, to name a few. Hassan Saif Knowledge Media Institute, Milton Keynes, United Kingdom, e-mail: [email protected] F. Javier Ortega Universidad de Sevilla, Seville, Spain, e-mail: [email protected] Miriam Fern´ andez Knowledge Media Institute, Milton Keynes, United Kingdom, e-mail: [email protected] Iv´ an Cantador Universidad Aut´ onoma de Madrid, Madrid, Spain, e-mail: [email protected] 1

Transcript of Sentiment Analysis in Social...

Page 1: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

Sentiment Analysis in Social Streams

Hassan Saif, F. Javier Ortega, Miriam Fernandez, Ivan Cantador

Abstract In this chapter we review and discuss the state of the art on sentimentanalysis in social streams –such as web forums, micro-blogging systems, and so-cial networks–, aiming to clarify how user opinions, affective states, and intendedemotional effects are extracted from user generated content, how they are modeled,and how they could be finally exploited. We explain why sentiment analysis tasksare more difficult for social streams than for other textual sources, and entail goingbeyond classic text-based opinion mining techniques. We show, for example, thatsocial streams may use vocabularies and expressions that exist outside the main-stream of standard, formal languages, and may reflect complex dynamics in theopinions and sentiments expressed by individuals and communities.

1 Introduction

Sentiment Analysis is the field of study that analyzes the people’s attitudes towardsentities –individuals, organizations, products, services, events, and topics–, and theirattributes [36]; The attitudes may correspond to personal opinions and evaluations,affective states (sentiments and moods), or intended emotional effects. It representsa large problem space, covering different tasks, such as subjectivity identification,sentiment extraction and analysis, and opinion mining, to name a few.

Hassan SaifKnowledge Media Institute, Milton Keynes, United Kingdom, e-mail: [email protected]

F. Javier OrtegaUniversidad de Sevilla, Seville, Spain, e-mail: [email protected]

Miriam FernandezKnowledge Media Institute, Milton Keynes, United Kingdom, e-mail: [email protected]

Ivan CantadorUniversidad Autonoma de Madrid, Madrid, Spain, e-mail: [email protected]

1

Page 2: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

2 Hassan Saif, F. Javier Ortega, Miriam Fernandez, Ivan Cantador

Although some of the above tasks have been addressed on multi-modal datasources –e.g., sentiment extraction in audio and video–, from its origins, sentimentanalysis has mainly focused on textual data sources [48]. Hence, it commonly refersto the use of natural language processing, text analysis, and computational linguis-tics to extract and exploit subjective information from text materials. In Chapters ??and ?? the reader can find overviews of the state of the art in affective informationrepresentation and acquisition for various modalities.

With the advent of the Social Web , the amount of text material is huge and growsexponentially every day. The Web is a source of up-to-date, never-ending streamsof user generated content; people communicate online with contacts in social net-works, create or upload multimedia objects in online sharing sites, post comments,reviews and ratings in blogs and recommender systems, contribute to wiki-stylerepositories, and annotate resources in social tagging platforms.

The Web thus provides unstructured information about user opinions, moods andemotions, and tastes and interests, which may be of great utility to others, includ-ing consumers, companies, and governments. Hence, for instance, someone whowants to buy a camera may look in web forums for online opinions and reviewsabout different brands and models, while camera manufacturers implicitly/explicitlyget feedback from customers to improve their products, and adapt their marketingstrategies. qVery interestingly, this information can go beyond reflecting the users’subjective evaluations and sentiments about entities and their changes over time, bytriggering chains of reactions and new events. For instance, identifying the over-all concern, expressed in social media, on certain political decision may impact themodification or rejection of such decision.

The interest and potential exploitation of sentiment analysis in social streams –understood as social media in which user generated content emerges and changesrapidly and constantly–, are evident, and have been shown in numerous domains andapplications, like politics and e-government [6][45][77], education and e-learning[76], business and e-commerce [85], and entertainment [23][72][80]. The reader isreferred to several chapters of this book for detailed surveys of particular applica-tions of affective information by personalized services, specifically by recommendersystems (Chapters ??, ?? and ??), conversational systems (Chapter ??), multimediaretrieval systems (Chapters ?? and ??), and e-learning systems (Chapter ??).

The high availability of user generated content in social streams, nonetheless,comes with some challenges. The large volume of data makes difficult to get the rel-evant information in an efficient and effective way. Proposed techniques have to besimple enough to scale up, but have to deal with complex data. Some of these chal-lenges are related to general natural language processing (NLP) approaches, such asopinion-feature association [31], opinion negation [32], irony and sarcasm [12][18],and opinion spam [33]. Others, in contrast, are related to issues characteristic ofonline user generated content, such as multiple languages, high level of ambiguityand polysemy, misspellings, and slang and swear words [70]. In this context, it isalso important to mention the need of determining the users’ reputation and trust.For certain topics, the majority opinion (i.e., the wisdom of the crowd) may be thebest solution [49], while for others, only the experts’ opinions should be the source

Page 3: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

Sentiment Analysis in Social Streams 3

of information to consider [87]. Another relevant issue is the existence of particu-lar pieces and forms of information existing in social streams: explicit citations tousers, groups and organizations (e.g., @robinwilliams in Twitter), explicit forms forreferring to concepts (e.g., Twitter hashtags #comedian and #funny), emoticons andslang terms noting emotions and moods (e.g., :D and lol), mechanisms to expressinterests and tastes (e.g., Facebook likes), and URLs to resources that complementposted information. There, the use of contextual metadata also plays a key role;extracting and mining time and geo-location metadata may be very valuable forsentiment analysis on dynamic and global social stream data.

In this chapter, we review and discuss the state of the art on sentiment analysis insocial streams, describing how opinion and affective information is extracted, pro-cessed, modeled, and exploited, in comparison to classic text-based opinion miningtechniques.

The chapter is structured as follows. In Section 2 we overview the research lit-erature in Sentiment Analysis, focusing on the main addressed tasks and appliedtechniques. In Section 3 we provide a description of social media, characterizingthe user generated content and research challenges that arise from them. Next, inSection 4 we discuss Sentiment Analysis to social streams, and describe existingapplications in such context. Finally, in Section 5 we discuss current and open re-search trends on Sentiment Analysis in social streams.

2 Sentiment Analysis

In the last fifteen years, Sentiment Analysis and Opinion Mining have been fed by anumber of research problems and opportunities of increasing importance and inter-est [48]. In this section we review the main tasks addressed in the literature related tosentiment analysis, together with the different assumptions and approaches adopted.We then discuss some interesting proposals, resources and techniques intended todeal with those tasks.

2.1 Sentiment Analysis Tasks

The different sentiment analysis tasks can be categorized based on the granularity oftheir linguistic units they consider. In this sense, there are tasks where the documentis assumed to be the main linguistic unit as a whole, while there are others wheresentences or even words are considered as linguistic units. We can summarize theselevels as follows:

• Document-level: At this level, it is assumed that each document expresses a par-ticular sentiment, or at least it poses a predominant one. Many works have facedsentiment analysis tasks at the document level; see for example the survey pre-sented in [78].

Page 4: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

4 Hassan Saif, F. Javier Ortega, Miriam Fernandez, Ivan Cantador

• Sentence-level: Some tasks could benefit from the determination of the senti-ment in a text at a sentence level, as done in information extraction and questionanswering systems, where it is necessary to provide the user with particular in-formation for a given topic.

• Aspect-level: In general, a sentence can contain more than one opinion about dif-ferent aspects of an entity or topic. In an aspect-level approach, the context ofthe words are taken into account to determine the subjectivity of each expressionin a sentence, and the specific aspect being opinionated [82][84]. This level canbe useful, for example, in recommender systems [13], and in automatic process-ing of product reviews [16][44], where knowing individual opinions about eachfeature of a given product is crucial for the performance of the system.

• Word-level (also called as entity level): In this category we can find those tasksconsisting of identifying the sentiment expressed by a given word regardless itcontext. Word-level analysis is useful in order to build resources like sentimentlexicons with the possible sentiment orientations of a word [29][64].

Another possible classification of sentiment analysis tasks can be made fromthe point of view of the dependency on the target domain. While some tasks aredefined independently of the domain of application –like subjectivity detection–,some research works have shown the influence of domain-dependency on sentimentanalysis problems –e.g., polarity detection [16][53][52][83].

In general, the following are the main goals of sentiment analysis:

• Subjectivity detection. Identifying subjective and objective statements.• Polarity opinion detection. Identifying positive and negative opinions within sub-

jective texts.• Emotion detection. Identifying human emotions and moods.

Subjectivity detection can provide valuable knowledge to diverse NLP-based ap-plications. In principle, any system intended to extract pieces of information froma large collection of texts could take advantage of subjectivity detection approachesas a tool for identifying and considering/discarding non-factual information [57].Such is the case of question answering [86] and information extraction systems.

Polarity detection aims to identify whether a text expresses a positive or a neg-ative sentiment from the writer. Since it is very common to address this task onlyon subjective texts, usually a subjectivity detection stage is needed. Hence, in theliterature we can find a number of works that tackle both problems –subjectivityand polarity detection– as a single one. Existing approaches commonly distinguishbetween three types of texts: positive, negative, and neutral or objective texts. Someworks have shown this approach is much more challenging than the binary classi-fication of subjective texts [57]. Applications of polarity classification are the iden-tification of the writer’s political ideology –since it can be considered as a binaryclassification problem [21]–, and the analysis of product reviews –determining userpositive or negative opinions about a given item (a product, a movie, a hotel, etc.)or even personal sentiments about specific features of such item.

In emotion detection, the main object of study is the user’s emotional attitudewith respect to a text. In this context, we may aim to determine the writer’s mood

Page 5: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

Sentiment Analysis in Social Streams 5

towards a text [82] or to identify the emotions “provoked” by the text to the reader[68].

2.2 Sentiment Analysis Approaches

In this section we discuss some interesting approaches intended to deal with thesentiment analysis tasks and goals previously described. For the sake of clarity, weclassify them into two groups, according to the nature of the applied techniques:

• Lexicon-based approaches are those techniques that rely on a resource contain-ing information about the affective terms that may occur in the texts, and usu-ally additional information about such terms (e.g., polarity, intensity, etc.). Theseresources can be manually or automatically generated, domain independent orfocused on a particular domain. Most of these approaches take advantage of theinformation available in a lexicon to compute subjective and affective estimationsover the texts.

• Machine Learning approaches are those techniques that apply a machine learn-ing method to address sentiment analysis tasks. In this case, a majority of tech-niques have been based on Support Vector Machines, which are usually fed withlexical and syntactic features, or even with lexicon-based features, to providesubjective and affective classifications.

It is worth to note that the creation, integration and use of lexicons are crucialin Sentiment Analysis, not only for lexicon-based techniques, but also for machine-learning techniques, which can be enhanced with the information available in suchresources. In this context, General Inquirer [69] can be considered as one of themost relevant and widely used resources. It is a manually built lexicon formed bylemmas with associated syntactic, semantic and pragmatic information. It contains4,206 lemmas manually tagged as positive or negative.

The MPQA (Multi-Perspective Question Answering) is a lexicon of news doc-uments from the world press based on General Inquirer, including a set of wordsobtained from a dictionary and a thesaurus, and a set of automatically compiledsubjective terms [57]. The MPQA lexicon is composed by 8,222 words with a setof syntactic and semantic features (type strength, length, part of speech, stem, andprior polarity).

Following the same schema, the Bing Liu’s English Lexicon (BLEL) [30] con-sists of an automatically generated list of words that have been classified into pos-itive and negative. This classification is manually updated periodically. In total,BLEL contains 4,783 negative words and 2,006 positive words, including mis-spelled terms, morphological variants, and slang words, among others.

Maybe one of the most well-known and widely used lexical resources is Word-Net [42], a thesaurus for English based on the definition of the so-called synsets,which are groups of words with the same meaning and a brief definition (gloss). To

Page 6: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

6 Hassan Saif, F. Javier Ortega, Miriam Fernandez, Ivan Cantador

relate synsets, WordNet provides a number of semantic relations, such as synonymy,hypernonymy, and meronymy.

A very large number of works have used WordNet in a wide number of tasks anddomains, and some of them have aimed to enrich or expand WordNet in differentways. In this context, it is worth to mention the Global WordNet Association1, anon-commercial organization devoted to provide a platform to ease the creation andconnection of WordNet versions in different languages. Regarding the enrichmentof WordNet, we can highlight WordNet Domains [5], a semi-supervised generatedresource that augments WordNet with domain labels for all its synsets. Related toit, we find WordNet Affect [67], which assigns to each WordNet synset a set of af-fective labels encoding emotions, moods, attitudes, behaviors, etc. in order to builda resource suitable for emotion detection, in addition to subjectivity and polarityclassification. Another affective extension of WordNet is SentiWordNet (SWN) [4],which attaches to each WordNet synset three sentiment scores in the range [0,1]summing up to 1, representing positivity, negativity and objectivity degrees of eachsynset. The polarities of words are assigned by means of a propagation of the po-larity of some manually picked synsets through the relations in WordNet. SWNincludes 117,000 synsets with sentiment scores.

The main advantage of WordNet-based resources and techniques over MPQA,BLEL or General Inquirer is the lack of semantic ambiguity between synsets, whichunequivocally represent the term meaning. Word sense disambiguation constitutesa crucial problem in NLP, and most of the works using the above lexicons addresssuch problem by computing the polarity at the level of words or lemmas by meansof the polarity values from all the respective synsets [1][71]. In addition to this,the graph structure of WordNet-based resources allows for the application of graph-based techniques in order to better exploit the semantic information encoded withinthe relations.

Among the existing lexicon-based approaches, the technique presented in [78]has been a main reference work for many others. This technique is applied overmanually selected sets of strongly positive words (such us excellent and good) andstrongly negative words (such as poor and bad), which are considered as seed terms.The technique computes the Pointwise Mutual Information (PMI) between inputwords and the seeds in order to determine the polarity of the former. Since the po-larity of a word depends on the relation between the word and the seed sets, thetechnique is usually called semantic orientation by association. A similar idea isproposed in [34], but replacing the PMI computation by building a graph with theadjectives in WordNet for computing the polarity of a word; specifically, by select-ing the shortest graph path from the synset of the word to the synsets of the positiveand negative seeds.

With respect to machine learning-based approaches, a considerable number ofworks has been done, applying well-known machine learning techniques, such asSVM and LSA, to deal with sentiment analysis tasks. These works usually includethe exploitation of lexical, syntactic and semantic features suitable for the classifica-

1 http://globalwordnet.org/

Page 7: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

Sentiment Analysis in Social Streams 7

tion problems that must be tackled in sentiment analysis for the subjectivity and po-larity detection. Among these features, one may highlight n-grams, Part-Of-Speech(POS) tags, PMI and features extracted from lexicons [19][84]. In this context, it hasto be noted that the joint use of lexicon- an machine learning-based approaches canbe performed in the opposite direction, i.e., by using machine-learning techniques inorder to improve lexicon-based approaches. For instance, in [51] LSA-based tech-niques are used to expand a given lexicon for different languages.

The work presented in [29] is another representative example of a machinelearning-based sentiment analysis approach. It aims to predict the orientation ofsubjective adjectives by analyzing a large unlabeled document set, and looking forpairs of adjectives linked with conjunctions. It then builds a graph where the nodescorrespond to terms connected by equal-orientation or opposite-orientation edges,according to the conjunctions that link the terms, and finally apply a clustering al-gorithm that partitions the graph into clusters of positive and negative terms.

A combination of ideas from Turney [78] and Hatzivassiloglou [29] is presentedin [15], where a set of seed words is used to introduce a bias in a random-walkalgorithm that computes a ranking of the terms in a graph of words linked accordingto the conjunctions that join them in the texts. In the generated rankings, positiveand negative terms are respectively located into the highest and lowest positions.The word graph is also used as a mechanism to process the negations in the textby developing a PageRank-based algorithm that builds graphs with positive andnegative weighted edges.

3 Sentiment Analysis on User Generated Content

Online social media platforms support social interactions by allowing users to createand maintain connections, share information, collaborate, discuss, and interact in avariety of ways. The proliferation and usage of these platforms have experienced anexplosive growth in the last decade, expanding to all areas of society, such as enter-tainment, culture, science, business, politics, and public services. As a result, a largeamount of user generated content is continuously being created, offering individualsand organizations a fast way to monitor people’s opinions and sentiments towardsany form of entity, such as products, services and brands.

The nature and purpose of these platforms is manifold, and thus they differ in avariety of aspects, such as the way in which users establish connections, the mainactivities they conduct, and the type of content they share. These characteristicspose novel challenges and opportunities to sentiment analysis researchers. In thesubsequent sections, we characterize the user generated content available in populartypes of existing social media platforms, and present the major challenges to processsuch content in the context of sentiment analysis and opinion mining.

Page 8: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

8 Hassan Saif, F. Javier Ortega, Miriam Fernandez, Ivan Cantador

3.1 Characterizing User Generated Content

In the literature, social media platforms have been categorized in different ways[35]2. Here we propose a categorization based on three dimensions: the type of userconnections, the type of user activities, and the type of contents generated/sharedwithin the platforms. We summarize such a categorization in Table 1.

• Connections: Users connections –e.g., friendship and following relations– in so-cial media are based on three main models: explicit connections, which can bereciprocal –u follows v, and v follows u– and non-reciprocal –u follows v, butnot necessary v follows u–, and implicit connections, where relations are ex-tracted via interactions in the social platform –e.g., if user u posts a message anduser v replies to that message, an implicit relation between v and u may be as-sumed. An example of a social platform that uses explicit reciprocal connectionsis Facebook3 via its friendship relations. Twitter4, differently, uses explicit non-reciprocal connections via its follower-followee relation; if a user u follows auser v on Twitter, it does not necessarily imply that v follows u. Implicit connec-tions, on the other hand, are more common in forums and blogs, where users postquestions, evaluations or opinions, and other users react to the posted content.

• Activities: Users may perform different activities and have different goals whenparticipating in a social media platform. In this chapter we mainly focus on fiveactivities: nurturing social connections, discussing about particular issues andtopics, asking for information, sharing content and, collaborating with others forcertain tasks. Note that the majority of social media may allow performing vari-ous of these activities.

• Types of contents: The third dimension to categorize social platforms is the typeof content that users share between them. Here we distinguish between six maintypes: text, micro-text, tags, URLs, videos and images. Text and micro-text con-tents differ on their number of characters. Micro-text is characteristic of micro-blogging platforms, such as Twitter, which allows a maximum of 140 charactersin their text messages. Note that, as with activities, many of the existing platformsallow for multiple combinations of these content types, although their focus tendsto be on few of them.

According to these three dimensions, social platforms can be described as fol-lows:

• Forums: Forums and discussion boards are mainly focused on allowing users tohold conversations and to discuss about particular issues and topics. A user gen-erally posts an comment, opinion or question, and other users reply, starting aconversation. All the posts related to a conversation are grouped into a structure

2 http://decidedlysocial.com/13-types-of-social-media-platforms-and-counting/,http://outthinkgroup.com/tips/the-6-types-of-social-media3 http://www.facebook.com4 http://twitter.com

Page 9: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

Sentiment Analysis in Social Streams 9

Table 1 Main characteristics of particular social media.

Social media

Forums Q&Asystems

Wikis Blogs Micro-blogs

Socialnetworks

Socialtaggingsystems

Userconnections

ExplicitReciprocal x x x

Non-reciprocal x

Implicit x x x x

Actions

Nurturingsocial connections

x

Discussingissues and topics

x x x x

Askingfor information

x x

Sharingcontent

x x x x x x

Collaboratingin tasks

x

Contents

Text x x x x x

Micro-text x

Tags x x

URLs x x x x x x

Videos x x x x

Images x x x x

called thread. The predominant type of content in these platforms is the text gen-erated with the evolution of the users’ discussions. User connections in forumsusually are implicit. In general, users are not “friends” with each other explicitly,but connections between them can be extracted from the question-reply chainsof their discussions. An example of this type of social platform is Boards.ie5, apopular Irish public forum board system, which is not restricted to certain topic,and where users discuss about any domain or topic, e.g., politics, sports, movies,TV programs, and music.

• Q&A systems: Question Answering (QA) platforms can be understood as a par-ticular type of forums, where the main goal of their users is to ask for information,and therefore discussions are generated around the answers to formulated ques-tions. A popular example of QA system is Stack Overflow6, where users ask avariety of questions about computer programming. A particular characteristic ofStack Overflow and other QA platforms, is that users can gain reputation pointsbased on the quality of their contributions.

• Wikis: The key goal of wikis is to enable collaboration between users in orderto create content (ideas, documents, reports, etc.). Users are therefore allowed toadd, modify and delete content in collaboration with others. Connections in thistype of platforms are generally implicit, and are derived from common editing of

5 http://www.boards.ie6 http://stackoverflow.com

Page 10: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

10 Hassan Saif, F. Javier Ortega, Miriam Fernandez, Ivan Cantador

a particular resource: a wiki page. The main type of content generated in wikis istext, but other content types, such as images and URLs, are also quite common.One of the most popular examples of this type of platforms is Wikipedia7, a wikiwith more than 73,000 editors around the world, who have contributed to thecreation of a very large open online encyclopedia.

• Blogs: Blogs represent a more “personal” type of platform with respect to forumsand wikis. When using these platforms, the main goal is to share information, al-though this often generates discussions. A user does not participate in a blog, butowns it, and uses it to share explanations, opinions or reviews about a variety ofissues. Other users can comment about particular blog posts, sometimes gener-ating large discussions. Differently to forums, these discussions are not groupedinto threads, but are located under a particular blog post. Multimedia content(photos, videos) are also frequent within this type of platforms. Popular exam-ples of blogging platforms are Blogger8 and WordPress9.

• Microblogs: Microblogs can be considered as a particular type of blog, where theposted content typically is much smaller. Microblogs are also focused on sharinginformation, but in this case, information is exchanged in small elements, such asshort sentences, individual images, videos, and URLs. As opposed to blogs, mi-croblogs generally allow for explicit user connections, both reciprocal and non-reciprocal. One of the most popular micro-blogging platforms is Twitter, whichallows a maximum message length of 140 characters. This limitation forces usersto use abbreviations and ill-formed words, which represent important challengeswhen analyzing sentiments and opinions.

• Social networks: The main goal of social networks is to maintain and nurturesocial connections. With this purpose they enable the creation of explicit, recip-rocal relations between users. Most of these platforms also support other types ofactivities, such as sharing content and enabling discussions. In this sense, usersshare text, URLs and multimedia content within a platform. Popular examples ofsocial networks are LinkedIn10, which is focused on professional connections,and Facebook, which tends to be more focused on personal relations.

• Social tagging systems: In these platforms, users create or upload content (e.g.,images, audios, videos), annotate it with freely chosen words (called tags), andshare it with others. The whole set of tags constitutes an unstructured collab-orative categorization scheme, which is commonly known as folksonomy. Thisimplicit categorization is then used to search for and discover resources of inter-est. In principle, social tagging systems are not conceived for connecting users.Nonetheless, the shared tags and annotated items are usually used to find implicitrelations between users based on common interests and tastes. Moreover, tags donot always describe the annotated items, but reflect personal opinions and emo-

7 http://www.wikipedia.org8 http://www.blogger.com9 http://www.wordpress.com10 http://www.linkedin.com

Page 11: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

Sentiment Analysis in Social Streams 11

tions concerning such items [10]. Popular sites with social tagging services areFlickr11, YouTube12 and Delicious13.

Note that our purpose is not to provide an exhaustive categorization of social me-dia, but an overview of the main types of platforms used in the literature to extractand capture opinion and affective information. Other categorizations and platformsexist, such as social bookmarking systems and multimedia sharing sites. In the fol-lowing subsection, we explain the challenges and opportunities that social mediacontent poses to the extraction and analysis of the above information.

3.2 Challenges of Sentiment Analysis in Social Media

Content generated by users via social media in general, and micro-blogging plat-forms in particular, poses multiple challenges to sentiment analysis [38][60]. In thissection we aim to overview and summarize some of these challenges.

• Colloquial language: Social platforms, except those targeting professional cir-cles, are commonly used for informal communication. Colloquial written lan-guage generally contains spelling, syntactical and grammatical mistakes [73]. Inaddition, users tend to express their emotions and opinions using slang terms,emoticons, exclamation marks, irony and sarcasm [38]. Processing ill-formedtext, understanding the semantics of slang language, emphasizing the detectedemotion/opinion level according to exclamation marks, and detecting that theemotion expressed by a user is the opposite than the emotion reflected within thetext due to sarcasm, represent difficult challenges for current NLP and sentimentanalysis tools.

• Short texts: Small pieces of text are typical in micro-blogging platforms, such asTwitter, where a maximum of 140 characters per message is allowed. To con-dense their messages, users make use of abbreviations (e.g., lol for laugh outloud), ill-formed words (e.g., 2morrow for tomorrow), and sentences lacking syn-tactical structure (e.g., TB Pilot Measuring up (Time):<1 week from data shar-ing). The lack of syntactical structure, as well as the appearance of abbreviationsand contemporaneous terms not recorded in dictionaries, represent importantchallenges when attempting to understand the affective information expressedwithin the texts [60].

• Platform-specific elements: Some social platforms have their own symbols andtextual conventions to express opinions (e.g., Facebook “likes”, Google+ “+1”,and StackOverflow points to reward high quality answers), topics (e.g., Twitterhashtags), and references to other users (e.g., Twitter @ symbol). To exploit theseconventions, sentiment analysis methods and tools have to be adapted [75].

11 http://www.flickr.com12 http://www.youtube.com13 http://delicious.com

Page 12: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

12 Hassan Saif, F. Javier Ortega, Miriam Fernandez, Ivan Cantador

• Real-time Big Data: User generated content coming from popular social media,such as Facebook and Twitter, is characterized by the Big Data challenges, in-cluding: volume –data size–, velocity –the speed of change–, variety –differenttypes of data–, and veracity –the trustworthiness of the data. Hence, sentimentanalysis techniques applied to social media platforms have to deal with: process-ing massive amounts of data in short periods of time, dealing with the constantemergence of new words and topics, managing data in different formats (text,image, video), and assessing the veracity of data sources [7][8][65]. From theseaspects, we highlight the velocity aspect, which implies not only to capture andprocess the user generated content in real time, but also to perform a response(e.g., recommendation, news provision, trending topic detection) as fast as pos-sible, since it is a common demanding functionality from social media users.

4 Sentiment Analysis in Social Streams

Once presented the main Sentiment Analysis tasks and techniques (Section 2), anddescribed the characteristics of user generated content with regards to the expressionof personal opinions and sentiments (Section 3), in the subsequent sections we focuson particular problems and applications of Sentiment Analysis in social streams.

4.1 Sentiment Analysis Problems Addressed in Social Streams

Sentiment Analysis is an essential processing task for personalized services that aimto exploit textual content –such as microblog messages and social tags– generatedin social streams, since they usually reflect the users’ subjectivity, in terms of opin-ions and sentiments for certain issues and topics. For such purpose, in addition tothe fundamental sentiment analysis problems –such as entity and opinion recogni-tion, and sentiment polarity estimation–, there are aspects that have to be taken intoaccount. User-generated content in social streams presents a number of interestingphenomena, namely opinion spam, user reputation, irony, sarcasm, and emotion dy-namics. If we intend to address these issues, we have to go beyond classic text-basedopinion mining techniques.

Opinion spam [33] is aimed to disturb the normal behavior in social media ser-vices, especially those integrated in recommendation and e-commerce systems, byintroducing a bias towards a specific opinion tendency that promotes or demotes anentity (e.g., a product, a service, a brand), or makes users express reviews and opin-ions in a certain direction. The identification of opinion spam represents a crucialproblem for opinion mining and sentiment analysis approaches, which should beable to detect deceptive opinions that try to simulate real user reviews that increaseor harm an entity’s reputation [28][47]. In certain media, such as social networksand microblogging platforms, the users’ responses (e.g. by unfollowing contacts,

Page 13: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

Sentiment Analysis in Social Streams 13

and posting complaint comments) may represent a valuable source of informationto detect spam content.

The writers’ reputation is another important aspect of sentiment analysis of user-generated content. From the point of view of a review site, the higher the reputationof a review author, the more reliable the review can be to other customers, andsometimes vice versa: A review that is seen as reliable by the users can providehigh reputation to its author. In this sense, determining the reputation of the au-thors of a content can be helpful for opinion spam detection. Moreover, some siteshave adopted reputation systems as a tool for avoiding or at least discouraging theproduction of opinion spam. The reputation of users also plays an important rolein social networks, since automatically determining the most influential users in agiven network can be really valuable [20].

Given the subjective nature of user-generated content, another relevant phe-nomenon is the existence of irony or sarcasm in the texts. This can constitute aserious problem for many tasks in Sentiment Analysis, like detection of subjectivityand the classification of the polarity of a given opinion, since the explicit text con-tent reflects the opposite of the sentiment really expressed by the writers. Most ofpublished works has focused on the identification of one-liners (jokes or humorouscontents in short texts), but there are some researches aimed to extract humorouspatterns from longer texts. In a different way, there are also approaches that useresults of sentiment analysis in order to detect humor in texts, for example the (neg-ative) polarity of a text has been taken as a feature to retrieve patterns of humorouscontents [41], and syntactic and semantic features have been used as indicativesof humor [56], e.g., semantic ambiguity, the appearance of emoticons, idioms andslang language, and the abundance/absence of punctuation marks, to name a few. Inthe case of social media, certain user responses, such as expressions of amusementand laughing emoticons, may be used as a source for identifying contents with ironyand sarcasm, which may be difficult to detect if no additional information apart fromthe contents themselves exist.

4.2 Applications of Sentiment Analysis on Social Streams

Sentiment analysis over social platforms offers a fast and effective way to monitorthe public’s opinions and feelings towards products, brands, governments, events,etc. Such insights can be used to support decision making in a variety of scenar-ios. In this section we present three scenarios where the extraction and exploitationof affective information from social streams have become key for certain decisionmaking tasks.

• Sentiment Analysis in politics and e-government: Designing and implementinga policy at any level of government is a complex process. One of the key dif-ficulties is finding and summarizing public opinions and sentiments. Citizensdo not actively participate in e-government portals [43], and policy specialists

Page 14: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

14 Hassan Saif, F. Javier Ortega, Miriam Fernandez, Ivan Cantador

lack of appropriate tools to take into account the citizens’ views on policy is-sues expressed in real time through social network discussions. Governments arecurrently investing in research and development14 to learn about the citizens, bysummarizing public opinion via popular social platforms, and to engage themmore effectively. One of the key challenges that arises from this scenario is thelack of awareness of the characteristics of those users that discuss politic issuesin social media, and whether those users really represent the public opinion [24].Sentiment analysis tools are therefore challenged in this scenario to complementaffective information with details about the citizens and organizations behind thegathered opinions. Another common task in which social streams are used as asource of affective and opinion information about politics is the prediction ofthe outcome and evolution of events, such as elections [45][77], and crises andrevolutions [6], as in the case of the Westgate Mall Terror Attack in Kenya [66].

• Sentiment Analysis in education and e-learning: Schools and universities striveto collect feedback from students to improve their courses and tutorship pro-grams. Such feedback is often collected at the end of a course via survey forms.However, such methods are too controlled, slow, and passive. With the rise ofsocial streams, many students are finding online social streams as perfect venuesin which sharing their experiences, and seeking for peer help and support. Toaddress this issue, educational institutions –such as the Open University15– areworking towards the development of platforms that allow capturing and moni-toring the students’ sentiment and opinion in open social media groups [76]. Theaim is to speed up the reaction to the concerns and challenges raised by students.In this scenario, one of the key challenges that arises is the need for adapting thesentiment and opinion extraction processes to the particularities of the domain.For example, discussions around a World War lecture will generally have a neg-ative connotation. Sentiment analysis tools need to isolate the opinions targetingthe logistics of a course, with respect to the opinions targeting themes inherentto the course.

• Sentiment Analysis in business and e-commerce: Public as well as corporate so-cial platforms generate major economic value to business, and can form pivotalparts of corporate expertise management, corporate marketing, product support,customer relationship management, product innovation, and targeted advertising.Public social platforms are generally used to monitor public opinion and reputa-tion about brands and products [85]16. Corporate social platforms, on the otherhand, are more focused on providing product support and knowledge interchangewithin a company. One of the new challenges associated with managing theseon-line communities is the ability to predict change in their “health”. Providingowners and managers of the social platforms with early warnings (by monitoringthe members’ contributions, opinions, levels of satisfaction, etc.) may facilitatetheir decisions to safeguard the communities, e.g. by maintaining engagement,

14 http://www.wegov-project.eu, http://www.sense4us.eu15 http://data.open.ac.uk16 http://www.brandwatch.com/, http://www.lithium.com/

Page 15: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

Sentiment Analysis in Social Streams 15

reducing community churn, and providing adequate support. The identificationof sentiments is key as an initial indicator, but it does not necessarily representthe overall health of the community. The challenge of sentiment analysis toolsin this scenario is to complement sentiment extraction with techniques for riskdetection in the context of business domains, helping owners and hosts to ensurea sustained stability of their communities.

• Sentiment Analysis in entertainment: In an online entertainment scenario, it iswell accepted that (i) the user’s current mood may affect the type of resource(e.g., a song, a tv series episode, a video game) she prefers to consume at a par-ticular time –partial or completely regardless her personal tastes– and, in the op-posite direction, that (ii) emotions evoked by consumed resources may affect theuser’s current mood. These facts are the basis for the investigation and develop-ment of sentiment-aware engaging services in social media. How user moods anditem-provoked emotions can be determined [25], how they can be related eachother [88], and how they and their relations can be exploited for user entertain-ment applications are indeed emerging research topics, such as those addressed inrecommender systems [72], and multimedia retrieval and entertainment [23][80].

All these application scenarios come with an additional common challenge: scal-ability. Social platforms can easily exceed a million users with hundreds of thou-sands online each day. Content generation may be of Gigabytes per day, and ordersof magnitude more data is derived from observing interaction of the users withina system. Existing data analysis approaches, and in particular sentiment analysistools, currently struggle with these scalability challenges.

5 Discussion

In the previous sections, we have reviewed and discussed the state of the art onsentiment analysis in social streams. We have explained the different types of usergenerated content existing in social media platforms, as well as some of the mostcommon challenges that this type of content poses when analyzing affective andopinion information. We have described the different problems and tasks addressedin the sentiment analysis research area, as well as the variety of techniques thathave been developed to approach them. We have shown examples of applicationsthat use sentiment analysis on social streams to support decision making processin a variety of domains. In this section, we provide an overview of directions thatsentiment analysis area is currently following, and what are the main factors drivingthe research into these directions.

• Sentiments are dynamic: Social streams, such as Twitter, may exhibit very strongtemporal dynamics with opinions about the same entity or event changing rapidlyover time. Since sentiment analysis approaches generally work by aggregating in-formation, a key challenge faced by current sentiment analysis approaches is todetect when new opinions are emerging, so that the new information is not aggre-

Page 16: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

16 Hassan Saif, F. Javier Ortega, Miriam Fernandez, Ivan Cantador

gated to an existing opinion for the given entity. For example, the opinion aboutthe Nexus4 smartphone is generally determined based on a set of posts express-ing sentiment about this particular device. Opinions about it may change overtime, e.g., as new technical problems or bugs are discovered. Sentiment analy-sis approaches should therefore be able to identify opinion changes for entitiesand/or events as long as new issues regarding them emerge. An option adopted byseveral approaches is to define a time-window (minute, hour, day) in which sen-timent is aggregated for the particular entity that is being monitored. However,discussions in social media may emerge and spread really fast, or cool down forlong time periods. Therefore, assessing the right granularity level is key to notloose relevant information when discussions spike, and not waste resources whendiscussions about target entities or events are not present [7][39].

• Sentiments are entity-focused: As discussed in Section 2, sentiment is generallycomputed at document and/or sentence level. Multiple sentiments, nonetheless,can be expressed within the same document or the same sentence towards dif-ferent targets. For example, the post “I love Nexus4 but I don’t like Nexus5 atall!” expresses two different sentiments towards two different targets, the Nexus4and Nexus5 devices. Additionally, when monitoring the sentiment or particu-lar brands, events or individuals in social media, sentiment analysis approachesshould consider if the sentiment of the posts referencing the brand, event or indi-vidual do indeed express sentiment towards those entities. For instance, a signif-icant number of negative posts do exist in social streams mentioning the WWF(the World Wildlife Fund) organization, which do not criticize it, but the neg-ative impact of climate change, the danger of extinction suffered by a numberof species, and other sustainability issues. Furthermore, approaches in the litera-ture of sentiment analysis have emerged in the last few years that aim to identifysentiment targets within a given text, focusing on entity-level and aspect-levelsentiment analysis detection [37][40][54][85], i.e., they first identify the entitiesand events appearing in the text, and then check the sentiment expressed towardsthem.

• Sentiments are semantics-dependent: Most of existing approaches to sentimentanalysis in social streams have shown effective when sentiment is explicitly andunambiguously reflected in text fragments through affective (opinionated) words,such as “great” as in “I got my new Android phone, what a great device!” or “sad”as in “so sad, now four Sierra Leonean doctors lost to Ebola.” However, merelyrelying on affective words is often insufficient, and in many cases does not lead tosatisfactory sentiment detection results [8][27][61]. Examples of such cases arisewhen the sentiment of words differs according to (i) the context in which thosewords occur (e.g., “great” conveys a negative connotation in the context “pain”and positive in the context “smile”), or (ii) the conceptual meaning associatedwith the words (e.g., “Ebola” is likely to be negative when its associated con-cept is “Virus” and likely to be neutral when its associated concept is “River”).Therefore, ignoring the semantics of words when calculating their sentiment, ineither case, may lead to inaccuracies. Recent research in sentiment analysis istherefore focusing on investigating the identification and use of contextual and

Page 17: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

Sentiment Analysis in Social Streams 17

semantic information to enhance the accuracy of traditional machine learning[59] and lexicon-based approaches [61].

• Sentiments are domain-dependent: Sentiment is expressed in social streamswithin multiple domains. For example, the domain of death is generally morenegative than the domain of birth, although both use common terminology, suchas hospital, family, etc. Sentiment analysis approaches need to establish the sen-timent of the targeted domain to be able to establish the positivity/negativity ofthe posts. It has been observed that current sentiment classifiers trained with datafrom one specific domain do indeed fail when applied to a different domain [3].Similarly, while lexicon-based approaches have a higher tolerance to domainchanges, these approaches do suffer when the vocabulary of the domain underanalysis is not well covered by the available sentiment lexicon. Given the greatvariety of topics and domains that constantly emerge in social streams, domainconstraints currently affect the applicability of sentiment analysis approaches.Research is currently being conducted to adapt to new domains, by automaticallyassigning sentiment to terms not previously covered by the lexicons, and by pro-viding dynamic re-training of existing classifiers [9][50][61][65]. There are alsorecent approaches aimed to generate domain-dependent lexicons, such as thatpresented in [25]. In that work, automatic lexicons17 with emotional categoriesfor the movie, music and book domains –e.g., gloomy movies, nostalgic musiccompositions, and suspenseful books– are automatically generated and modeledby exploiting information available in social tagging systems and online thesauri.The terms of these lexicons are also linked to a core lexicon which is composedof weighted terms associated to 16 general emotions –e.g., happiness, calmness,and tension– of the well-known Russell’s circumplex model of affect [58].

• Sentiments are language- and culture-dependent: An important problem whenanalyzing sentiment in social media streams is that posts are written in differentlanguages. Even individual posts may include terminology from a variety of lan-guages within them. Language identification tools are therefore needed to detectthe language in which posts are written [11]. An even more complex problem isthat sentiment is culturally dependent. The way in which we express positivity ornegativity, humor, irony or sarcasm varies depending on our cultural background[22]. Sentiment analysis tools therefore need to account for language and culturevariances to provide accurate sentiment identification. Few research works havebeen recently conducted in this vein, focusing mainly on demographic languagevariations (e.g., age, gender) of users to improve sentiment analysis performance[79, 74].

• Sentiments are personality-dependent: The relationships between emotional statesand personality have been a topic of study in psychology in the last twenty years(see e.g. seminal works as [55]). The reader can find more details on this inChapter ??. In particular, several studies have revealed associations between ex-traversion and neuroticism (sometimes referred as emotional stability) personal-ity traits with individual differences in affective level and environmental response

17 http://ir.ii.uam.es/emotions/

Page 18: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

18 Hassan Saif, F. Javier Ortega, Miriam Fernandez, Ivan Cantador

[14][55]. This, together with the facts that (i) it has been shown that there existcorrelations between user personality traits and user preferences in several do-mains [26], and that (ii) approaches have been proposed to infer user personalityfrom data about user activity and behavior in social streams [2] (see Chapter ??),raise new research opportunities and applications –such as customer character-ization, market segmentation, and personalized recommendation– for sentimentanalysis in the context of the Social Web.

References

1. Agrawal, S., Siddiqui, T. J.: Using Syntactic and Contextual Information for Sentiment Po-larity Analysis. In: Proceedings of the 2nd International Conference on Interaction SciencesInformation Technology, Culture and Human (ICIS’09), pp. 620–623 (2009)

2. Amichai-Hamburger, Y., Vinitzkyb, G.: Social Network Use and Personality. Computers inHuman Behavior 26(6), pp. 1289–1295.

3. Aue, A., Gamon, M.: Customizing Sentiment Classifiers to New Domains: A Case Study. In:Proceedings of the 3rd International Conference on Recent Advances in Natural LanguageProcessing (RANLP’05) (2005)

4. Baccianella, S., Esuli, A., Sebastiani, F.: entiWordNet 3.0: An Enhanced Lexical Resource forSentiment Analysis and Opinion Mining. In: Proceedings of the 7th International Conferenceon Language Resources and Evaluation (LREC’10) (2010)

5. Bentivogli, L., Forner, P., Magnini, B., Pianta, E.: Revising WordNet Domains Hierarchy: Se-mantics, Coverage, and Balancing. In: Proceedings of the COLLING’04 Workshop on Multi-lingual Linguistic Resources (MLR’04), pp. 101–108 (2004)

6. Bhuiyan, S. I.: Social Media and its Effectiveness in the Political Reform Movement in Egypt.Middle East Media Educator 1(1), pp. 14–20 (2011)

7. Bifet, A., Frank, E.: Sentiment Knowledge Discovery in Twitter Streaming Data. In: Proceed-ings of the 13th International Conference on Discovery Science (DS’10), pp. 1–15 (2010)

8. Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New Avenues in Opinion Mining and SentimentAnalysis. IEEE Intelligent Systems 28(2), pp. 15–21 (2013)

9. Cambria, E., Song, Y., Wang, H., Howard, N.: Semantic Multi-Dimensional Scaling for Open-domain Sentiment Analysis. IEEE Intelligent Systems 29(2), pp. 44–51 (2013)

10. Cantador, I., Konstas, I., Jose, J. M. Categorising Social Tags to Improve Folksonomy-basedRecommendations. textitJournal of Web Semantics 9(1), pp. 1–15 (2010)

11. Carter, S., Weerkamp, W., Tsagkias, M. Microblog Language Identification: Overcomingthe Limitations of Short, Unedited and Idiomatic Text. Language Resources and Evaluation47(1), pp. 195–215 (2013)

12. Carvalho, P., Sarmento, L., Silva, M. J., de Oliveira, E.: Clues for Detecting Irony in User-generated Contents: Oh...!! It’s ”so easy” ;-) In: Proceedings of the 1st International Work-shop on Topic-sentiment Analysis for Mass Opinion (TSA’09), pp. 53–56 (2009)

13. Chow, A., Foo, M.-H. N., Manai, G.: HybridRank : A Hybrid Content-Based Approach ToMobile Game Recommendations. In: Proceedings of the 1st Workshop on New Trends inContent-based Recommender Systems (CBRecSys’14), pp. 1–4 (2014)

14. Corr, P. J.: The Reinforcement Sensitivity Theory. In: Corr, P. J. (Ed.), The ReinforcementSenitivity Theory of Personality. Cambridge University Press (2008)

15. Cruz, F. L., Vallejo, C. G., Enrıquez, F., Troyano, J. A.: PolarityRank: Finding an Equilibriumbetween Followers and Contraries in a Network. Information Processing and Management48(2), pp. 271–282 (2012)

16. Cruz, F. L., Troyano, J. A., Enrıquez, F., Ortega, F. J., Vallejo, C. G.: Long Autonomy orLong Delay? The Importance of Domain in Opinion Mining. Expert Systems and Applications40(8), pp. 3174–3184 (2013)

Page 19: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

Sentiment Analysis in Social Streams 19

17. Cruz, F. L., Troyano, J. A., Pontes, B., Ortega, F. J.: Building Layered, Multilingual SentimentLexicons at Synset and Lemma Levels. Expert Systems and Applications 41(13), pp. 5984–5994 (2014)

18. Davidov, D., Tsur, O., Rappoport, A.: Semi-supervised Recognition of Sarcastic Sentencesin Twitter and Amazon. In: Proceedings of the 14th Conference on Computational NaturalLanguage Learning (CoNLL’10), pp. 107-116 (2010)

19. Dehkharghani, R., Yanikoglu, B., Tapucu, D., Saygin, Y.: Adaptation and Use of SubjectivityLexicons for Domain Dependent Sentiment Classification. In: Proceedings of the 12th IEEEInternational Conference on Data Mining Workshops, pp. 669–673 (2012)

20. Barbagallo, D., Bruni, L., Francalanci, C., Giacomazzi, P.: An Empirical Study on the Rela-tionship between Twitter Sentiment and Influence in the Tourism Domain. Information andCommunication Technologies in Tourism, pp. 506–516 (2012)

21. Durant, K. T., Smith, M. D. Mining Sentiment Classification from Political Web Logs. In:Proceedings of the WebKDD’06 Workshop on Web Mining and Web Usage Analysis (2006)

22. Elahi, M. F., Monachesi, P.: An Examination of Cross-Cultural Similarities and Differencesfrom Social Media Data with respect to Language Use. In: Proceedings of the 8th Interna-tional Conference on Language Resources and Evaluation (LREC’12), pp. 4080–4086 (2012)

23. Feng, Y., Zhuang, Y., Pan, Y.: Music Information Retrieval by Detecting Mood via Computa-tional Media Aesthetics. In: Proceedings of the 2003 IEEE/WIC International Conference onWeb Intelligence (WI’03), pp. 235–241 (2003)

24. Fernandez, M., Allen, B., Wandhoefer, T., Cano, E., Alani, H.: Using Social Media To InformPolicy Making: To Whom are we Listening? In: Proceedings of the 1st European Conferenceon Social Media (ECSM’14), pp. 174–182 (2014)

25. Fernandez-Tobıas, I., Cantador, I., Plaza, L.: An Emotion Dimensional Model Based on So-cial Tags: Crossing Folksonomies and Enhancing Recommendations. In: Proceedings of the14th International Conference on E-Commerce and Web Technologies (EC-Web’13), pp. 88–100 (2013)

26. Fernandez-Tobıas, I., Cantador, I.: Personality-Aware Collaborative Filtering: An EmpiricalStudy in Multiple Domains with Facebook Data. In: Proceedings of the 15th InternationalConference on E-Commerce and Web Technologies (EC-Web’14), pp. 125–137 (2013)

27. Gangemi, A., Presutti, V., Reforgiato Recupero, D.: Frame-based Detection of Opinion Hold-ers and Topics: A Model and a Tool. IEEE Computational Intelligence Magazine 9(1), pp.20–30 (2014)

28. Hancock, J. T., Cardie, C.: Finding Deceptive Opinion Spam by Any Stretch of the Imag-ination. In: Proceedings of the 49th Annual Meeting of the Association for ComputationalLinguistics (HLT’11), pp. 309–319 (2011)

29. Hatzivassiloglou, V., McKeown, K. R.: Predicting the Semantic Orientation of Adjectives.In: Proceedings of the 35th Annual Meeting on Association for Computational Linguistics(ACL’98), pp. 174–181 (1998)

30. Hu, M., Liu, B.: Mining and Summarizing Customer Reviews. In: Proceedings of the2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD’04), pp. 168–177 (2004)

31. Hu, M., Liu, B.: Opinion Feature Extraction Using Class Sequential Rules. In: Proceeings ofthe AAAI’06 Spring Symposium: Computational Approaches (2006)

32. Jia, L., Yu, C., Meng, W.: The Effect of Negation on Sentiment Analysis and Retrieval Ef-fectiveness. In: Proceedings of the 18th ACM Conference on Information and KnowledgeManagement (CIKM’10), pp. 1827–1830 (2010)

33. Jindal, N., Liu, B.: Opinion Spam and Analysis. In: Proceedings of the 2008 InternationalConference on Web Search and Data Mining (WSDM’08), pp. 219–230 (2008)

34. Kamps, J., Marx, M., Mokken, R. J., Rijke, M.: Using WordNet to Measure Semantic Ori-entations of Adjectives. In: Proceedings of the 4th International Conference on LanguageResources and Evaluation (LREC’04), pp. 1115–1118 (2004)

35. Kaplan, A. M., Haenlein, M.: Users of the World, Unite! The Challenges and Opportunitiesof Social Media. Business Horizons 53(1), pp. 59–68 (2010)

Page 20: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

20 Hassan Saif, F. Javier Ortega, Miriam Fernandez, Ivan Cantador

36. Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (2nd edition).Springer (2011)

37. Long, J., Yu, M., Zhou, M., Liu, X., Zhao, T.: Target-dependent Twitter Sentiment Classi-fication. In: Proceedings of the 49th Annual Meeting of the Association for ComputationalLinguistics: Human Language Technologies (HLT’11), pp. 151–160 (2011)

38. Maynard, D., Bontcheva, K., Rout, D.: Challenges in Developing Opinion Mining Tools forSocial Media. In: Proceedings of NLP can u tag# usergeneratedcontent?! Workshop (2012)

39. Maynard, D., Gossen, G., Funk, A., Fisichella, M.: Should I Care about Your Opinion? De-tection of Opinion Interestingness and Dynamics in Social Media. Future Internet 6(3), pp.457–481 (2014)

40. Meng, X., Wei, F., Liu, X., Zhou, M., Li, S., Wang, H.: Entity-centric Topic-oriented OpinionSummarization in Twitter. In: Proceedings of the 18th ACM SIGKDD International Confer-ence on Knowledge Discovery and Data Mining (KDD’12), pp. 379–387 (2012)

41. Mihalcea, R., Strapparava, C.: Making Computers Laugh. In: Proceedings of the 2005 Con-ference on Human Language Technology and Empirical Methods in Natural Language Pro-cessing (HLT’05), pp. 531–538 (2005)

42. Miller, G.A.: WordNet: a Lexical Database for English. Communications of the ACM 38(11),pp. 39–41 (1995)

43. Miller, L., Williamson, A.: Digital Dialogs - Third Phase Report. Handsard Society (2008)44. Morinaga, S., Yamanishi, K., Tateishi, K., Fukushima, T.: Mining Product Reputations on

the Web. In: Proceedings of the 8th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD’02), pp. 341–349 (2002)

45. O’Connor, B., Balasubramanyan, R., Routledge, B. R., Smith, N. A.: From Tweets to Polls:Linking Text Sentiment to Public Opinion Time Series. In: Proceedings of the 4th Interna-tional AAAI Conference on Weblogs and Social Media (ICWSM’10), pp. 122–129. (2010)

46. Ortega, F. J.: Detection of Dishonest Behaviors in On-line Networks using Graph-based Rank-ing Techniques. AI Communications 26(3), pp. 327–329 (2013)

47. Ott, M., Cardie, C., Hancock, J. T.: Negative Deceptive Opinion Spam. In: Proceedings of2013 Conference of the North American Chapter of the Association for Computational Lin-guistics (NAACL-HLT’13), pp. 497–501 (2013)

48. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Foundations and Trends in Infor-mation Retrieval 2(1-2), pp. 1–135 (2008)

49. Pak, A., Paroubek, P. Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In:Proceedings of the 5th International Conference on Language Resources and Evaluation(LREC’10), pp. 1320–1326 (2010)

50. Peddinti, V. M. K., Chintalapoodi, P.: Domain Adaptation in Sentiment Analysis of Twitter.Analyzing Microtext, vol. WS-11-05 of AAAI’11 Workshops (2011)

51. Perez-Rosas, V., Banea, C., Mihalcea, R.: Learning Sentiment Lexicons in Spanish. In:Proceedings of the 8th International Conference on Language Resources and Evaluation(LREC’12), pp. 3077–3081 (2012)

52. Popescu, A.-M., Etzioni, O.: Extracting Product Features and Opinions from Reviews. In:Proceedings of the Conference on Human Language Technology and Empirical Methods inNatural Language Processing (HLT/EMNLP’05), pp. 339–346 (2005)

53. Qiu, G., Liu, B., Bu, J., Chen, C.: Opinion Word Expansion and Target Extraction throughDouble Propagation. Computational Linguistics 37(1), pp. 9–27 (2011)

54. Recupero, D. R., Presutti, V., Consoli, S., Gangemi, A., Nuzzolese, A. G. Sentilo: Frame-based Sentiment Analysis. Cognitive Computation, September 2014, pp. 1–15 (2014)

55. Revelle, W.: Personality Processes. Annual Review of Psychology 46, pp. 295–328 (1995)56. Reyes, A., Rosso, P., Buscaldi, D.: From Humor Recognition to Irony Detection: The Figura-

tive Language of Social Media. Data & Knowledge Engineering 74, pp. 1–12 (2012)57. Riloff, E., Wiebe, J., Wilson, T. Learning Subjective Nouns Using Extraction Pattern Boot-

strapping. In: Proceedings of the 7th Conference on Computational Natural Language Learn-ing (CoNLL’03), vol. 4., pp. 25–32 (2003)

58. Russell, J. A.: A Circumplex Model of Affect. Journal of Personality and Social Psychology39(6), pp. 1161–1178 (1980)

Page 21: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

Sentiment Analysis in Social Streams 21

59. Saif, H., He, Y., Alani, H. Semantic Sentiment Analysis of Twitter. In: Proceedings of the11th International Semantic Web Conference (ISWC’12), pp. 508–524 (2012)

60. Saif, H., He, Y., Alani, H.: Alleviating Data Sparsity for Twitter Sentiment Analysis. In:Proceedings of the WWW’12 Workshop on Making Sense of Microposts (2012)

61. Saif, H., Fernandez, M., He, Y., Alani, H.: SentiCircles for Contextual and Conceptual Se-mantic Sentiment Analysis of Twitter. In: Proceedings of the 11th Extended Semantic WebConference (ESWC’14), pp. 83–98 (2014)

62. Saif, H., Fernandez, M., He, Y., Alani, H.: On Stopwords, Filtering and Data Sparsity for Sen-timent Analysis of Twitter. In: Proceedings of the 9th International Conference on LanguageResources and Evaluation (LREC’14), pp. 810–817 (2014)

63. Saif, H., He, Y., Fernandez, M., Alani, H.: Semantic Patterns for Sentiment Analysis of Twit-ter. In: Proceedings of the 13th International Semantic Web Conference (ISWC’14) - part 2,pp. 324–340 (2014)

64. Sebastiani, F., Esuli, A.: Determining Term Subjectivity and Term Orientation for OpinionMining. In: Proceedings of the 11th Conference of the European Chapter of the Associationfor Computational Linguistics (EACL’06) (2006)

65. Silva, I. S., Gomide, J., Veloso, A., Meira Jr, W., Ferreira, R.: Effective Sentiment StreamAnalysis with Self-augmenting Training and Demand-driven Projection. In: Proceedings ofthe 34th international ACM SIGIR Conference on Research and Development in InformationRetrieval (SIGIR’11), pp. 475–484 (2011)

66. Simon, T., Goldberg, A., Aharonson-Daniel, L., Leykin, D., Adini, B.: Twitter in the CrossFire – The Use of Social Media in the Westgate Mall Terror Attack in Kenya. PloS one 9(8):e104136 (2014)

67. Strapparava, C., Valitutti, A.: Wordnet-affect: An Affective Extension of WordNet. In:Proceedings of the 4th International Conference on Language Resources and Evaluation(LREC’04), pp. 1083–1086 (2004)

68. Strapparava, C., Mihalcea, R.: Learning to Identify Emotions in Text. In: Proceedings of the2008 ACM Symposium on Applied Computing (SAC’08), pp. 1556–1560 (2008)

69. Stone, P. J., Dunphy, D. C., Smith, M. S.: The General Inquirer: A Computer Approach toContent Analysis. MIT Press (1966)

70. Szomszor, M., Cantador, I., Alani, H.: Correlating User Pprofiles from Multiple Folk-sonomies. In: Proceedings of the 19th ACM Conference on Hypertext and Hypermedia (Hy-pertext’08), pp. 33–42 (2008)

71. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-Based Methods for Sen-timent Analysis. Computational Linguistics 37(2), pp. 267–307 (2011)

72. Tkalcic, M., Burnik, U., Odic, A., Kosir, A., Tasic, J.: Emotion-Aware Recommender Systems- A Framework and a Case Study. In: Markovski, S., Gusev, M. (Eds.), ICT Innovations 2012.Advances in Intelligent Systems and Computing 207, pp. 141–150. Springer (2013)

73. Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment Strength Detectionin Short Informal Text. Journal of the American Society for Information Science and Tech-nology 61(12), pp. 2544–2558 (2010)

74. Thelwall, M., Wilkinson, D., & Uppal, S. Data mining emotion in social network commu-nication: Gender differences in MySpace. Journal of the American Society for InformationScience and Technology 61(1), pp. 190–199 (2010)

75. Thelwall, M., Buckley, K., & Paltoglou, G. Sentiment strength detection for the social web.Journal of the American Society for Information Science and Technology 63(1), pp. 163–173(2012)

76. Thomas, K., Fernandez, M., Brown, S., Alani, H.: OUSocial2 - A Platform for GatheringStudents’ Feedback from Social Media. Demo at the 13th International Semantic Web Con-ference (ISWC’14) (2014)

77. Tumasjan, A., Sprenger, T. O., Sandner, P. G., Welpe, I. M.: Predicting Elections with Twitter:What 140 Characters Reveal about Political Sentiment. In: Proceedings of the 4th Interna-tional AAAI Conference on Weblogs and Social Media (ICWSM’10), pp. 178–185 (2010)

Page 22: Sentiment Analysis in Social Streamsarantxa.ii.uam.es/~cantador/doc/2015/epps-socialstreams15.pdf · reviews and ratings in blogs and recommender systems, ... emotions, and tastes

22 Hassan Saif, F. Javier Ortega, Miriam Fernandez, Ivan Cantador

78. Turney, P.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to UnsupervisedClassification of Reviews. In Proceedings of the 40th Annual Meeting of the Association forComputational Linguistics (ACL’02), pp. 417–424 (2002)

79. Volkova, S., Wilson, T., & Yarowsky, D. Exploring Demographic Language Variations toImprove Multilingual Sentiment Analysis in Social Media. In EMNLP, pp. 1815–1827 (2013)

80. Vorderer, P., Klimmt, C., Ritterfeld, U.: At the Heart of Media Entertainment. CommunicationTheory 14(4), pp. 388–08 (2004)

81. Wiebe, J., Bruce, R., O’Hara, T.: Development and Use of a Gold Standard Data Set forSubjectivity Classifications. In Proceedings of the 37th Annual Meeting of the Associationfor Computational Linguistics (ACL’99), pp. 246–253 (1999)

82. Wiebe, J., Wilson, T., Cardie, C.: Annotating Expressions of Opinions and Emotions in Lan-guage. Language Resources and Evaluation 39(2-3), pp. 165–210 (2006)

83. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing Contextual Polarity in Phrase-Level Senti-ment Analysis. In: Proceedings of the 2005 Conference on Human Language Technology andEmpirical Methods in Natural Language Processing (HLT’05), pp. 347–354 (2005)

84. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing Contextual Polarity: An Exploration ofFeatures for Phrase-Level Sentiment Analysis. Computational Linguistics 35(3), pp. 399–433(2009)

85. Yerva, S. R., Mikls, Z. Aberer, K.: Entity-based Classification of Twitter Messages. Interna-tional Journal of Computer Science and Applications 9, pp. 88–115 (2012)

86. Yu, H., Hatzivassiloglou, V.: Towards Answering Opinion Questions: Separating Facts fromOpinions and Identifying the Polarity of Opinion Sentences. In: Proceedings of the 2003Conference on Empirical Methods in Natural Language Processing (EMNLP’03), pp. 129–136 (2003)

87. Zhang, J., Tang, J., Li, J.: Expert Finding in a Social Network. In: Proceedings of the 12thInternational Conference on Database Systems for Advanced Applications (DASFAA’07), pp.1066–1069 (2007)

88. Zillmann, D.: Mood Management: Using Entertainment to Full Advantage. In: Donohew, L.,Sypher, H. E., Higgins, E. T. (Eds.), Communication, Social Cognition, and Affect, pp. 147–171. Lawrence Erlbaum Associates (1988)