[ACM Press the 3rd International Conference - Madrid, Spain (2013.06.12-2013.06.14)] Proceedings of...

8
Customer Review Summarization Approach using Twitter and SentiWordNet Jihene Jmal LARODEC, Higher Institute Of Management Le Bardo Tunisia [email protected] Rim Faiz LARODEC, IHEC de Carthage Carthage Tunisia [email protected] ABSTRACT Since E-commerce is becoming more and more popular, the number of customer reviews raises rapidly. Opinions on the Web affect our choices and decisions. Thus, it becomes necessary to automatically process a mixture of reviews and prepare to the customer the required information in an appropriate form. In the same context, we present a new approach of feature-based opinion summarization which aims to turn the customer reviews into scores that measure the customer satisfaction for a given product and its features. These scores are between 0 and 1 and can be used for decision making and then help users in their choices. We investigated opinions extracted from nouns, adjectives, verbs and adverbs contrary to previous researches which use essentially adjectives. Experimental results show that our method performs comparably to classic feature-based summarization methods. Categories and Subject Descriptors H.3.3 Information Search and Retrieval, I.2.7 Natural Language Processing, I.5.1 Models, I.7 Document and Text Processing General Terms Algorithms, Measurement. Keywords Opinion mining, Sentiment Classification, Opinion Strength, Feature-based Opinion Summarization, Feature Buzz Summary. 1. INTRODUCTION In the Web 2.0 (social or participatory Web), the user is a lead- er who shares documents, information, believes. He interacts, collaborates with others and expresses his opinion. He has at his disposal several services such as social networks (twitter, face- book, etc.), blogs, forums, wikis, video-sharing sites, photos, music, etc. The frequent use of these services produces a User Generated Content (UGC), which is today an important amount of data. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy oth- erwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. WIMS'13, June 12-14, 2013 Madrid, Spain Copyright © 2013 ACM 978-1-4503-1850-1/13/06... $10.00 This content usually consists of text data that are consi- dered more subjective than professional articles. They are carri- ers of opinions and feelings. One major field that took advantage of this potential is Opinion Mining (Sentiment Analysis or Sub- jectivity Analysis). It is a sub-field of Text Mining which pur- pose is to highlight the sources of opinions and senti- ments in text. An opinion can be defined as a point of view or a personal judgment towards an issue of discussion which takes, usually, the form of an expression of feelings from one person to an entity or an aspect of the entity [20]. In addition, e-commerce becomes more and more popular. Merchants and product manu- facturers let customers review their opinions on the products or services they sold (e.g. amazon.com, epinions.com). These re- views affect our choices and decisions. Indeed, according to a study performed by CREDOC (Research Center for the Study and Observation of Living) in 2009, 57% of French Internet users got interested in the opinions of others on the Web and 66% of them have confidence in these comments [17]. In the Opinion Mining community, there are several research fields such as subjectivity classification [31], sentiment classifica- tion [25] [38] [3], and opinion summarization [12] [23] [9]. How- ever, these approaches did not provide focus on identifying the opinion strength. Therefore, we propose a new approach of auto- matic opinion summarization which aims to turn the customer reviews into scores that measure the intensity of the customer satisfaction for a given product and its features. These scores are between 0 and 1 and can be used for decision making and then help users in their choices. We extract product features from reviews and we assign to each feature a score calculated from its frequency of occurrences in the corpus (collected reviews) weighted by its popularity in the Web 2.0, especially on Twitter1- the most popular microblogging platform. We next identify opi- nion sentences and assign to each verb and adjective a score from SentiWordNet developed by [4]. The main contribution of this research is that we do not consider a product as recommended or not recommended; but we let the customer make his choice according to certain degrees between 0 and 1 concerning the whole product and those of the features he is interested in. Besides, when computing these scores, we investi- gated nouns, adjectives, verbs and adverbs contrary to previous research which use essentially adjectives. This is an excerpt of our output for the product iPod: 1 www.twitter.com

Transcript of [ACM Press the 3rd International Conference - Madrid, Spain (2013.06.12-2013.06.14)] Proceedings of...

Page 1: [ACM Press the 3rd International Conference - Madrid, Spain (2013.06.12-2013.06.14)] Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics - WIMS

Customer Review Summarization Approach using Twitter and SentiWordNet

Jihene Jmal

LARODEC, Higher Institute Of Management Le Bardo Tunisia

[email protected]

Rim Faiz LARODEC, IHEC de Carthage

Carthage Tunisia

[email protected]

ABSTRACT Since E-commerce is becoming more and more popular, the number of customer reviews raises rapidly. Opinions on the Web affect our choices and decisions. Thus, it becomes necessary to automatically process a mixture of reviews and prepare to the customer the required information in an appropriate form. In the same context, we present a new approach of feature-based opinion summarization which aims to turn the customer reviews into scores that measure the customer satisfaction for a given product and its features. These scores are between 0 and 1 and can be used for decision making and then help users in their choices. We investigated opinions extracted from nouns, adjectives, verbs and adverbs contrary to previous researches which use essentially adjectives. Experimental results show that our method performs comparably to classic feature-based summarization methods.

Categories and Subject Descriptors

H.3.3 Information Search and Retrieval, I.2.7 Natural Language Processing, I.5.1 Models, I.7 Document and Text Processing

General Terms Algorithms, Measurement.

Keywords Opinion mining, Sentiment Classification, Opinion Strength, Feature-based Opinion Summarization, Feature Buzz Summary.

1. INTRODUCTION In the Web 2.0 (social or participatory Web), the user is a lead-er who shares documents, information, believes. He interacts, collaborates with others and expresses his opinion. He has at his disposal several services such as social networks (twitter, face-book, etc.), blogs, forums, wikis, video-sharing sites, photos, music, etc. The frequent use of these services produces a User Generated Content (UGC), which is today an important amount of data.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy oth-erwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. WIMS'13, June 12-14, 2013 Madrid, Spain Copyright © 2013 ACM 978-1-4503-1850-1/13/06... $10.00

This content usually consists of text data that are consi-dered more subjective than professional articles. They are carri-ers of opinions and feelings. One major field that took advantage of this potential is Opinion Mining (Sentiment Analysis or Sub-jectivity Analysis). It is a sub-field of Text Mining which pur-pose is to highlight the sources of opinions and senti-ments in text. An opinion can be defined as a point of view or a personal judgment towards an issue of discussion which takes, usually, the form of an expression of feelings from one person to an entity or an aspect of the entity [20]. In addition, e-commerce becomes more and more popular. Merchants and product manu-facturers let customers review their opinions on the products or services they sold (e.g. amazon.com, epinions.com). These re-views affect our choices and decisions. Indeed, according to a study performed by CREDOC (Research Center for the Study and Observation of Living) in 2009, 57% of French Internet users got interested in the opinions of others on the Web and 66% of them have confidence in these comments [17].

In the Opinion Mining community, there are several research fields such as subjectivity classification [31], sentiment classifica-tion [25] [38] [3], and opinion summarization [12] [23] [9]. How-ever, these approaches did not provide focus on identifying the opinion strength. Therefore, we propose a new approach of auto-matic opinion summarization which aims to turn the customer reviews into scores that measure the intensity of the customer satisfaction for a given product and its features. These scores are between 0 and 1 and can be used for decision making and then help users in their choices. We extract product features from reviews and we assign to each feature a score calculated from its frequency of occurrences in the corpus (collected reviews) weighted by its popularity in the Web 2.0, especially on Twitter1-the most popular microblogging platform. We next identify opi-nion sentences and assign to each verb and adjective a score from SentiWordNet developed by [4].

The main contribution of this research is that we do not consider a product as recommended or not recommended; but we let the customer make his choice according to certain degrees between 0 and 1 concerning the whole product and those of the features he is interested in. Besides, when computing these scores, we investi-gated nouns, adjectives, verbs and adverbs contrary to previous research which use essentially adjectives.

This is an excerpt of our output for the product iPod:

1 www.twitter.com

Page 2: [ACM Press the 3rd International Conference - Madrid, Spain (2013.06.12-2013.06.14)] Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics - WIMS

Product: iPod

Customer Satisfaction = 60%

Player: Popularity = 70%

Customer Satisfaction = 83%

Screen: Popularity = 54%

Customer Satisfaction = 62%

….

The rest of the document is organized as follow: Section 2 intro-duces the related work on opinion summarization. In section 3, we present our approach of feature based opinion summarization. In section 4 we evaluate our method in order to demonstrate its ability. In section 5 we conclude with few notes and some pers-pectives.

2. RELATED WORK [12] present a system of feature-based customer review summari-zation which uses association rule mining to extract frequent features. To identify opinion words (only adjective), the authors use WordNet [22] in conjunction with a set of manually prepared seed words. The system only extracts explicit features. Opinion words are thereafter used to extract infrequent features. These authors implement Opinion Observer [18]; a system offering a visual comparison of customer reviews of competing products along different features dimensions. They identify product fea-tures from Pros expressing positive opinion and Cons those of negative opinion.

Like [12], our task is far from traditional text summarization. We propose a structured summary ranged by product name and its related features. We do not rewrite a subsequence of the original text but we try to give a general judgment according to the cus-tomer reviews. In [12], the authors present only the number of positive and negative reviews for each feature, whereas, our system gives more details. We furnish a score revealing the cus-tomer satisfaction degree for a given product and for each feature as well. Our system is not only corpus-based. Indeed, we use the Web 2.0 at each step; for the feature decider and also to measure the feature popularity and customer satisfaction.

Several researchers have studied the problem of opinion word detection. There are corpus-based approaches [11] [36] [14] [30] and dictionary based approaches [12] [16] [15] [8] [33] [2] [7] [5]. In [12], Hu and Liu use only adjectives to detect opinions. They assign 1 to each positive adjective and 0 to negative ones accord-ing to a list of manually prepared seed words. However, adjective, verb and adverb play an important role in sentiment analysis. They are also used to express opinion and emotion in text, e.g., the verb appreciate in “I appreciate this product” conveys a posi-tive sentiment even though the sentence doesn’t contain neither adjective nor adverb. Liu et al. count the number of occurrences of each feature in Pros and Cons to predict the customer satisfac-tion [18].

Zhang and Liu proved that noun and noun phrases may also imply opinions [39]. They count the number of positive and negative sentences including the product feature using the opinion lexicon complied by [6]. Their approach achieves an average precision of about 0.44 for the extraction of features that imply opinion. In our case, we also consider that nouns may express opinion. Our sys-tem is not only corpus based like [39] but, we also use Twitter to measure the popularity and the customer satisfaction towards each

product feature. Furthermore, identifying just the polarity of the opinion may not be enough. The strength of the opinion is also required. Indeed, subjectivity is expressed in different ways; “good battery” is different from “great battery” and “excellent battery”. [37] and [26] focus on the detection of opinion strength. Wilson and Wiebe use boosting, rule learning and support vector regression [37]. [25] and [35] classify documents as “thumbs up” or “thumbs down” according to the opinion they convey. Howev-er, [26] consider generalizing to finer-grained scales. They at-tempt to infer the author's implied numerical rating using Machine Learning Techniques. In our approach, we measure the strength of the opinion using Twitter and SentiWordNet [4]. We aim to turn text into a score that measures the polarity strength and then the customer satisfaction degree. It may summarizes the whole opi-nion extracted from a huge amount of customer reviews that user is unable to analyze in numerical scores which help him for deci-sion making.

3. PROPOSED APPROACH We propose in this article two types of summaries; Feature-based Summary and Feature Buzz Summary. The Feature Buzz Sum-mary shows the relative frequency of feature mentions. It can tell a company what their customers really care about [20]. We also merge two areas of research; feature-based summarization [12] [6] [39] and Opinion strength identification [38]. Figure 1 presents the proposed approach.

Figure 1: Proposed Approach

We begin by collecting the customer reviews from the Web and proceed to the document preprocessing. Our system performs all the following steps automatically.

We remember that an opinion is an expression of feelings from one person to an entity or an aspect of the entity. An entity is a product, person, event, organization or a topic. It is represented as a hierarchy of component, sub-component and so on and each node represents a component and is associated with a set of attributes of the component [20]. Therefore, the entity itself can be also seen as a feature. An opinionated comment on the object itself is called a general opinion on the object (“I like my IPod”).

Page 3: [ACM Press the 3rd International Conference - Madrid, Spain (2013.06.12-2013.06.14)] Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics - WIMS

An opinionated comment on any specific feature is called a spe-cific opinion on a feature of the entity (“The battery of the IPod is really good”).

3.1 Document Preprocessing Liu showed that the product reviews are in three formats [20]:

Format 1 – Pros and cons: the reviewers are asked to describe Pros and Cons separately.

Format 2 – Pros, cons and detailed reviews: The re-viewers express Pros and Cons separately and also write detailed reviews.

Format 3 - Free format: The reviewers write the reviews in the free form with no separation of Pros and Cons.

In this paper, we use reviews of the “format 3”. All the examples that follow will focus on the product “iPod”. Table I presents some examples of customer reviews.

Table 1: Some examples of customer reviews ## There isn't much features on the iPod at all, except games.

##The Click Wheel is a great design, something no one else came up with (however, the iRiver has a touchpad).

##SOUND QUALITY: The iPod's sound quality is pretty good.

Our input is a database of reviews collected from the Web which represents our corpus. All the following operations are done automatically, one time for each product and without any human intervention. Given a product name, our system chooses the corresponding reviews from the database and split them into sentences. Then, it converts them to lower case and remove the non literal characters at the beginning and the end of each word (e.g. “##iPod##” becomes “ipod”). We also highlight the negation to use it later in the classification phase (e.g. “don’t” or “dont” become “do not”). In fact, [12] reveal that noun and noun phrases in the sentence are likely to be the feature that customers com-ment on. Besides, adjectives convey opinion and judgment. We therefore perform the POS (Part-of-Speech) tagging of the whole document to identify the grammatical classes of each word using TreeTagger2. We group the different tags into four categories: noun (“NN”, “NNS”), verb (“VV”, “VVD”, “VVG”, “VVN”, “VVP”, “VVZ”,”VBZ”), adjective (“JJ”, “JJR”, “JJS”) and ad-verb (“RB”) for ease of use3. We extract nouns from the reviews and we move to the feature decider phase which will be discussed in the following section.

3.2 Feature Selection We collect all nouns from the reviews and construct our stop word list.

2 http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/ 3 For more details about the different tags see http://www.ims.uni-

stuttgart.de/projekte/corplex/TreeTagger/Penn-Treebank-Tagset.pdf

Feature phrase Construction and compactness pruning: After collecting nouns from reviews, we construct noun phrases which are composed of two successive nouns. We give an example:

The Click Wheel is a great design Noun click 1 Noun wheel 2 Adjective great 3 Noun design 4 “Click wheel” is considered as a noun phrase. We extract in the same way all noun phrases but we only keep those appearing together at least 3 times in the reviews.

Frequent features: We remove sentence redundancy: if a noun appears more than one time in the same sentence, we consider as if it appears one time. We then compute frequency of occurrences in the review for all extracted nouns and we only keep those whose frequency is greater than 0.01. Here is an excerpt of the result file.

Table 2: Excerpt of the frequent feature result file

Feature Number of oc-currences

Frequency

Click wheel 9 0.07853403

Battery 30 0.2617801

Battery Life 8 0.06980803

Column 1 presents the feature. Column 2 gives the number of occurrences of the feature and column 3 is the frequency of occur-rences of that feature in the reviews.

Twitter Feature Popularity: The User Generated Content grows every day and many research works have shown that this data can be used for Opinion Mining and Sentiment Analysis [12] [6]. Twitter is the most popular microblogging service. People publish and read short messages of about 140 characters called tweets4. The length constraint of tweets encourages the use of emoticons, shortened lingo, slangs. Hence, Twitter is becoming an attractive domain of Natural Language Processing (NLP) applications [27] [28] [29] [10]. In this paper, we show how social networks, especially Twitter, can be used to detect the popularity of a given product and the-reafter used to perform the redundancy and compactness pruning. Therefore, we begin by crawling Twitter. We only search popular tweets talking about a given product. We use twitter4j5, a java library for the Twitter API, to collect almost 5000 tweets for each product posted during the last few days. We then calculate the number of tweets including each feature extracted from the re-views. Our purpose is to count the number of popular features that people are interested in for a given product and especially the number of persons who tweet about it. Table 3 shows some re-sults. The twitter feature popularity is given by this formula:

4 This is an example of tweet: “i neeed an ipod! i have a mill at

my house but of course none of them work ”. 5 http://twitter4j.org/en/index.html

Page 4: [ACM Press the 3rd International Conference - Madrid, Spain (2013.06.12-2013.06.14)] Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics - WIMS

(1)

Where fnbreTweetp is the number of popular tweets men-

tioning both the feature and the product, and nbreTweetp is

the number of popular tweets talking about the product. Table 3: Occurrences in the reviews Vs Occurrences on Twit-

ter Feature Number of occur-

rences in the review

Number of occurrences on Twitter

Song 20 480 Battery 31 60

Reputation 3 0

Storage space 2 0

After computing the number of occurrences of each extracted feature, we only keep those whose number of occurrences is greater than 1-those mentioned by at least one tweet. Twitter Compactness pruning: Various methods of similarity measures have been proposed: Dictionary based methods [24], Corpus based methods [21]) and Web based methods [1] [34]. For the compactness pruning (noun phrases pruning), we applied a similarity measure to decide if the noun phrases we collected have a meaning or not. We chose to use a web based approach. In [34], Turney defined a measure called point-wise mutual information (PMI-IR) using the page counts returned by a web search engine to recognize synonyms. It is a statistical approach to measure the similarity of a couple of word using the Web as a corpus. We adapt this approach to detect the compactness of a noun phrase using instead of the search engine page counts, the number of tweets concerning a given query. Given two words w1 and w2, we define PMI by this formula:

Where tweets(w1,w2) is the number of tweets including w1 and w2 as a compacted word (e.g., click wheel), and tweets(w1) (respectively tweets(w2)) is the number of tweets containing only w1 (respectively w2). To the best of our knowledge, this is the first research using Twitter to measure the strength of semantic association between words, i.e. to decide the compactness of a feature phrase. We prune noun phrases of PMI less than 0. After the feature decider, we move to the opinion sentence extraction.

3.3 Opinion Sentence extraction One of the goals of our system is to detect the shift of opinion in the reviews and then find out their polarity and measure the strength of the expressed opinion. In fact, people use opinion words which are located around the feature to divulge their point of view. Thus, using the feature list already detected, we extract all sentences in the reviews containing at least one feature. Let’s give an example: “iPod is brilliant, but service was awful. This sentence shows two features, “iPod” and “service”. The opinion word is “brilliant” for the first feature and “awful” for the second. All the extracted sentences must contain at least one adjective or verb which are considered as the major components of the opi-

nion. This is an example of rejected sentence: “The iPod's battery, 1.5 years”. After extracting all the opinion sentences, we move to the score computing to measure the polarity strength.

3.4 Score Computing In this section, we explain how to measure the strength of the opinion for each feature and then for the whole product. We begin by the feature score.

Feature Score: It measures the importance that people give to a certain product and its features. We consider the score of a feature as its frequency in the reviews weighted by its Twitter popularity. We assign to each feature a score using the following formula6:

α [0,1] Where is the frequency of occurrences of the feature in the reviews for a given product, and is the feature popularity on Twitter. This score weights the importance of the feature for the product. It also measures the popularity of the feature and the interest that people have in it. Let's take the example of the feature “battery”, its score is equal to 0.3442 (0.6×0.543+0.4×0.046). Twitter Sentiment Analysis: A Tweet has a maximum number of 140 characters. This length constraint, among others, encourag-es the use of emoticons to express opinion and for the most part to resume the polarity of the whole sentence. Several researches used the emoticons to predict the semantic orientation of the text and to reduce disambiguation [32] [10]. As for our work, we construct a list of emoticons and then divide the Twitter corpus into two sets, positive and negative tweets according to the emoticon it contains. Table 4 gives some examples of emoticons with their polarity.

Table 4: Examples of emoticons and their polarity

Polarity Emoticon Positive :-) :) :o) :] :3 :c) :> =]

Extremely Positive :-D :D 8D xD XD =D =3 <=3 Negative --!-- :-( :( :c :< :[ :{

Extremely Negative D: D8 D; D= DX v.v

Then, for each feature we count the number of positive and nega-tive tweets. Our basic assumption is that a feature should have a high score if it appears frequently in positive sentences. So, if the number of positive tweets is greater, we have then to increase the feature score, otherwise, we have to reduce it, as shown by the following formula:

Where scoref is the feature score, nbreTweetPos is the number of positive tweets, and nbreTweetNeg is the number of negative tweets for a given feature. Let’s take the example of the feature “battery”. Its score is equal to 0. 3442. It appears more in negative tweets (53 negative tweets and 5 positive ones), so we have to reduce its score. The score battery becomes 0.215.

6 For the experiments α = 0.6

Page 5: [ACM Press the 3rd International Conference - Madrid, Spain (2013.06.12-2013.06.14)] Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics - WIMS

Adjective and verb scores: We use SentiWordNet 3.0, a publicly available lexical resource in which each WordNet synset s is associated to three numerical scores ObjScore(s), PosScore(s) and NegScore(s), describing how the words in the synset-set of terms sharing the same meaning- are objective, positive, and negative where:

For example, “great” have six synonyms and for each one a posi-tive, negative and objective scores. Remember that our scores are between 0 and 1. The negative score belongs to the interval [0, 0.5], and the positive score be-longs to the interval [0.5, 1]. For the Opinion Strength identifica-tion we adopt this assumption; the close to 0 the more is, the more negative the opinion is and vice versa. We do not treat objective verbs or adjectives-those whose objective scores are higher than the sum of their positive and negative scores. Given a word w, the corresponding score is calculated using the following formula:

(6)

Where iwscoreS is The SentiWordNet word score given by the

algorithm below and n is the number of synset of w.

Algorithm Word_Score_Computing

Input: PosScore, NegScore //SentiWordNet scores

Begin Word_Score_Computing ObjScore=1-(NegScore+PosScore)

If (ObjScore ) then //not objective If ( ) then

iwscore = PosScore

ElseIf ( ) then

iwscore = NegScore

Else

iwscore = 1-NegScore

EndIf End If End Word_Score_Computing

Output: iwscore

Let’s take an example: “The iPod has one of the worst batteries around”. The opinionated sentence is “worst batteries”. The opinion word is “bad”. We compute its score using SentiWord-Net. It has 14 synonyms and the whole score is equal to 0.285.

Decreasing or increasing the score: Sentences may contain modifier: intensifier like “Absolutely”, “Absurdly”, “Acutely” and “Alarmingly” or diminisher like “Moderately”, “Momentarily” and “Improbably” which can be used in positive or negative

context like “Absolutely great” or “Absolutely bad”. We con-struct our proper lists of intensifier (192 terms) and diminisher (40 terms). As our scores are between 0 and 1, we chose the root square and the square to increase or diminish them. If a sentence contains a modifier preceding the verb or the adjective, we calcu-late their scores using this algorithm:

Algorithm Word_Score_Computing_Modifier

Input: wscore , IntensifierG, DiminisherG

Begin Word_Score_Computing_Modifier

If ( wscore then // the word is positive

If (Modifier IntensifierG) then

wscore = wscore

ElseIf (Modifier DiminisherG) then

wscore = wscore

EndIf ElseIf (Modifier IntensifierG) then

wscore = wscore

ElseIf (Modifier DiminisherG) then

wscore = wscore

EndIf End If End Word_Score_Computing_Modifier

Output: wscore

If there is an intensifier preceding a positive verb/adjective (score 0.5), we have to increase its score. Whereas, we have to decrease it if there is a diminisher. In the case of negative verb/adverb (score<0.5), if it is preceded by an intensifier, we have to reduce its score. Otherwise, we have to increase it. Let’s take an example: “The battery is extremely bad”. The score of bad is 0.285. As “Extremely” is an intensifier and bad is a negative word (score<0.5), the score of “extremely bad” becomes 0.081(= 0.285x0.285). We now have the features score and the words score, we can move to the sentence score computing. Sentence Score: For each feature, we compute the score of all the opinion words. The sentence score depends firstly on the scores of the verbs and adjectives it contains. These scores will be weighted by the opi-nion intensity of the adverb like exposed in the previous section. We assume that the sentence score depends also on the feature score. We remember that the feature score measures its popularity using the corpus (collected customer reviews) and Twitter. It also conveys the Twitter sentiment that people have towards this

Page 6: [ACM Press the 3rd International Conference - Madrid, Spain (2013.06.12-2013.06.14)] Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics - WIMS

feature as shown above. If a sentence contains n features, its score is given by the following formula7:

α [0,1] Let’s take the previous example: “The battery is extremely bad”. The battery score is 0.215. The sentence score is: 0.3×0.215+0.7×0.081= 0.121. Now, let’s take another example to show positive score: “The iPod’s sound is pretty good”. Here, the feature is “sound”. It’s score is equal to 0.354. It appears more in positive sentences, so the score becomes 0.525. The opinion sentence is “pretty good”. The adjective “good” have 21 synset. It’s score is 0.595. Since “Pretty” is an intensifier, the score becomes 0.771. The sentence score is 0.697 (0.3×0.525+0.7×0.771). Review Score: The score of all the review which summarizes the final opinion is computing by summing all the sentences scores. It is performed by the following formula:

Where fsscore is the score of a sentence in the review and n is

the number of sentences in the review.

4. EXPERIMENT RESULTS The proposed approach has been implemented in Java language and all the steps are performed by our system without any human intervention. We evaluate our system using real life customer reviews of four electronics products collected from Amazon.com and C|net.com and manually annotated by [12]8: 2 digital cameras, 1 cellular phone and 1 iPod. Our first purpose is to extract effec-tive product features closest to those of the manual annotation. Table 5 summarizes the precision and recall of the feature decider phase. Column 1 presents the list of products used to evaluate our system. FASTR9 is a well known and publicly available term extraction and indexing system implemented by Christian Jac-quemin. It was evaluated by [12] using the same customer reviews we use in this research and the recall and precision are shown in the column 2. Column 3 gives the precision and recall of the Hu and Liu’s (2004) system. The column 4 shows the precision and recall of our system. The precision and recall of this research are very closest to those of [12]. Also, the average F-score of the Hu and Liu’s method is 0.657. It is 0.655 for this research. Using Twitter for compactness and redundancy pruning, there is an improvement in the precision but the recall declines. This de-crease is due to the removal of a number of features that are not popular, i.e. that do not interest the majority of people on Twitter. We detect that the recall and precision of FASTR are strongly lower than those of the two proposed methods. The reason is that this system produces a large number of words that are not all

7 For the experiments = 0.3 8 This is an example of manually annotated sentence:

“battery [-2]##This is really stupid to me. 18 months for a bat-tery isn't good”. “Battery” is the feature that the customer com-ment on and “-2” is the sentence score.

9 http://www.limsi.fr/Individu/jacquemi/FASTR/

product features. The average number of terms is 377. This num-ber is about 80 in our case. Extracted terms are only noun phrases of 2 or more words. However, our methods can extract both noun and noun phrases features. Table 5 shows that our methods are more effective than FASTR and provide comparable results to the Hu and Liu’s method [12]. Our second purpose is to summarize the users’ opinion toward a certain product. Therefore, we have also extracted opinionated sentences and then computed their scores. When comparing these scores to manually annotated ones, we found that they are ex-tremely correlated to 82 %. We intend to do further experiments in order to determine the validity of our scores.

5. CONCLUSION AND FUTURE WORKS Opinions on the Web affect our choices and decisions. Thus, summarizing the entire opinion extracted from a huge amount of customer reviews that users are unable to analyze in numerical scores that help them for decision making is becoming crucial. This paper introduces a new approach of automatic summarization of customer reviews. We have shown that we can use social networks such as Twitter to highlight most relevant features the user is interested in and also to detect the popularity of that fea-tures and the user’s opinion. Experimental results show that our method performs comparably to classic feature-based summariza-tion methods. In future work, we plan to further improve our method (increase recall) and to experiment him with other entities not only prod-ucts. We also intend to turn out normal text summarization (pro-ducing a short text from a long text) on a given topic since it is obvious that summarization should include different aspects of the topic.

Page 7: [ACM Press the 3rd International Conference - Madrid, Spain (2013.06.12-2013.06.14)] Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics - WIMS

Where NRFI is the number of relevant features identified, NRF is the number of relevant features and NF is the number of features identified.

6. References [1] Akermi, I, and Faiz, R. 2012. Semantic Similarity Measure

based on Web Content. Proc. of International Conference on Web Intelligence, Mining and Semantics (WIMS'12). ACM 2012.

[2] Andreevskaia, A. and Bergler, S. 2006. Mining WordNet for fuzzy sentiment: Sentiment tag extraction from Word-Net glosses, Proceedings of EACL.

[3] Blitzer, J., Dredze, M., and Pereira, F. 2007. Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification, Proceedings of ACL 2007.

[4] Baccianella S., Esuli A., Sebastiani F., 2010. SentiWordNet 3.0 : An Enhanced Lexical Resource for Sentiment Analy-sis and Opinion Mining, Proceedings of LREC 2010.

[5] Bouchleghem R., Elkhlifi A., and Faiz R. (2010). Automat-ic extraction and classification approach of opinions in texts, ISDA 2010, IEEE Press, 918-922.

[6] Ding, X., Liu, B., and Yu, P.S. 2008. A Holistic Lexicon-Based Approach to Opinion Mining. Proceedings of WSDM.

[7] Dragut, E. C., Yu, C., Sistla, P., and Meng, W. 2010. Con-struction of a sentimental word dictionary. In Proceedings of CIKM.

[8] Esuli A., and Sebastiani, F. 2005. Determining the Seman-tic Orientation of Terms through Gloss Classification. Pro-ceedings of CIKM.

[9] Gamon, M., Aue, A., Corston-Oliver, S., Ringger, E. 2005. Pulse: Mining Customer Opinions from Free Text, In Proc. 6th Int. Symp. Advances in intelligent data analysis, 121–132.

[10] Go A., Huang L., Bhayani R. 2009. Twitter Sentiment Analysis, 2009, Final Projects from CS224N for Spring 2008/2009 at The Stanford Natural Language Processing Group.

[11] Hatzivassiloglou, V., and McKeown, K. 1997. Predicting the Semantic Orientation of Adjectives. Proceedings of ACL 1997.

[12] Hu, M., Liu, B. 2004. Mining and summarizing customer reviews. KDD 2004: 168-177.

[13] Harris, Z. S. 1998. Mathematical structures of language. Interscience tracts in pure and applied mathematics, no.21, New York: Interscience Publishers. ix,230 p.

[14] Kanayama, K., Nasukawa, T. 2006. Fully Automatic Lex-icon Expansion for Domain-Oriented Sentiment Analysis. Proceedings of EMNLP 2006.

[15] Kamps, J., Marx, M., Robert J. M., and Rijke, M. 2004. Using WordNet to measure semantic orientation of adjec-tives. Proceedings of LREC 2004.

[16] Kim, S.M., and Hovy, E. 2004. Determining the Sentiment of Opinions. Proceedings of C OLING 2004.

[17] Lehuédé, F. 2009. L’internet participatif redonne confiance aux consommateurs.

[18] Liu, B., Hu, M., and Cheng, J. 2005. Opinion observer: Analyzing and comparing opinions on the web, Proceed-ings of WWW.

[19] Liu, B. 2007. Web Data Mining Exploring Hyperlinks, Contenrs, and Usage Data, Springer, New York.

[20] Liu, B. 2010. Invited Chapter for the Handbook of Natural Language Processing, Second Edition. March, 2010.

[21] Mihalcea, R., Corley, C., and Strapparava, C. 2006. Cor-pus-based and knowledgebased measures of text semantic similarity, in Proceedings of the 21st national conference on Artificial intelligence - Volume 1, pages 775—780, AAAI Press.

[22] Miller G. 1995. WordNet: A lexical database for English. Proceedings of ACM 38, p. 39-41.

[23] Popescu, A. M., Etzioni, O. 2005. Extracting Product Features and Opinions from Reviews, In Proc. Conf. Hu-man Language Technology and Empirical Methods in Nat-ural Language Processing, Vancouver, British Columbia, 339–346.

[24] Pedersen, T., and Patwardhan, S. and Michelizzi, J. 2004. WordNet::Similarity: measuring the relatedness of con-cepts, Association for Computational Linguistics, 2004.

[25] Pang, B., Lee, L., Vaithyanathan, S. 2002. Thumbs up? Sentiment Classification Using Machine Learning Tech-niques, In Proc. Conf. Empirical Methods in Natural Lan-guage Processing, 79-86.

Product FASTR Hu and Liu Feature Selec-tion

Feature Selection (this research)

Feature Selection (using Twitter)

Precision Recall Precision Recall Precision Recall Precision Recall iPod -- -- -- -- 0.702 0.697 0.774 0.518

Digital camera1

0.189 0.031 0.634 0.658 0.617 0.679 0.763 0.57

Digital camera 2

0.187 0.044 0.679 0.594 0.69 0.58 0.737 0.528

Cellular phone

0.149 0.027 0.676 0.716 0.556 0.731 0.745 0.513

Average 0.175 0.034 0.663 0.656 0.641 0.671 0.754 0.532

Table 5: Precision and recall of the proposed method Vs Hu and Liu's method and FASTR

Page 8: [ACM Press the 3rd International Conference - Madrid, Spain (2013.06.12-2013.06.14)] Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics - WIMS

[26] Pang, B., and Lee, L. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the Association for Compu-tational Linguistics (ACL), pages 115–124.

[27] Pak A., Paroubek P. 2011. Twitter for Sentiment Analysis: When Language Resources are Not Available. DEXA Workshops 2011: 111-115

[28] Pak A., Paroubek P. 2010a. Twitter as a Corpus for Senti-ment Analysis and Opinion Mining. LREC 2010 .

[29] Pak A., Paroubek P., 2010b. Twitter based system: Using Twitter for Disambiguating Sentiment Ambiguous Adjec-tives, SemEval 2010. Proceedings of International Work-shop of Semantic Evaluations.

[30] Qiu, G., Liu, B., Bu, J. and Chen, C. 2009. Expanding Domain Sentiment Lexicon through Double Propagation. Proceedings of IJCAI 2009.

[31] Riloff, E., Janyce, W., Theresa, W. 2003. Learning Subjec-tive Nouns Using Extraction Pattern Bootstrapping, In Proc. 7th Conf. Natural Language Learning, 25-32.

[32] Read J. 2005. Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification, ACL, The Association for Computer Linguistics.

[33] Takamura, H., Inui, T., and Okumura, M. 2007. Extracting Semantic Orientations of Phrases from Dictionary. Pro-ceedings of HLT-NAACL.

[34] Turney, P. 2001. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL”. Machine Learning: ECML 2001, pages 491–502.

[35] Turney, P. 2002.Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Re-views. Proceedings of ACL 2002.

[36] Wiebe,J. 2000. Learning Subjective Adjectives from Cor-pora. Proceedings of AAAI 2000.

[37] Wilson, T., Wiebe, J., Hwa, R. 2004. Just how mad are you? Finding strong and weak opinion clauses. Proceedings of AAAI.

[38] Wilson, T., Wiebe, J., and Hoffmann, P. 2005. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP.

[39] Zhang, L., and Liu, B. 2011. Identifying Noun Product Features that Imply Opinions. ACL (Short Papers) 2011, p 575-580