Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer...

29
“Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics” Authors: Anindya Ghose, Panagiotis G. Ipeirotis, Member, IEEE Course: Topics in Data mining Presenter: Nobal Niraula December 8, 2010 @ UOM 1

description

This talk is made by reading a very good journal paper: "Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics"

Transcript of Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer...

Page 1: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

1

“Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics”

Authors: Anindya Ghose, Panagiotis G. Ipeirotis, Member, IEEE

Course: Topics in Data miningPresenter: Nobal Niraula

December 8, 2010 @ UOM

Page 2: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

2

Introduction Gathering variables (Attributes) Explanatory study using Econometric

Regression◦ Hypothesis for sales◦ Hypothesis for perceived usefulness

Prediction◦ Helpfulness◦ Impact on sales

Conclusion

Outline

Page 3: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

3

Product related word-of-mouth conversations in online markets

Reviewers contribute time and energy Volume of review could be high Benefits

◦ Customers: Usefulness / Helpfulness Average Star Rating Bimodel Peer Review Biased Helpfulness = helpful votes / total votes “Spotlight Review” in Amazon.com

◦ Manufacturers: Influence on Sales Helpful reviews are not necessarily the ones that lead to increases

in sales ! Reviews that affect most should be presented first to

manufacturers

Introduction (1)

Page 4: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

4

The paper is unique in looking at how subjectivity level, readability and spelling errors in the text of reviews affect product sales and the perceived helpfulness of these reviews.

Introduction (2)

Page 5: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

5

Two Level Study◦ Explanatory Econometric Analysis

Identify aspects of a review a reviewer

◦ Prediction Model using “Random Forests” How peer consumers are going to rate a review How sales will be affected by the posted review

Predicting Helpfulness and Importance

Page 6: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

6

Product Reviews

Page 7: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

7

Sample Review

Page 8: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

8

Reviewer’s Profile

Page 9: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

9

Product Rank

Page 10: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

10

Variables Collection Products

◦ Audio and video players (144 products),◦ Digital cameras (109 products), and◦ DVDs (158 products).

Product and Sales Data :Retail Price, Sales Rank, Average Rating, Number of Reviews, Elapsed Date

Reviewer History: Number of Past Reviews, Reviewer History Micro, Reviewer History Micro, Past Helpful Votes, Past Total Votes

Reviewer Characteristics: Reviewer Rank, Top-10 Reviewer, Top-50 Reviewer, Top-100 Reviewer, Top-500 Reviewer, Real Name, Nick Name, Hobbies, Birthday, Location, Web page, Interests, Snippet, Any Discloser

Individual Review: Moderate Review, Helpful Votes, Total Votes, Helpfulness

Review Readability : Length(Chars), Length (Words), Length(Sentence), Spelling Error, ARI, Gunning Index, Coleman–Liau index, Flesch Reading Ease, Flesch–Kincaid Grade Level, SMOG

Review Subjectivity: AvgProb, DevProb

Page 11: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

11

Readability Analysis◦ Automated Readability Index◦ Coleman-Liau Index◦ Flesch-Kincaid Grade Level◦ Gunning fog index◦ SMOG

Subjectivity Analysis◦ Stylistic Choices : “Subjective” vs “Objective”◦ Each document gets a “Subjectivity Score”

AvgProb (r) : High value Many Subjective sentences DevProb (r) : High Value Mixed (Subj+Obj) sentences

Text of a Review Matters !

Page 12: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

12

Hypothesis 1a: ◦ All else equal, a change in the subjectivity level and

mixture of objective and subjective statements in reviews will be associated with a change in sales.

Hypothesis 1b: ◦ All else equal, a change in the readability score of

reviews will be associated with a change in sales.

Hypothesis 1c: ◦ All else equal, a decrease in the proportion of spelling

errors in reviews will be positively related to sales.

Hypothesis for Sales

Page 13: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

13

ln(D) = a + b * ln(S)◦ D is the unobserved product demand◦ S is its observed sales rank◦ Pareto Distribution◦ High sales rank low demand

Key Observation:◦ “Sales rank” in Amazon.com can be taken as

PROXY of Demand !

Effect on Product Sales

Page 14: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

14

Descriptive Statistics for Econometric Analysis

Page 15: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

15

Model to test Hypothesis1

μk is a product fixed effect that accounts for unobserved heterogeneity across products and εkt is the error termControl Variables: Retail Price, Avg. Numeric Rating, Elapsed Date, Number of Reviews

Page 16: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

16

Empirical Results for Product Sales

Note: 1. (-ve) decrease Sales Rank Increase Sales2. Variables that Increase Sales: AvgProb, Readability, Spelling

Errors3. Variables that Decrease Sales: Retail Price, DevProbAlso: Reviews with Rating < =2 are associated with increased sales

Page 17: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

17

Hypothesis 1a:◦ High subjective sentences increase sales◦ Mixture of subjective and objective sentences are negatively associated

with product sales compared to highly subjective and objective sentences.

Hypothesis 1b:◦ Higher readability scores are associated with higher sales

Hypothesis 1c◦ An increase in proportion of spelling mistakes decreases product sales for

some “experience products” like DVDs however the proportion of spelling errors doesn’t have significant impact on sales for “search products”

Reviews with that rate products negatively can be associated with increased product sales when the review text is informative and detailed !!!

Conclusion (1)

Page 18: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

18

Hypothesis 2a: ◦ All else equal, a change in the subjectivity level and mixture of

objective and subjective statements in a review will be associated with a change in the perceived helpfulness of that review

Hypothesis 2b: ◦ All else equal, a change in the readability of a review will be

associated with a change the perceived helpfulness of that review.

Hypothesis 2c: ◦ All else equal, a decrease in the proportion of spelling errors in a

review will be positively related to perceived helpfulness of that review.

Hypothesis 2d: ◦ All else equal, an increase in the average helpfulness of a

reviewer’s historical reviews will be positively related to perceived helpfulness of a review posted by that reviewer.

Hypothesis for Helpfulness

Page 19: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

19

Effect on Helpfulness

μk is a product fixed effect that controls differences in the average helpfulness of reviews across products and εkt is the error term

Page 20: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

20

Empirical Results for Helpfulness

Note:

(-ve) Lower Helpfulness

Negative Relations:AvgProb, Spelling Error, Moderate

Positive Relations: DevProb **, Disclosure, Readability, Reviewer History Macro, Number of Reviews

Page 21: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

21

Hypothesis 2a:◦ In general, mixture of subjective and objective elements more

informative (helpful) by the users.◦ For feature-based goods users prefer reviews having more objective

information and less subjective sentences ◦ For experience goods, e.g. DVD, users expect few objective

sentences but more subjective sentences

Hypothesis 2b – 2d : ◦ Increase in the readability of reviews has a positive and statistically

impact on review helpfulness◦ An increase in proportion of spelling errors has a negative and

statistically significant impact review helpfulness for audio-video products and DVDs.

◦ Past historical information about reviewers has a statistically significant effect on the perceived helpfulness of reviews

Conclusion (2)

Page 22: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

22

Main goal◦ Is the review informative or not ?◦ Does the review impact on sales or not ?

Question: given a helpfulness value of a review, decide whether it is useful or not◦ Helpfulness = (Helpful votes/ Total votes)◦ Continuous to binary conversion◦ Threshold found is 60 %

Classification◦ Regression Model can be used◦ Binary Classification

Predictive Modeling

Page 23: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

23

Classifiers◦ SVM VS Random Forest

SVM consistently performed worse unlike reported in reports

Training time for SVM was significantly higher than that of Random Forest

Predicting Helpfulness (1)

Page 24: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

24

Predicting Helpfulness (2)

Page 25: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

25

Examining whether the difference SalesRankt(r)+T − SalesRankt(r) where t(r) is the time the review is posted, is positive or negative.

Predicting Impact on Sales

Page 26: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

26

Random Forest based prediction◦ For experience goods such as DVDs classifier has

lower performance◦ Observed high correlation of “classification error”

with “distribution of review ratings”◦ Reviews that have received widely fluctuating

ratings also have reviews with widely fluctuating helpfulness votes.

◦ Highly detailed and readable reviews can have low helpfulness votes

◦ “reviewer-related”, “review subjectivity” and “review readability” features sets are interchangeable!

Conclusion (3)

Page 27: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

27

Subjectivity level, readability and spelling errors in the text of reviews affect product sales and the perceived helpfulness

Overall Conclusion

Page 28: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

28

Anindya Ghose, Panagiotis G. Ipeirotis, "Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics," IEEE Transactions on Knowledge and Data Engineering, vol. 99, no. PrePrints, , 2010

References

Page 29: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

29

Thank You !