Spot deceptive TripAdvisor Reviews

23
Spot Deceptive TripAdvisor Hotel Reviews By: Yousef Fadila Project Notebook: https :// github.com/yousef-fadila/cs548-project-5/blob/master/noteboo k.ipynb CS548: Text Mining Project

Transcript of Spot deceptive TripAdvisor Reviews

Page 1: Spot deceptive TripAdvisor Reviews

Spot Deceptive TripAdvisor Hotel Reviews

By: Yousef FadilaProject Notebook:

https://github.com/yousef-fadila/cs548-project-5/blob/master/notebook.ipynb

CS548: Text Mining Project

Page 2: Spot deceptive TripAdvisor Reviews

Motivation - Fake reviews in the news TripAdvisor warns of hotels posting fake reviews

http://abcnews.go.com/Technology/story?id=8094231

Twitter campaign takes aim at fake restaurant reviews on TripAdvisorhttps://www.theguardian.com/travel/2015/oct/24/twitter-campaign-targets-fake-tripadvisor-restaurant-reviews

Page 3: Spot deceptive TripAdvisor Reviews

DatasetsDeceptive Opinion Spam

CorpusTripAdvisor Hotel-reviews

Consists of:400 deceptive positive reviews400 deceptive negative reviews⇒ From Amazon Turks400 truthful positive reviews 400 truthful negative reviews⇒ From Trusted users in TripAdvisor

Consists of:878561 reviews from 4333 hotels crawled from TripAdvisor.⇒ Includes meta-data. (hotel name, rating, stars, location..)

Page 4: Spot deceptive TripAdvisor Reviews

OutlineGuiding Questions:

1. Which is more prevalent, positive deceptive or negative deceptive reviews among the 200,000 sample reviews?

2. What star-rating of hotels most commonly has deceptive reviews? Who are the top ten hotels with deceptive positive reviews?

3. Is there enough support to claim that deceptive positive reviews are used to cover previous negative reviews?

Extra:4. Would a 2-step approach based on domain knowledge (like the one presented on

anomaly detection showcase) improve the accuracy of the text classification model? 5. Demo: Try it yourself.6. Are computers better than Humans in detecting deceptive reviews?

Page 5: Spot deceptive TripAdvisor Reviews

Text Classification Model1. (1,3) n_grams

2. min_df=3

3. max_df=0.96

4. LinearSVC classification.

Page 6: Spot deceptive TripAdvisor Reviews

Positive deceptive vs. negative deceptive ratio1. Which is more prevalent, positive deceptive or negative deceptive reviews among

the 200,000 sample reviews?

Answer:

Positive deceptive reviews are moreprevalent.

Page 7: Spot deceptive TripAdvisor Reviews

Hotel Stars-Rating vs. Deceptive reviews rate1. What star rating of hotels most commonly has deceptive reviews? who are the top

hotels according deceptive positive ratio reviews?

Top “deceptive” Hotels:

********Inn Houston ******** York Hotel ********ose Hotel ********a Inn Houston Wirt Road ********lmonico

Page 8: Spot deceptive TripAdvisor Reviews

Frequent Sequences Leads to Positive Deceptive Reviews

1. Pick up 20 hotels with deceptive reviews2. Export all reviews of the selected hotels to arff file3. Set sequence Id to hotel Id. 4. Run GSP algorithm in Weka.

Page 9: Spot deceptive TripAdvisor Reviews

2 Step Approach1. Would a 2-step approach based on domain knowledge (like the one presented on anomaly

detection showcase) improve the accuracy of the text classification model?

What features could be usedto distinguish deceptive fromtruthful?

False Positive vs False Negative.

Supervised vs Unsupervised

Page 10: Spot deceptive TripAdvisor Reviews

Content Based FeaturesSome online reviews are too good to be true; Cornell computers spot 'opinion spam' http://bit.ly/2g6ou9X"The researchers then applied computer analysis based on subtle features of text. Truthful hotel reviews, for example, are more likely to use concrete words relating to the hotel, like "bathroom," "check-in" or "price." Deceivers write more about things that set the scene, like "vacation," "business trip" or "my husband." Truth-tellers and deceivers also differ in the use of keywords referring to human behavior and personal life, and sometimes in features like the amount of punctuation or frequency of "large words." In parallel with previous analysis of imaginative vs. informative writing, deceivers use more verbs and truth-tellers use more nouns."

Features to extract from the review text:1)amount of punctuation2)total nouns - total verbs3)length of the review.4)adjective and adverbs ratio

Page 11: Spot deceptive TripAdvisor Reviews

Unsupervised AD Followed by supervised classifier

No Improvement!

Page 12: Spot deceptive TripAdvisor Reviews

2nd Try: One Single Step Supervised ModelMerge both “bag of words” features and the content based extracted features together for supervised classifier.

No Improvement!

Page 13: Spot deceptive TripAdvisor Reviews

3rd Try: Change Topology2 supervised text

classification models.

Positive-negative based only on “bag of words”.

Deceptive-truthful uses both bag of words andcontent based features.

Page 14: Spot deceptive TripAdvisor Reviews

3rd Try: Change Topology - Result Overall

Improvement by 7%!

Page 15: Spot deceptive TripAdvisor Reviews

Demo: Try it yourselfwww.yousef.fadila.net/cs548

REST API:POST REQUEST to:www.yousef.fadila.net/cs548/review_checker

Payload: {'review_text': text}

Sample response:{"result": "Likely Fake" }

Page 16: Spot deceptive TripAdvisor Reviews

Computers vs. Humans Are computers better than Humans in detecting deceptive reviews?

Survey of WPI students

74 WPI students responded

Students were given 5 positive reviews and were asked to decide whether they are truthful or deceptive reviews

The list intentionally includes reviews that weren’t classified correctly using the model from 1st experiment

Page 17: Spot deceptive TripAdvisor Reviews

Computers vs. Humans 1 Computers Humans

1 1

Page 18: Spot deceptive TripAdvisor Reviews

Computers vs. Humans 1 Computers Humans

1 1

1 0

Page 19: Spot deceptive TripAdvisor Reviews

Computers vs. Humans 1 Computers Humans

1 1

1 0

0 0

Page 20: Spot deceptive TripAdvisor Reviews

Computers vs. Humans 1 Computers Humans

1 1

1 0

0 0

1 1

Page 21: Spot deceptive TripAdvisor Reviews

Computers vs. Humans Computers Humans

1 1

1 0

0 0

1 1

1 1

Page 22: Spot deceptive TripAdvisor Reviews

Computers vs. Humans - Result This is not a scientific study nor a

statistical one!

This is only a game! In fact it is unfair game as we use reviews from the dataset we train the model on them!

The purpose of the game is to show if humans truth bias,assuming that what they are reading is true until they find evidence to the contrary, could affect their ability to spotdeceptive reviews.

Computers Humans

1 1

1 0

0 0

1 1

1 1

4 3

Page 23: Spot deceptive TripAdvisor Reviews

Any Questions?