Wide-Ranging Review Manipulation...

Department of Computer Science and EngineeringTexas A&M University

CIKM 2019

Parisa Kaghazgaran, Majid Alfifi, James Caverlee

Wide-Ranging Review Manipulation Attacks:

Model, Empirical Study and Countermeasure

User Reviews are Everywhere

Online Retailers

Business Review Forums

… and so are targets of manipulation.

Media Platforms

!2

Amazon Headphones, 2019

!3

Amazon Headphones, 2019

!4

Crowd-based Manipulation Campaigns

Crowdsourcing websites

✓Read the product description before writing down a review.

✓Go to https://goo.gl/7QfW0h✓Leave a relevant 5-star review with at

least 40 words.✓Provide the proof that you left the

review your self.

Target Review Platforms

!5

Realistic Reviews: Written by Humans

• Reviews arrive synchronized in time.e.g., Mukherjee et al. 2013, Akoglu et al. 2015, Kaghazgaran et al. 2019

• Dense community over co-review graph.e.g., Kijung et al. Shin 2017, Kaghazgaran et al. 2018

Typical detection approaches:

Manipulators can launch scalable and difficult to detect attacks by removing manipulation traces.

!6

Crowd vs. Machine

!7

Review PlatformsCrowdsourcing

websites

Crowdworkers

Attack Strengths• Scalability: do not rely on paying workers.• Increased deception: obfuscate signals left by crowd campaigns.

!8

How machine produces human readable reviews?

Seminal work Yao, Yuanshun, et al. "Automated crowdturfing attacks and defenses in online review systems." CCS, ACM, 2017.

• Domain dependent reviews.e.i., training process needs to be replicated for any domain, needs large training data for each domain

• Character-level language model.e.i., capture longer dependency, learn spelling in addition to semantics, grammatically more error-prone

Leveraging neural language models for the specific domain of restaurant reviews at Yelp

Our Proposed DIOR Framework

1.WeproposeDIORforDomainIndependentOnlineReviewGenera9on.

2.Empiricalstudytoevaluatethequalityofmachine-generatedreviews.

3.Embedding-basedclassifiertodetectsuchfakereviews.

!9

Goal:validatethewide-ranginga2acksonreviewpla7ormsandproposetodetectthem.

!10

Neural Language Models (A quick refresher)

• Recurrent Neural Networks have shown success in generating meaningful text.

• They learn from a sequence of words to predict the next word

Ht

ot

xt

Each word in the review as input

Learn the information from sequence until time-step t

Predict the next word in the review

x( < = t)(P(xt+1 |x1, …, xt)) H0

o0

x0

H1

o1

x1

H2

o2

x2

H3

o3

x3

I

ate

ate

at this restaurant

At thisIn generation step, the predicted word at time-step t is feed back to the model as input along with hidden state to predict the next word

!11

RQ1: Can we generate reviews across different domains?

I ’ve eaten here about 8 times . I ’ve been introduced to this place . Its always busy and their food is consistently great . I LOVE their food , hence the name . It is so clean , the staff is so friendly , and the food is great . I especially like the chicken pad thai , volcano roll , and the yellow curry .

the case works great ! it has a soft rubber insert that goes over the hard shell . The hard plastic shell has a soft inner shell and the hard case is hard plastic . It is very sticky and has not fallen out or dropped or fallen apart .

this app is a great tool for discovering new things : being able to search for films and putting reviews on particular items as well as having a way to download stories from the app .

!12

Yelp

LSTM3

wt

Encoder

word to id

Input word

id

LSTM2

Decoder

id to word

wt+1

Next word

LSTM1id

Embedding of input word (400)

ht1(1150)

ht2(1150)

ht3(400)Embedding of output word

Transfer Learning to the Rescue!

Amazon

Universal Model

Transferred Model

App Store

Universal model parameters —>

Transferred model parameters —>

θYelp

θAmazon

θAppStore

!13

Example of Synthetic Reviews

I ’ve eaten here about 8 times . I ’ve been introduced to this place . Its always busy and their food is consistently great . I LOVE their food , hence the name . It is so clean , the staff is so friendly , and the food is great . I especially like the chicken pad thai , volcano roll , and the yellow curry .

this is a nice case . It ’s a little difficult to remove , but that ’s to be expected . The case is slightly thicker than a regular screen protector , but that is to be expected . It ’s a great phone case and I highly recommend it .

this app is great for learning the basics of math ! I love that it has a different function that can help you learn the words that you understand . I wish all apps were this simple .

Temperature

Labe

led

“Rea

l” (%

)

020406080

0.2 0.4 0.6 0.8 1.0

Labe

led

“Rea

l” (%

)

022.5

4567.5

90

0.2 0.4 0.6 0.8 1.0

Labe

led

“Rea

l” (%

)

020406080

0.2 0.4 0.6 0.8 1.0

Yelp

Amazon

App Store

!14

RQ2: Can Model-generated Reviews Pass Human Test?

Takeaway 1: Reviews generated at temperature 0.8 can fool human readers and go undetected.Takeaway 2: Human readers are more sensitive to repetition errors than they are to small grammar mistakes.

AMT Guidelines

• 95% approval rate

• Dwell time >= 7 minutes

• Located in US

• Ask for a trivial question

!15

RQ3: Can Spam Detector Catch Model-generated Reviews?

Takeaway: The textual-based spam detector does not distinguish synthetic reviews from real reviews.

Yelp Amazon App Store

Accuracy (%) 64 61 62

Precision (%) 65 64 62

Recall (%) 65 61 62

F1 score (%) 65 60 62

Textual Features

• Similarity

• Structural

• Syntactic

• Semantic

!16

RQ4: How DIOR Works versus Crowd Manipulators?

Takeaway: Users find reviews generated by DIOR as reliable as fake reviews written by manipulation campaigns.

DIOR31%

Neither37%

Crowd32%

!17

RQ5: How DIOR works versus Individual Models?

Takeaway: Using transfer learning not only facilitate the domain shift but also improves the performance significantly.

Pref

eren

ce (%

)

0

25

50

75

100

0

25

50

75

100Amazon App Store

TransferredModel Transferred

Model

IndividualModel

IndividualModelBoth Both

!18

RQ6: How Much Training Data for Transferred Model?

Takeaway: The transferred models need reasonably low number of samples compared to universal model to reach stable performance.

Valid

atio

n Lo

ss

3

3.2

3.4

3.6

25k 50k 75k 100k 125k 150k 200k

App Store Amazon

Training Size

!19

RQ7: How We Can Detect Model-generated Reviews?

Takeaway: Model-generated reviews are detectable in the embedding space with high accuracy.

• Embedding based Classifier

Conclusion and Next Steps

!20

Explored how transferred learning technology could lead to a wide-ranging review manipulation attacks.

Proposed DIOR framework demonstrates:

(1) Model-generated reviews can be perceived as real by human examiners, pass the traditional textual-based spam detectors, and beat the crowd-based review manipulators.

(2) Fake reviews tend to cluster together in the embedding space that provide the intuition for our proposed discriminator.

Next steps: study the performance of other neural network architectures to develop more powerful discriminator.

[email protected]

http://people.tamu.edu/~kaghazgaran/

Wide-Ranging Review Manipulation...

Documents

Transcript of Wide-Ranging Review Manipulation...