Understanding, Characterizing, and Detecting Facebook Like Farms

27
Understanding, Characterizing, and Detecting Facebook Like Farms Cambridge, 22 March 2016 Emiliano De Cristofaro https://emilianodc.com

Transcript of Understanding, Characterizing, and Detecting Facebook Like Farms

Page 1: Understanding, Characterizing, and Detecting Facebook Like Farms

Understanding, Characterizing, and Detecting

Facebook Like Farms

Cambridge, 22 March 2016

Emiliano De Cristofarohttps://emilianodc.com

Page 2: Understanding, Characterizing, and Detecting Facebook Like Farms

2

Facebook and Ads1 billion users, $3 billion ad revenue

Brands create “page” to engage customers

FC Barcelona (90M+ likes), Shakira (100M+)40M+ small businesses with active pages, 2M of them use ads to promote

Value of a “like” highly debatable...From $214.81 (Blackbaud) to $8 (ChompOn)http://valueofalike.com/

Page 3: Understanding, Characterizing, and Detecting Facebook Like Farms

3

How can I get “likes”?Facebook page ads

Cost-per-click or cost-per-impressionVariable price and amount

Like Farms, including:boostlikes.comsocialformula.comauthenticlikes.commammothsocials.com

Reports that “farmers” like a lot of other pages too...

Page 4: Understanding, Characterizing, and Detecting Facebook Like Farms

4

Calls for a measurement study...Created 13 Facebook honeypot pages

“Virtual Electricity” Description: “this is not a real page, please do not like it”

Two promotion methods1. Like farms2. Facebook ads

Anonymized data collection, ethical approval, …

E. De Cristofaro, A. Friedman, G. Jourjon, M.A. Kaafar, M. Zubair Shafiq. Paying for Likes? Understanding Facebook Like Fraud Using Honeypots. ACM IMC 2014.

Page 5: Understanding, Characterizing, and Detecting Facebook Like Farms

5

Provider Location Budget Duration #Likes

1 Facebook USA $6/day 15 days 322 Facebook France $6/day 15 days 443 Facebook India $6/day 15 days 5184 Facebook Egypt $6/day 15 days 6915 Facebook Worldwide $6/day 15 days 4846 BoostLikes Worldwide $70 15 days -7 BoostLikes USA $190 15 days 6248 SocialFormula Worldwide $14 3 days 9849 SocialFormula USA $70 3 days 738

10 AuthenticLikes Worldwide $50 3-5 days 75511 AuthenticLikes USA $60 3-5 days 103812 MammothSoci

als Worldwide $20 - -

13 MammothSocials USA $95 5 days 317

Page 6: Understanding, Characterizing, and Detecting Facebook Like Farms

6

Temporal AnalysisSocialFormula campaign acquires likes in a short time window

Bot-operated (“lock-step” behavior)

BoostLikes campaign acquires likes gradually

Manual process or deliberately slow to avoid suspicion

Page 7: Understanding, Characterizing, and Detecting Facebook Like Farms

7

Location AnalysisSocialFormulalikes from Turkey

AuthenticLikes spread out across many countries

Page 8: Understanding, Characterizing, and Detecting Facebook Like Farms

8

Social Graph AnalysisAuthenticLikes and MammothSocials have some common users

BoostLikes likers are well-connected

Page 9: Understanding, Characterizing, and Detecting Facebook Like Farms

9

Like AnalysisLike farm profiles like a lot of pages (median 1-2K)

Exception: BoostLikes worldwide campaign

Facebook campaign likers also like a lot of pages (median 800-1200)

Page 10: Understanding, Characterizing, and Detecting Facebook Like Farms

10

Like AnalysisLikers tend to like similar pages

Many likers like popular pagesFootball starsMobile phonesTech companies

Page 11: Understanding, Characterizing, and Detecting Facebook Like Farms

11

Two Main Modi OperandiSome farms seem to be operated by bots and do not try to hide

Bursts of activity, few friends

Some are stealthier Mimic real behavior, well-connected network structure

Page 12: Understanding, Characterizing, and Detecting Facebook Like Farms

12

Facebook DetectionRevisited liker accounts after 1 month

Account termination reasons by user or Facebook

A small fraction of liker accounts terminated

Provider Location #Likes #Closed

1 Facebook USA 32 02 Facebook France 44 03 Facebook India 518 24 Facebook Egypt 691 65 Facebook Worldwide 484 36 BL Worldwide - -7 BL USA 624 18 SF Worldwide 984 119 SF USA 738 9

10 AL Worldwide 755 811 AL USA 1038 3612 MS Worldwide - -13 MS USA 317 9

Page 13: Understanding, Characterizing, and Detecting Facebook Like Farms

13

Detecting Fake Likes?Temporal burst of likes

[WWW’13, KDD’14] – CopyCatch algorithm

Cluster based on similar actions[CCS’14] – SynchroTrap algorithm

Like distributions (spatial)[USENIX’14] – PCA anomaly detection

Facebook essentially applies graph co-clustering fraud detection (SynchroTrap, CopyCatch)

Page 14: Understanding, Characterizing, and Detecting Facebook Like Farms

14

Efficacy of Co-Clustering?Pretty sure it works well on non-stealthy, but what about the stealthy farms?

Let’s measure this stuff up (again!)Well, first we need to re-crawl

M. Ikram, L. Onwuzurike, S. Farooqi, E. De Cristofaro, A. Friedman, G. Jourjon, M.A. Kaafar, M. Zubair Shafiq. Combating Fraud in Online Social Networks: Characterizing and Detecting Facebook Like Farms. Available from http://arxiv.org/abs/1506.00506

Page 15: Understanding, Characterizing, and Detecting Facebook Like Farms

15

New DatasetCampaign #Users #Pages

Liked #Unique #Posts

BL-USA 583 79,025 37,283 44,566

SF-ALL 870 879,369 108,020 46,394

SF-USA 653 340, 964 75,404 38,999

AL-ALL 707 162,686 46,230 61,575

AL-USA 827 441,187 141,214 30,715

MS-USA 259 412,258 141,262 12,280

Baseline 1,408 79,247 57,384 34,903

Page 16: Understanding, Characterizing, and Detecting Facebook Like Farms

16

Co-Clustering ResultsCampaign TP FP TN FN Precision Recall F1

AL-USA 681 9 569 4 98% 99% 99%AL-ALL 448 53 527 1 89% 99% 94%BL-USA 523 588 18 0 47% 100% 64%SF-USA 428 67 512 1 86% 100% 94%SF-ALL 431 48 530 2 90% 99% 95%MS-USA 201 22 549 2 90% 99% 93%

Page 17: Understanding, Characterizing, and Detecting Facebook Like Farms
Page 18: Understanding, Characterizing, and Detecting Facebook Like Farms

18

But... BL-USA

Page 19: Understanding, Characterizing, and Detecting Facebook Like Farms

19

What if we use lexical features?Posts and comments from timelines

Term frequency-inverse document frequency (TF-IDF)Using TF-IDF features, train SVM

Not the best!Campaign Total

Users Training

Set Testing

Set TP FP TN FN Precision Recall Accuracy F1

AL-USA 827 661 204 103 9 229 101 92% 50% 75% 65%AL-ALL 707 566 141 101 1 237 40 99% 72% 89% 83%BL-USA 583 468 115 78 1 237 37 99% 68% 89% 80%SF-USA 652 522 130 83 0 238 47 100% 89% 73% 84%SF-ALL 870 697 173 128 3 235 45 88% 98% 74% 84%MS-USA 259 210 49 32 5 233 17 86% 65% 92% 74%

Page 20: Understanding, Characterizing, and Detecting Facebook Like Farms

20

How can we characterize timeline features?Idea: let’s look at users’ interaction with posts and their lexical features

First, we look at types of posts on timelines

Page 21: Understanding, Characterizing, and Detecting Facebook Like Farms

21

#comments per post

Page 22: Understanding, Characterizing, and Detecting Facebook Like Farms

22

#likes per post

Page 23: Understanding, Characterizing, and Detecting Facebook Like Farms

23

#shared posts

Page 24: Understanding, Characterizing, and Detecting Facebook Like Farms

24

#words per post

Page 25: Understanding, Characterizing, and Detecting Facebook Like Farms

25

Lexical AnalysisCampaign Avg

#CharsAvg

#WordsAvg

#Sentences

AvgSentence

Length

AvgWord

LengthRichness ARI Flesch

Score

Baseline 4,477 780 67 6.9 17.6 0.7 20.2 55.1

AL-ALL 2,835 464 32 6.2 13.9 0.59 14.8 43.6

AL-USA 2,475 394 33 6.2 12.7 0.49 14.1 54

BL-USA 7,356 1,330 63 5.7 22.8 0.58 16.9 51.5

MS-USA 6,227 1,047 66 6.1 17.8 0.53 16.2 50.1

SF-ALL 1,438 227 19 6.3 11.7 0.58 14.1 45.2

SF-USA 1,637 259 22 6.3 12 0.55 14.4 45.6

Page 26: Understanding, Characterizing, and Detecting Facebook Like Farms

26

Page 27: Understanding, Characterizing, and Detecting Facebook Like Farms

27

ConclusionLike farms are widespread

Moderate fraud impact in isolation, but unclear how much fraudsters mess with advertising platform

Some are easy to spot, some less

Future/ongoing work:1. Reputation Manipulation2. Measuring Page Engagement3. Understanding Farm Ecosystem