Twitter Content-based Spam Filtering - CISIS 2013

47
Igor Santos Igor Miñambres-Marcos Carlos Laorden Patxi Galán-García Aitor Santamaría-Ibirika Pablo G. Bringas

description

Presentation at CISIS 2013 International conference of the paper: Twitter Content-based Spam Filtering

Transcript of Twitter Content-based Spam Filtering - CISIS 2013

Page 1: Twitter Content-based Spam Filtering - CISIS 2013

Igor Santos Igor Miñambres-Marcos Carlos Laorden Patxi Galán-García Aitor Santamaría-Ibirika Pablo G. Bringas

Page 2: Twitter Content-based Spam Filtering - CISIS 2013
Page 3: Twitter Content-based Spam Filtering - CISIS 2013
Page 4: Twitter Content-based Spam Filtering - CISIS 2013
Page 5: Twitter Content-based Spam Filtering - CISIS 2013
Page 6: Twitter Content-based Spam Filtering - CISIS 2013
Page 7: Twitter Content-based Spam Filtering - CISIS 2013
Page 8: Twitter Content-based Spam Filtering - CISIS 2013
Page 9: Twitter Content-based Spam Filtering - CISIS 2013
Page 10: Twitter Content-based Spam Filtering - CISIS 2013
Page 11: Twitter Content-based Spam Filtering - CISIS 2013
Page 12: Twitter Content-based Spam Filtering - CISIS 2013
Page 13: Twitter Content-based Spam Filtering - CISIS 2013
Page 14: Twitter Content-based Spam Filtering - CISIS 2013
Page 15: Twitter Content-based Spam Filtering - CISIS 2013
Page 16: Twitter Content-based Spam Filtering - CISIS 2013
Page 17: Twitter Content-based Spam Filtering - CISIS 2013
Page 18: Twitter Content-based Spam Filtering - CISIS 2013
Page 19: Twitter Content-based Spam Filtering - CISIS 2013
Page 20: Twitter Content-based Spam Filtering - CISIS 2013
Page 21: Twitter Content-based Spam Filtering - CISIS 2013
Page 22: Twitter Content-based Spam Filtering - CISIS 2013
Page 23: Twitter Content-based Spam Filtering - CISIS 2013
Page 24: Twitter Content-based Spam Filtering - CISIS 2013
Page 25: Twitter Content-based Spam Filtering - CISIS 2013
Page 26: Twitter Content-based Spam Filtering - CISIS 2013
Page 27: Twitter Content-based Spam Filtering - CISIS 2013

Detecting spammer accounts

Content-based analysis

Page 28: Twitter Content-based Spam Filtering - CISIS 2013
Page 29: Twitter Content-based Spam Filtering - CISIS 2013
Page 30: Twitter Content-based Spam Filtering - CISIS 2013

(TweetSpike) (Legitimate)

spam ham

Page 31: Twitter Content-based Spam Filtering - CISIS 2013
Page 32: Twitter Content-based Spam Filtering - CISIS 2013

t1

t2

t3

m1

m2

m10

m3

m9

m4

m7

m8

m5

m11

m6

Page 33: Twitter Content-based Spam Filtering - CISIS 2013
Page 34: Twitter Content-based Spam Filtering - CISIS 2013
Page 35: Twitter Content-based Spam Filtering - CISIS 2013

legitimate

spam

Page 36: Twitter Content-based Spam Filtering - CISIS 2013

legitimate

spam

testing

probability

Page 37: Twitter Content-based Spam Filtering - CISIS 2013

Dynamic Markov Chain (DMC)

Prediction by Partial Match (PPM)

Page 38: Twitter Content-based Spam Filtering - CISIS 2013
Page 39: Twitter Content-based Spam Filtering - CISIS 2013

Classifier Acc. Sp Sr F-Measure AUC

Random Forest N=50 96.42 0.98 0.94 0.96 0.99

DMC without Adaptation 95.99 0.96 0.95 0.96 0.99

Random Forest N=10 95.96 0.97 0.94 0.95 0.99

PPM without Adaptation 94.80 0.97 0.91 0.94 0.99

Naive Bayes Multinomial Word Frequency 94.94 0.95 0.93 0.94 0.98

Bayes K2 94.12 0.99 0.88 0.93 0.98

DMC with Adaptation 93.11 0.94 0.90 0.92 0.98

C4.5 95.79 0.98 0.92 0.95 0.97

KNN K=3 93.71 0.97 0.89 0.93 0.97

SVM PVK 95.81 0.97 0.93 0.95 0.96

PPM with Adaptation 76.50 0.78 0.69 0.72 0.86

Naive Bayes 72.72 0.64 0.89 0.75 0.76

Page 40: Twitter Content-based Spam Filtering - CISIS 2013
Page 41: Twitter Content-based Spam Filtering - CISIS 2013

A new and public dataset of twitter spam to serve as evaluation

Adaptation of content-based spam filtering to Twitter

A new compression-based text filtering library for the ML tool WEKA

Page 42: Twitter Content-based Spam Filtering - CISIS 2013

enhance this approach using social network features

semantic capabilities by studying the linguistic relationships

Page 43: Twitter Content-based Spam Filtering - CISIS 2013
Page 44: Twitter Content-based Spam Filtering - CISIS 2013
Page 45: Twitter Content-based Spam Filtering - CISIS 2013

1. Follow me: http://files.twiyo-magazine.com/200000231-

1dfbb1ef57/follow-me-twitter.png

2. Twitter: http://www.redunonet.co/twitter.png

3. Twitter Infography: http://expandedramblings.com/index.php/march-

2013-by-the-numbers-a-few-amazing-twitter-stats

4. Twitter news: http://techtips.biz/wp-

content/uploads/sites/9/2013/07/twitter-news.jpg

5. Customer service: http://www.parature.com/wp-

content/uploads/2012/04/customerservice_twitter.jpg

6. MUSI Deusto: https://twitter.com/MUSIDeusto

7. Gossip: http://polskilive.pl/wp-content/uploads/2013/02/bigstock-

Gossiping-Women-Retro-Clip-A-17343494.jpg

8. Cyber-bullying:

http://jodielouiseuow.files.wordpress.com/2013/05/2010-10-21-cyber-

bullies.jpg

9. Sad teddy bear: http://thumbs.dreamstime.com/x/sad-lonely-teddy-

bear-15726476.jpg

Page 46: Twitter Content-based Spam Filtering - CISIS 2013

10. Spam bird: http://all4boys.ru/_pu/0/52734883.png

11. Dollars: http://vegasgravy.com/News-detail/two-women-caught-for-

transporting-drug-money-from-vegas/dollars/

12. Day 97: Infected by dustywrath:

http://www.flickr.com/photos/10921499@N07/2187318683

13. my bank sucks by B Rosen:

http://www.flickr.com/photos/rosengrant/3537904106/

14. Spam wall by freezelight:

http://www.flickr.com/photos/63056612@N00/155554663/

15. Bird with boxing gloves: http://www.fightlikeagirlclub.com/wp-

content/uploads/2010/11/Bird-with-Boxing-Gloves.png

16. Twitter media: http://media.meltybuzz.fr/article-1440806-

ajust_930/media.jpg

17. Construction bird: http://i1-news.softpedia-

static.com/images/news2/Malicious-URL-Filtering-on-Twitter-2.jpg

18. Bird in egg: http://needsomeonetoblog.com/wp-

content/uploads/2013/07/bigstock-Blue-bird-in-egg-6079257.jpg

Page 47: Twitter Content-based Spam Filtering - CISIS 2013

19. Document folder:

http://www.gsstr.nl/upload/9/4/1/gsstr/documentfolder.large.jpg?0.7202

662836172612

20. ZIP: http://www.kohl.bz/fileadmin/template/ZIP.png

21. Bird in pole: http://www.microcenterblog.com/wp-

content/uploads/2013/01/Fake-or-Real-150x150.jpg

22. Bird screaming: http://www.bluewaterbrand.com/wp-

content/uploads/2013/04/168_2671597.jpg

23. Bird with sign: http://blog.retirementincomenetwork.com/wp-

content/uploads/2013/05/twitter-bird.jpg

24. Bird in lineup: http://sparkboutik.com/wp-

content/uploads/2012/01/twitterfauxpas.jpg