Download - Web Spambot Detection Based on Web Navigation Behaviour

Transcript
Page 1: Web Spambot Detection  Based on  Web Navigation Behaviour

Web Spambot Detection Based on Web Navigation Behaviour

Pedram HayatiVidyasagar Potdar

Kevin ChaiAlex Talevski

Anti-Spam Research Lab (ASRL)Digital Ecosystem and Business Intelligence Institute

Curtin University, Perth, Western Australia

Page 2: Web Spambot Detection  Based on  Web Navigation Behaviour

2www.AntiSpamResearchLab.com

Introduction

• Junk, Unrelated, Unwelcome, Anonymous content ==> spam.

• Spam now not only spreads through email but also through Web 2.0.

• This new trend of spamming is called as Spam 2.0.

Page 3: Web Spambot Detection  Based on  Web Navigation Behaviour

3www.AntiSpamResearchLab.com

Examples of Spam 2.0

• Hosting Spam content in Web applications on legitimate websites¹.

¹ P. Hayati, V. Potdar, A. Talveski, N. Firoozeh, S. Sarenche, E. A. Yeganeh. Spam 2.0 Definition, New Spamming Boom. DEST 2010, Dubai, UAE, April 2010.

Page 4: Web Spambot Detection  Based on  Web Navigation Behaviour

4www.AntiSpamResearchLab.com

Web SpamBot

• A tool is used by spammer to distribute Spam 2.0.

• Use the idea of Web robots.

• Mimic Human user behaviour.

• Waste useful resources.

In order to counter Spam 2.0 We can concentrate on Web Spambot detection as Source of Spam 2.0 problem.

Page 5: Web Spambot Detection  Based on  Web Navigation Behaviour

5www.AntiSpamResearchLab.com

Spam 2.0

Page 6: Web Spambot Detection  Based on  Web Navigation Behaviour

6www.AntiSpamResearchLab.com

Countermeasures

• Mostly on Email Spam detection.

• Content based, Meta-Content based.

• Applicable for Web environment like link-based detection.

• CAPTCHA– Possible to bypass using ML.– Machines are better to decipher.– Inconveniences human users.

Page 7: Web Spambot Detection  Based on  Web Navigation Behaviour

7www.AntiSpamResearchLab.com

Problem

• Not suitable for web 2.0 platform– Spam hosts on legitimate website – Parasitic nature– We cannot make whole website blacklisted

because of spam posts.

Page 8: Web Spambot Detection  Based on  Web Navigation Behaviour

8www.AntiSpamResearchLab.com

Our Solution

• Study Web spambot behaviour in order to stop spam 2.0.

• Fundamental assumption:

– spambot behaviour is intrinsically different from those of humans.

• Use Web Usage Data.– Contain information about user navigation through

website.– Can be gathered implicitly.

• Convert web usage data into a format that can be– Extendible– Discriminative

Page 9: Web Spambot Detection  Based on  Web Navigation Behaviour

9www.AntiSpamResearchLab.com

Our Solution

• Propose new feature set called Action.– a set of user requested webpages to achieve

a certain goal.

• Example– in an online forum, a user navigates to a

specific board then goes to the New Thread page to start a new topic.

– This user navigation can be formulated as submitting new content action.

Page 10: Web Spambot Detection  Based on  Web Navigation Behaviour

10www.AntiSpamResearchLab.com

Framework

Page 11: Web Spambot Detection  Based on  Web Navigation Behaviour

11www.AntiSpamResearchLab.com

Action Extraction

Page 12: Web Spambot Detection  Based on  Web Navigation Behaviour

12www.AntiSpamResearchLab.com

Algorithm

Page 13: Web Spambot Detection  Based on  Web Navigation Behaviour

13www.AntiSpamResearchLab.com

Dataset

• 60 days study of web spambot behaviour on a live discussion board (HoneySpam 2.0 Project).

• 1 month study of human user behaviour.

Page 14: Web Spambot Detection  Based on  Web Navigation Behaviour

14www.AntiSpamResearchLab.com

Action Frequency of Humans and Spambots

Page 15: Web Spambot Detection  Based on  Web Navigation Behaviour

15www.AntiSpamResearchLab.com

Performance Measurement

• Matthew Correlation Coefficient (MCC)

Page 16: Web Spambot Detection  Based on  Web Navigation Behaviour

16www.AntiSpamResearchLab.com

Results

Page 17: Web Spambot Detection  Based on  Web Navigation Behaviour

17www.AntiSpamResearchLab.com

Conclusion

• We propose innovative idea by focusing on spambot identification to manage spam rather than analysing spam content.

• We proposed a novel framework to detect spambots inside Web 2.0 applications, which lead us to Spam 2.0 detection.

• We proposed a new feature set i.e. action navigations, to detect spambots.

• We validated our framework against an online forum and achieved 96.24% accuracy using the MCC method

Page 18: Web Spambot Detection  Based on  Web Navigation Behaviour

18www.AntiSpamResearchLab.com

Thank YOU!

Web Spambot Detection Based on Web Navigation Behaviour

• Pedram Hayati – [email protected]• Vidyasagar Potdar – [email protected]• Kevin Chai – [email protected]• Alex Talevski – [email protected]

• Anti-Spam Research Lab (ASRL)• Digital Ecosystem and Business Intelligence Institute• Curtin University, Perth, Western Australia

• www.antispamresearchlab.com