Web Spambot Detection Based on Web Navigation Behaviour
Pedram HayatiVidyasagar Potdar
Kevin ChaiAlex Talevski
Anti-Spam Research Lab (ASRL)Digital Ecosystem and Business Intelligence Institute
Curtin University, Perth, Western Australia
2www.AntiSpamResearchLab.com
Introduction
• Junk, Unrelated, Unwelcome, Anonymous content ==> spam.
• Spam now not only spreads through email but also through Web 2.0.
• This new trend of spamming is called as Spam 2.0.
3www.AntiSpamResearchLab.com
Examples of Spam 2.0
• Hosting Spam content in Web applications on legitimate websites¹.
¹ P. Hayati, V. Potdar, A. Talveski, N. Firoozeh, S. Sarenche, E. A. Yeganeh. Spam 2.0 Definition, New Spamming Boom. DEST 2010, Dubai, UAE, April 2010.
4www.AntiSpamResearchLab.com
Web SpamBot
• A tool is used by spammer to distribute Spam 2.0.
• Use the idea of Web robots.
• Mimic Human user behaviour.
• Waste useful resources.
In order to counter Spam 2.0 We can concentrate on Web Spambot detection as Source of Spam 2.0 problem.
5www.AntiSpamResearchLab.com
Spam 2.0
6www.AntiSpamResearchLab.com
Countermeasures
• Mostly on Email Spam detection.
• Content based, Meta-Content based.
• Applicable for Web environment like link-based detection.
• CAPTCHA– Possible to bypass using ML.– Machines are better to decipher.– Inconveniences human users.
7www.AntiSpamResearchLab.com
Problem
• Not suitable for web 2.0 platform– Spam hosts on legitimate website – Parasitic nature– We cannot make whole website blacklisted
because of spam posts.
8www.AntiSpamResearchLab.com
Our Solution
• Study Web spambot behaviour in order to stop spam 2.0.
• Fundamental assumption:
– spambot behaviour is intrinsically different from those of humans.
• Use Web Usage Data.– Contain information about user navigation through
website.– Can be gathered implicitly.
• Convert web usage data into a format that can be– Extendible– Discriminative
9www.AntiSpamResearchLab.com
Our Solution
• Propose new feature set called Action.– a set of user requested webpages to achieve
a certain goal.
• Example– in an online forum, a user navigates to a
specific board then goes to the New Thread page to start a new topic.
– This user navigation can be formulated as submitting new content action.
10www.AntiSpamResearchLab.com
Framework
11www.AntiSpamResearchLab.com
Action Extraction
12www.AntiSpamResearchLab.com
Algorithm
13www.AntiSpamResearchLab.com
Dataset
• 60 days study of web spambot behaviour on a live discussion board (HoneySpam 2.0 Project).
• 1 month study of human user behaviour.
14www.AntiSpamResearchLab.com
Action Frequency of Humans and Spambots
15www.AntiSpamResearchLab.com
Performance Measurement
• Matthew Correlation Coefficient (MCC)
16www.AntiSpamResearchLab.com
Results
17www.AntiSpamResearchLab.com
Conclusion
• We propose innovative idea by focusing on spambot identification to manage spam rather than analysing spam content.
• We proposed a novel framework to detect spambots inside Web 2.0 applications, which lead us to Spam 2.0 detection.
• We proposed a new feature set i.e. action navigations, to detect spambots.
• We validated our framework against an online forum and achieved 96.24% accuracy using the MCC method
18www.AntiSpamResearchLab.com
Thank YOU!
Web Spambot Detection Based on Web Navigation Behaviour
• Pedram Hayati – [email protected]• Vidyasagar Potdar – [email protected]• Kevin Chai – [email protected]• Alex Talevski – [email protected]
• Anti-Spam Research Lab (ASRL)• Digital Ecosystem and Business Intelligence Institute• Curtin University, Perth, Western Australia
• www.antispamresearchlab.com
Top Related