Web Spambot Detection Based on Web Navigation Behaviour
description
Transcript of Web Spambot Detection Based on Web Navigation Behaviour
Web Spambot Detection Based on Web Navigation Behaviour
Pedram HayatiVidyasagar Potdar
Kevin ChaiAlex Talevski
Anti-Spam Research Lab (ASRL)Digital Ecosystem and Business Intelligence Institute
Curtin University, Perth, Western Australia
2www.AntiSpamResearchLab.com
Introduction
• Junk, Unrelated, Unwelcome, Anonymous content ==> spam.
• Spam now not only spreads through email but also through Web 2.0.
• This new trend of spamming is called as Spam 2.0.
3www.AntiSpamResearchLab.com
Examples of Spam 2.0
• Hosting Spam content in Web applications on legitimate websites¹.
¹ P. Hayati, V. Potdar, A. Talveski, N. Firoozeh, S. Sarenche, E. A. Yeganeh. Spam 2.0 Definition, New Spamming Boom. DEST 2010, Dubai, UAE, April 2010.
4www.AntiSpamResearchLab.com
Web SpamBot
• A tool is used by spammer to distribute Spam 2.0.
• Use the idea of Web robots.
• Mimic Human user behaviour.
• Waste useful resources.
In order to counter Spam 2.0 We can concentrate on Web Spambot detection as Source of Spam 2.0 problem.
5www.AntiSpamResearchLab.com
Spam 2.0
6www.AntiSpamResearchLab.com
Countermeasures
• Mostly on Email Spam detection.
• Content based, Meta-Content based.
• Applicable for Web environment like link-based detection.
• CAPTCHA– Possible to bypass using ML.– Machines are better to decipher.– Inconveniences human users.
7www.AntiSpamResearchLab.com
Problem
• Not suitable for web 2.0 platform– Spam hosts on legitimate website – Parasitic nature– We cannot make whole website blacklisted
because of spam posts.
8www.AntiSpamResearchLab.com
Our Solution
• Study Web spambot behaviour in order to stop spam 2.0.
• Fundamental assumption:
– spambot behaviour is intrinsically different from those of humans.
• Use Web Usage Data.– Contain information about user navigation through
website.– Can be gathered implicitly.
• Convert web usage data into a format that can be– Extendible– Discriminative
9www.AntiSpamResearchLab.com
Our Solution
• Propose new feature set called Action.– a set of user requested webpages to achieve
a certain goal.
• Example– in an online forum, a user navigates to a
specific board then goes to the New Thread page to start a new topic.
– This user navigation can be formulated as submitting new content action.
10www.AntiSpamResearchLab.com
Framework
11www.AntiSpamResearchLab.com
Action Extraction
12www.AntiSpamResearchLab.com
Algorithm
13www.AntiSpamResearchLab.com
Dataset
• 60 days study of web spambot behaviour on a live discussion board (HoneySpam 2.0 Project).
• 1 month study of human user behaviour.
14www.AntiSpamResearchLab.com
Action Frequency of Humans and Spambots
15www.AntiSpamResearchLab.com
Performance Measurement
• Matthew Correlation Coefficient (MCC)
16www.AntiSpamResearchLab.com
Results
17www.AntiSpamResearchLab.com
Conclusion
• We propose innovative idea by focusing on spambot identification to manage spam rather than analysing spam content.
• We proposed a novel framework to detect spambots inside Web 2.0 applications, which lead us to Spam 2.0 detection.
• We proposed a new feature set i.e. action navigations, to detect spambots.
• We validated our framework against an online forum and achieved 96.24% accuracy using the MCC method
18www.AntiSpamResearchLab.com
Thank YOU!
Web Spambot Detection Based on Web Navigation Behaviour
• Pedram Hayati – [email protected]• Vidyasagar Potdar – [email protected]• Kevin Chai – [email protected]• Alex Talevski – [email protected]
• Anti-Spam Research Lab (ASRL)• Digital Ecosystem and Business Intelligence Institute• Curtin University, Perth, Western Australia
• www.antispamresearchlab.com