Enhancing Twitter spam discovery using cross account pattern matching.

24
ENHANCING TWITTER SPAM DETECTION USING CROSS ACCOUNT PATTERN MATCHING. By Ambarish Pande

Transcript of Enhancing Twitter spam discovery using cross account pattern matching.

Page 1: Enhancing Twitter spam discovery using cross account pattern matching.

ENHANCING TWITTER SPAM DETECTION USING CROSS ACCOUNT PATTERN MATCHING.

By Ambarish Pande

Page 2: Enhancing Twitter spam discovery using cross account pattern matching.

Contents

▸ Introduction▸ Motivation▸ Proposed Algorithm▸ Implementation Details▸ Advantages and Drawbacks▸ Conclusion and Future work

Page 3: Enhancing Twitter spam discovery using cross account pattern matching.

Introduction

▸ Emerging Social Networks.▹ Popularity of Facebook and Twitter▹ 1550 Million active FB users.▹ 320 Million active Twitter Users.▹ Global Reach▹ Multi-platform

▸ Social Network’s Revenue Model▹ Advertising▹ 85% of Twitter’s Revenue comes

from advertising

Page 4: Enhancing Twitter spam discovery using cross account pattern matching.

Motivation

▸ The Problem▹ Social networks like twitter provide

a legal way of publicizing content.▹ Some companies go for illegal

methods like Spam Accounts.▹ Huge Revenue Loss to Twitter

10,000,00 $ /YrMillions of Dollars per year. That’s a lot of money!

Page 5: Enhancing Twitter spam discovery using cross account pattern matching.

Motivation

▸ Existing Solution▹ Twitter’s spam detection algorithm

focuses on criteria such as:▹ harmful links▹ aggressive following behavior▹ posting to trending topics, ▹ posting duplicated tweets▹ Low profile activity

▸ Drawbacks▹ Spammers have evolved.▹ Now Twitter cannot detect spam

based on existing algorithm

Page 6: Enhancing Twitter spam discovery using cross account pattern matching.

Proposed Algorithm

▸ Emphasis on interaction between accounts and not on individual accounts.

▸ Finding pattern with existing spam tweets.

▸ Detecting spam accounts based on tweets and spam tweets based on accounts.

Page 7: Enhancing Twitter spam discovery using cross account pattern matching.

FLOW CHART TO DETECT SPAM

Identify Tweets with Malicious Links

Mining Spam Patterns

Spam Likelihood Estimation

Page 8: Enhancing Twitter spam discovery using cross account pattern matching.

Proposed Algorithm

Stage 1 :Identify Tweets with Malicious Links.

1. Collect tweets and user info.2. Follow links in the Tweet3. Check whether it is flagged by Twitter or any

other URL Shortening services (goo.gl or bit.ly)

4. If yes Mark as Spam Else no

Leverage Twitter’s Database of Malicious links.

Page 9: Enhancing Twitter spam discovery using cross account pattern matching.

Proposed Algorithm

Stage 2: Mining Spam Patterns. .

1. Strip off all URLS, @user mentions and #hashtags.

2. Strip off all non alphanumeric characters such as digits 0-9 or characters like *,!,@,#.

3. Create a hash for each stripped off tweet.

4. Compare the hash with hashes of other tweets.

Find Pattern

Page 10: Enhancing Twitter spam discovery using cross account pattern matching.

Proposed Algorithm

Stage 3: Spam Likelihood Estimation.

1. Iterate through users and assign spam scores based on the user’s tweets.

2. Iterate through tweets and assign spam score based on the users of tweet.

Calculate Spam Score

Page 11: Enhancing Twitter spam discovery using cross account pattern matching.

Proposed Algorithm

Stage 3: Spam Likelihood Estimation.

Here comes the MATH

Page 12: Enhancing Twitter spam discovery using cross account pattern matching.

Proposed Algorithm

Page 13: Enhancing Twitter spam discovery using cross account pattern matching.

Implementation Details

▸ Data Collection▹ Twitter java API - Twitter4j▹ Registering App with twitter.

Page 14: Enhancing Twitter spam discovery using cross account pattern matching.

Implementation Details

▸ Data Storage▹ MySQL database.

Page 15: Enhancing Twitter spam discovery using cross account pattern matching.

3,79,867tweets

3,129users

Implementation Details

▸ Twitter API has Rate Limits to Number of Requests.

▸ 180 Request / 15 min

Page 16: Enhancing Twitter spam discovery using cross account pattern matching.

Implementation Details

▸ Stage 1 Implementation▹ JSoup - Web Crawler for Java

● t.co - Warning: this link may be unsafe

● Goo.gl - The site ahead contains malware

● Bit.ly - STOP - there might be a problem with the requested link

Page 17: Enhancing Twitter spam discovery using cross account pattern matching.

Implementation Details

▸ Stage 1 Stats▹ After implementing the first stage of the

algorithm

Page 18: Enhancing Twitter spam discovery using cross account pattern matching.

Implementation Details

▸ Stage 2 Implementation▹ Regular Expressions to Strip Off

#hashtags, @user mentions, URLs, special characters and numbers

▹ Used MD5 Algorithm to generate unique hashes.

▹ Tweets with same hash values were marked as spam.

Page 19: Enhancing Twitter spam discovery using cross account pattern matching.

Implementation Details

▸ Stage 2 stats▹ 13015 duplicate hashes were found▹ It covered 70,728 tweets

Page 20: Enhancing Twitter spam discovery using cross account pattern matching.

Implementation Details

▸ Stage 3 Stats▹ Spam tweets which were not initially

labelled by first two stages were found out.

▹ Users which tweet more spam were assigned high Spam Score.

▹ And tweets which are tweeted by such accounts are also assigned higher Spam Score

Page 21: Enhancing Twitter spam discovery using cross account pattern matching.

Drawbacks

▸ Not good enough in detecting human controlled spam accounts.

Advantages

▸ Detects bot controlled spam accounts.▸ Easily detect Spam Campaigns.▸ Spam tweets with different user mentions

and links are also detected.▸ Excessive ReTweets to unrelated topics are

also treated as Spam.

Page 22: Enhancing Twitter spam discovery using cross account pattern matching.

Conclusion and Future Work

▸ Cross Account pattern matching method is highly effective.

▸ Old Methods do not work nowadays.▸ For Future Work

▹ Clustering of tweets to understand topics which spammers use the most

▹ Providing a real time spam discovery solution by implementing Machine Learning.

Page 23: Enhancing Twitter spam discovery using cross account pattern matching.

Refrences

[1] Publication

http://dl.ifip.org/db/conf/im/im2015m/137446.pdf

Page 24: Enhancing Twitter spam discovery using cross account pattern matching.

THANKS!

Any questions?