Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks
description
Transcript of Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks
![Page 1: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/1.jpg)
Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks
Yehonatan CohenDaniel GordonDanny Hendler
Ben-Gurion University
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
![Page 2: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/2.jpg)
Preliminaries ErDOS: An Early Detection Scheme for Outgoing Spam Evaluation Conclusions and Future Work
Danny Hendler and Philipp Woelfel, PODC 2009
Talk outline
![Page 3: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/3.jpg)
Preliminaries Spam
Unsolicited mail, typically sent in large quantities
Hazards•Malware distribution•Phishing•Resource consumption•Poor user experience
Detection may be attempted when•Mail is sent (outgoing spam detection)•Mail is received (incoming spam detection)
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
![Page 4: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/4.jpg)
Outgoing spam detection
Spam can be blocked before leaving the Email Service Provider (ESP)
Advantages• Reduces load on ESP infrastructure• Prevents damage to ESP reputation• Detection may be based on hosted accounts' activity
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
![Page 5: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/5.jpg)
Outgoing spam filtering techniques
Contents-based filtering: Learn & identify messages' textual patterns typical of spam messages
•May be tricked by manipulating spam contento Image-basedo Random string insertion (hash busters)
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
Non-negligible false negative rate
![Page 6: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/6.jpg)
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
Outgoing spam filtering techniques (cont'd)
Inter-account communication patterns analysis:•Models accounts' behaviour•Based on inter-account social interactions•Typically utilizes machine-learning techniques•May leverage ESP account identification
![Page 7: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/7.jpg)
Devise an effective detector of outgoing spammers for large ESPs (the ErDOS detector)
Emphasis on early detection•Detects spammers before the contents-based filter
Short training periods•Highly adaptive to changing spamming patterns
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
Our goals
![Page 8: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/8.jpg)
Most relevant related work Lam & Yeung, CEAS 2007
• Introduce “social-network”-based outgoing spam detection• Use the k-NN classifier• Relatively small dataset (ENRON)• Labeling based on simulated spammer accounts
Tseng & Chen, CSE 2009• Uses same set of features• Uses SVM classifier• Larger, non-ESP dataset (University email server)• Incremental model update• Labeling based on pure accounts• Account identification based on “from” header field
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
![Page 9: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/9.jpg)
Comparison with data-sets of previous work
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
Our data set NTU Enron
#mails 9.86E7 2.13E8 2.86E6 5.17E5
#accounts 5.63E7 5.81E7 6.37E5 3.67E4
#edges 7.40E7 12.90E7 - 3.68E5
time period 4 days(in/out)
26 days(outgoing) 10 days 3.5 years
contents spam & ham spam & ham ham
Collected by a very large ESP Consists of incoming and outgoing log files
o 4 days of bi-directional data + 22 days of outgoing traffic only Both incoming and outgoing messages are labeled as spam/ham by
a content-based detector
![Page 10: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/10.jpg)
Comparison with data-sets of previous work
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
Our data set NTU Enron
#mails 9.86E7 2.13E8 2.86E6 5.17E5
#accounts 5.63E7 5.81E7 6.37E5 3.67E4
#edges 7.40E7 12.90E7 - 3.68E5
time period 4 days(in/out)
26 days(outgoing) 10 days 3.5 years
contents spam & ham spam & ham ham
Collected by a very large ESP Consists of incoming and outgoing log files
o 4 days of bi-directional data + 22 days of outgoing traffic only Both incoming and outgoing messages are labeled as spam/ham by
a content-based detector
![Page 11: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/11.jpg)
Danny Hendler and Philipp Woelfel, PODC 2009
Preliminaries ErDOS: An Early Detection Scheme for Outgoing Spam
• Computation Flow• Features
Evaluation Conclusions and Future Work
Talk outline
![Page 12: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/12.jpg)
The ErDOS detector: computation flow
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
Scored accounts
Classifieddata set
Classification model
Undersampling: extract all spammers and equal number of legitimate accounts
as training setTraining set
Remainder of accounts not in training set
Determine accounts'
classification
Compute account feature values
based on a single day of email logs
Build rotation
forest model
Assign account scores using classification
model
Construct suspect
accounts list of configurable
size
Pre-processing
Feature values
computed
![Page 13: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/13.jpg)
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
Preliminaries ErDOS: An Early Detection Scheme for Outgoing Spam
• Computation Flow• Features
Evaluation Conclusions and Future Work
Talk outline
![Page 14: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/14.jpg)
Legitimate users Maintain social
interactions Often belong to
mailing lists
Spammers Sent messages
seldom replied
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
An account’s IOR = #incoming/#outgoing mails
Low IOR characteristic of spammers
ErDOS features: IOR
![Page 15: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/15.jpg)
Danny Hendler and Philipp Woelfel, PODC 2009
ErDOS features: IOR (cont'd)
![Page 16: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/16.jpg)
Communication Reciprocity (CR)• Fraction of recipients who responded to an account's emails• Defined by Gomes et al.• IOR is superior for short training periods
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
ErDOS features: IOR versus CR
![Page 17: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/17.jpg)
IEBC (Internal/External Behaviour Consistency)• An account can send/receive emails to/from
Internal addresses (accounts hosted by ESP) External addresses
• Legitimate accounts show correlation between internal and external IOR, spammers less so
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
ErDOS features: IEBC
![Page 18: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/18.jpg)
ErDOS features: #outgoing messages Number of outgoing messages
• Spamming accounts send more emails than legitimate• Insufficient for detecting low-volume spammers
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
![Page 19: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/19.jpg)
A large fraction of spammers' incoming mail is spam!• Legitimate accounts seldom send emails to spamming
accounts• Dictionary attacks may cause spammers to spam each other
Analyse senders' characteristics
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
ErDOS: Sender Accounts' Characteristics
![Page 20: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/20.jpg)
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
Preliminaries ErDOS: An Early Detection Scheme for Outgoing Spam Evaluation Conclusions and Future Work
Talk outline
![Page 21: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/21.jpg)
Accuracy for Single-Day training Evaluate Accuracy attained for single day logs
• Email accounts are classified based on the tags of the contents-base detector
• True Positive (TP) and False Positive (FP) values are averaged over available 4 days of bidirectional data
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
ErDOS LY-knn� ⃰ MailNET� ⃰� ⃰ ⃰ ⃰TP FP TP FP TP FP71 8.9 76.3 47.8 22.6 44.2
![Page 22: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/22.jpg)
Early detection evaluation Spamming accounts detected before the
contents-based detector• Suspected by detector, send messages tagged as spam
only on later days• Evaluation uses all 26 days of data
Early detection quality criteria:• e-Precision: fraction of early detected accounts out of
suspects list.• Enrichment Factor (EF): ratio between detector's
e-Precision and that of a random accounts list.
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
![Page 23: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/23.jpg)
Early detection Early detection results, averaged over 4 days:
Prior art’s early detections results compared to ErDOS:
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
ErDOS’s suspects Entire population#accounts 100 100
Early detections 9 0.53
e-Precision 0.09 0.0053
ErDOS LY-knn MailNETe-Precision 90.0 0.012 0.025
EF 16.9 2.3 4.7
![Page 24: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/24.jpg)
Early detection (cont’d) e-Precision for varying suspects list lengths:
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
![Page 25: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/25.jpg)
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013
Preliminaries ErDOS: An Early Detection Scheme for Outgoing Spam Evaluation Conclusions and Future Work
Talk outline
![Page 26: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks](https://reader035.fdocuments.in/reader035/viewer/2022081604/5681567a550346895dc42e08/html5/thumbnails/26.jpg)
Conclusions and Future Work Conclusions
• The case of outgoing spam detection for ESPs has its unique nature
• Contents-based filtering is not enough• Early detection of spamming accounts can be achieve by a
combination of contents-based filter and network level-based detector
Future Work• Enhancement of ErDOS’s early detection performance by
additional features• A low-volume spammers expert detector, based on
ErDOS’s computation flow and features
Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013