Understanding the Network-Level Understanding the Network-Level Behavior of SpammersBehavior of Spammers
Mike DelahuntyMike DelahuntyBryan LutzBryan Lutz
Kimberly PengKimberly PengKevin KazmierskiKevin Kazmierski
John ThykattilJohn Thykattil
By Anirudh Ramachandran and Nick FeamsterBy Anirudh Ramachandran and Nick Feamster
Defense Team:Defense Team:
AgendaAgenda
IntroductionIntroduction
Background and Related WorkBackground and Related Work
Data CollectionData Collection
Network-level Characteristics of SpammersNetwork-level Characteristics of Spammers
Spam from BotnetsSpam from Botnets
Spam from Transient BGP AnnouncementsSpam from Transient BGP Announcements
Lessons from Better Spam MitigationLessons from Better Spam Mitigation
ConclusionConclusion
IntroductionIntroductionSpamSpam
Multiple emails sent to many recipientsMultiple emails sent to many recipients Unsolicited commercial messagesUnsolicited commercial messages
Study based on network level behavior of Study based on network level behavior of spammersspammers
IP address rangesIP address ranges Spamming modes (route hijacking, bots, etc.)Spamming modes (route hijacking, bots, etc.) Temporal persistence of spamming hostsTemporal persistence of spamming hosts Characteristics of spamming botnetsCharacteristics of spamming botnets
Much attention has been paid to studying the Much attention has been paid to studying the content of spamcontent of spam
Introduction Cont.Study posits that Network Level properties need to Study posits that Network Level properties need to be investigated in order to determine creative be investigated in order to determine creative ways to mitigate spamways to mitigate spamPaper analyzes network properties of spam that is Paper analyzes network properties of spam that is observed at a large spam “sinkhole”observed at a large spam “sinkhole”
BGP route advertisementsBGP route advertisements Traces of command and control messages of a Bobax botnetTraces of command and control messages of a Bobax botnet Legitimate emailsLegitimate emails
Surprising ConclusionsSurprising Conclusions Most spam comes from a small IP address space (but so does Most spam comes from a small IP address space (but so does
legitimate email)legitimate email) Most spam comes from Microsoft Windows hosts – botsMost spam comes from Microsoft Windows hosts – bots Small set of spammers use short-lived route announcements to Small set of spammers use short-lived route announcements to
remain untraceableremain untraceable
BackgroundBackground
Methods and MitigationMethods and Mitigation Spamming MethodsSpamming Methods
Direct Spamming – via spam friendly ISPs or dial-up IPsDirect Spamming – via spam friendly ISPs or dial-up IPs
Open Relays and Proxies – mail serves that allow Open Relays and Proxies – mail serves that allow unauthenticated to relay emailunauthenticated to relay email
Botnets – hijacked machines acting under the control of Botnets – hijacked machines acting under the control of centralized ‘botmaster’centralized ‘botmaster’
BGP Spectrum Agility – short-lived route announcements to the BGP Spectrum Agility – short-lived route announcements to the IP addresses from which they send spam; hampers traceabilityIP addresses from which they send spam; hampers traceability
Mitigation TechniquesMitigation Techniques Filtering: Content based and IP BlacklistsFiltering: Content based and IP Blacklists
Related WorkRelated Work
Related Work – Previous StudiesRelated Work – Previous Studies Packet traces to determine bandwidth Packet traces to determine bandwidth
bottlenecks from spam sourcesbottlenecks from spam sources Project HoneypotProject Honeypot
Sink for email traffic and hands out trap email Sink for email traffic and hands out trap email addresses to determine harvesting behavior and addresses to determine harvesting behavior and identity of spammersidentity of spammers
Time monitoring from harvesting to receipt of first Time monitoring from harvesting to receipt of first spam messagespam message
Countries where harvesting infrastructure is locatedCountries where harvesting infrastructure is located
Persistence of spam harvestersPersistence of spam harvesters
Related Work Cont.
MitigationMitigation SpamAssassin Project – reverse engineering via mail SpamAssassin Project – reverse engineering via mail
content analysiscontent analysis DNS blacklist – 80% of IPs sending spam were in the DNS blacklist – 80% of IPs sending spam were in the
blacklistblacklist
Unusual Route AnnouncementsUnusual Route Announcements Bogus Well-Known addressesBogus Well-Known addresses Suggestions of short lived route announcementsSuggestions of short lived route announcements
Data CollectionData Collection
Reserve a “sinkhole” Reserve a “sinkhole” Registered domain with no legitimate email Registered domain with no legitimate email
addressesaddresses Establish a DNS Mail Exchange record for it.Establish a DNS Mail Exchange record for it. All emails received by the server are spamAll emails received by the server are spam Run metrics on incoming emails Run metrics on incoming emails
IP address of the relay; also run a tracerouteIP address of the relay; also run a traceroute TPC fingerprint to get the source OSTPC fingerprint to get the source OS Results of DNS blacklist from 8 different blacklist serversResults of DNS blacklist from 8 different blacklist servers
Data Collection Cont.
Spam received per day at sinkhole (Aug. 2004 – Dec. 2005)Spam received per day at sinkhole (Aug. 2004 – Dec. 2005)
Data Collection Cont.““Hijack” the DNS server for the domain running a Hijack” the DNS server for the domain running a botnetbotnet Have botnet commands go to a known machine instead.Have botnet commands go to a known machine instead.
MMonitor the BGP update from the networks where onitor the BGP update from the networks where the spams are receivedthe spams are received Collect logs from large email provider (40 million Collect logs from large email provider (40 million mailboxes)mailboxes) Allows analysis of network characteristics for spam and Allows analysis of network characteristics for spam and
non-spamnon-spam
Data AnalysisData Analysis
Study focuses on network level characteristicsStudy focuses on network level characteristics Distribution of spam across IP address space is Distribution of spam across IP address space is similar to legitimate emails (although not exact)similar to legitimate emails (although not exact) Spam over IP address range is not uniformSpam over IP address range is not uniform 12% of all received spam comes from two 12% of all received spam comes from two
Autonomous Systems (AS)Autonomous Systems (AS) 37% come from top 20 ASes.37% come from top 20 ASes. Offers insight into spam preventionOffers insight into spam prevention
Classifying spam by country: China, Korea, & US Classifying spam by country: China, Korea, & US dominate dominate Defense suggestionDefense suggestion Correlate originating country with IP range to Correlate originating country with IP range to
estimate probability of spam.estimate probability of spam.
Cumulative Distribution Function (CDF) of Spam and Legitimate Cumulative Distribution Function (CDF) of Spam and Legitimate EmailEmail
Greater probability of
legitimate emails
Big increase in probability of
received spam
Spam PersistenceSpam Persistence
85% of unique spammers
send 10 emails or less
If this is true for all, what’s the value in
filtering by a specific IP address?
Effectiveness of Blacklists
About 80% of spam listed in at least one major blacklistAbout 80% of spam listed in at least one major blacklist
Effectiveness of Blacklists Cont.
Most spam bots are detected by at least one DNSRBLMost spam bots are detected by at least one DNSRBLOnly 50% of spammers using transient BGP announcements detected by one Only 50% of spammers using transient BGP announcements detected by one DNSRBLDNSRBL
Spam from BotnetsSpam from BotnetsCircumstantial evidence suggests that most Circumstantial evidence suggests that most spam originates from botsspam originates from botsSpamming hosts and Bobax drones have very Spamming hosts and Bobax drones have very similar distributions across IP address spacesimilar distributions across IP address space Suggests that much spam received may be due to Suggests that much spam received may be due to
botnets such as Bobaxbotnets such as Bobax
More on Bots
Most individual bots send low volume of spam individuallyMost individual bots send low volume of spam individually
Operating Systems Used by SpammersOperating Systems Used by Spammers
Used OS fingerprinting tool “p0f” in Mail Used OS fingerprinting tool “p0f” in Mail AvengerAvengerAble to identify OS of 75% of hosts that sent Able to identify OS of 75% of hosts that sent spamspam Of this 75% identifiable segment, 95% run Of this 75% identifiable segment, 95% run
WindowsWindows Consistent with percentage of hosts on Internet Consistent with percentage of hosts on Internet
that run Windowsthat run Windows
Only about 4% run other OS, but are Only about 4% run other OS, but are responsible for 8% of received spam.responsible for 8% of received spam. This goes against common perception that most This goes against common perception that most
spam originates from Windows botnet drones spam originates from Windows botnet drones
Spam from Transient BGP AnnouncementsSpam from Transient BGP Announcements
Some spammers briefly hijack large portions Some spammers briefly hijack large portions of IP address space (that do not belong to of IP address space (that do not belong to them), send spam, and withdraw routes them), send spam, and withdraw routes immediately after spammingimmediately after spamming
Not much known, not well defended againstNot much known, not well defended against
Very difficult to traceVery difficult to trace Allows spammer to evade DNSRBLsAllows spammer to evade DNSRBLs
Used 10% or less of the time, as Used 10% or less of the time, as complementary spamming tacticcomplementary spamming tactic
Lessons on Spam MitigationLessons on Spam MitigationWhy should we use network-level information?Why should we use network-level information? Information is less malleableInformation is less malleable
More constant than spam email contents, which content-More constant than spam email contents, which content-based filters monitorbased filters monitor
Information is observable in the middle of the Information is observable in the middle of the networknetwork
Closer to the source of the spam than other techniquesCloser to the source of the spam than other techniques Will result in more effective spam filtersWill result in more effective spam filters
When combined with other techniquesWhen combined with other techniques Has potential to stop spam that other techniques missHas potential to stop spam that other techniques miss
More LessonsMore Lessons
Improves knowledge of host identityImproves knowledge of host identity
Bases detection techniques on aggregate Bases detection techniques on aggregate behaviorbehavior
Protects against route hijackingProtects against route hijacking ““BGP spectrum agility”BGP spectrum agility” Other techniques do notOther techniques do not
Uses network-level properties to detect and Uses network-level properties to detect and filterfilter
ConclusionConclusion
Studying the network-level behavior of Studying the network-level behavior of spammersspammers
Designing better spam filters with network-Designing better spam filters with network-level filterslevel filters
Network-level behavior filters vs. content-Network-level behavior filters vs. content-based filtersbased filters Should not replace content-based filters, but Should not replace content-based filters, but
complement themcomplement them
Questions?Questions?
Top Related