Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

38
Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

Page 1: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

Spam

Sagar Vemuri

slides courtesy:

Anirudh Ramachandran

Nick Feamster

Page 2: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

2

Agenda

• Understanding Spam– What is Spam? – Statistics– Types of Spam– Spamming Methods– Spam Mitigation Methods

• Understanding the Network-level behavior of spammers– Data Collection Methods– Statistics– BGP Spectrum Agility, Botnets, Harvesting– Drawbacks

Page 3: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

3

What is Spam?

• Unsolicited commercial message• “Spam is e-mail that is both unsolicited by the

recipient and sent in substantively identical form to many recipients”

• As of last quarter of 2005, estimates indicate that about 80-85% of all email is spam

• Microsoft founder Bill Gates receives four million e-mails per year, most of them being spam

Page 4: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

4

Some statistics

• 1978 - An e-mail spam is sent to 600 addresses.• 1994 - First large-scale spam sent to 6000

newsgroups, reaching millions of people• 2005 - (June) 30 billion per day • 2006 - (June) 55 billion per day • 2006 - (December) 85 billion per day • 2007 - (February) 90 billion per day

Page 5: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

5

Products advertised

• Porn site subscriptions• Prescription drugs• Printer ink cartridges• Counterfeit software• Mortgage offers• Fake diplomas from non-existent or non-

accredited universities

Page 6: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

6

Types of Spam

• Email spam• IM spam

– Also called ‘Spim’– 1.2 billion spam IM messages in 2004

• SMS spam– Also called ‘m-spam’

• Image spam– Text of a msg stored as GIF or JPEG and displayed in

the email– Prevents text based spam filters from detecting it

Page 7: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

7

Spamming Methods

• Direct spamming– By purchasing upstream connectivity from “spam-

friendly ISPs”

• Open relays and proxies– Mail servers that allow unauthenticated Internet hosts

to connect and relay mail through them

• Botnets– Collection of machines acting under one centralized

controller. Eg: Bobax

• BGP Spectrum Agility– IP hijacking techniques

Page 8: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

8

Spam Mitigation

• Filtering– Based on content– Use features in email’s headers and body– Eg: SpamAssassin

• Blacklisting: – IP addresses of known spam sources are used to

classify email– More than 30 widely used blacklists available today

Page 9: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

9

Content-based Filtering

Content-based properties are malleable– Low cost to evasion: Spammers can easily alter features of an

email’s content – Customization: Customized emails are easy to generate– High cost to filter maintainers: Filters must be continually

updated as content-changing techniques become more sophisticated

• Content-based filters are applied at the destination– Too little, too late: Wasted network bandwidth, storage, etc.

Many users receive (and store) the same spam content

Page 10: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

10

DNS Blacklisting

• Aggressive filters have many false positives• One list might not have all the information about

spamming IPs • Need to consult multiple lists

Page 11: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

11

Network-level Spam Filtering

• Network-level properties are harder to change than content

• Network-level properties– IP addresses and IP address ranges (prevalence)– Change of addresses over time (persistence)– Distribution according to operating system, country

and AS – Characteristics of botnets and short-lived route

announcements

• Help develop better spam filters

Page 12: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

12

Spamming Patterns

Network-level properties of spam arrival– From where?

• What IP address space?• ASes?• What OSes?

– What techniques?• Botnets• Short-lived route announcements• Shady ISPs

– Capabilities and limitations?• Bandwidth• Size of botnet army

Page 13: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

Understanding the Network-Level Behavior of Spammers

Anirudh Ramachandran

Nick Feamster

(Georgia Tech)

Page 14: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

14

Data Collection• Primary dataset: Actual spam email messages

collected at a large spam sinkhole• Corpus of email logs from a large email provider• Command and Control traffic from a Bobax botnet• BGP route advertisements from an upstream

border router in the same network• Also capturing traceroutes, DNSBL results, passive

TCP host fingerprinting simultaneous with spam arrival

Page 15: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

15

Data Collection Setup

Exchange 1

Page 16: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

16

Data collected when the spam is received

• IP address of the relay that established the SMTP connection to the sinkhole

• Traceroute to that IP address, to help us estimate the network location of the mail relay

• Passive “p0f” TCP fingerprint, to determine the OS of the mail relay

• Result of DNS blacklist (DNSBL) lookups for that mail relay at eight different DNSBLs

Page 17: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

17

MailAvenger

• Highly configurable SMTP server that collects many useful statistics

Page 18: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

18

Spam per Day

• Both the amount of spam and the number of distinct IP addresses increase over time

Page 19: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

19

IP Address Distribution

• The majority of spam is sent from a relatively small fraction of IP address space

• The distribution is the same for legitimate mail

Page 20: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

20

AS distribution

• Large fraction of spam received from just a handful of ASes

• 12% of all received spam originates in just two ASes (from Korea and China)

• Top 20 ASes are responsible for sending nearly 37% of all spam

• Spam filtering efforts might be better if focussed on identifying high-volume, persistent groups of spammers by AS number rather than on blacklisting individual IP addresses.

Page 21: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

21

Distribution across ASesStill about 40% of spam coming from the U.S.

Page 22: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

22

Distribution Across Operating Systems

About 4% of known hosts are non-Windows.

These hosts are responsible for about 8% of received spam.

Page 23: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

23

Persistence

• More than half of the client IPs appear less than twice• 85% of the client IP addresses sent less than 10 emails to the

sinkhole

Page 24: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

24

Effectiveness of Blacklists

• Nearly 80% of all spam received from mail relays appear in at least one of eight blacklists

• > 50% of spam was listed in two or more blacklists

• If spammers use BGP spectrum agility, then 50% of the IP addresses do not appear in any blacklist

• About 30% appear in more than one blacklist

Page 25: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

25

Effectiveness of Blacklists

Page 26: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

26

Effectiveness of Blacklists

Page 27: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

27

Spam From Botnets

Page 28: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

28

Most Bot IP addresses do not return

65% of bots only send mail to a domain once over 18 months

Collaborative spam filtering seems to be helping track bot IP addresses

Lifetime (seconds)

Per

cen

tag

e o

f b

ots

Page 29: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

29

Most Bots Send Low Volumes of Spam

Lifetime (seconds)

Am

ou

nt

of

Sp

amMost bot IP addresses send very little spam, regardless

of how long they have been spamming…

Page 30: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

30

BGP Spectrum Agility

• Log IP addresses of SMTP relays• Correlate BGP route advertisements seen at network

where spam trap is co-located.

A small club of persistent players appears to be using

this technique.

Common short-lived prefixes and ASes

61.0.0.0/8 4678 66.0.0.0/8 2156282.0.0.0/8 8717

~ 10 minutes

Somewhere between 1-10% of all spam (some clearly intentional,

others might be flapping)

Page 31: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

31

Why Such Big Prefixes?

• Flexibility: Client IPs can be scattered throughout dark space within a large /8– Same sender usually returns with different IP

addresses

• Visibility: Route typically won’t be filtered (nice and short)

Page 32: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

32

Characteristics of IP-Agile Senders

• IP addresses are widely distributed across the /8 space

• IP addresses typically appear only once at the sinkhole

• Depending on which /8, 60-80% of these IP addresses were not reachable by traceroute when spot-checked

• Some IP addresses were in allocated, albeit unannounced space

• Some AS paths associated with the routes contained reserved AS numbers

Page 33: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

33

Length of short-lived BGP epochs

Page 34: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

34

The Effectiveness of Blacklisting

~80% listed on average

~95% of bots listed in one or more blacklists

Number of DNSBLs listing this spammer

Only about half of the IPs spamming from short-lived BGP are listed in any blacklistF

ract

ion

of

all

spam

rec

eive

d

Spam from IP-agile senders tend to be listed in fewer blacklists

Page 35: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

35

Harvesting

• Tracking Web-based harvesting– Register domain, set up MX record– Post, link to page with randomly generated email

addresses– Log requests– Wait for spam

Page 36: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

36

Harvesting

• Domain was registered on November 19, 2005• SMTP server was setup on December 6, 2005• Email harvesting occurred on January 16, 2006• First spam came on January 20, 2006 (phishing

attack)• The harvester and the spammers were not in the

same AS• Attack was coordinated between two machines

– One machine sent to half of the addresses listed alphabetically, the other machine to the other half

Page 37: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

37

Spam Mitigation

• Spam filtering requires a better notion of host identity– IP address is not enough to identify an host

• IP address range based filtering is more effective than single IP address based filtering– Some IP address ranges send more spam than others

• Securing the Internet routing is necessary for bolstering identity and traceability of email senders– BGP spectrum agility method can be used more

• Network-level properties can make current spam filters more effective

Page 38: Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

38

Conclusion

• A detailed study examining network level properties• Reveals botnet characteristics in sending spam• Shows the existence of BGP spectrum agility method• Datasets are substantial, but not comprehensive

– Comparison between spam and legitimate mail is questionable– Comparison between spam and legitimate mail of a single

domain, repeating this using several domains can be better?– Analysis of IP addresses and address ranges fails to draw

important conclusions

• Does not analyze other types of spam, apart from email spam.

• Data Analysis from a single vantage point