1 Spam: Why? Chris Kanich Christian Kreibich Kirill Levchenko Brandon Enright Vern Paxson Geoffrey...

1

Spam: Why?Spam: Why?

Chris Kanich

Christian Kreibich

Kirill Levchenko

Brandon Enright

Vern Paxson

Geoffrey M. Voelker

Stefan Savage

+ =

2

What is Computer security?What is Computer security?

3

What is Computer security?What is Computer security?

• Most of computer science is about providing functionality:

User Interface Software Design Algorithms Operating Systems/Networking Compilers/PL Microarchitecture VLSI/CAD

• Computer security is not about functionality• It is about how the embodiment of functionality behaves

in the presence of an adversary• Security mindset – think like a bad guy

My BackgroundMy Background

• Collaborative Center for Internet Epidemiology and Defenses (CCIED)

UCSD/ICSI group created in response to worm threat Very well funded, many strong partners

• Goals Internet epidemiology: measuring/understanding attacks Automated defenses: stopping outbreaks/attacks Economic and legal issues: that other stuff

Many big successes…Many big successes…• 50+ papers, lots of tech transfer, big sytems, etc • Network Telescope

Passive monitor for > 1%of routable Internet addr space

• Potemkin & GQ Honeyfarms Active VM honeypot servers on

>250k IP addresses

• Earlybird On-line learning of new

worm signatures in < 1ms

But… depressing truthBut… depressing truth

We didn’t stop Internet worms, let alone malware,

let alone cybercrime… nor did anyone else.

At best, moved it around a bit.

By any meaningful metric the bad guys are winning…

Mistake: looking at this solely as a technical problem

Key threat transformations Key threat transformations of the 21of the 21stst century century

• Efficient large-scale compromises Internet communications model Software homogeneity User naïveity/fatigue

• Centralized control Makes compromised host a

commodity good Platform economy

• Profit-driven applications Commodity resources

(IP, bandwidth, storage, CPU) Unique resources

(PII/credentials, CD-Keys, address book, etc)

7

DDoS for saleDDoS for sale• Emergence of economic engine for Internet crime

SPAM, phishing, spyware, etc

• Fluid third party markets for illicit digital goods/services Bots ~$0.5/host, special orders, value added tiers Cards, malware, exploits, DDoS, cashout, etc.

9

• 3.6 cents per bot week

• 6 cents per bot week

• 2.5 cents per bot week

September 2004 postings to SpecialHam.com, Spamforum.biz

>20-30k always online SOCKs4, url is de-duped and updated> every 10 minutes. 900/weekly, Samples will be sent on

> request. Monthly payments arranged at discount prices.

>$350.00/weekly - $1,000/monthly (USD) >Type of service: Exclusive (One slot only)

>Always Online: 5,000 - 6,000>Updated every: 10 minutes

>$220.00/weekly - $800.00/monthly (USD)>Type of service: Shared (4 slots)

>Always Online: 9,000 - 10,000>Updated every: 5 minutes

Botnet Spammer Rental RatesBotnet Spammer Rental Rates

Bot PayloadsBot Payloads

Spamalytics11

Key structural asymmetriesKey structural asymmetries

• Defenders reactive, attackers proactive Defenses public, attacker develops/tests in private Arms race where best case for defender is to “catch up”

• New defenses expensive, new attacks cheap Defenses sunk costs/business model,

attacker agile and not tied to particular technology

• Low risk to attacker, high reward to attacker Minimal deterrence Functional anonymity on the Internet; very hard to fix

• Defenses hard to measure, attacks easy to measure Few security metrics (no “evidence-based” security),

attackers measure monetization which drives attack quality

12

Revisiting the problem Revisiting the problem • We tend to think about this in terms of technical means for

securing computer systems• Most of 50-100B IT budget on cyber security is spent on

securing the end host AV, firewalls, IDS, encryption, etc… Single most expensive front to secure Single hardest front to secure

• But are individual end hosts valuable to bad guys? Maybe $1.50? Even less in bulk… not a pain point

• What instead? Economically informed strategies• Identify and attack economic bottlenecks in value chain• This means understanding the return-on-investment for bad guys

13

Today: the spam problemToday: the spam problem

• We tend to focus on the costs of spam > 100 Billion spam emails sent every day [Ironport] > $1B in direct costs – anti-spam products/services [IDC] Estimates of indirect costs (e.g., productivity) 10-100x more

• But spam exists only because it is profitable• Someone is buying! (though no one has admitted it to me…)

• Our goal Understand underlying economic support for spam

14

History of the History of the spam business modelspam business model

• Direct Mail: origins in 19th century catalog business

Idea: send unsolicited advertisements to potential customers

Rough value proposition:Delivery cost < (Conversion rate * Marginal revenue)

• Modern direct mail (> $60B in US) Response rate: ~2.5% (mean per DMA) CPM (cost per thousand) = $250 - $1000

• Spam is qualitatively the same…

15

… … but quantitatively differentbut quantitatively different

• Advantages of e-mail direct marketing No printing cost Legitimate delivery cost low

(outsourced price ~ $0.001/message [Get Response]) Dominated by production & lead generation cost (i.e. mailing list) But this is for spam as a legal marketing vehicle… a minority

• Spam as marketing/bait for criminal enterprises (scams)

Mailing lists → ε (purchase/steal/harvest) <$10/M retail Delivery cost → ε (botnet-based delivery) <$70M retail

16

Courtesy Stuart Brownmodernlifisrubbish.co.uk

Anatomy of a modern Pharma Anatomy of a modern Pharma spam campaignspam campaign

Estimating spam profitsEstimating spam profits

• Recall key basic inequality:

(Delivery Cost) < (Conversion Rate) x (Marginal Revenue)

• We have some handle on two of these (e.g., [Franklin07]) Delivery cost to send spam

» Outsourced cost: retail purchase price < $70/M addrs» In-house cost: development/management labor

Marginal revenue

» Average pharma sale of $100, affiliate commissions ≈ 50%

• Conversion rate is fundamentally different• We don’t know; estimates vary by orders of magnitude

20

The measurement conundrumThe measurement conundrum

• No accident that we lack good conversion measures• Its easy to measure spam from a receiver viewpoint

Which MTA sent it to me? What does the content contain? Where do the links go? etc…

• But the key economic issue is only known by the sender Conversion rate * marginal profit = revenue per msg sent

• What to do? Interview spammers? (0.00036) [Carmack03] Guess? (“millions of dollars a day”) [Corman08]) Send lots of spam and see who clicks on links? (gold standard)

21

Botnet infiltrationBotnet infiltration

• Key idea: distributed C&C is a vulnerability Botnet authors like de-centralized communications for

scalability and resilience, but… … to do so, they trust their bots to be good actors If you can modify the right bots you can observe and influence

actions of the botnet

• Rest of today: preliminary results from a case study Infiltrated Storm P2P botnet, instrumented ~500M spams Delivery rates (anti-spam impacts on delivery) Click through (visits to spam advertized sites) Conversions (purchases and purchase amounts)

22

Kanich, Kreibich, Levchenko, Enright, Paxson, Voelker and Savage, Spamalytics: an Empirical Analysis of Spam Marketing Conversion, ACM CCS 2008

How this works in detailHow this works in detail

• Botnet Infiltration Overview of the Storm peer-to-peer botnet

» How does Storm work? Mechanics of botnet spamming

» How can Storm’s C&C be instrumented?

• Economic issues Using a botnet for measurement

» How to measure conversion via C&C interposition Measuring spam delivery pipeline

» What happens to spam from when a bot sends it…» …to when a user clicks “purchase” at a scam site?

23

StormStorm

• Storm is a well-known peer-to-peer botnet• Storm has a hierarchical architecture

Workers perform tasks (send spam, launch DDoS attacks, etc.) Proxies organize workers, connect to HTTP proxies Master servers controlled directly by botmaster

• Workers and proxies are compromised hosts (bots) Use a Distributed Hash Table protocol (Overnet) for rendezvous Roughly 20,000 actives bots at any time in April [Kanich08]

• Master servers run in “bullet-proof” hosting centers Communicate with proxies and workers via command and

control (C&C) protocol over TCP

Spamalytics 24Kanich, Levchenko, Enright, Voelker and Savage, The Heisenbot Uncertainty Problem: Challenges in Separating Bots from Chaff, LEET 2008.

Storm architectureStorm architecture

25

Dr. Evil

Masterservers

Proxybots

Workerbots

Storm setupStorm setup

• New bots decide if they are proxies or workers Inbound connectivity? Yes, proxy. No, worker.

• Proxies advertise their status via encrypted variant of Overnet DHT P2P protocol

Master sends “Breath of Life” packet to new proxies to tell them IP address of master servers (RSA signature)

Allows master servers to be mobile if necessary

• Workers use Overnet to find proxies (tricky: time-based key identifies request)

• Workers send to proxy, proxy forwardsto one of master servers in “safe” data center

• Bottom line: imperfect, but remarkably sophisticated

26

Storm spam campaignsStorm spam campaigns

Workers request “updates” to send spam [Kreibich08] Dictionaries: names, domains, URLs, etc. Email templates for producing polymorphic spam

» Macros instantiate fields: %^Fdomains^% from domains dict Lists of target email addresses (batches of 500-1000 at a time)

Workers immediately act on these updates Create a unique message for each email address Send the message to the target Report the results (success, failure) back to proxies

Many campaign types Self-propagation malware, pharmaceutical, stocks, phishing, …

27

Kreibich, Kanich, Levchenko, Enright, Voelker, Paxson and Savage, On the Spam Campaign Trail, LEET 2008.

Storm templatesStorm templates

Example Storm spam template and instantiation

28

Macro expansion to insert target email address

Received: from %^C0%^P%^R2-6^%:qwertyuiopasdfghjklzxcvbnm^%.%^P%^R2-6^%:qwertyuiopasdfghjklzxcvbnm^%^% ([%^C6%Î^%.%Î^%.%Î^%.%Î^%^%]) by %Â^% with Microsoft SMTPSVC(%^Fsvcver^%); %^D^%From: <%^Fnames^%@%^Fdomains^%>To: <%^0^%>Subject: Say hello to bluepill!<%^Fpharma_links^%>

Received: from %^C0%^P%^R2-6^%:qwertyuiopasdfghjklzxcvbnm^%.%^P%^R2-6^%:qwertyuiopasdfghjklzxcvbnm^%^% ([%^C6%Î^%.%Î^%.%Î^%.%Î^%^%]) by %Â^% with Microsoft SMTPSVC(%^Fsvcver^%); %^D^%From: <%^Fnames^%@%^Fdomains^%>To: <%^0^%>Subject: Say hello to bluepill!<%^Fpharma_links^%>

Received: from auz.xwzww ([132.233.197.74]) by dsl-189-188-79-63.prod-infinitum.com.mx with Microsoft SMTPSVC(5.0.2195.6713); Wed, 6 Feb 2008 16:33:44 -0800From: <[email protected]>To: <[email protected]> Subject: Say hello to bluepill!spammerdomain2.com

Received: from auz.xwzww ([132.233.197.74]) by dsl-189-188-79-63.prod-infinitum.com.mx with Microsoft SMTPSVC(5.0.2195.6713); Wed, 6 Feb 2008 16:33:44 -0800From: <[email protected]>To: <[email protected]>Subject: Say hello to bluepill!spammerdomain1.com

Received: from auz.xwzww ([132.233.197.74]) by dsl-189-188-79-63.prod-infinitum.com.mx with Microsoft SMTPSVC(5.0.2195.6713); Wed, 6 Feb 2008 16:33:44 -0800From: <[email protected]>To: [email protected]: Say hello to bluepill!spammerdomain2.com

Storm in actionStorm in action

1224704030~!pharma_links~!spammerdomain1.comspammerdomain2.comspammerdomain3.com…

1224720409~!names~!eduardorafaelkatierachrisjohnny…

[email protected]@[email protected]@icir.org...

30

Received: from dkjs.sgdsz ([132.233.197.74]) by dsl-189-188-79-63.prod-infinitum.com.mx with Microsoft SMTPSVC(5.0.2195.6713); Wed, 6 Feb 2008 16:33:44 -0800From: <[email protected]>To: <[email protected]>Subject: Say hello to bluepill!spammerdomain3.com

Interposition on StormInterposition on Storm

• We interpose on Storm command and control network Reverse-engineered Storm protocols, communication

scrambling, rendezvous mechanisms [Kanich08] [Kreibich08]

• Run unmodified Storm proxy bots in VMs Key issue: Real bot workers connect to our proxies

• Insert rewriting proxies between workers & proxies Transparently interpose on messages between Storm proxies

and their associated Storm workers Generic engine for rewriting traffic based on rules

• Interpose to control site URLs and spam delivery Which sites the spam advertises (replace urls in template links) To whom spam gets sent (replace addrs in target list)

31

spammerdomain.com

spammerdomain2.com

spammerdomain3.com

Modifying template linksModifying template links

newdomain1.com

newdomain2.com

newdomain3.com

Received: from dkjs.sgdsz ([132.233.197.74]) by dsl-189-188-79-63.prod-infinitum.com.mx with Microsoft SMTPSVC(5.0.2195.6713); Wed, 6 Feb 2008 16:33:44 -0800From: <[email protected]>To: <[email protected]>Subject: Say hello to bluepill!spammerdomain3.com

Received: from dkjs.sgdsz ([132.233.197.74]) by dsl-189-188-79-63.prod-infinitum.com.mx with Microsoft SMTPSVC(5.0.2195.6713); Wed, 6 Feb 2008 16:33:44 -0800From: <[email protected]>To: <[email protected]>Subject: Say hello to bluepill!newdomain2.com

• Create two sites that mirror actual sites in spam E-card (self-propagation) and pharmaceutical Replace dictionaries with URLs to our sites

• E-card (self-prop) site Link to benign executable that POSTs to our server Log all POSTs to track downloads and executions

• Pharma site Log all accesses up through clicks on “purchase” Track the contents of shopping carts

• Strive for verisimilitude to remove bias (spam filtering) Site content is similar, URLs have same format as originals, …

Measuring click-throughMeasuring click-through

33

Aside: having funAside: having fun

34

Measuring DeliveryMeasuring Delivery

• Create various test email accounts At Web mail providers: Hotmail, Yahoo!, Gmail Behind a commercial spam filtering appliance As SMTP sinks: accept every message delivered

• Put email addresses in Storm target delivery lists

• Log all emails delivered to these addresses Both labeled as spam (“Junk E-mail”) and in inbox

35

Ethical contextEthical context

• Consequentialism• First, do no harm (users no worse off than before)

We do not send any spam» Proxies are relays, worker bots send spam

We do not enable additional spam to be sent» Workers would have connected to some other proxy

We do not enable spam to be sent to additional users» Users are already on target lists, only add control addresses

• Second, reduce harm where possible Our pharma sites don’t take credit card info Our e-card sites don’t export malicious code

36

Legal contextLegal context

• Warning: IANAL (we had lawyers involved though)• CAN*SPAM

• Subject to strong definition of “initiator”; we don’t fit it

• ECPA• Our proxy is directly addressed by worker bots

(“party to” communication carve out)

• CFAA• We do not contact worker bots, they contact us

(“unauthorized access”?)• We do not cause any information to be extracted or any

fundamentally new activity to take place • Hard to find a good theory of damages

(functionally indistinguishable -- consequentialism)37

But…But…

• In this kind of work there is little precedent• No agency to get permission; no way to get indemnity• Lawyers tend to say “I believe this activity has low risk

of…”

• We communicate our activities to a lot of people• Security researchers in industry, academia• Affected network operators/registrars• Law enforcement• FTC

38

Aside: Spam is hardAside: Spam is hard

• Lots of operational complexities to a study like this

• Net Ops notices huge Storm infestation• Address space cleanliness • Registrar issues

GoDaddy TUCOWS

• Abuse complaints• Spam site support e-mail• Anti-virus signatures• Law-enforcement

39

Spam conversion experimentSpam conversion experiment

• Experimented with Storm March 21 – April 15, 2008• Instrumented roughly 1.5% of Storm’s total output

40

Pharmacy Campaign

E-card Campaigns

Postcard April Fool

Worker bots 31,348 17,639 3,678

Emails 347,590,389 83,665,479 38,651,124

Duration 19 days 7 days 3 days

Spam pipelineSpam pipeline

41

83.6 M

347.5M

21.1M (25%)

82.7M (24%)

3,827 (0.005%)

10,522 (0.003%)

316 (0.00037%)

28 (0.000008%)

---

Pharma: 12 M spam emails for one “purchase”Pharma: 12 M spam emails for one “purchase”

Sent MTA Visits ConversionsInbox

40.1 M 10.1M (25%) 2,721 (0.005%) 225 (0.00056%)

E-card: 1 in 10 visitors execute the binaryE-card: 1 in 10 visitors execute the binary

Spam filtering software• The fraction of spam delivered into user inboxes

depends on the spam filtering software used Combination of site filtering (e.g., blacklists) and

content filtering (e.g., spamassassin)

• Difficult to generalize, but we can use our test accounts for specific services

Fraction of spam sent that was delivered to inboxes

Effects of Blacklisting (CBL Feed)

Unused

Effective

Other filtering

Response rates by country

Two orders of magnitude

No large aberrations based on email topic

The spammer’s bottom lineThe spammer’s bottom line

• Recall that we tracked the contents of shopping carts• Using the prices on the actual site, we can estimate the

value of the purchases 28 purchases for $2,731 over 25 days, or $100/day ($140 active)

• We only interposed on a fraction of the workers Connected to approx 1.5% of workers Back-of-the-envelope (be very careful)

$7-10k/day for all, or ~$3M/year With a 50% affiliate commission, $1.5M/year revenue

• For self-propagation Roughly 3-9k new bots/day

42

SummarySummary

• First measurement study of spam marketing conversion• Infiltrated Storm botnet, interposed on spam campaigns

Rewriting proxies take advantage of Storm reverse-engineering

• Pharmaceutical spam 1 in 12M conversion rate $1.5M/yr net revenue Profitability possibly tied to infrastructure integration Sent via retail market, this campaign would not be profitable Ergo: in-house delivery (Storm owners = pharma spammers)

• Self Propagation spam 250k spam emails per infection Social engineering effective: one in ten visitors run executable

43

What are we doing now?What are we doing now?• More analysis

Extending infiltration to ~15 botnets; comparative analysis Characteristic fingerprints of different spammers/crews Characterizing supply chain relationships

» Broadly order on-line “viagra”, rolexes, etc» Cluster credit processor/merchant, mailing materials, etc» Cluster on manufacturing fingerprint (e.g., NIR spectroscopy)

Measuring monetization by purposely losing credit cards

• Proactive defenses Automated filter generation from templates Automated classification of URLs Automated vision-based detection of phishing pages

44

Security courses at UCSDSecurity courses at UCSD

• CSE107 – Introduction to modern cryptography• CSE127 – Computer Security

• But…

• Security plays a role in virtually all of your courses

45

Questions?Questions?

Yahoo! 46

Collaborative Center for Internet Epidemiology and Defenses

http://ccied.org

What’s next:What’s next: Value-chain characterization Value-chain characterization

• Value-chain characterization Empirical map establishing links between criminal

groups and enablers» Affiliate programs, botnets, fast flux networks, registrars,

payment processors, SEO/traffic partners, fulfillment/manufacturing

» Data mining across huge data feeds we’ve built or established relationships for

Social network among criminal groups» Semantic Web mining

New: Fulfillment measurementsNew: Fulfillment measurements

• About to start purchasing wide range of spam-advertized products Watches Pharma Traffic

• Cluster purchases based on Merchant and processor Packaging (postmark, forensic analysis of paper) Artifacts of manufacturing process (e.g., FT-NIR on drugs)

48

• Observations– Modest number of bots send most spam

– Virtually all bots use templates with simple rules to describe polymorphism

– Templates+dictionaries ≈ regex describing spam to be

generated

– If we can extract or infer these from the botnets, we have a perfect filter for all the spam generated by the botnet

– Very specific filters, extremely low FP risk

New: Bot-based spam filter generationNew: Bot-based spam filter generation

http://www.marshal.com/trace/spam_statistics.asp

random letters and numbers

phrases from a dictionary

Early results (last week)0 FP with 50 examples

0 FN on Storm with 500 examples

Still tuning for other botnets

Spare slides

Removing crawlers/honeyclientsRemoving crawlers/honeyclients

• Anyone can send email to our accounts or visit our Web sites, potentially muddying the waters

Use various heuristics to validate the logs

• Validate spam in mailboxes was sent by us Spam from other campaigns, bounce messages, etc. Subject line matches our campaign, URL from our dictionary

• Validate Web accesses were by users in response Sites with links in spam are immediately crawled by Google, A/V

vendors, etc. Special 3rd-level DNS names, special url encoding Ignore hosts that access robots.txt, don’t load javascript,

don’t load flash, don’t load images, many malformed requests

52

Pharma and e-card conversionsPharma and e-card conversions

53

Who is targeted?Who is targeted?

54

Top 20 domains Many Web mail & broadband

providers, but very long tail Campaigns have nearly identical

distributions Same scammers, or target

lists sold to multiple scammers

1 Spam: Why? Chris Kanich Christian Kreibich Kirill Levchenko Brandon Enright Vern Paxson Geoffrey...

Documents

Transcript of 1 Spam: Why? Chris Kanich Christian Kreibich Kirill Levchenko Brandon Enright Vern Paxson Geoffrey...