Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma...

12
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma [email protected]

Transcript of Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma...

Page 1: Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu.

Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006

N Provos, J McClain, K Wang

Dhruv [email protected]

Page 2: Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu.

A worm is malicious code that propagates over a network, with or without human assistance

• worm authors are looking for new ways to acquire vulnerable targets

• search worms propagates automatically by copying itself to target systems

• search worms can severely harm search engines

• worms send carefully crafted queries to search engines which evade identification mechanisms that assume random scanning

2

Page 3: Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu.

Search worms generate search queries, analyze search results and infects identified targets

• return as many unique targets as possible using a list of prepared queries

• search for popular domains to extract email addresses

• prune search results, remove duplicates, ignore URLs that belong to the search engine itself

• exploit identified targets, reformat URLs to include the exploit and bootstrapping code

3

Page 4: Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu.

MyDoom.O, a type of search worm requires human intervention to spread

4

• spreads via email containing an executable file as an attachment

• searches local hard drive for email addresses• figure below shows the number of infected hosts and

the number of MyDoom.O queries that Google received per second

• Peak scan rate, more than 30,000 queries per second.

Page 5: Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu.

Santy is the first search worm to propagate automatically, without any human

intervention

5

• written in Perl, exploits a bug in phpBB bulletin board system

• after injecting arbitrary code into Web server running phpBB, uses google to search for more targets and connects infected machine to an IRC botnet

• graph below shows a time-line of infected IP addresses for three different Santy variants in December 2004

each variant manages to infect about four thousand differentIP addresses.

Page 6: Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu.

Graphical description of the dependencies between different Santy variants using a

honeypot

6

• shows the dependency between Santy variants from August 2005 to May 2006

• each node is labelled by the filename downloaded to the infected host, two nodes are connected with an edge if their line difference computed via diff is minimal in respect to all other variants

• this graph shows that some variants of Santy have been continuously modified for over six months

Page 7: Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu.

architecture of the worm mitigation system is split into three phases:

7

• Anomaly identification step• Signature generation step• Index based filtering

Page 8: Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu.

Identifying abnormal traffic automatically blocks parts of the worm traffic after observing IP addresses

8

• classify the IP addresses responsible for abnormal traffic

• maintaining a map of frequent words which are used to compute the compound probability for a query

• flag an IP address abnormal which sends too

many low probability queries

Page 9: Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu.

signature generation step generates signatures based on Polygraph

9

• extracts tokens from bad queries to create signatures matching the bad traffic

• hierarchical clustering is used to merge signatures until a predefined false positive threshold is reached

• false positives are computed by matching signatures against a good query set.

• following signature was generated in an experiment token extraction on a cluster of 85 2.4 GHz Intel Xeon machines

GET /search\?q=.*\+-modules&num=[0-9][0-9]+&start=

Page 10: Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu.

Index-based filtering modifies search index to handle multiple search queries mapping to similar result pages

10

• search worm relies on a search engine to obtain a list of potentially vulnerable targets. If the search engine does not provide any vulnerable targets in the

search results, the worm fails to spread

• tag all pages that seem to contain vulnerable information while crawling

• query results are not returned if they have pages from many hosts and when majority of them are tagged as vulnerable

Page 11: Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu.

Conclusion

11

• worms spread by querying a search engine for new targets to infect and uses the information collected by search engines

• signature generation along with anomaly identification is not effective in preventing a worm from spreading

• proposed solution is CPU efficient and is query independent as well as classifies web pages as vulnerable if they belong to an exploitable server or contain potential infection targets

Page 12: Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma dhruvs@usc.edu.

Pros and Cons

• Pros• query independent

index-based filtering

• using word based features(tokenization), Phishing URLs contain several suggestive word tokens.

• Cons • signature-based

approach is a good option if given good seed queries

• cannot find new attacks for which we have no prior knowledge

• lacks a module which could analyze malicious pages to automatically extract the searches which in turn can help in finding vulnerable targets