2008-7-31 Guofei Gu BotMiner BotMiner: Clustering Analysis of Network Traffic for Protocol- and...
-
Upload
jordan-hammond -
Category
Documents
-
view
214 -
download
0
Transcript of 2008-7-31 Guofei Gu BotMiner BotMiner: Clustering Analysis of Network Traffic for Protocol- and...
2008-7-31 Guofei Gu BotMiner
BotMiner: Clustering Analysis of Network Traffic for
Protocol- and Structure-Independent
Botnet DetectionGuofei Gu1,2, Roberto Perdisci3, Junjie
Zhang1, and Wenke Lee1
1Georgia Tech 3Damballa, Inc.2Texas A&M University
2
Roadmap
• Introduction– Botnet problem– Challenges for botnet detection– Related work
• BotMiner– Motivation– Design– Evaluation
• Conclusion
Roadmap
3
What Is a Bot/Botnet?
• Bot– A malware instance that runs autonomously and automatically on
a compromised computer (zombie) without owner’s consent– Profit-driven, professionally written, widely propagated
• Botnet (Bot Army): network of bots controlled by criminals– Definition: “A coordinated group of malware instances that are
controlled by a botmaster via some C&C channel”– Architecture: centralized (e.g., IRC,HTTP), distributed (e.g., P2P)– “25% of Internet PCs are part of a botnet!” ( - Vint Cerf)
bot
C&C
Botmaster
IntroductionBotMiner
Conclusion
Botnet ProblemChallenges for Botnet DetectionRelated Work
4
Botnets are used for …
• All DDoS attacks
• Spam
• Click fraud
• Information theft
• Phishing attacks
• Distributing other malware, e.g., spyware
IntroductionBotMiner
Conclusion
Botnet ProblemChallenges for Botnet DetectionRelated Work
5
Challenges for Botnet Detection
• Bots are stealthy on the infected machines– We focus on a network-based solution
• Bot infection is usually a multi-faceted and multi-phased process– Only looking at one specific aspect likely to fail
• Bots are dynamically evolving– Static and signature-based approaches may not be
effective
• Botnets can have very flexible design of C&C channels– A solution very specific to a botnet instance is not
desirable
Botnet Problem
Challenges for Botnet DetectionRelated Work
IntroductionBotMiner
Conclusion
6
Why Existing Techniques Not Enough?
• Traditional AV tools– Bots use packer, rootkit, frequent updating to
easily defeat AV tools
• Traditional IDS/IPS– Look at only specific aspect– Do not have a big picture
• Honeypot– Not a good botnet detection tool
IntroductionBotMiner
Conclusion
Botnet Problem
Challenges for Botnet DetectionRelated Work
7
Existing Botnet Detection Work
• [Binkley,Singh 2006]: IRC-based bot detection combine IRC statistics and TCP work weight
• Rishi [Goebel, Holz 2007]: signature-based IRC bot nickname detection
• [Livadas et al. 2006, Karasaridis et al. 2007]: (BBN, AT&T) network flow level detection of IRC botnets (IRC botnet)
• BotHunter [Gu etal Security’07]: dialog correlation to detect bots based on an infection dialog model
• BotSniffer [Gu etal NDSS’08]: spatial-temporal correlation to detect centralized botnet C&C
• TAMD [Yen, Reiter 2008]: traffic aggregation to detect botnets that use a centralized C&C structure
Botnet ProblemChallenges for Botnet Detection
Related Work
IntroductionBotMiner
Conclusion
8
Why BotMiner?
• Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models …
bot
bot
bot
bot
bot
C&C
bot
bot
bot
bot
bot
bot
(a) (b)
IntroductionBotMinerConclusion
MotivationDesignEvaluation
Example: Nugache, Storm, …
9
BotMiner: Protocol- and Structure-Independent Detection
Enterprise-like Network
Horizontal correlation- Bots are for long-term use- Botnet: communication and activities are coordinated/similar
IntroductionBotMinerConclusion
MotivationDesignEvaluation
Internet
10
Revisit the Definition of a Botnet• “A coordinated group of malware instances that
are controlled by a botmaster via some C&C channel”
• We need to monitor two planes– C-plane (C&C communication plane): “who is talking
to whom”– A-plane (malicious activity plane): “who is doing what”
IntroductionBotMinerConclusion
MotivationDesignEvaluation
11
BotMiner Architecture
Scan
Spam
A-Plane Monitor
BinaryDownloading
C-Plane Monitor
Flow Log
C-PlaneClustering
NetworkTraffic
Exploit
...
Activity Log
A-PlaneClustering
Cross-PlaneCorrelation
Reports
IntroductionBotMinerConclusion
Motivation
DesignEvaluation
12
BotMiner C-plane Clustering
• What characterizes a communication flow (C-flow) between a local host and a remote service? – <protocol, srcIP, dstIP, dstPort>
IntroductionBotMinerConclusion
Motivation
DesignEvaluation
13
How to Capture “Talking in What Kind of Patterns”?
• Temporal related statistical distribution information in– BPS (bytes per
second)– FPH (flow per hour)
• Spatial related statistical distribution information in– BPP (bytes per packet)– PPF (packet per flow)
IntroductionBotMinerConclusion
Motivation
DesignEvaluation
14
Two-step Clustering of C-flows
• Why multi-step?
• How?– Coarse-grained clustering
• Using reduced feature space: mean and variance of the distribution of FPH, PPF, BPP, BPS for each C-flow (2*4=8)
• Efficient clustering algorithm: X-means
– Fine-grained clustering• Using full feature space (13*4=52)
• What’s left?
IntroductionBotMinerConclusion
Motivation
DesignEvaluation
15
A-plane Clustering
• Capture “activities in what kind of patterns”
IntroductionBotMinerConclusion
Motivation
DesignEvaluation
16
Cross-plane Correlation
• Botnet score s(h) for every host h
• Similarity score between host hi and hj
• Hierarchical clustering
AiAj
Two hosts in the same A-clusters and in at least one common C-cluster are clustered together
IntroductionBotMinerConclusion
Motivation
DesignEvaluation
17
Evaluation TracesIntroductionBotMinerConclusion
Motivation
Design Evaluation
18
Evaluation Results: False PositivesIntroductionBotMinerConclusion
Motivation
Design Evaluation
19
Evaluation Results: Detection RateIntroductionBotMinerConclusion
Motivation
Design Evaluation
20
Summary and Future Work
• BotMiner– New botnet detection system based on Horizontal
correlation– Independent of botnet C&C protocol and structure– Real-world evaluation shows promising results
• Future work– More efficient clustering, more robust features– New faster detection system using active techniques
• BotMiner: offline correlation, and requires a relatively long time for detection
• BotProbe: fast detection by observing at most one round of C&C
– New real-time solution for very high speed and very large networks
IntroductionBotMiner
Conclusion
Summary & Future Work
Correlation-based Botnet Detection Framework
21
Correlation-based Botnet Detection Framework
Internet
Enterprise-like Network
HorizontalCorrelation
Vertical Correlation
BotHunter(Security’07
)
BotSniffer(NDSS’08)
BotMiner(Security’08
)
Cause-Effect Correlation
BotProbe
Time
IntroductionBotMiner
Conclusion
Summary & Future Work
Correlation-based Botnet Detection Framework
22
Limitation and Discussion
• Evading C-plane monitoring and clustering– Misuse whitelist– Manipulate communication patterns
• Evading A-plane monitoring and clustering– Very stealthy activity– Individualize bots’ communication/activity
• Evading cross-plane analysis– Extremely delayed task
Appendix
23
High-Speed Packet Sampling
• Traffic arrives at high rates– High volume– Some analysis scales with the size of the
input
• Possible approaches– Random packet sampling– Targeted packet sampling
24
Approach
• Idea: Bias sampling of traffic towards subpopulations based on conditions of traffic
• Two modules– Counting: Count statistics of each traffic flow– Sampling: Sample packets based on (1)
overall target sampling rate (2) input conditions
CountingTraffic stream Sampling
Input conditionsInstantaneous
sampling probability
Overall sampling rate
Traffic subpopulations
25
Challenges
• How to specify subpopulations?– Solution: multi-dimensional array specification
• How to maintain counts for each subpopulation?– Solution: rotating array of counting Bloom filters
• How to derive instantaneous sampling probabilities from overall constraints?– Solution: multi-dimensional counter array, and
scaling based on target rates
26
Specifying Subpopulations
• Idea: Use concatenation of header fields (“tupples”) as a “key” for a subpopulation– These keys specify a group of packets that
will be counted together
# base sampling ratesampling_rate = 0.01# number of tuplestuples = 2# number of conditionsconditions = 1# tuple definitionstuple_1 := srcip.dstiptuple_2 := srcip.srcport.dstport# condition : sampling budgettuple_1 in (30, 1] ANDtuple_2 in (0, 5]: 0.5
Count groups of packets with the same source and destination IP address
Count groups of packets with the same source IP, source port, and destination port
27
# base sampling ratesampling_rate = 0.01# number of tuplestuples = 2# number of conditionsconditions = 1# tuple definitionstuple_1 := srcip.dstiptuple_2 := srcip.srcport.dstport# condition : sampling budgettuple_1 in (30, inf] ANDtuple_2 in (0, 5]: 0.5
Sampling Rates for Subpopulations
• Operator specifies– Overall sampling rate– Conditional rate within each class
• Flexsample computes instantaneous sampling probabilities based on this
Sample one in 100 packets on average
Within the 1/100 “budget”, half of sampled packets should come from groups satisfying this condition
28
Examining the Condition
• Biases sampling towards packets from (source IP, destination IP) pairs which– Have sent at least 30 packets– Have sent packets to at least 5 distinct ports
• Application: Portscan
# base sampling ratesampling_rate = 0.01# number of tuplestuples = 2# number of conditionsconditions = 1# tuple definitionstuple_1 := srcip.dstiptuple_2 := srcip.srcport.dstport# condition : sampling budgettuple_1 in (30, inf] ANDtuple_2 in (0, 5]: 0.5
29
Sampling Lookup Table
• Problem: Conditions may not be completely specified
• Solution: Sampling budget lookup table– Lookup table for allocating sampling “budget”
to each class
# tuple definitionstuple_1 := srcip.dstiptuple_2 := srcip.srcport.dstport# condition : sampling budgettuple_1 in (30, inf] ANDtuple_2 in (0, 5]: 0.5
Deduced values
Next problem: Determining which condition each packet satisfies
30
Counting Subpopulations
• Each packet belongs to a particular range in n-dimensional space
• Counts for each condition– Maintain counter (counting Bloom filter) for
each tuple in every subcondition– Rotate counters to expunge “stale” values
Details:1. Number of counters2. How often to rotate
31
Deriving Instantaneous Sampling Rates
• Problem: Traffic rates are dynamic– Relative fractions of packets in each class
may change
• Solution: Count packets in each sampling class, and adjust probabilities to rebalance according to the lookup table– Instantaneous rate =
overall rate * (target rate) / (actual rate) – Keep track of actual rate using Bloom filter
array and EWMA
32
Example Evaluation: Portscan
• Parameters as above• Nmap scan injected into
ful one-hour trace from department network
Results
Setup
• FlexSample can capture 10x more of the portscan packets if all sampling budget is allocated to portscan class
• Bias can be configured
33
Other Applications
• Recovering unique “conversations” in sampled traffic
• Identifying DDoS Attacks
• Identifying heavy hiters, high-degree nodes, etc.
34
Open Challenges
• Specifying ranges and classes for specific applications
• Scaling the counter array as the number of tuples and ranges increases
• Simultaneously satisfying multiple objectives
35
Next Steps: BotMiner Integration
• Determine – The traffic rates that BotMiner can support for
online analysis– The subpopulations that will yield the highest
detection rates
• Evaluation on traffic traces that contain botnets of interest