Integrating BotMiner & SNARE into SMITE
-
Upload
conan-woodard -
Category
Documents
-
view
45 -
download
6
description
Transcript of Integrating BotMiner & SNARE into SMITE
1
Integrating BotMiner & SNARE into SMITE
Nick Feamster and Wenke LeeGeorgia Tech
Students: Shuang Hao, Junjie Zhang
2
Status Report
• Summary of BotMiner and SNARE
• Integration on GaTech campus network
• Preliminary evaluation results
• Next steps
3
SMITE Integration
4
BotMiner: Structure and Protocol Independent
• Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models …
bot
bot
bot
bot
bot
C&C
bot
bot
bot
bot
bot
bot
(a) (b)
5
Definition of a Botnet
• “A coordinated group of malware instances that are controlled by a botmaster via some C&C channel”– Hosts that have similar C&C-like traffic and similar
malicious activities
• We need to monitor two planes– C-plane (C&C communication plane): “who is talking
to whom”– A-plane (malicious activity plane): “who is doing what”
6
BotMiner Architecture
Scan
Spam
A-Plane Monitor
BinaryDownloading
C-Plane Monitor
Flow Log
C-PlaneClustering
NetworkTraffic
Exploit
...
Activity Log
A-PlaneClustering
Cross-PlaneCorrelation
Reports
SensorsAlgorithms
Correlation
7
SNARE: Network-Level Spam Filter
• Single-Packet– AS of sender’s IP– Distance to k nearest senders– Status of email service ports– Geodesic distance– Time of day
• Single-Message– Number of recipients– Length of message
• Aggregate (Multiple Message/Recipient)
8
Test Environment
• Port mirrored from College of Computing network switch– About 300 Mbps
9
Current Status
• Real-time test on college network
• Summary of results– Pipeline runs in real-time (200 to 300 Mbps)– BotMiner & SNARE run in batch mode,
detecting bots/spammers based on data of one day
– Results from 4 days of testing: September 21-24, 2009
10
Metrics
• Volume– N1: raw by pipeline.– N2: raw flows recorded. – N3-B: C-flows. (BotMiner)– N4-S: SMTP flows (SNARE)
• Time– T1: Dumping raw flows– T2-B: Aggregating raw flows to c-flows – T3-B: Clustering and correlation. – T4-S: Feature extraction
(single-packet based)– T5-S: Building classifier
(based on sampled flows)– T6-S: Detection
11
Detection Metrics
• BotMiner– TP: Detection Rate
(6 botnets including HTTP-, IRC-, P2P-based botnets).– FP: False positive rate
• SNARE– TP: (Ground truth from DNSBL)– FP: False positive rate
12
Reducing Flow Volume
• N2 (# of flows recorded) < N1 (# of raw flows)
• Policies for reducing volume– Keep the only flows whose SrcIP is from internal
networks and DstIP is to external networks• For TCP flows, to eliminate flows for scanning, we only
record flows in database which have at least 2 packets in outgoing or incoming direction.
– BotMiner detects scanning/spamming behaviors on raw flows (rather than flow recorded in database)
– SNARE works on SMTP flows
• Discard the flows whose IP appear on the whitelist (e.g., internal major HTTP/DNS)
13
Pipeline Configuration
• Device Info– Box
• Intel(R) Xeon(TM) CPU 3.00GHz
• 2G Memory
• Debian Linux 2.6.16
– NIC informationLink encap:Ethernet HWaddr 00:15:c5:e6:72:96
inet6 addr: 2610:148:1f02:8f00:215:c5ff:fee6:7296/64 Scope:Global
inet6 addr: fe80::215:c5ff:fee6:7296/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
• Pipeline Configuration-pcaplive device=eth1
-addressanalysis
-flow_analyzer dump_period=600 (10 minutes)
14
Volume: Number of Flows
Date N1 (# of raw flows)
N2 (# flows in DB)
N3-B (# of C-flows)
N4-S (# SMTP flows)
2009-09-21 486,636,396 936,397 21,204 307,931
2009-09-22 450,589,989 936,962 29,912 287,695
2009-09-23 380,575,773 869,811 15,796 404,746
2009-09-24 454,792,651 967,945 13,070 404,426
15
BotMiner Evaluation: Time
• All times in minutes
Date T1(dumping raw
flows)
T2-B (flow aggregation)
T3-B(clustering and
correlation)
2009-09-21 900 56 8
2009-09-22 780 41 10
2009-09-23 540 43 6
2009-09-24 606 48 5
16
BotMiner Evaluation: Detection
• The number of the hosts we used to evaluate the false positives is the number of internal hosts in the recorded flows.
Date B-HTTP-I
B-HTTP-II
B-IRC B-spybot
B-sdbot Waldec (p2p)
False positives
2009-09-21 4/4 2/4 4/4 3/4 3/4 3/3 11/889
2009-09-22 4/4 2/4 4/4 3/4 3/4 3/3 10/850
2009-09-23 3/4 2/4 4/4 2/4 2/4 3/3 9/799
2009-09-24 4/4 4/4 4/4 4/4 4/4 3/3 11/801
17
SNARE Evaluation
• Single packet/header features (for initial testing):– AS number– Geodesic distance between the sender and
the recipient– Message size (bytes sent)– Local hour when the email was sent
18
Evaluation of SNARE
• SNARE trains on sampled SMTP flows (in T5-S)• All times in seconds
Date T4-S(feature extraction)
T5-S (model, 10000 samples)
T5-S (model, 30000 samples)
T5-S (model, 50000 samples)
T6-S(detection)
2009-09-21 34.50 73.80 247.61 3857.27 35.59
2009-09-22 32.67 74.79 198.53 3967.38 32.53
2009-09-23 44.87 70.12 184.16 3689.47 45.46
2009-09-24 45.47 68.03 184.12 3731.98 46.50
2) Time for training 50,000 samples (in T5-S) is high, probably because it reaches the physical memory limitations.
1) The detection time (T6-S) is relatively small (note: all SMTP flows)
19
Next Steps
• Optimize the flow dumping process to improve efficiency.
• In the case of SNARE, evaluate with more features.