DETECTING PEER-TO-PEER
BOTNETS BY TRACKING
CONVERSATIONS
Pratik Narang1, Subhajit Ray1, Chittaranjan Hota1 and Venkat Venkatakrishnan2
1BITS Pilani, Hyderabad campus, India 2University of Illinois at Chicago
Introduction
• What’s a bot ?
• What’s a botnet ?
• What’s a Peer-to-Peer based botnet ?
Traditional Botnets
Bot-Master
Peer-to-Peer Botnets
Bot-Master
P2P: Uses and Misuses
Previous work
• Intial work with signature-based approaches
• Evaded by bots using encryption
• Recent work – analysis of network behavior
• Most of it uses 5-tuple ‘flow-based’ approach <Source IP, Dest. IP, Source port, Dest. Port, Protocol>
• Great success in Internet traffic classification
• Doesn’t suit the needs of P2P traffic
Identifying P2P traffic
• Modern P2P apps and bots randomize ports, operate on
TCP as well as UDP
• P2P traffic has bi-directional nature
• E.g.- BitTorrent- seeders and leechers
• Thus, traditional flow-based approaches may give a false
view of network communication
• Notion of a conversation more suited to P2P
• Who is talking to whom ? • Irrespective of protocol, port, etc.
P2P apps v/s P2P bots
Applications:
• A human user-‘bursty’
traffic
• High volume of data
transfers seen
• Small inter-arrival time of
packets seen in apps
Botnets:
• Automated/scripted
commands
• Low in volume,
high in duration
• Large inter-arrival time of
packets seen in stealthy
bots
PeerShark: Overview
Conversation Creation Module
Conversation Aggregation
Module
Classification Module
Packet Filtering Module
FLOWGAP initial
FLOWGAP
Packets useful for our system Packets discarded by our system (Corrupted or missing headers)
Conversations classified as benign Conversations classified as malicious
Approach
• Parse network traces, discard corrupted packets
• Create ‘conversations’, identified by the tuple <IP1,IP2> and
an initial FLOWGAP parameter
• Aggregate conversations again – this time with a higher
FLOWGAP parameter
• To be decided by Network Admin based on understanding of the
network
• Useful for detecting slow and stealthy bots
Approach
• For each tuple, extract 4 features : – The duration of the conversation
– The number of packets exchanged in the conversation
– The volume of the conversation (no. of bytes)
– The Median value of the inter-arrival time of packets in the conversation
• Hunt for long-lived, stealthy conversations
• Categorize P2P apps & bots with the features
above, using supervised machine learning
approaches
Dataset
P2P app name Used for? Type of data/Size of data
eMule P2P file sharing application pcap file/19 GB
uTorrent P2P file sharing application pcap file/33 GB
P2P botnet name What it does? Type of data/Size of data
Storm Email Spam pcap file/ 4.8 GB
Waledac Email spam, password stealing pcap file/ 1.1 GB
Results
BayesNet J48 Adaboost with REP
tree
TP FP ROC TP FP ROC TP FP ROC
eMule 0.929 0.012 0.996 0.964 0.012 0.987 0.93 0.021 0.993
Storm 0.988 0.009 0.999 0.986 0.003 0.996 0.979 0.004 0.999
Waledac 0.989 0.01 0.999 0.988 0.005 0.995 0.97 0.009 0.998
uTorrent 0.947 0.019 0.996 0.965 0.012 0.989 0.943 0.025 0.994
Avg. 0.96325 0.0125 0.9975 0.97575 0.008 0.99175 0.9555 0.01475 0.996
90%
91%
92%
93%
94%
95%
96%
97%
98%
99%
100%
BayesNet J48 Adaboost with REP tree
Ove
rall
A
cc
ura
cy (
%)
Code publicly available for review & feedback:
https://github.com/pratiknarang/peershark
Back-up
Limitations & Possible evasions of
PeerShark
• Only built for 2 apps and 2 bots. Any new app/bot will also
get (mis)classified into one of these classes.
• If more than one P2P application (benign or malicious) is
running between two peers, PeerShark will not be able to
correctly classify it.
• Smarter bots which engage in occasional file-sharing with
bot-peers (and thus mimic benign behavior) can evade
PeerShark.
Top Related