Download - PeerShark - Detecting Peer-to-Peer Botnets by Tracking Conversations

DETECTING PEER-TO-PEER

BOTNETS BY TRACKING

CONVERSATIONS

Pratik Narang1, Subhajit Ray1, Chittaranjan Hota1 and Venkat Venkatakrishnan2

1BITS Pilani, Hyderabad campus, India 2University of Illinois at Chicago

Introduction

• What’s a bot ?

• What’s a botnet ?

• What’s a Peer-to-Peer based botnet ?

Traditional Botnets

Bot-Master

Peer-to-Peer Botnets

Bot-Master

P2P: Uses and Misuses

//upload.wikimedia.org/wikipedia/commons/1/1f/GNUnet_logo.svg

Previous work

• Intial work with signature-based approaches

• Evaded by bots using encryption

• Recent work – analysis of network behavior

• Most of it uses 5-tuple ‘flow-based’ approach <Source IP, Dest. IP, Source port, Dest. Port, Protocol>

• Great success in Internet traffic classification

• Doesn’t suit the needs of P2P traffic

Identifying P2P traffic

• Modern P2P apps and bots randomize ports, operate on

TCP as well as UDP

• P2P traffic has bi-directional nature

• E.g.- BitTorrent- seeders and leechers

• Thus, traditional flow-based approaches may give a false

view of network communication

• Notion of a conversation more suited to P2P

• Who is talking to whom ? • Irrespective of protocol, port, etc.

P2P apps v/s P2P bots

Applications:

• A human user-‘bursty’

traffic

• High volume of data

transfers seen

• Small inter-arrival time of

packets seen in apps

Botnets:

• Automated/scripted

commands

• Low in volume,

high in duration

• Large inter-arrival time of

packets seen in stealthy

bots

PeerShark: Overview

Conversation Creation Module

Conversation Aggregation

Module

Classification Module

Packet Filtering Module

FLOWGAP initial

FLOWGAP

Packets useful for our system Packets discarded by our system (Corrupted or missing headers)

Conversations classified as benign Conversations classified as malicious

Approach

• Parse network traces, discard corrupted packets

• Create ‘conversations’, identified by the tuple <IP1,IP2> and

an initial FLOWGAP parameter

• Aggregate conversations again – this time with a higher

FLOWGAP parameter

• To be decided by Network Admin based on understanding of the

network

• Useful for detecting slow and stealthy bots

Approach

• For each tuple, extract 4 features : – The duration of the conversation

– The number of packets exchanged in the conversation

– The volume of the conversation (no. of bytes)

– The Median value of the inter-arrival time of packets in the conversation

• Hunt for long-lived, stealthy conversations

• Categorize P2P apps & bots with the features

above, using supervised machine learning

approaches

Dataset

P2P app name Used for? Type of data/Size of data

eMule P2P file sharing application pcap file/19 GB

uTorrent P2P file sharing application pcap file/33 GB

P2P botnet name What it does? Type of data/Size of data

Storm Email Spam pcap file/ 4.8 GB

Waledac Email spam, password stealing pcap file/ 1.1 GB

Results

BayesNet J48 Adaboost with REP

tree

TP FP ROC TP FP ROC TP FP ROC

eMule 0.929 0.012 0.996 0.964 0.012 0.987 0.93 0.021 0.993

Storm 0.988 0.009 0.999 0.986 0.003 0.996 0.979 0.004 0.999

Waledac 0.989 0.01 0.999 0.988 0.005 0.995 0.97 0.009 0.998

uTorrent 0.947 0.019 0.996 0.965 0.012 0.989 0.943 0.025 0.994

Avg. 0.96325 0.0125 0.9975 0.97575 0.008 0.99175 0.9555 0.01475 0.996

90%

91%

92%

93%

94%

95%

96%

97%

98%

99%

100%

BayesNet J48 Adaboost with REP tree

Ove

rall

A

cc

ura

cy (

%)

Code publicly available for review & feedback:

https://github.com/pratiknarang/peershark





Back-up

Limitations & Possible evasions of

PeerShark

• Only built for 2 apps and 2 bots. Any new app/bot will also

get (mis)classified into one of these classes.

• If more than one P2P application (benign or malicious) is

running between two peers, PeerShark will not be able to

correctly classify it.

• Smarter bots which engage in occasional file-sharing with

bot-peers (and thus mimic benign behavior) can evade

PeerShark.