Motivation: How to stop network abuse given a set of policies?

1
Nabs: A System for Detecting Resource Abuses via Characterization of Flow Content Type Kulesh Shanmugasundaram, Mehdi Kharrazi, and Nasir Memon Motivation: •How to stop network abuse given a set of policies? •No encrypted traffic on network, besides SSH and HTTPS •No outbound video or audio streams •Types of abusers: •Malicious outsider looking for free resources to host illegal activities •Malicious insider running a P2P hub •Ill informed user running application proxy Current solutions: Block ports with firewall Tunnel everything trough an open port Use IDS No signature available for content type Check packet header Packet containing header needs to be captured Not all data types have headers, i.e. text, encrypted data Header could be changed Classification: A multi-class SVM is used Two main ideas with support vector machines: 1-Map the data to another dimension (kernels) 2-Maximize the separating margins For each data segment size: 40% of data is used for training The obtained classifier is tested on the rest of the unseen data System Design Nabs: Use payload statistical properties to classify packets belonging to a set of possible content types. Identifying statistics: Time domain statistics Mean, variance, auto-correlation, entropy Frequency domain statistics Power, mean, variance, and skewness of different frequency bands Higher order statistics used to characterize non-linearity Mean and power of bicoherence magnitude Power of bicoherence phase Skewness, and kurtosis Feature Selection: •Some of the 25 features will have little information gain •Less feature means faster but less system complexity •Used SFFS to identify the more important features •Entropy, Power in the first freq. band, Mean, Variance, mean and variance in the fourth freq. band Deployment: •Monitored Poly network for two weeks •600 flows processed per sec. on average •Flow characterization takes about 945us •Detected abuses: •Unauthorized source of encrypted traffic •9 hosts found being source of encrypted traffic •Waste a p2p application, which encrypts connections was being used •Unauthorized source of multimedia content •16 hosts with heavy multimedia traffic detected •Further investigation revealed them as proxy servers Dataset: Raw: TXT, BMP, WAV Compressed: ZIP, JPEG, MP3, MPEG Encrypted: AES encrypted files 1000 files collected from each category using a P2P network 16384 bytes of data sample from each file, at random location Confusion matrix Accuracy before and after feature selection Avg. Entropy Avg. Skenewss Avg. Power in frequency band Scatter plot of statistics from 4 data categories

description

ISIS Lab. Motivation: How to stop network abuse given a set of policies? No encrypted traffic on network, besides SSH and HTTPS No outbound video or audio streams Types of abusers: Malicious outsider looking for free resources to host illegal activities Malicious insider running a P2P hub - PowerPoint PPT Presentation

Transcript of Motivation: How to stop network abuse given a set of policies?

Page 1: Motivation: How to stop network abuse given a set of policies?

Nabs: A System for Detecting Resource Abuses via Characterization of Flow Content TypeKulesh Shanmugasundaram, Mehdi Kharrazi, and Nasir Memon

Motivation:•How to stop network abuse given a set of policies?

•No encrypted traffic on network, besides SSH and HTTPS•No outbound video or audio streams

•Types of abusers:•Malicious outsider looking for free resources to host illegal activities•Malicious insider running a P2P hub•Ill informed user running application proxy

Current solutions:• Block ports with firewall

• Tunnel everything trough an open port• Use IDS

• No signature available for content type• Check packet header

• Packet containing header needs to be captured• Not all data types have headers, i.e. text, encrypted data• Header could be changed

Classification:• A multi-class SVM is used• Two main ideas with support vector machines:

1-Map the data to another dimension (kernels)2-Maximize the separating margins

• For each data segment size:• 40% of data is used for training• The obtained classifier is tested on the rest of the unseen data

System Design

Nabs:•Use payload statistical properties to classify packets belonging to a set of possible content types.

Identifying statistics:Time domain statistics

Mean, variance, auto-correlation, entropyFrequency domain statistics

Power, mean, variance, and skewness of different frequency bands

Higher order statisticsused to characterize non-linearityMean and power of bicoherence magnitudePower of bicoherence phaseSkewness, and kurtosis

Feature Selection:•Some of the 25 features will have little information gain•Less feature means faster but less system complexity•Used SFFS to identify the more important features

•Entropy, Power in the first freq. band, Mean, Variance, mean and variance in the fourth freq. band

Deployment:•Monitored Poly network for two weeks

•600 flows processed per sec. on average•Flow characterization takes about 945us

•Detected abuses:•Unauthorized source of encrypted traffic

•9 hosts found being source of encrypted traffic•Waste a p2p application, which encrypts connections was being used

•Unauthorized source of multimedia content•16 hosts with heavy multimedia traffic detected•Further investigation revealed them as proxy servers

Dataset: Raw: TXT, BMP, WAV Compressed: ZIP, JPEG, MP3, MPEG Encrypted: AES encrypted files 1000 files collected from each category using a P2P

network 16384 bytes of data sample from each file, at random

location

Confusion matrix

Accuracy before and after feature selection

Avg. Entropy Avg. SkenewssAvg. Power infrequency band

Scatter plot of statistics from 4 data categories