Post on 11-Jan-2016
Traffic classification and applications to traffic monitoring
Marco MelliaElectronic and Telecommunication DepartmentPolitecnico di Torino
Email:mellia@tlc.polito.it
1
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Traffic Classification & Measurement Why?
Identify normal and anomalous behavior Characterize the network and its users Quality of service Filtering …
How? By means of passive measurement
2
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 20123
Scenario
Traffic classifier Deep packet inspection Statistical methods
Persistent and scalable monitoring platform Round Robin Database (RRD) Histograms
Internal Clients
EdgeRouter
External Servers
htt
p:/
/tst
at.
tlc.
polit
o.it
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Tstat at a Glance
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Worm and Viruses?
Did someone open a Christmas card? Happy new year to Windows!!
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Anomalies (Good!)Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
New Applications – P2PTVFiorentina 4 - Udinese 2
Inter 1 - Juventus 0
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Megaupload blocked 19/01/12
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
How to monitor traffic?
All previous examples rely on the availability of a CLASSIFIER A tool that can discriminate classes of traffic
Classification: the problem of assigning a class to an observation The set of classes is pre-defined The output may be correct or not
9
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Some terminology
Question: Is this a cat, a rabbit, or a dog?
10
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Some terminology
Question: Is this a cat, a rabbit, or a dog?
11
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
How to compute performance? Confusion matrix
On rows we have the actual class On columns we have the predicted class
Allows to see if some confusion arises
12
Predicted class
Cat Dog Rabbit
Actualclass
Cat 5 3 0
Dog 2 3 1
Rabbit 0 2 11
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
How to compute performance? Confusion matrix
True positive It was classified as a cat, and it was a cat
13
Predicted class
Cat Dog Rabbit
Actualclass
Cat 5 3 0
Dog 2 3 1
Rabbit 0 2 11
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
How to compute performance? Confusion matrix
False negative It was classified NOT as a cat, but it was a cat
14
Predicted class
Cat Dog Rabbit
Actualclass
Cat 5 3 0
Dog 2 3 1
Rabbit 0 2 11
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
How to compute performance? Confusion matrix
True negative It was classified NOT as a cat, and it was NOT a cat
15
Predicted class
Cat Dog Rabbit
Actualclass
Cat 5 3 0
Dog 2 3 1
Rabbit 0 2 11
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
How to compute performance? Confusion matrix
False positive It was classified as a cat, but it was NOT a cat
16
Predicted class
Cat Dog Rabbit
Actualclass
Cat 5 3 0
Dog 2 3 1
Rabbit 0 2 11
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Other metrics
Accuracy: is the ratio of the sum of all True Positives to the sum of all tests, for all classes.
It is biased toward the most predominant class in a data set. Consider for example a test to identify patients
that suffer from a disease that affects 10 patient over 100 tests. The classifier that always returns ``sane'' will have accuracy of 90%.
17
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Predicted class
Cat Dog Rabbit
Actualclass
Cat 5 3 0
Dog 2 3 1
Rabbit 0 2 11
Other metrics
Recall of a class: is the ratio of the True Positives and the sum of True Positives and False Negatives. Recall(cat)=5/(5+3+0)
It is a measure of the ability of a classifier to select instances of the given class from a data set
18
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Other metrics
Precision of a class: is the ratio of True Positives and the sum of True Positives and False Positive Precision(cat) = 5/(5+2+0)
It is a metric that measure how precise is the classifier in labeling only samples of a given class
19
Predicted class
Cat Dog Rabbit
Actualclass
Cat 5 3 0
Dog 2 3 1
Rabbit 0 2 11
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Traffic classification
InternetService Provider
Look at the packets…
Tell me what protocol and/or application
generated them
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
InternetService Provider
Port:
Port: 4662/4672
Port:
Port:
??
? ?
Payload: “bittorrent”
Payload: E4/E5
Payload:
Payload:
?
RTP protocol
Skype Bittorrent
Gtalk eMule
Typical approach: Deep Packet Inspection (DPI)
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
The problem of traffic classification Deep Packet Inspection
Based on looking for some pre-defined payload patterns, deep in the packet
Simple at L2-L4 “if ethertype == 0x0800, then there is an IP packet” Usually done with a set of if-then-else or even switch-
case Ambiguous at L7
TCP port 80 does not mean automatically “protocol HTTP”
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
DPI: Rule-set complexity Practical rule-sets:
Snort, as of November 2007 8536 rules, 5549 Perl Compatible Regular Expressions
OpenDPI as of February 2012 118 protocols
Tstat as of February 2012 Approx 200 classes/services
23
Deep packet inspection
Regular expression matching at line rate
Finite Automata based techniques
=
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Some notes...
Protocol identification… … or application verification?
Skype can use the standard HTTP protocol to exchange data
Is that traffic “Skype” or “HTTP”? Today everything is going over HTTP
Is it Facebook? Twitter? YouTube video? Or HTTP?
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
The question
Which granularity are you interested into ??
25
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Several approaches to traffic classification
Content-based
Packet-based
(e.g., Spatscheck)
Port-based
(stateless)
Statistical methods
Host social behaviour
(e.g., Faloutsos)
Traffic statistics
(e.g., Salgarelli, Baiocchi, Moore, Mellia)
Auto-learning methods (e.g. Bayes)
Preclassified bins
Message-based
Protocol behaviour
(e.g., BinPac, SML)
Traffic classification
Pre-computed or auto-learning signatures
Payload-based
(stateful)
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Some references
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Some references
Some critical overview/tutorial T.T.T.Nguyen, G.Armitage. A survey of techniques for internet traffic
classification using machine learning, Communications Surveys & Tutorials, IEEE, V.10, N.4, pp.56 - 76
H.Kim, KC Claffy, M.Fomenkov, D.Barman, M.Faloutsos, K.Lee. Internet traffic classification demystified: myths, caveats, and the best practices. In Proceedings of the 2008 ACM CoNEXT Conference (CoNEXT '08). ACM, New York, NY, USA, 2008.
For this class: D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. Tofanelli Revealing skype traffic:
when randomness plays with you ACM SIGCOMM Kyoto, JP, ISBN: 978-1-59593-71, 27 August 2007.
A. Finamore, M. Mellia, M. Meo, D. Rossi KISS: Stochastic Packet Inspection 1st TMA Workshop, Aachen, 11 May 2009.
A. Finamore, M. Mellia, M. Meo, D. Rossi, KISS: Stochastic Packet Inspection Classifier for UDP Traffic, IEEE/ACM Transactions on Networking "5", Vol. 18, pp. 1505-1015, ISSN: 1063-6692, October 2010.
G. La Mantia, D. Rossi, A. Finamore, M. Mellia, M. Meo, Stochastic Packet Inspection for TCP Traffic, IEEE ICC, Cape Town, South Africa, 23 May 2010.
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
InternetService Provider
Port:
Port: 4662/4672
Port:
Port:
??
? ?
Payload: “bittorrent”
Payload: E4/E5
Payload:
Payload:
?
RTP protocol
Skype Bittorrent
Gtalk eMule
Typical approach: Deep Packet Inspection (DPI)
It fails more and more:P2P
EncryptionProprietary solution
Many different flavours
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
The Failure of DPI
11.05.2008 12:29 eMule 0.49a released
1.08.2008 20:25 eMule 0.49b released
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Possible Solution: Behavioral Classifier
Phase 1
Feature
Phase 3
Verify
1. Statistical characterization of traffic2. Look for the behaviour of unknown traffic and
assign the class that better fits it3. Check for possible classification mistakes
Phase 2
DecisionTraffic(Known)
(Training) (Operation)
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Behavioural classifiers
Which statistics? Packet size
Average, std, max, min Len of first X pkts
IPG Average, std, max, min IPG of first X+1 pkts
Total size, duration, #data packets From client, from server, from both RTT, #concurrent connection, rtx,
dups, … TCP options, flags, signaling, …
Feature selection?
Which decision process? Ad Hoc Bayesian Neural Networks Decision trees SVM …
Which training set? Supervised techniques
32
The case of Skype
Consider a simple example
33
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Our Goal
Identify Skype traffic Motivations
Operators need to know what is running in their network New business models, provisioning, TE, etc.
Understand user behaviour Traffic characterization, security … It’s fun
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Skype Overview
Skype offers voice, video, chat and data transfer services over IP
Closed design, proprietary solutions P2P technology Proprietary protocols Encrypted communications
Easy to use, difficult to reveal It is the perfect example of DPI failure
No serverNo well-known port…
No standardNo RFC
State-of-the-ArtEncryption/ObfuscationMechanisms
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Our Goal
Identify Skype traffic Voice stream first: both E2E and SkypeOut/In
streams Possible video/chat/file transfers/signaling
Constraints Passive observation of traffic Protocol ignorance
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Three Classifiers
Naïve Bayes Classifier
Payload Based Classifier
Chi Square Classifier
Skype?
TrafficFlow
Skype?
Skype?
AND
Skype?
Phase 1 – try to understand it
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Skype as VoIP Application
Skype selects the voice codec from a list Low bit rate: 10-32 kbps Regular Inter-Packet-Gap (30 ms frames)
Redundancy may be added to mitigate packet loss
Framing may be modified from the original codec one
Multiplexes different source into the same message (voice, video, chat,…)
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Skype Source Model
SkypeMessage
TCP/UDP
IP
Skype Header Formats(What we guess about it)
Can we design a DPI classifier?
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Possible Skype Messages
Signaling and data messages Use TCP, with ciphered payload
Login, lookup, signaling… Data flow
Use UDP whenever possible: payload is encrypted… but some header MUST be exposed
Question: Some header MUST be exposed… Why?!??
Impossibleto exploit.Everything is ciphered
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Possible Skype Messages
Signaling and data messages Use TCP, with ciphered payload
Login, lookup, signaling… Data flow
Use UDP whenever possible: payload is encrypted… but some header MUST be exposed
Source
AES
Receiver
AES
Impossibleto exploit.Everything is ciphered
Unreliable
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Skype Source Model
SkypeMessage
TCP/UDP
IP
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
SoM Format for E2E Messages
Start of Message (SoM) of End2End messages carried by UDP has:
ID: 16 bits long random identifier FUNC: 5 bits long function (multiplexing?),
obfuscated in a Byte
0 8 16 24---------------------| ID | FUNC |---------------------
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012 46
Function Values
0x01 = ??Query message
0x02 = ??Query
0x0d = Data
0x07 = NAK
VoiceVideoChatFile
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
PBC SoM can be used to identify
Skype flows carried by UDP 5bits long signature
Question: Which is the chance that you have a
false positive?
Classic signature based classifier
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
PBC SoM can be used to identify
Skype flows carried by UDP 5bits long signature
Question: Which is the chance that you have a
false positive? We look for 1 string out of 32 possible strings
1/32 of false detection possible Can we improve it by checking multiple packets
Yet we can have a very high false positive rate
Classic signature based classifier
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
PBC
Any other smart way of improving accuracy? Hint: this is UDP
49
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
PBC SoM can be used to identify
Skype flows carried by UDP 5bits long signature
Classic signature based classifier
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
PBC SoM can be used to identify
Skype flows carried by UDP 5bits long signature
IMPROVE: Identify Skype socket address at clients The UDP port is FIXED and not random (as in TCP) Then, look for Skype flows with the same UDP port
It works with UDP only at edge node only
Complex Cannot discriminate VOICE/VIDEO/CHAT/DATA
Classic signature based classifier
Skype Encrypts Traffic
Can we leverage this?
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Skype Source Model
SkypeMessage
TCP/UDP
IP
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Randomness Classifier
Skype encrypts traffic payload looks like random
Some headers are constant (FUNC) Apply randomness test to the payload bits
Chi-Square test:statistic test for random sequences
i
i
E
Ex 22χ
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012 55
CSC
Split the payload into groups
Apply the test on the values assumed at each group Each message is an
observation Some groups will contain
Random bits Mixed bits Deterministic bits
0 8 16 24---------------------| ID | FUNC |---------------------
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
CSC
1
10
100
1000
10000
100000
1e+006
100 1000 10000 100000 1e+006n [pkt]
Deterministic groupRandom group
Mixed group
2χ
Set a threshold
Skype is a VoIP Application
Which are the features that make it different from a bulk download?
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Skype Source Model
SkypeMessage
TCP/UDP
IP
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Which features?
Question: Which features would you select to differentiate a VoIP stream from a data download?
59
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Sample Trace
0 20 40 60 80
100
[Kbp
s]
Average ThroughputBandwidth limit
0 20 40 60 80
100
[ms]
Framing
0 50
100 150 200 250 300
0 30 60 90 120 150 180 210 240 270 300
[Byt
es]
Time [s]
Skype Message Size
RegularIPG
Small/regularpackets
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Naive Baysean Classifier
Simple classifier: based on the a-priori prob, evaluate the a-posteriori prob
How similar is this flow to a Skype voice flow?
What makes “VoIP” traffic different from other traffic? Packet size, i.e., small packets (packet NBC) Inter-Packet-Gap, i.e., small IPG (IPG NBC)
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Design of the behavioral classifier We consider windows of N packets In each window, we check the IPG and the payloadsize
distribution We compare against the expected distribution and
compute a belief One belief for each “mode”
After K windows, take a decision Consider the most likely mode in each windows Average over all K windows to get the average max belief Compare it against a threshold
62
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Skype Naive Bayes Classifier
W(k) W(k+1) W(k+2) W(k+3) W(k+4)
X
Y
max
max
AVG
AVG
min
Bs(k,j)
E[Bs(,j)]
Bt(k) E[Bt]
B
Packet NBCPacket NBCPacket NBC
IPG NBCIPG NBCIPG NBC``
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
NBC over Time
-25
-20
-15
-10
-5
0
0 30 60 90 120 150 180 210 240
Bel
ief
Time
E[PKT] = 90BE[PKT] = 210BE[PKT] = 252B
0
50
100
150
200
[By
tes]
Set a threshold
Results
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Three Classifiers
Naïve Bayes Classifier
Payload Based Classifier
Chi Square Classifier
Skype?
TrafficFlow
Skype?
Skype?
AND
Skype?
UDPbenchmarkdataset
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Scenario
Testbed traces: 100% accuracy Campus LAN@polito
Simple scenario, no P2P, no VoIP Italian ISP
Stiff scenario: lot of P2P, tons of VoIP Results consider
True positive (OK): Skype, and identified False positive (FP): Not Skype, but identified False negative (FN): Skype, but discarded
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
N OK FP FP% FN FN%
PBC 65 (50) --- --- --- ---
NBC 27437 50 27387 73.73% 15 23.08%
CSC 191 57 134 0.36% 8 12.31%
NBC+CSC 51 49 2 0.01% 16 24.62%
TOT >100 37212
Performance Evaluation: UDP
NBC + CSC
Chi Square Classifier
Naïve Based Classifier
Payload Based Classifier
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
CSC Threshold Impact
1
10
100
1000
10000
100000
1e+006
100 1000 10000 100000 1e+006n [pkt]
Deterministic groupRandom group
Mixed group
0
1
2
3
4
5
6
0 100 200 300
FP [%
]
E2E
0 10 20 30 40 50 60 70 80 90
100
0 100 200 300
FN [%
]
CSC Threshold
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
NBC Threshold Impact
-25
-20
-15
-10
-5
0
0 30 60 90 120 150 180 210 240
Belie
f
Time
E[PKT] = 90BE[PKT] = 210BE[PKT] = 252B
0
50
100
150
200
[Byt
es]
0
1
2
3
4
5
-30 -25 -20 -15 -10 -5 0
FP
[%
]
E2E
0
20
40
60
80
100
-30 -25 -20 -15 -10 -5 0
FN
[%
]
Minimum Belief
E2E
Kiss: chi square stocastich classifier orStocastic Packet InspectionGeneralize it
71
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Phase 1
Feature
Phase 3
Verify
Phase 2
DecisionTraffic(Known)
Our Approach
Statistical characterization of bits in a flow
Do NOT look at the SEMANTIC and TIMING… but rather look at the protocol FORMAT
Testc2
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
0 4 8 16 19 24 32
Source Port Destination Port
Sequence Number
Acknowledgment Number
Checksum
Options
Window
Urgent Pointer
HLEN Resv Control flag
Padding
Question: Which protocol is this?
CONSTANT CONSTANT
CONSTANTCONSTANT
CONSTANT
COUNTER
COUNTER
RANDOM
RANDOM
RANDOM
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012 74
CSC
Split the payload into groups
Apply the test on the values assumed at each group Each message is an
observation Some groups will contain
Random bits Mixed bits Deterministic bits
0 8 16 24---------------------| ID | FUNC |---------------------
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Chunking and c2
UDP headerFirst N payload
bytes
C chunks Each of
b bitsc2
1c2
C[ ], … ,
Vector of Statistics
The provides an implicit measure of entropy or randomness
c2
Observeddistribution
Expecteddistribution(uniform)
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Consider a chunk of 2 bits:
0 1 2 3 0 1 2 3 0 1 2 3
RandomValues
DeterministicValue
Counter
Oi
and different beaviour
Ei
Question: Why comparing against a UNIFORM pdf??
77
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
4 bit long chunks: evolution
random
x x x x
c2
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
random
Deterministic )12(2 bN
0 0 0 1
4 bit long chunks: evolutionc2
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
random
deterministic
mixed
x 0 0 0
x 0 x 0
0 x x x
4 bit long chunks: evolutionc2
Question: is it dependent on WHICH bits are fixedAnd WHICH are random???
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
KISS
1
10
100
1000
10000
100000
1e+006
100 1000 10000 100000 1e+006n [pkt]
Deterministic groupRandom group
Mixed group
2χ
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Question: What is this?!?!
2 byte long counter
MSG L2 L1 LSG
MostSignificantGroup
LessSignificantGroup
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Protocol format as seen from thec2
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Statistical characterization of bits in a flow
Decision process Test
Minimum distance / maximum likelihood
c2
Phase 1
Feature
Phase 3
Verify
Phase 2
DecisionTraffic(Known)
Our Approach
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
C-dimension space
c21
c2C[ ], … ,
Iperspace
ClassificationRegions
EuclideanDistance
Support VectorMachine
c2i
c2j
?Class
Class
My Point
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Example considering the c2
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
c2i
c 2j Centroid
Center of mass
Euclidean Distance Classifier
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
c2i
c 2j
True NegativeAre “Far”
True PositivesAre “Nearby”
CentroidCenter of mass
Euclidean Distance Classifier
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
c2i
c2j
False Positives
CentroidCenter of mass
Iper-sphere
Euclidean Distance Classifier
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
c2i
c 2j Centroid
Center of mass
Iper-sphere False negatives
Radius
Euclidean Distance Classifier
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
c2i
c 2j Centroid
Center of mass
Iper-sphere max { True Pos. } min { False Neg. }
Confidence
The distance is a measure of the condifence of the decision
Euclidean Distance Classifier
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Radius
Tru
e P
ositi
ve
– F
alse
pos
itive
How to define the sphere radius?
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Space ofsamples(dim. C)
Kernel function
Space of feature
(dim. ∞)
Kernel functions Move point so that borders
are simple
Support Vector Machine
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Support vectors
Support vectors
Kernel functions Move point so that borders
are simple
Borders are planes Simple surface! Nice math Support Vectors
LibSVM
Support Vector Machine
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Decision Distance from the border Confidence is aprobability
p ( class )
Kernel functions
Borders are planes Simple surface! Nice math Support Vectors
LibSVM
Support Vector Machine
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Performance evaluationHow accurate is all this?
Our ApproachPhase 1
Feature
Phase 3
Verify
Phase 2
DecisionTraffic(Known)
Statistical characterization of bits in a flow
Decision process Test
Minimum distance / maximum likelihood
c2
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Per flow and per endpoint
What are we going to classify? It can be applied to both single flows And to endpoints
Question: Do we assume to monitor ALL packets? Do we assume to monitor since the first
packet?
97
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Per flow and per endpoint
What are we going to classify? It can be applied to both single flows And to endpoints
Question: Do we assume to monitor ALL packets? Do we assume to monitor since the FIRST
packet? NO!
It is robust to sampling It can start from any point
98
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Per flow and per endpoint
99
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Real traffic tracesInternet
Fastweb
Known + Other Training Known Traffic False Negatives Unknown traffic False Positives
Trace
RTPeMuleDNS
Oracle(DPI +Manual )
other
Other UnknownTraffic
1 day long trace
20 GByte diUDP traffic
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Definition of false positive/negative
TrafficOracle (DPI)
eMuleRTP
DNS
Other
Classifing “known”
true positives
false negatives
KISS
true negatives
false positives
Classifing “other”KISS
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Case A Case BRtp 0.08 0.23Edk 13.03 7.97Dns 6.57 19.19
Case A Case B0.00 0.050.98 0.540.12 2.14
Case A Case Bother 13.6 17.01
Euclidean Distance SVM
Case A Case B0.00 0.18
Results
Known traffic(False Neg.)
[%]
Other(False Pos.)
[%]
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Tuning trainset size
%
True positives
False positives
Samples per class
Small training setFor “known”: 70-80 MbyteFor “other”: 300 Mbyte
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
c2
packets
%
True positives
False positives
Tuning Num. of Packets for
(confidence 5%)
Protocols with volumesat least 70-80 pkts per flow
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Real traffic trace
RTP errors are oracle mistakes(do not identify RTP v1)
DNS errors are due to impure training set
(for the oracle all port 53 is DNS traffic)
EDK errors are (maybe) Xbox Live(proper training for “other”)
FN are always below 3%!!!
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
P2P-TV applications
P2P-TV applications are becoming popularThey heavly rely on UDP at the transport protocolThey are based on proprietary protocolsThey are evolving over time very quicklyHow to identify them?... After 6 hours, KISS give you results
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Putting all together
Now with 9 classes 3 different networks
107
Another example of behavioral classifier
Abacus
108
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Abacus: Rationale Applications are like people in a party
room Some prefer brief exchanges with many other
people
Some likes long talks with few other people
“Attitudes” are different across P2P applications...
Some prefer to download small pieces of data from many peers
Some prefer to download all data from almost the same peers
... enough to classify them Observe a host for a given time
Count the number of peers contacted and the number of packets exchanged which represent the attitude
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Abacus signature definition Consider a host X which in a fixed time-
window ΔT = 5s is contacted by N=5 peers Y
i
for each peer Yi count the number of
packets sent to X in ΔT
Consider a set of bins of exponential width
Divide the peers in bins according to the number of exchanged packets
Normalize the bins, i.e. divide for the total number of peers N
The final signature is an empirical probability distribution function
In the example
N=5, bins = (1, 0, 2, 2) Abacus signature (0.2, 0, 0.4, 0.4)
X
Y1
Y2
1 2 3-4 5-8 ...
Y3 Y
5Y
4
9-16
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Signature comparisonP
PL
ive
TV
An
tsJo
ost
So
pC
ast
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Performance evaluationHow accurate is all this?
Our ApproachPhase 1
Feature
Phase 3
Verify
Phase 2
DecisionTraffic(Known)
Statistical characterization of bits in a flowABACUS signatures
Decision process
Supervised machine learning based on SVM
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Hyper-space is partitioned every point is given a label
even “unknown” apps
Need a way to recognize them Define a center for each class
Define a threshold R
Calculate the distance d between the point and the center of the assigned class
If d > R mark the new point as unknown
Bhattacharyya distance BD Distance between p.d.f.
Rejection criterion
B=qp,BD 1 n
=x
xqxp=B1
Training points
R
R
Center of the class
New points
Labeled as“unknown”
Labeled as“green”
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Experimental results
PPLive TVAnts SopCast Joost UnkPPLive 81.66 0.58 9.55 2.32 5.9TVAnts 0.41 98.84 0.15 0.57 0
SopCast 3.76 0.11 89.62 0.32 6.2Joost 2.84 0.55 0.28 89.5 6.9Rea
l val
ue
Abacus signature confusion matrixClassification outcome
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Experimental resultsFor “unknown traffic” the selection of the rejection threshold R is fundamental!
For R~0
low TPR
low FPR
For R~1
high TPR
high FPR
For R=0.5
high TPR
low FPR
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
The Failure of DPI
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Question you should ask yourself Which feature to use?
Are those “portable”? How often should the classifier be retrained? Can it take a decision after some packets?
Open issues Which is a good training set? How to get a valid benchmarking set?
Never trust other classifiers! How to make it work at 100Gb/s? How much traffic can it actually classify (coverage)?
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
How Much Traffic can be Classified?
118
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Interesting research topics
And for TCP? And for HTTP?
How to get Facebook or Twitter Over HTTPS? When served by the same CDN?
And for the application? Is it a video seen on Facebook?
Can I get rid of the training set? Use unsupervised classifiers?
119
120
Ops, wrong key
121
And for TCP?
122
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Chunking and c2
First N payload bytes
C chunks Each of
b bitsc2
1c2
C[ ], … ,
Vector of Statistics
The provides an implicit measure of entropy or randomness
c2
Observeddistribution
Expecteddistribution(uniform)
UDPTCP
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Results
124
3° Cost-TMA PhD school – Krakow – Feb. 15 20123° Cost-TMA PhD school – Krakow – Feb. 15 2012
Results
125