Internet Traffic Classification KISS
description
Transcript of Internet Traffic Classification KISS
![Page 1: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/1.jpg)
Internet Traffic ClassificationKISS
Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi
1
![Page 2: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/2.jpg)
Traffic Classification & Measurement Why??
Identify normal and anomalous behavior Characterize the network and its users Quality of service Filtering …
How?How? By means of passive measurement Using Tstat
2
![Page 3: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/3.jpg)
3
Tstat
Traffic classifier Deep packet inspection Statistical methods
Persistent and scalable monitoring platform Round Robin Database (RRD) Histograms
Internal Clients
EdgeRouter
External Servers
htt
p:/
/tst
at.
tlc.
polit
o.it
htt
p:/
/tst
at.
tlc.
polit
o.it
![Page 4: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/4.jpg)
Tstat at a Glance
![Page 5: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/5.jpg)
Worm and Viruses?
Did someone open a Christmas card? Happy new year to Windows!! Did someone open a Christmas card? Happy new year to Windows!!
![Page 6: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/6.jpg)
Anomalies (Good!)Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008
Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008
![Page 7: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/7.jpg)
New Applications – P2PTVFiorentina 4 - Udinese 2Fiorentina 4 - Udinese 2
Inter 1 - Juventus 0Inter 1 - Juventus 0
![Page 8: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/8.jpg)
Traffic classification
Look at the packets…
Tell me what protocol and/or application
generated them
![Page 9: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/9.jpg)
Port:
Port: 4662/4672
Port:
Port:
Payload: “bittorrent”
Payload: E4/E5
Payload:
Payload: RTP protocol
Skype Bittorrent
Gtalk eMule
Typical approach: Deep Packet Inspection (DPI)
It fails more and more:P2P
EncryptionProprietary solution
Many different flavours
![Page 10: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/10.jpg)
The Failure of DPI
11.05.2008 12:29 eMule 0.49a released 11.05.2008 12:29 eMule 0.49a released
1.08.2008 20:25 eMule 0.49b released 1.08.2008 20:25 eMule 0.49b released
![Page 11: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/11.jpg)
Possible Solution: Behavioral Classifier
Phase 1
Feature
Phase 3
Verify
1. Statistical characterization of traffic (given source) 2. Look for the behaviour of unknown traffic and
assign the class that better fits it3. Check for possible classification mistakes
Phase 2
DecisionTraffic(Known)
(Training) (Operation)
![Page 12: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/12.jpg)
Phase 1
Feature
Phase 3
Verify
Phase 2
DecisionTraffic(Known)
Our Approach
Statistical characterization of bits in a flow
Do NOT look at the SEMANTIC and TIMING… but rather look at the protocol FORMAT
Test2
![Page 13: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/13.jpg)
Chunking and 2
First N payload bytes
First N payload bytes
C chunks Each of
b bits2
12
C[ ], … ,
Vector of Statistics
The provides an implicit measure of entropy or randomness
2
Observeddistribution
Expecteddistribution(uniform)
![Page 14: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/14.jpg)
Consider a chunk of 2 bits:
0 1 2 3 0 1 2 3 0 1 2 3
RandomValues
DeterministicValue
Counter
Oi
and different beaviour
![Page 15: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/15.jpg)
4 bit long chunks: evolution
random
x x x x
2
![Page 16: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/16.jpg)
random
Deterministic )12(2 bN
0 0 0 1
4 bit long chunks: evolution2
![Page 17: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/17.jpg)
random
deterministic
mixed
x 0 0 0
x 0 x 0
0 x x x
4 bit long chunks: evolution2
![Page 18: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/18.jpg)
Chi Square Classifier
Split the payload into groups
Apply the test on the groups at the flow end: each message is a sample
Some groups will contain Random bits Mixed bits Deterministic bits
0 8 16 24---------------------| ID | FUNC |---------------------
![Page 19: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/19.jpg)
CSC
1
10
100
1000
10000
100000
1e+006
100 1000 10000 100000 1e+006n [pkt]
Deterministic groupRandom group
Mixed group
2χ
![Page 20: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/20.jpg)
And the counter example?
2 byte long counter
MSG L2 L1 LSG
MostSignificantGroup
LessSignificantGroup
![Page 21: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/21.jpg)
Protocol format as seen from the2
![Page 22: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/22.jpg)
Statistical characterization of bits in a flow
Decision process Test
Minimum distance / maximum likelihood
2
Phase 1
Feature
Phase 3
Verify
Phase 2
DecisionTraffic(Known)
Our Approach
![Page 23: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/23.jpg)
C-dimension space
21
2C[ ], … ,
Iperspace
ClassificationRegions
EuclideanDistance
Support VectorMachine
2i
2j
Class
Class
My Point
![Page 24: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/24.jpg)
Example considering the 2
![Page 25: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/25.jpg)
2i
2j Centroid
Center of mass
Euclidean Distance Classifier
![Page 26: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/26.jpg)
2i
2j
True NegativeAre “Far”
True PositivesAre “Nearby”
CentroidCenter of mass
Euclidean Distance Classifier
![Page 27: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/27.jpg)
2i
2j
False Positives
CentroidCenter of mass
Iper-sphere
Euclidean Distance Classifier
![Page 28: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/28.jpg)
2i
2j Centroid
Center of mass
Iper-sphere False negatives
Radius
Euclidean Distance Classifier
![Page 29: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/29.jpg)
2i
2j Centroid
Center of mass
Iper-sphere min { False Pos. } min { False Neg. }
Confidence
The distance is a measure of the condifence of the decision
Euclidean Distance Classifier
![Page 30: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/30.jpg)
Radius
Tru
e P
ositi
ve
– F
alse
pos
itive
How to define the sphere radius?
![Page 31: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/31.jpg)
Space ofsamples(dim. C)
Kernel function
Space of feature
(dim. ∞)
Kernel functions Move point so that borders
are simple
Support Vector Machine
![Page 32: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/32.jpg)
Support vectors
Support vectors
Kernel functions Move point so that borders
are simple
Borders are planes Simple surface! Nice math Support Vectors LibSVM
Support Vector Machine
![Page 33: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/33.jpg)
Decision Distance from the border Confidence is a
probability
p ( class )
Kernel functions
Borders are planes Simple surface! Nice math Support Vectors LibSVM
Support Vector Machine
![Page 34: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/34.jpg)
Performance evaluationHow accurate is all this?
Our ApproachPhase 1
Feature
Phase 3
Verify
Phase 2
DecisionTraffic(Known)
Statistical characterization of bits in a flow
Decision process Test
Minimum distance / maximum likelihood
2
![Page 35: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/35.jpg)
Per flow and per endpoint
What are we going to classify? It can be applied to both single flows And to endpoints
It is robust to sampling Does not require to monitor all packets, not the
first packets
35
![Page 36: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/36.jpg)
Real traffic tracesInternet
Fastweb
Known + Other Training Known Traffic False Negatives Unknown traffic False Positives
Trace
RTPeMuleDNS
Oracle(DPI +Manual )
other
Other UnknownTraffic
1 day long trace
20 GByte diUDP traffic
![Page 37: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/37.jpg)
Definition of false positive/negative
TrafficOracle (DPI) eMuleRTP
DNS
Other
Classifing “known”
true positives
false negatives
true negatives
false positives
Classifing “other”KISS KISS
![Page 38: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/38.jpg)
Case A Case BRtp 0.08 0.23Edk 13.03 7.97Dns 6.57 19.19
Case A Case B0.00 0.050.98 0.540.12 2.14
Case A Case Bother 13.6 17.01
Euclidean Distance SVM
Case A Case B0.00 0.18
Results
Known traffic(False Neg.)
[%]
Other(False Pos.)
[%]
![Page 39: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/39.jpg)
Real traffic trace
RTP errors are oracle mistakes(do not identify RTP v1)
DNS errors are due to impure training set
(for the oracle all port 53 is DNS traffic)
EDK errors are (maybe) Xbox Live(proper training for “other”)
FN are always below 3%!!!
![Page 40: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/40.jpg)
Tuning trainset size
%
True positives
False positives
Samples per class
(confidence 5%)
Small training setFor “known”: 70-80 MbyteFor “other”: 300 Mbyte
![Page 41: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/41.jpg)
2
packets
%
True positives
False positives
Tuning num of packets for
(confidence 5%)
Protocols with volumesat least 70-80 pkts per flow
![Page 42: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/42.jpg)
P2P-TV applications
P2P-TV applications are becoming popularThey heavly rely on UDP at the transport protocolThey are based on proprietary protocolsThey are evolving over time very quicklyHow to identify them?... After 6 hours, KISS give you results
![Page 43: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/43.jpg)
The Failure of DPI
![Page 44: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/44.jpg)
And for TCP?
44
![Page 45: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/45.jpg)
Chunking and 2
First N payload bytes
First N payload bytes
C chunks Each of
b bits2
12
C[ ], … ,
Vector of Statistics
The provides an implicit measure of entropy or randomness
2
Observeddistribution
Expecteddistribution(uniform)
![Page 46: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/46.jpg)
Results
46
![Page 47: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/47.jpg)
Results
47
![Page 48: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/48.jpg)
Pros and Cons
KISS is good because…• Blind approach• Completely automated• Works with many protocols• Works even with small training• Statistics can start at any point• Robust w.r.t. packet drops• Bypasses some DPI problems
but…• Learn (other) properly• Needs volumes of traffic• May require memory (for now)• Only UDP (for now)• Only offline (for now)
![Page 49: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/49.jpg)
Papers D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. Tofanelli “Revealing skype
traffic: when randomness plays with you”, ACM SIGCOMM, Kyoto, JP, August 2007
D. Rossi, M. Mellia, M. Meo, “A Detailed Measurement of Skype Network Traffic”, 7th International Workshop on Peer-to-Peer Systems (IPTPS '08), Tampa Bay, Florida, February 2008
D. Bonfiglio, M. Mellia, M. Meo, N. Ritacca, D. Rossi, “Tracking Down Skype Traffic”, IEEE Infocom, Phoenix, AZ, 15,17 April 2008
D. Bonfiglio, M. Mellia, M. Meo, D. Rossi Detailed Analysis of Skype Traffic IEEE Transactions on Multimedia "1", Vol. 11, No. 1, pp. 117-127, ISSN: 1520-9210, January 2009
A. Finamore, M. Mellia, M. Meo, D. Rossi KISS: Stochastic Packet Inspection 1st Traffic Monitoring and Analysis (TMA) Workshop Aachen, 11 May 2009
![Page 50: Internet Traffic Classification KISS](https://reader035.fdocuments.in/reader035/viewer/2022062500/568158f4550346895dc63049/html5/thumbnails/50.jpg)
And for TCP
50