Data Mining for Network Intrusion Detection: Experience with KDDCup’99 Data set Vipin Kumar,...
-
date post
21-Dec-2015 -
Category
Documents
-
view
222 -
download
3
Transcript of Data Mining for Network Intrusion Detection: Experience with KDDCup’99 Data set Vipin Kumar,...
Data Mining for Network Data Mining for Network Intrusion Detection: Experience Intrusion Detection: Experience
with KDDCup’99 Data setwith KDDCup’99 Data set
Vipin Kumar, AHPCRC, University of Minnesota
Group members: L. Ertoz, M. Joshi, A. Lazarevic, H. Ramnani, P. Tan, J. Srivastava
IntroductionIntroduction
• Key challenge – Maintain high detection rate while keeping low false
alarm rate
• Misuse Detection– Two phase learning – PNrule
– Classification based on Associations (CBA) approach
• Anomaly Detection– Unsupervised (e.g. clustering) and supervised
methods to detect novel attacks
DARPA 1998 - KDDCup’99 Data SetDARPA 1998 - KDDCup’99 Data Set
• Modification of DARPA 1998 data set prepared and managed by MIT Lincoln Lab
• DARPA 1998 data includes a wide variety of intrusions simulated in a military network environment
• 9 weeks of raw TCP dump data simulating a typical U.S. Air Force LAN– 7 weeks for training (5 million connection records)– 2 weeks for training (2 million connection records)
KDDCup’99 Data SetKDDCup’99 Data Set
• Connections are labeled as normal or attacks• Attacks fall into 4 main categories (38 attack
types) - – DOS - Denial Of Service– Probe - e.g. port scanning– U2R - unauthorized access to root privileges, – R2L - unauthorized remote login to machine,
• U2R and R2L extremely small classes• 3 groups of features
– Basic, content based, time based features (details)
KDDCup’99 Data SetKDDCup’99 Data Set
• Training set - ~ 5 million connections
• 10% training set - 494,021 connections
• Test set - 311,029 connections
• Test data has attack types that are not present in the training data => Problem is more realistic– Train set contains 22 attack types– Test data contains additional 17 new attack types that
belong to one of four main categories
Performance of Winning StrategyPerformance of Winning Strategy
DOS U2R R2L probe normal Recall (%)
DOS 223226 0 0 1328 5299 97.1
U2R 0 30 10 20 168 13.2
R2L 0 8 1360 294 14527 8.4
probe 184 0 0 3471 511 83.3
normal 78 4 6 243 60262 99.5
Precision (%) 99.9 71.4 98.8 64.8 74.6
• Cost-sensitive bagged boosting (B. Pfahringer)
Simple RIPPER classificationSimple RIPPER classification
DOS U2R R2L probe normal Recall (%)
DOS 223281 0 0 281 6291 97.2
U2R 1 9 6 0 212 4
R2L 4 4 1581 0 14600 9.8
probe 187 0 0 3085 894 74.1
normal 57 14 12 259 60251 97.37
Precision (%) 99.9 33.3 98.9 85.1 73.3
• RIPPER trained on 10% of data (494,021 connections)
• Test on entire test set (311,029 connections)
Simple RIPPER on modified dataSimple RIPPER on modified dataDOS U2R R2L probe normal Recall (%)
DOS 12917 0 2 10 98 99.16
U2R 1 78 5 8 44 57.35
R2L 0 12 1506 0 464 75.98
probe 33 0 0 2207 205 93.26
normal 31 2 44 45 17278 97.37
Precision (%) 99.5 84.78 96.72 97.22 95.52
• Remove duplicates and merge new train and test data sets
• Sample 69,980 examples from the merged data set – Sample from neptune and normal subclass. Other subclasses remain intact.
• Divide in equal proportion to training and test sets
• Apply RIPPER algorithm on the new data set
Building Predictive Models in NIDBuilding Predictive Models in NID• Models should handle skewed class distributions• Accuracy is not sufficient metric for evaluation
• Focus on both recall and precision– Recall (R) = TP/(TP + FN)– Precision (P) = TP/(TP + FP)
• F – measure = 2*R*P/(R+P)
Predicted class
Confusion matrix
NC C NC TN FP Actual
class C FN TP
rare class – C
large class – NC
Predictive Models for Rare ClassesPredictive Models for Rare Classes
• Over-sampling the small class [Ling, Li, KDD 1998]
• Down-sizing the large class [Kubat, ICML 1997]
• Internally bias discrimination process to compen-sate for class imbalance [Fawcett, DMKDD 1997]
• PNrule and related work [Joshi, Agarwal, Kumar, SIAM, SIGMOD 2001]
• RIPPER with stratification • SMOTE algorithm [Chawla, JAIR 2002]
• RareBoost [Joshi, Agarwal, Kumar, ICDM 2001]
PNrule Learning PNrule Learning • P-phase:
• cover most of the positive examples with high support
• seek good recall
• N-phase:• remove FP from examples covered in P-phase
• N-rules give high accuracy and significant support
Existing techniques can possibly learn erroneous small signatures for absence of C
C
NC
PNrule can learn strong signatures for presence of NC in N-phase
C
NC
RIPPER vs. PNrule ClassificationRIPPER vs. PNrule Classification
Model Attack Recall (%) Precision (%) F-value
RIPPER
U2R 17.1 6.7 9.6
R2L 13.9 84.9 23.9
Probe 77.8 64.7 70.7
PN rule
U2R 18.4 56.8 27.8
R2L 14.1 72.8 23.7
Probe 83.8 69.2 75.9
• 5% sample from normal, smurf (DOS), neptune (DOS) from 10% of training data (494,021 connections)
• Test on entire test set (311,029 connections)
Classification Based on Associations (CBA)Classification Based on Associations (CBA)
• What are Association patterns?– Frequent itemset: captures the set of “items” that co-occur
together frequently in a transaction database.– Association Rule: predicts the occurrence of a set of items
in a transaction given the presence of other items.
Example:Beer}MilkDiaper,{ ,s
4.05
2
nsTransactio ofNumber Total
)BeerMilk,Diaper,(
s
66.0|)MilkDiaper,(
)BeerMilk,Diaper,(
yX, s
))yX,((||
)yX(Ps
Ts
))X|y((|)X(
)yX(P
Association Rule:
Support:
Confidence:
• Previous work: – Use association patterns to improve the overall
performance of traditional classifiers.• Integrating Classification and Association Rule Mining
[Liu, Li, KDD 1998]
• CMAR: Accurate Classification Based on Multiple Class-Association Rules [Han, ICDM 2001]
– Associations in Network Intrusion Detection• Use classification based on associations for anomaly
detection and misuse detection [Lee, Stolfo, Mok 1999]
• Look for abnormal associations [Barbara, Wu, Jajodia, 2001]
Classification Based on Associations (CBA)Classification Based on Associations (CBA)
MethodologyMethodology
Overall data set
Stratification
DOS
U2R
R2L
probe
normal
Frequent Itemset Generation
F1: {A, B,C} => dos
F2: {B,D} => dos
…
F1: {A, C, D} => u2r
F2: {E,F,H} => u2r
…
F1: {C,K,L} => r2l
F2: {F,G,H} => r2l
…
F1: {B,F} => probe
F2: {B,C,H}=> probe
…
F1: {A, B} => normal
F2: {E,G} => normal
…
Feature Selection
Feed to classifier
MethodologyMethodology• Current approaches use confidence-like measures to
select the best rules to be added as features into the classifiers.– This may work well only if each class is well-represented
in the data set.
• For the rare class problems, some of the high recall itemsets could be potentially useful, as long as their precision is not too low.
• Our approach: – Apply frequent itemset generation algorithm to each class.– Select itemsets to be added as features based on precision,
recall and F-Measure.– Apply classification algorithm, i.e., RIPPER, to the new
data set.
Experimental Results Experimental Results (on modified data)(on modified data)
Original RIPPER RIPPER with high Precision rules
RIPPER with high Recall rules RIPPER with high F-measure rules
dos u2r r2l probe norm al
dos 12917 0 2 10 98
u2r 1 78 5 8 44
r2l 0 12 1506 0 464
probe 33 0 0 2207 205
norm al 31 2 44 45 17278
dos u2r r2l probe norm al
dos 12890 0 1 11 125
u2r 1 100 6 8 21
r2l 0 9 1504 1 468
probe 32 0 0 2364 49
norm al 31 2 108 54 17205
dos u2r r2l probe norm al
dos 12982 0 1 5 39
u2r 0 93 13 0 30
r2l 0 7 1564 2 409
probe 33 0 0 2320 92
norm al 28 5 42 55 17270
dos u2r r2l probe norm al
dos 12999 0 0 3 25
u2r 0 113 4 0 19
r2l 0 5 1671 2 304
probe 38 0 1 2252 154
norm al 30 2 62 35 17271
Experimental Results Experimental Results (on modified data)(on modified data)
For rare classes, rules ordered according to F-Measure produce the best results.
Original RIPPER RIPPER with high Precision rules
RIPPER with high Recall rules RIPPER with high F-measure rules
Precis ion Recall F-Measure
dos 99.50% 99.16% 99.33%
u2r 84.78% 57.35% 68.42%
r2l 96.72% 75.98% 85.11%
probe 97.22% 90.27% 93.62%
norm al 95.52% 99.30% 97.37%
Precis ion Recall F-Measure
dos 99.51% 98.95% 99.23%
u2r 90.09% 73.53% 80.97%
r2l 92.90% 75.88% 83.53%
probe 96.96% 96.69% 96.83%
norm al 96.29% 98.88% 97.57%
Precis ion Recall F-Measure
dos 99.53% 99.65% 99.59%
u2r 88.57% 68.38% 77.18%
r2l 96.54% 78.91% 86.84%
probe 97.40% 94.89% 96.13%
norm al 96.80% 99.25% 98.01%
Precis ion Recall F-Measure
dos 99.48% 99.79% 99.63%
u2r 94.17% 83.09% 88.28%
r2l 96.14% 84.31% 89.84%
probe 98.25% 92.11% 95.08%
norm al 97.18% 99.26% 98.21%
CBA SummaryCBA Summary
• Association rules can improve the overall performance of classifiers
• Measure used to select rules for feature addition can affect the performance of classifiers– The proposed F-measure rule selection approach
leads to better overall performance
Anomaly Detection – Related WorkAnomaly Detection – Related Work
• Detect novel intrusions using pseudo-Bayesian estimators to estimate prior and posterior probabilities of new attacks [Barbara, Wu, SIAM 2001]
• Generate artificial anomalies (intrusions) and then use RIPPER to learn intrusions [Fan et al, ICDM 2001]
• Detect intrusions by computing changes in esti-mated probability distributions [Eskin, ICML 2000]
• Clustering based approaches [Portnoy et al, 2001]
SNN ClusteringSNN Clustering on KDD Cup 99’ dataon KDD Cup 99’ data
• SNN clustering suited for finding clusters of varying sizes, shapes, densities in the presence of noise
• Dataset– 10,000 examples were sampled from neptune, smurf
and normal both from training and test– Other sub-classes remain intact– Total number of instances : 97,000– Applied shared nearest neighbors based clustering
and k-means clustering
Clustering ResultsClustering Results• SNN clusters of pure new attack types are foundCluster name Size Same category Wrong category
apache2 (dos) 211 0 0
apache2 (dos) 183 4 0
mscan (probe) 142 0 0
mscan (probe) 118 0 0
xterm + ps (u2r) 117 57 24 (r2l), 36 (normal)
snmpgetattack (r2l) 69 0 34 (normal)
snmpgetattack (r2l) 131 0 0
mscan (probe) 104 0 0
processtable (dos) 146 0 0
processtable (dos) 87 1 1 (dos), 3 (r2l)
• K-means performance
• SNN clustering performance
Clustering ResultsClustering Results
total correct incorrect Impurity
normal 18183 17458 725 3.99%u2r 267 15 252 94.38%dos 17408 17035 373 2.14%r2l 3894 3000 894 22.96%probe 4672 4293 379 8.11%
total correct incorrect missing impurity
normal 18183 12708 327 5148 2.51%u2r 267 101 67 99 39.88%dos 17408 13537 53 3818 0.39%r2l 3894 2654 257 983 8.83%probe 4672 3431 217 1024 5.95%
total correct incorrect missing impurity
normal 18183 9472 520 8191 5.20%u2r 267 0 113 154 100.00%dos 17408 16221 186 1001 1.13%r2l 3894 2569 471 854 15.49%probe 4672 3610 302 760 7.72%
All k-means clusters Tightest k-means clusters
Nearest Neighbor (NN) based Nearest Neighbor (NN) based Outlier DetectionOutlier Detection
• For each point in the training set, calculate the distance to the closest point
• Build a histogram
• Choose a threshold such that a small percentage (e.g., 2%) of the training set are classified as outliers
Anomaly Detection using NN SchemeAnomaly Detection using NN Scheme
normal anomalynormal 12046 343anomaly 2378 12373
detection rate 83.88%false alarm rate 2.77%
attack
Normal Correct Attack Group
Incorrect Attack Group
Anomaly Total
Normal 12040 0 176 173 12389
Known
Attacks
1119 7581 225 1814 10739
Anomaly 781 347 139 2755 4022
Detection Rate for Novel Attacks = 68.50%
False Positive Rate for Normal connections = 2.82%
Novel Attack Detection Using NN SchemeNovel Attack Detection Using NN Scheme
1-NN on known attack types
Anomaly dos u2r r2l probe normaldos 325 6756 0 0 0 22u2r 26 0 1 0 4 8r2l 1075 1 84 10 135 1023
probe 1269 388 1 0 0 814
1-NN on Anomalies
Anomaly dos u2r r2l probe normal total detection ratedos 1625 0 0 0 0 223 1848 87.93%u2r 66 0 0 1 98 11 176 37.50%r2l 585 40 1 4 1 0 631 92.71%
probe 1413 1024 35 0 0 346 2818 50.14%
novel attacksdetails
details
Novel Attack Detection Using NN SchemeNovel Attack Detection Using NN Scheme
ConclusionsConclusions
• Predictive models specifically designed for rare class can help in improving the detection of small attack types
• SNN clustering based approach shows promise in identifying novel attack types
• Simple nearest neighbor based approaches appear capable of detecting anomalies
KDDCup’99 Data SetKDDCup’99 Data Set
• KDDCup’99 contains derived high-level features
• 3 groups of features– basic features of individual TCP connections
(duration, protocol type, service, src & dest bytes, …)– content features within a connection suggested by
domain knowledge (e.g. # of failed login attempts)– time-based traffic features of the connection records
• ''same host'' features examine only the connections that have the same destination host as the current connection
• ''same service'' features examine only the connections that have the same service as the current connection
back
1-NN on Anomaliestotal outlier dos u2r r2l probe normal
apache2 dos 794 731 0 0 0 0 63 mailbomb dos 308 149 0 0 0 0 159 processtable dos 744 744 0 0 0 0 0 udpstorm dos 2 1 0 0 0 0 1 httptunnel u2r 145 44 0 0 0 97 4 ps u2r 16 9 0 0 1 1 5 worm u2r 2 0 0 0 0 0 2 xterm u2r 13 13 0 0 0 0 0 named r2l 17 10 0 1 1 0 5 sendmail r2l 15 11 0 0 0 0 4 snmpgetattack r2l 179 7 0 1 0 0 171 snmpguess r2l 359 2 1 0 0 0 356 sqlattack r2l 2 2 0 0 0 0 0 xlock r2l 9 5 0 1 0 0 3 xsnoop r2l 4 3 0 1 0 0 0 mscan probe 1049 1011 35 0 0 3 0 saint probe 364 13 0 0 0 343 8
back
1-NN on Known Attackstotal outlier dos u2r r2l probe normal
back dos 386 12 362 0 0 0 12 land dos 9 4 5 0 0 0 0 neptune dos 5715 255 5460 0 0 0 0 pod dos 45 13 29 0 0 0 3 smurf dos 936 36 893 0 0 0 7 teardrop dos 12 5 7 0 0 0 0 buffer_overflow u2r 22 17 0 1 0 4 0 loadmodule u2r 2 2 0 0 0 0 0 perl u2r 2 2 0 0 0 0 0 rootkit u2r 13 5 0 0 0 0 8 ftp_write r2l 3 2 0 0 0 0 1 guess_passwd r2l 1302 441 0 66 0 0 795 imap r2l 1 0 0 0 0 1 0 multihop r2l 18 7 0 0 0 2 9 phf r2l 2 1 0 0 1 0 0 warezmaster r2l 1002 624 1 18 9 132 218 ipsweep probe 155 0 1 0 0 150 4 nmap probe 80 13 0 0 0 67 0 portsweep probe 174 57 0 0 0 117 0 satan probe 860 318 0 0 0 480 62
back