Data Mining for Network Intrusion Detection: Experience with KDDCup’99 Data set Vipin Kumar,...

Data Mining for Network Data Mining for Network Intrusion Detection: Experience Intrusion Detection: Experience

with KDDCup’99 Data setwith KDDCup’99 Data set

Vipin Kumar, AHPCRC, University of Minnesota

Group members: L. Ertoz, M. Joshi, A. Lazarevic, H. Ramnani, P. Tan, J. Srivastava

IntroductionIntroduction

• Key challenge – Maintain high detection rate while keeping low false

alarm rate

• Misuse Detection– Two phase learning – PNrule

– Classification based on Associations (CBA) approach

• Anomaly Detection– Unsupervised (e.g. clustering) and supervised

methods to detect novel attacks

DARPA 1998 - KDDCup’99 Data SetDARPA 1998 - KDDCup’99 Data Set

• Modification of DARPA 1998 data set prepared and managed by MIT Lincoln Lab

• DARPA 1998 data includes a wide variety of intrusions simulated in a military network environment

• 9 weeks of raw TCP dump data simulating a typical U.S. Air Force LAN– 7 weeks for training (5 million connection records)– 2 weeks for training (2 million connection records)

KDDCup’99 Data SetKDDCup’99 Data Set

• Connections are labeled as normal or attacks• Attacks fall into 4 main categories (38 attack

types) - – DOS - Denial Of Service– Probe - e.g. port scanning– U2R - unauthorized access to root privileges, – R2L - unauthorized remote login to machine,

• U2R and R2L extremely small classes• 3 groups of features

– Basic, content based, time based features (details)


• Training set - ~ 5 million connections

• 10% training set - 494,021 connections

• Test set - 311,029 connections

• Test data has attack types that are not present in the training data => Problem is more realistic– Train set contains 22 attack types– Test data contains additional 17 new attack types that

belong to one of four main categories

Performance of Winning StrategyPerformance of Winning Strategy

DOS U2R R2L probe normal Recall (%)

DOS 223226 0 0 1328 5299 97.1

U2R 0 30 10 20 168 13.2

R2L 0 8 1360 294 14527 8.4

probe 184 0 0 3471 511 83.3

normal 78 4 6 243 60262 99.5

Precision (%) 99.9 71.4 98.8 64.8 74.6

• Cost-sensitive bagged boosting (B. Pfahringer)

Simple RIPPER classificationSimple RIPPER classification

DOS U2R R2L probe normal Recall (%)

DOS 223281 0 0 281 6291 97.2

U2R 1 9 6 0 212 4

R2L 4 4 1581 0 14600 9.8

probe 187 0 0 3085 894 74.1

normal 57 14 12 259 60251 97.37

Precision (%) 99.9 33.3 98.9 85.1 73.3

• RIPPER trained on 10% of data (494,021 connections)

• Test on entire test set (311,029 connections)

Simple RIPPER on modified dataSimple RIPPER on modified dataDOS U2R R2L probe normal Recall (%)

DOS 12917 0 2 10 98 99.16

U2R 1 78 5 8 44 57.35

R2L 0 12 1506 0 464 75.98

probe 33 0 0 2207 205 93.26

normal 31 2 44 45 17278 97.37

Precision (%) 99.5 84.78 96.72 97.22 95.52

• Remove duplicates and merge new train and test data sets

• Sample 69,980 examples from the merged data set – Sample from neptune and normal subclass. Other subclasses remain intact.

• Divide in equal proportion to training and test sets

• Apply RIPPER algorithm on the new data set

Building Predictive Models in NIDBuilding Predictive Models in NID• Models should handle skewed class distributions• Accuracy is not sufficient metric for evaluation

• Focus on both recall and precision– Recall (R) = TP/(TP + FN)– Precision (P) = TP/(TP + FP)

• F – measure = 2*R*P/(R+P)

Predicted class

Confusion matrix

NC C NC TN FP Actual

class C FN TP

rare class – C

large class – NC

Predictive Models for Rare ClassesPredictive Models for Rare Classes

• Over-sampling the small class [Ling, Li, KDD 1998]

• Down-sizing the large class [Kubat, ICML 1997]

• Internally bias discrimination process to compen-sate for class imbalance [Fawcett, DMKDD 1997]

• PNrule and related work [Joshi, Agarwal, Kumar, SIAM, SIGMOD 2001]

• RIPPER with stratification • SMOTE algorithm [Chawla, JAIR 2002]

• RareBoost [Joshi, Agarwal, Kumar, ICDM 2001]

PNrule Learning PNrule Learning • P-phase:

• cover most of the positive examples with high support

• seek good recall

• N-phase:• remove FP from examples covered in P-phase

• N-rules give high accuracy and significant support

Existing techniques can possibly learn erroneous small signatures for absence of C

C

NC

PNrule can learn strong signatures for presence of NC in N-phase

C

NC

RIPPER vs. PNrule ClassificationRIPPER vs. PNrule Classification

Model Attack Recall (%) Precision (%) F-value

RIPPER

U2R 17.1 6.7 9.6

R2L 13.9 84.9 23.9

Probe 77.8 64.7 70.7

PN rule

U2R 18.4 56.8 27.8

R2L 14.1 72.8 23.7

Probe 83.8 69.2 75.9

• 5% sample from normal, smurf (DOS), neptune (DOS) from 10% of training data (494,021 connections)

• Test on entire test set (311,029 connections)

Classification Based on Associations (CBA)Classification Based on Associations (CBA)

• What are Association patterns?– Frequent itemset: captures the set of “items” that co-occur

together frequently in a transaction database.– Association Rule: predicts the occurrence of a set of items

in a transaction given the presence of other items.

Example:Beer}MilkDiaper,{ ,s

4.05

2

nsTransactio ofNumber Total

)BeerMilk,Diaper,(

s

66.0|)MilkDiaper,(

)BeerMilk,Diaper,(

yX, s

))yX,((||

)yX(Ps

Ts

))X|y((|)X(

)yX(P

Association Rule:

Support:

Confidence:

• Previous work: – Use association patterns to improve the overall

performance of traditional classifiers.• Integrating Classification and Association Rule Mining

[Liu, Li, KDD 1998]

• CMAR: Accurate Classification Based on Multiple Class-Association Rules [Han, ICDM 2001]

– Associations in Network Intrusion Detection• Use classification based on associations for anomaly

detection and misuse detection [Lee, Stolfo, Mok 1999]

• Look for abnormal associations [Barbara, Wu, Jajodia, 2001]

Classification Based on Associations (CBA)Classification Based on Associations (CBA)

MethodologyMethodology

Overall data set

Stratification

DOS

U2R

R2L

probe

normal

Frequent Itemset Generation

F1: {A, B,C} => dos

F2: {B,D} => dos

…

F1: {A, C, D} => u2r

F2: {E,F,H} => u2r

…

F1: {C,K,L} => r2l

F2: {F,G,H} => r2l

…

F1: {B,F} => probe

F2: {B,C,H}=> probe

…

F1: {A, B} => normal

F2: {E,G} => normal

…

Feature Selection

Feed to classifier

MethodologyMethodology• Current approaches use confidence-like measures to

select the best rules to be added as features into the classifiers.– This may work well only if each class is well-represented

in the data set.

• For the rare class problems, some of the high recall itemsets could be potentially useful, as long as their precision is not too low.

• Our approach: – Apply frequent itemset generation algorithm to each class.– Select itemsets to be added as features based on precision,

recall and F-Measure.– Apply classification algorithm, i.e., RIPPER, to the new

data set.

Experimental Results Experimental Results (on modified data)(on modified data)

Original RIPPER RIPPER with high Precision rules

RIPPER with high Recall rules RIPPER with high F-measure rules

dos u2r r2l probe norm al

dos 12917 0 2 10 98

u2r 1 78 5 8 44

r2l 0 12 1506 0 464

probe 33 0 0 2207 205

norm al 31 2 44 45 17278


dos 12890 0 1 11 125

u2r 1 100 6 8 21

r2l 0 9 1504 1 468

probe 32 0 0 2364 49

norm al 31 2 108 54 17205


dos 12982 0 1 5 39

u2r 0 93 13 0 30

r2l 0 7 1564 2 409

probe 33 0 0 2320 92

norm al 28 5 42 55 17270


dos 12999 0 0 3 25

u2r 0 113 4 0 19

r2l 0 5 1671 2 304

probe 38 0 1 2252 154

norm al 30 2 62 35 17271

Experimental Results Experimental Results (on modified data)(on modified data)

For rare classes, rules ordered according to F-Measure produce the best results.

Original RIPPER RIPPER with high Precision rules

RIPPER with high Recall rules RIPPER with high F-measure rules

Precis ion Recall F-Measure

dos 99.50% 99.16% 99.33%

u2r 84.78% 57.35% 68.42%

r2l 96.72% 75.98% 85.11%

probe 97.22% 90.27% 93.62%

norm al 95.52% 99.30% 97.37%


dos 99.51% 98.95% 99.23%

u2r 90.09% 73.53% 80.97%

r2l 92.90% 75.88% 83.53%

probe 96.96% 96.69% 96.83%

norm al 96.29% 98.88% 97.57%


dos 99.53% 99.65% 99.59%

u2r 88.57% 68.38% 77.18%

r2l 96.54% 78.91% 86.84%

probe 97.40% 94.89% 96.13%

norm al 96.80% 99.25% 98.01%


dos 99.48% 99.79% 99.63%

u2r 94.17% 83.09% 88.28%

r2l 96.14% 84.31% 89.84%

probe 98.25% 92.11% 95.08%

norm al 97.18% 99.26% 98.21%

CBA SummaryCBA Summary

• Association rules can improve the overall performance of classifiers

• Measure used to select rules for feature addition can affect the performance of classifiers– The proposed F-measure rule selection approach

leads to better overall performance

Anomaly Detection – Related WorkAnomaly Detection – Related Work

• Detect novel intrusions using pseudo-Bayesian estimators to estimate prior and posterior probabilities of new attacks [Barbara, Wu, SIAM 2001]

• Generate artificial anomalies (intrusions) and then use RIPPER to learn intrusions [Fan et al, ICDM 2001]

• Detect intrusions by computing changes in esti-mated probability distributions [Eskin, ICML 2000]

• Clustering based approaches [Portnoy et al, 2001]

SNN ClusteringSNN Clustering on KDD Cup 99’ dataon KDD Cup 99’ data

• SNN clustering suited for finding clusters of varying sizes, shapes, densities in the presence of noise

• Dataset– 10,000 examples were sampled from neptune, smurf

and normal both from training and test– Other sub-classes remain intact– Total number of instances : 97,000– Applied shared nearest neighbors based clustering

and k-means clustering

Clustering ResultsClustering Results• SNN clusters of pure new attack types are foundCluster name Size Same category Wrong category

apache2 (dos) 211 0 0

apache2 (dos) 183 4 0

mscan (probe) 142 0 0


xterm + ps (u2r) 117 57 24 (r2l), 36 (normal)

snmpgetattack (r2l) 69 0 34 (normal)

snmpgetattack (r2l) 131 0 0


processtable (dos) 146 0 0

processtable (dos) 87 1 1 (dos), 3 (r2l)

• K-means performance

• SNN clustering performance

Clustering ResultsClustering Results

total correct incorrect Impurity

normal 18183 17458 725 3.99%u2r 267 15 252 94.38%dos 17408 17035 373 2.14%r2l 3894 3000 894 22.96%probe 4672 4293 379 8.11%

total correct incorrect missing impurity

normal 18183 12708 327 5148 2.51%u2r 267 101 67 99 39.88%dos 17408 13537 53 3818 0.39%r2l 3894 2654 257 983 8.83%probe 4672 3431 217 1024 5.95%

total correct incorrect missing impurity

normal 18183 9472 520 8191 5.20%u2r 267 0 113 154 100.00%dos 17408 16221 186 1001 1.13%r2l 3894 2569 471 854 15.49%probe 4672 3610 302 760 7.72%

All k-means clusters Tightest k-means clusters

Nearest Neighbor (NN) based Nearest Neighbor (NN) based Outlier DetectionOutlier Detection

• For each point in the training set, calculate the distance to the closest point

• Build a histogram

• Choose a threshold such that a small percentage (e.g., 2%) of the training set are classified as outliers

Anomaly Detection using NN SchemeAnomaly Detection using NN Scheme

normal anomalynormal 12046 343anomaly 2378 12373

detection rate 83.88%false alarm rate 2.77%

attack

Normal Correct Attack Group

Incorrect Attack Group

Anomaly Total

Normal 12040 0 176 173 12389

Known

Attacks

1119 7581 225 1814 10739

Anomaly 781 347 139 2755 4022

Detection Rate for Novel Attacks = 68.50%

False Positive Rate for Normal connections = 2.82%

Novel Attack Detection Using NN SchemeNovel Attack Detection Using NN Scheme

1-NN on known attack types

Anomaly dos u2r r2l probe normaldos 325 6756 0 0 0 22u2r 26 0 1 0 4 8r2l 1075 1 84 10 135 1023

probe 1269 388 1 0 0 814

1-NN on Anomalies

Anomaly dos u2r r2l probe normal total detection ratedos 1625 0 0 0 0 223 1848 87.93%u2r 66 0 0 1 98 11 176 37.50%r2l 585 40 1 4 1 0 631 92.71%

probe 1413 1024 35 0 0 346 2818 50.14%

novel attacksdetails

details

Novel Attack Detection Using NN SchemeNovel Attack Detection Using NN Scheme

ConclusionsConclusions

• Predictive models specifically designed for rare class can help in improving the detection of small attack types

• SNN clustering based approach shows promise in identifying novel attack types

• Simple nearest neighbor based approaches appear capable of detecting anomalies


• KDDCup’99 contains derived high-level features

• 3 groups of features– basic features of individual TCP connections

(duration, protocol type, service, src & dest bytes, …)– content features within a connection suggested by

domain knowledge (e.g. # of failed login attempts)– time-based traffic features of the connection records

• ''same host'' features examine only the connections that have the same destination host as the current connection

• ''same service'' features examine only the connections that have the same service as the current connection

back

1-NN on Anomaliestotal outlier dos u2r r2l probe normal

apache2 dos 794 731 0 0 0 0 63 mailbomb dos 308 149 0 0 0 0 159 processtable dos 744 744 0 0 0 0 0 udpstorm dos 2 1 0 0 0 0 1 httptunnel u2r 145 44 0 0 0 97 4 ps u2r 16 9 0 0 1 1 5 worm u2r 2 0 0 0 0 0 2 xterm u2r 13 13 0 0 0 0 0 named r2l 17 10 0 1 1 0 5 sendmail r2l 15 11 0 0 0 0 4 snmpgetattack r2l 179 7 0 1 0 0 171 snmpguess r2l 359 2 1 0 0 0 356 sqlattack r2l 2 2 0 0 0 0 0 xlock r2l 9 5 0 1 0 0 3 xsnoop r2l 4 3 0 1 0 0 0 mscan probe 1049 1011 35 0 0 3 0 saint probe 364 13 0 0 0 343 8

back

1-NN on Known Attackstotal outlier dos u2r r2l probe normal

back dos 386 12 362 0 0 0 12 land dos 9 4 5 0 0 0 0 neptune dos 5715 255 5460 0 0 0 0 pod dos 45 13 29 0 0 0 3 smurf dos 936 36 893 0 0 0 7 teardrop dos 12 5 7 0 0 0 0 buffer_overflow u2r 22 17 0 1 0 4 0 loadmodule u2r 2 2 0 0 0 0 0 perl u2r 2 2 0 0 0 0 0 rootkit u2r 13 5 0 0 0 0 8 ftp_write r2l 3 2 0 0 0 0 1 guess_passwd r2l 1302 441 0 66 0 0 795 imap r2l 1 0 0 0 0 1 0 multihop r2l 18 7 0 0 0 2 9 phf r2l 2 1 0 0 1 0 0 warezmaster r2l 1002 624 1 18 9 132 218 ipsweep probe 155 0 1 0 0 150 4 nmap probe 80 13 0 0 0 67 0 portsweep probe 174 57 0 0 0 117 0 satan probe 860 318 0 0 0 480 62

back

Data Mining for Network Intrusion Detection: Experience with KDDCup’99 Data set Vipin Kumar,...

Documents

Transcript of Data Mining for Network Intrusion Detection: Experience with KDDCup’99 Data set Vipin Kumar,...