Efficient Probabilistic Classification Methods for NIDS
Transcript of Efficient Probabilistic Classification Methods for NIDS
8/8/2019 Efficient Probabilistic Classification Methods for NIDS
http://slidepdf.com/reader/full/efficient-probabilistic-classification-methods-for-nids 1/5
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 8, November 2010
Efficient Probabilistic Classification Methods for
NIDS
S.M.Aqil Burney M.Sadiq Ali Khan Mr.Jawed Naseem
Department of Computer Science Department of Computer Science Principal Scientific Officer-PARCUniversity of Karachi, Karachi-Pakistan University of Karachi, Karchi-Pakistan
Abstract: As technology improve, attackers are trying to get
access of the network system resources by so many means, openloop holes in the network allow them to penetrate in the networkmore easily. Various approaches are tried for classification of attacks. In this paper we have compared two methods NaïveBayes and Junction Tree Algorithm on reduced set of features byimproving the performance as compared to full data set. Forfeature reduction PCA is used that helped in proposing a new
method for efficient classification. We proposed a Bayesiannetwork-based model with reduced set of features for IntrusionDetection. Our proposed method generates a less false positiverate that increase the detection efficiency by reducing theworkload and that increase the overall performance of an IDS.We also investigated that whether conditional independencereally effect on the attacks/ threats detection.
Keywords-Network Intrusion Detection System(NIDS); Bayesain Networks; Junction Tree Algorithm
I. INTRODUCTION
Network Security whether in a commercial organization orin a critically important research network, is a major issue of
concern with the increasing use of web even the personalinformation in under threat. Efficient network intrusiondetection system is only solution to such threats [4].
IDS is a monitoring system of networks to control / avoid / secure the networks from cyber terrorist or it is the process of examing the events occurring in a network or computer systemand detecting the signs of incidents which are the threats of computer security policies. Network system monitored by theIDS for detection of any rules violation. Having such violationin the system, efficient IDS generates notification by means of an alarm generation that alert the administrator to put somesteps/major according to such vulnerabilities. Commonintrusion attacks are classified based on various features/
parameter. KDD-99 data set usually used for investigating thenature of attack. The data set has 41 features listed. Informationvalue of these features and interdependence among them is aninterest of investigation. How much reduction in features canbe made without reducing the efficiency of classificationalgorithm and whether interdependency really contributes todetection efficiency? We are tried to find the answers of suchkind of questions in this paper. PCA is an effective datadimension reduction technique. Similarly Naïve Bayes’classifier and Bayesian Network both use probabilistic
approach for determination of attack probability. Naïve Bayes’classifiers assume conditional independence while Bayesiannetwork consider assumes conditional dependence. Twomethods can be used to compare whether conditionalindependency or interdependency really contribute toprobability of attack. In the next section we discussed somerelated works which are already proposed, in section 3 wediscussed the two methods of classification, in section 4 the
methodology is mentioned and finally in section 5 results anddiscussions are presented.
II. BACKGROUND
For intrusion most network based systems become thetarget to the hacker, so building efficient IDS is the main task now a day [4]. Intrusion based systems needs a component thatgenerates an alerts on the basis of rule set, to detect themalicious activity correctly it is necessary to manage the alertscorrectly [1]. Data Mining approaches are being applied byresearchers for the attacks detection in their Intrusion DetectionSystems[2]..Probabilistic approaches for reducing the falsealarm rate are proposed for example, see [3]. The enormous
amount of network data traffic is accumulated each day.Numbers of data mining approaches are used for collectingknowledge domain for intrusion detection which includesclustering, association rules and classification [12]. Dataanalysis supports by data mining techniques and now itbecomes one of the important features/component in intrusionbased system. The main concern of using data miningtechniques in attacks detection system to differentiate betweennormal packet vs abnormal. For applying data mining inintrusion detection we need a data set and a classificationmodel. That classification model may be Ba yesian Network,neural network, rule based decision tree based and other soft computing techniques as Support Vector Machines(SVM)[10,11]. Intrusion Detection System is now becomes the
necessicity for an organizational security system with itscredibility that may depend upon the data mining techniques.
2.1 Clustering
The process of labeling data and arranging it in groups iscalled clustering. By grouping we basically improve theperformance of different classifiers used. The genuine clustercontains data corresponding to single category [5]. The data setbelongs to the cluster is modeled with respect to them exciting
168 http://sites.google.com/site/ijcsis/ISSN 1947-5500
8/8/2019 Efficient Probabilistic Classification Methods for NIDS
http://slidepdf.com/reader/full/efficient-probabilistic-classification-methods-for-nids 2/5
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 8, November 2010
features. You may define the term clustering in such a way that
it refers as unsupervised machine learning mechanism forpatterns matching in unlabeled data with numerousaspects.
2.2 Classification
In classification we break the data sets into different classesand it is much less exploratory than clustering. By means of classification we need to classify data into set of classes normal
/not normal and to sub classify into different types. NaïveBayes’ used as a classification algorithm in this research bywhich data classification for intrusion detection be achieved.Due to the collection of huge amount of data traffic neededclassification is less famous [6].
III. CLASSIFICATION METHODS
3.1 Naïve Bayes Classifier
Naïve Bayes classifier is an effective technique forclassification of data. The technique is particularly useful forlarge data dimension. The Naïve Bayes is a special case of Bayes theoram which presuppose independence in dataattributes [7]. Even though Naïve Bayes assumes dataindependence, its performance is efficient and at par with othertechniques assuming data conditionality. Naïve Bayes classifiercan manage continuous or categorical data. Let for a set of given variable X={x1,x2,.....xn } with possible outcomesO={o1,o2,…..on}. The posterior probability of the dependentvariable is obtained by Bayes rule.
P(O j | x1,x2,.....xn) * P(x1,x2,.....xn)O j P(O j)
We can obtain a new case with X with a class label O j havehighest posterior probability as
d
The efficiency of Naive Bayes classifier lies in the fact thatit converts multi dimensionality of data to one dimensionaldensity estimation. The occupations of evidence do not affectthe posterior probability so generally classification task isefficient. The same is proved in this study also when Naive
Bayes classifier is compared with Junction Tree algorithm. Formodeling Naive Bayes classifier several distribution includingnormal gamma or Poisson density function can be employed.
3.2 Junction Tree Algorithm
Its a graphical method of belief updation or probabilisticreasoning. For Probabilistic reasoning, we are using BayesianNetworks and Decision Graphs (BNDG) for which details canbe found in [9]. The basic concept in junction tree is clusteringof predicted attributes [8]. In belief updation instead of approximating joint probability distribution of all targetedvariable (cliques) cluster attributes are formed and potential of clusters are used to approximate probability. So basically
junction tree is the graphical representation of potential clusternodes or cliques and a suitable algorithm to update thispotential. Junction tree algorithm involve several steps as
moralizing the graph, triangulation junction tree formulation,assigning probabilities to cliques, message passing and readingcliques marginal potentials from junction tree.
Using Junction tree algorithm requires that directed graph
is changed to undirected graph to ensure uniform applicationprocess is called moralization which involve adding edges
between parents and dropping the direction let = (
be a directed graph to be changed into undirected graph G(NG,EG) so infect two new sets along with EG required to beadded i.e.
and
The set can be defined as
In moralization is obtained and newundirected moralized graph is given as
Junction tree is formed after moralization which is basicallyhyper graphs of cliques if cliques of undirected graph G isgiven by C(G) than junction tree with a unique property thatintersections of any two nodes is contained in every node in theunique path joining the nodes.
Let consider a cluster representation having to neighborcluster U and V sharing a variable S in common
169 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/8/2019 Efficient Probabilistic Classification Methods for NIDS
http://slidepdf.com/reader/full/efficient-probabilistic-classification-methods-for-nids 3/5
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 8, November 2010
The aim of JTA is to modify potential in such a way that
the distribution of P (V) is obtained by modified potentialΨ(V). In such case probability of S can be given as
P(S)= ∑ Ψ(V)
Similarly
P(S) = ∑ Ψ(U)
Let Ψ(S) represent modified potential so Ψ(S) = P(S), sonow if potential of let say Ψ(V) is delayed as result of newevidence f the potential of both Ψ(S) & Ψ(U) can be updatedrealizing the equivalence
Ψ(U) = P(S) = Ψ(V)
Belief updation in junction tree is carried out throughmessage passing let U and V are two adjacent node withseparator S. so the task is to absorb V and W through S.
potential Ψ(W) and Ψ(S) with condition
∑ Ψ*(W) = Ψ*(S) = ∑ Ψ*(V)
In absorption Ψ*(S) and Ψ*(W) are replaced as under Ψ*(S) = ∑ Ψ(V)
Ψ(S) Ψ*(W) = Ψ (W)
Ψ(S)
In this way belief of the whole network is updated throughmessage passing.
IV. METHODOLOGY
KDD’99 data set of intrusion detection was used. PCAtechnique was used and 14 features were selected on the basisof analysis. Selection of data set for training and testing plays avital role in accuracy of prediction. In intrusion detectionfrequency of some attacks are very large as compare to others.To ensure inclusion of all attacks type in learning stratifiedrandom sample were drawn relative to proportion of eachattack type. This produces better result as compare to simple
random sampling. For Naive Bayes classification two data sets(stratified sample of equal size of 10000) were used forlearning and testing using software BN classifier . In junctiontree algorithm structure learning is carried out by drawing arandom sample of 5000 from KDD data sets using netica. Then
five data sets each of size 1000 are selected through simplerandom sample, data set is used for learning and drawing
junction tree. Data set 2 to 5 were used for testing belief updatelearned by junction tree.
V. RESULTS & DISCUSSION
The 41 features of KDD’99 data set were reduced to 14features. The PCA identified 12 major components havingEigen values greater than and around more than 80%variability of data explained by these features while 98%variability can be explained 24 components.
The difference of variability between 24 and 14 featuresselection is only 18% but computational cost highly increasedif 24 parameters are selected, so optimize the processing speed14 has been selected. It is evident from the graph mentionedabove that first 24 components represent 98.866% data and 14components explained 80% variability which is quite sufficient,and work was carried out on these components only, neglectingthe other components which seem less worthy. Besides this,structure learning also support selection of 14 features. TheBayesian network model shown in Figure 2 representsinterdependence among various attributes. It is evident thatmainly two factors as count & src_byte are effected byvarious features and in turn these two ultimately affect theattack types. The KDD’99 data set classification list 18 attack
types however normal & neptune are more frequent.
Figure 1: Scree Plot of attributes.
Ψ(U) Ψ(V) Ψ(S)
US
V
170 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/8/2019 Efficient Probabilistic Classification Methods for NIDS
http://slidepdf.com/reader/full/efficient-probabilistic-classification-methods-for-nids 4/5
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 8, November 2010
Figure:2 Bayesian Network Model Intrusion Detection System
BN classification also supports the importance of these twotype normal (0.527) and neptune (0.399) in Table 1. Theprobability of features buffer overflow, imap and multihop areless than 0.001% and that of ftp_write, guess_password andload_module are close to 0. It suggests that this classificationcan be merged.
Prediction Accuracy
4271
3729
4287
3713
3400
3600
3800
4000
4200
4400
normal Attack
N u m b e r s
Actual Predicted
Figure3: Prediction accuracy using BN Classifier
Figure 4 shows majors attacks category predictions. DoSattacks are 99.86% detected while probe attacks about 75%detected.
Prediction Accuracy of Major Attack Category
0
500
1000
1500
2000
2500
3000
3500
Attack Category
N u m b e r s
Actual Predicted
Actual 2872 726 7 3
Predicted 2868 538 2 1
DoS Probe R2l U2R
Figure 4: Prediction Accuracy of Major Attacks
BN classifier learned more effectively the attack which is morefrequent. In case of identify normal attacks it showed error rateof 0.8% only and identification of most frequent attack neptune
is 6.8% refers in table 1.
TABLE 1 ACCURACY OF CLASSIFICATION(BAYESIAN CLASSIFIER)
Class Actual Predicted Diff Error %
back 62 62 0 0
buffer_overflow 2 0 2 100
guess_passwd 3 0 3 100
imap 2 0 2 100
ipsweep 225 284 -59 -26.2
multihop 1 0 1 100
neptune 2630 2587 43 1.6
nmap 96 35 61 63.5
normal 4271 4287 -16 -0.37
phf 1 0 1 100
pod 12 0 12 100portsweep 186 219 -33 -17.7
rootkit 1 0 1 100
satan 219 273 -54 -24.6
smurf 168 180 -12 -7.1
teardrop 60 39 21 35
warezclient 57 34 23 40.35
warezmaster 4 0 4 100
Total 8000 8000
TABLE 2. PROBABILITY OF ATTACK(AVERAGE)
Class Junction
Tree
Naïve Bayes
Classifier
Diff
back 0.0102 0.0086 0.0016
buffer_overflow 0.0008 0.001 -0.0002imap 0.0006 0.0005 0.0001
ipsweep 0.0368 0.0368 0
multihop 0.0002 0 0.0002
neptune 0.3992 0.3936 0.0056
nmap 0.0176 0.0147 0.0029
normal 0.527 0.5432 -0.0162
Total 1 1
171 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/8/2019 Efficient Probabilistic Classification Methods for NIDS
http://slidepdf.com/reader/full/efficient-probabilistic-classification-methods-for-nids 5/5
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 8, November 2010
Using junction tree algorithm accuracy of identification isutmost 98%. Junction tree also identified neptune as mostfrequent attack. Probability identified of various attacks isdepicted in table 2. It is evident that estimation of probabilityalmost equal. This has been statistically compared that there isno significance difference between two methods. Frequenciesof remaining attacks are very small and their probability almost
near to zero.
Probability of Attack
0
0.1
0.2
0.3
0.4
0.5
0.6
B A C K
B U F F E R_
O V E R
F L O W
F T P_ W R
I T E
G U E S S_ P A
S S W D I M
A P
I P S W E
E P L A N D
L O A D M O
D U L E
M U L T I H O P
N E P T U N
E N M
A P
N O R M
A L
Attack Type
P r o b a b i l i t y
Avg JT Naïve Bayes
VI. CONCLUSION & FUTURE RECOMMENDATIONS
Despite the fact that Naïve Bayes classifiers assumeconditional independence and junction tree algorithmparameter interdependence, even though Naïve Bayes and
junction tree classifiers are almost equally effective. It isrecommended that only those attacks should be considered
which are more frequents in order to achieve betterperformance. It is also found that in selection of learning andtesting data set appropriate sampling techniques are utilized forbetter result prediction.
REFERENCES
[1] Moon Sun Shin, Eun Hee Kim, and Keun Ho Ryu, “ False Alarmclassification model for network-based IDS”; Springer-verlag berlin
Heidelberg, LNCS 3177, pp. 259–265, 2004.
[2] M.J.Lee,M.S.Shin,H.S.Moon,” Design and implementation of alertanalyzer with data mining engine. Proc. IDEAL ’03, Hongkong, 2003.
[3] A.Valdes and K. Skinner, “Probabilistic alert correlation”; 4thinternational symposium on Recent Advances in ID, RAID, 54-68, 2003.
[4] S.M.Aqil Burney and M.Sadiq Ali Khan , “Network UsageSecurity Policies for Academic Institutions”, International Journal of Computer Applications, October Issue, Published By Foundation of Computer
Science,2010.
[5] Anoop Singhal and Sushil Jajodia, “Data warehousing and datamining techniques for intrusion detection systems”, Distributed and Parallel
Databases Volume 20, Number 2, 149-166, DOI: 10.1007/s10619-006-9496-5,2006.
[6] Tasleem Mustafa, Ahmed Mateen, Ahsan Raza Sattar, Nauman ul
Haq and M. Yahya Saeed,“Forensic Data Security for Intrusions”, EuropeanJournal of Scientific Research ISSN 1450-216X Vol.39 No.2 (2010), pp.296-
308,2010.
[7] Karl Friston, Carlton Chu, Jnaina Mourao,Oliver Hulme, GeriantRees, Will Penny and John Ashburner, “Bayesian decoding of brain images”,
Elsevier NeuroImage Volume 39, Issue 1, 1, Pages 181-205, January 2008.
[8] Jaydip Sen, “An agent-based intrusion detection system for localarea networks”,IJCNIS, Vol. 2, No. 2, August 2010.
[9] F.V.Jensen and T.S.nielsen, “ Bayesian Networks and DecisionGraphs” Springer.Berlin Heidelberg, New York,2007.
[10] C.Cortes and V. Vapnik,“ Support Vector Networks”. Machine
Learning, 20, 1995, pp. 273-297,1995.
[11] Jungtaek Seo,“ An Attack Classification Mechanism Based onMultiple Support Vector Machines”, LNCS 4706, Part II, pp. 94–103,
Springer-Verlag Berlin Heidelberg, ICCSA 2007.
[12] Hebah H. O. Nasereddin, “Stream Data Mining”, InternationalJournal of Web Applications, Volume 1 Number 4 December 2009.
AUTHORS PROFILE
Dr.S.M.Aqil Burney is the Meritorious Professor and apSupervisor in Computer Science and Statistics by the
Education Commission, Govt of Pakistan. He is also the D& Chairman of Computer Science Department, Univer
Karachi. Additionally he is also a Director of Communication Network University of Karachi. He i
member of various higher academic boards of diuniversities of Pakistan. His research interest includes A
Computing, Neural Network, Fuzzy Logic, Data MStatistics, Simulation and Stochastic Modeling of M
Communication system and Networks, Network SecuritMIS in health services. Dr.Burney is also referee of v
journals and conferences proceedings, nationinternationally. He is member of IEEE(USA), ACM(USA
M.Sadiq Ali Khan received his BS & MS Degree in CompEngineering from SSUET in 1998 and 2003 respectiv
Since 2003 he is serving Computer Science DepartmUniversity of Karachi as an Assistant Professor. He has ab
12 years of teaching experience and his research aincludes Data Communication & Networks, Network Secu
Cryptography issues and Security in Wireless Networks. H
member of CSI, PEC and NSP.
Jawed Naseem is Principal Scientific Officer in Paki
Agricultural Research Council. He has M.Sc(Statistics) MCS from University of Karachi, currently doing
(Computer Science) from University of Karachi. His resea
interest are data modeling, Information ManagementSecurity and Decision Support System particularlyagricultural research. He has been a team member
development of several regional(SAARC) level agricultdatabases.
172 http://sites.google.com/site/ijcsis/
ISSN 1947-5500