powerpoint feb
Transcript of powerpoint feb
Ensemble of K-Nearest Neighbour Neighbour Classifiers for Intrusion Detection
Presented By
Imran Ahmed Malik
M.Tech CSE Networking Final Year
Sys ID 2014016942
Under the Guidance
of
Mrs. Amrita
Asst. Professor
SHARDA UNIVERSITY, GREATER NOIDA
Contents
• Objective
• Problem Statement
• Proposed system
• Introduction to implemented algorithm.
• Results and Graphs
• Conclusion
Objective
• Can GP based numeric classifier show optimized performance than individual K-NN classifiers?
• Can GP based combination technique produce a higher performance OCC as compared to K-NN component classifiers?
Problem Statement
OPTIMIZATION AND COMBINATION OF KNN CLASSIFIERS USING GENETIC PROGRAMMING FOR INTRUSION DETECTION SYSTEM
Proposed Model
KDD CUP 1999 data set
K-NN Classifiers
Import KDD Dataset
Select Initial K-Nearest Neighbors
OptimizationPossible?
Set GA Parameters
Generate initial random population
Evaluate fitness of each classifier
Parent selection for next generation
Crossover
Is optimization met?
End
YES
No
No
Figure 3 shows the operations of a general genetic algorithm according to which GA is implemented into our system.
GP Based Learning AlgorithmTraining Pseudo Code Stst , St represents the test and training data. C(x): class of x instance OCC: a composite classifier Ck : kth component classifier Ck (x): Prediction of Ck
Train-Composite Classifier (St ,OCC)Step 1: All input data examples x S∈ t are given to K component
classifiers.Step 2: Collect [C1 (x),C2 ( x), ,Ck (x)] for all x S∈ t to form a set of
prediction ClassStep 3: Start GP combining method, while using predictions as unary
function in GP tree. Threshold T is used as a variable to compute ROC curve.
GP Based Learning Algorithm………
Pseudo Code for Classification 1. Apply composite classifier (OCC, x )to data examples x
taken from Stst . 2. X= [C1 (x),C2 ( x), ,Ck (x)], stack the predictions to form new
derived data.3. Compute OCC(x)
Working Of Genetic Programming1. The algorithm begins by creating a random initial population.2. The algorithm then creates a sequence of new populations. At each step, the
algorithm uses the individuals in the current generation to create the next population. To create the new population, the algorithm performs the following steps:
I. Scores each member of the current population by computing its fitness value.II. Scales the raw fitness scores to convert them into a more usable range of values.III. Selects members, called parents, based on their fitness.IV. Some of the individuals in the current population that have lower fitness are
chosen as elite. These elite individuals are passed to the next population.V. Produces children from the parents. Children are produced either by making
random changes to a single parent—mutation—or by combining the vector entries of a pair of parents—crossover.
VI. Replaces the current population with the children to form the next generation.
Dataset And Operations on Dataset
• KDD CUP 1999 dataset• Remove Redundancy• Conversion of values• Normalization• PCA• Final Corrected data
Tools Used
• Genetic Programming Tool Kit
• Windows operating system
• 4 Gb Ram
• I5 processor
• Matlab
RESULTS GRAPHS AND ANALYSIS
Fitness Function
• Records :records must be maximum • Num folds :Number of folds must be minimum• K_value: k should be closer optimal• Time: time must be minimum negative• Model : highest model is preferred• Accuracy: top accurate model is preferred
f=records + num folds + K_value + Time +model + accuracy;
Current Best individual
records
Num-folds
model
time
K-valueaccuracy
GP Stopping Criteria
GP Selection Function
Confusion Matrix For Normal Class
Confusion Matrix For DoS Class
Confusion Matrix For R2L Class
Confusion Matrix For U2R Class
Confusion Matrix For Probe Class
Confusion matrix
• Scatter Plot of Src bytes with Count For Class using KNN
• Scatter Plot of src bytes versus dst host same src port rate for Class using KNN
• Roc Curve• ROC curve for GP based Classifier showing
0.99976 area under the curve
• Classification Results using Ensemble of Classifiers
Conclusion
• Ensemble increase the performance
• It reduces the error rates
• GP based ensembler provides better results then individual classifier
References
• Gianluigi Folino, Giandomenico Spezzano and Clara Pizzuti, Ensemble Techniques for parallel Genetic Programming based classifier
• Michał Woz´niak, Manuel Grana, Emilio Corchado,2014, A survey of multiple classifier systems as hybrid systems, ELSEVIER.
• Urvesh Bhowan, Mark Johnston, Member, IEEE, Mengjie Zhang, Senior Member, IEEE, and Xin Yao, Fellow, IEEE, JUNE 2013, Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data, IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 17, NO. 3, JUNE 2013.
• H Nguyen, K Franke, S Petrovic Improving Effectiveness of Intrusion Detection by Correlation Feature Selection, 2010 International Conference on Availability, Reliability and Security, IEEE.
• Shelly Xiaonan Wu, Wolfgang Banzhaf. 2010. The use of computational intelligence in intrusion detection systems: A review. Applied Soft Computing 10, 1-35
• Ahmad Taher Azar, Hanaa Ismail Elshazly, Aboul Ella Hassanien, Abeer Mohamed Elkorany. 2013. A random forest classifier for lymph diseases. Computer Methods and Programs in Biomedicine.
Thank You