powerpoint feb

28
Ensemble of K-Nearest Neighbour Neighbour Classifiers for Intrusion Detection Presented By Imran Ahmed Malik M.Tech CSE Networking Final Year Sys ID 2014016942 Under the Guidance of Mrs. Amrita Asst. Professor SHARDA UNIVERSITY, GREATER NOIDA

Transcript of powerpoint feb

Page 1: powerpoint feb

Ensemble of K-Nearest Neighbour Neighbour Classifiers for Intrusion Detection

Presented By

Imran Ahmed Malik

M.Tech CSE Networking Final Year

Sys ID 2014016942

Under the Guidance

of

Mrs. Amrita

Asst. Professor

SHARDA UNIVERSITY, GREATER NOIDA

Page 2: powerpoint feb

Contents

• Objective

• Problem Statement

• Proposed system

• Introduction to implemented algorithm.

• Results and Graphs

• Conclusion

Page 3: powerpoint feb

Objective

• Can GP based numeric classifier show optimized performance than individual K-NN classifiers?

• Can GP based combination technique produce a higher performance OCC as compared to K-NN component classifiers?

Page 4: powerpoint feb

Problem Statement

OPTIMIZATION AND COMBINATION OF KNN CLASSIFIERS USING GENETIC PROGRAMMING FOR INTRUSION DETECTION SYSTEM

Page 5: powerpoint feb

Proposed Model

KDD CUP 1999 data set

K-NN Classifiers

Import KDD Dataset

Select Initial K-Nearest Neighbors

OptimizationPossible?

Set GA Parameters

Generate initial random population

Evaluate fitness of each classifier

Parent selection for next generation

Crossover

Is optimization met?

End

YES

No

No

Figure 3 shows the operations of a general genetic algorithm according to which GA is implemented into our system.

Page 6: powerpoint feb

GP Based Learning AlgorithmTraining Pseudo Code Stst , St represents the test and training data. C(x): class of x instance OCC: a composite classifier Ck : kth component classifier Ck (x): Prediction of Ck

Train-Composite Classifier (St ,OCC)Step 1: All input data examples x S∈ t are given to K component

classifiers.Step 2: Collect [C1 (x),C2 ( x), ,Ck (x)] for all x S∈ t to form a set of

prediction ClassStep 3: Start GP combining method, while using predictions as unary

function in GP tree. Threshold T is used as a variable to compute ROC curve.

Page 7: powerpoint feb

GP Based Learning Algorithm………

Pseudo Code for Classification 1. Apply composite classifier (OCC, x )to data examples x

taken from Stst . 2. X= [C1 (x),C2 ( x), ,Ck (x)], stack the predictions to form new

derived data.3. Compute OCC(x)

Page 8: powerpoint feb

Working Of Genetic Programming1. The algorithm begins by creating a random initial population.2. The algorithm then creates a sequence of new populations. At each step, the

algorithm uses the individuals in the current generation to create the next population. To create the new population, the algorithm performs the following steps:

I. Scores each member of the current population by computing its fitness value.II. Scales the raw fitness scores to convert them into a more usable range of values.III. Selects members, called parents, based on their fitness.IV. Some of the individuals in the current population that have lower fitness are

chosen as elite. These elite individuals are passed to the next population.V. Produces children from the parents. Children are produced either by making

random changes to a single parent—mutation—or by combining the vector entries of a pair of parents—crossover.

VI. Replaces the current population with the children to form the next generation.

Page 9: powerpoint feb

Dataset And Operations on Dataset

• KDD CUP 1999 dataset• Remove Redundancy• Conversion of values• Normalization• PCA• Final Corrected data

Page 10: powerpoint feb

Tools Used

• Genetic Programming Tool Kit

• Windows operating system

• 4 Gb Ram

• I5 processor

• Matlab

Page 11: powerpoint feb

RESULTS GRAPHS AND ANALYSIS

Page 12: powerpoint feb

Fitness Function

• Records :records must be maximum • Num folds :Number of folds must be minimum• K_value: k should be closer optimal• Time: time must be minimum negative• Model : highest model is preferred• Accuracy: top accurate model is preferred

f=records + num folds + K_value + Time +model + accuracy;

Page 13: powerpoint feb

Current Best individual

records

Num-folds

model

time

K-valueaccuracy

Page 14: powerpoint feb

GP Stopping Criteria

Page 15: powerpoint feb

GP Selection Function

Page 16: powerpoint feb

Confusion Matrix For Normal Class

Page 17: powerpoint feb

Confusion Matrix For DoS Class

Page 18: powerpoint feb

Confusion Matrix For R2L Class

Page 19: powerpoint feb

Confusion Matrix For U2R Class

Page 20: powerpoint feb

Confusion Matrix For Probe Class

Page 21: powerpoint feb

Confusion matrix

Page 22: powerpoint feb

• Scatter Plot of Src bytes with Count For Class using KNN

Page 23: powerpoint feb

• Scatter Plot of src bytes versus dst host same src port rate for Class using KNN

Page 24: powerpoint feb

• Roc Curve• ROC curve for GP based Classifier showing

0.99976 area under the curve

Page 25: powerpoint feb

• Classification Results using Ensemble of Classifiers

Page 26: powerpoint feb

Conclusion

• Ensemble increase the performance

• It reduces the error rates

• GP based ensembler provides better results then individual classifier

Page 27: powerpoint feb

References

• Gianluigi Folino, Giandomenico Spezzano and Clara Pizzuti, Ensemble Techniques for parallel Genetic Programming based classifier

• Michał Woz´niak, Manuel Grana, Emilio Corchado,2014, A survey of multiple classifier systems as hybrid systems, ELSEVIER.

• Urvesh Bhowan, Mark Johnston, Member, IEEE, Mengjie Zhang, Senior Member, IEEE, and Xin Yao, Fellow, IEEE, JUNE 2013, Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data, IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 17, NO. 3, JUNE 2013.

• H Nguyen, K Franke, S Petrovic Improving Effectiveness of Intrusion Detection by Correlation Feature Selection, 2010 International Conference on Availability, Reliability and Security, IEEE.

• Shelly Xiaonan Wu, Wolfgang Banzhaf. 2010. The use of computational intelligence in intrusion detection systems: A review. Applied Soft Computing 10, 1-35

• Ahmad Taher Azar, Hanaa Ismail Elshazly, Aboul Ella Hassanien, Abeer Mohamed Elkorany. 2013. A random forest classifier for lymph diseases. Computer Methods and Programs in Biomedicine.

Page 28: powerpoint feb

Thank You