MultistageSystem-BasedMachineLearningTechniquesfor...

14
ResearchArticle Multistage System-Based Machine Learning Techniques for Intrusion Detection in WiFi Network VuVietThang 1 and F. F. Pashchenko 2,3 1 MoscowInstituteofPhysicsandTechnology(StateUniversity),Moscow,Russia 2 eDepartmentofInformationandCommunicationTechnologies,MIPT(StateUniversity),Moscow,Russia 3 TrapeznikovInstituteofControlSciences,RussianAcademyofSciences,Moscow,Russia Correspondence should be addressed to Vu Viet ang; [email protected] Received 6 December 2018; Accepted 9 April 2019; Published 28 April 2019 Guest Editor: Arash H. Lashkari Copyright © 2019 Vu Viet ang and F. F. Pashchenko. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e aim of machine learning is to develop algorithms that can learn from data and solve specific problems in some context as human do. is paper presents some machine learning models applied to the intrusion detection system in WiFi network. Firstly, we present an incremental semisupervised clustering based on a graph. Incremental clustering or one-pass clustering is very useful when we work with data stream or dynamic data. In fact, for traditional clustering such as K-means, Fuzzy C-Means, DBSCAN, etc., many versions of incremental clustering have been developed. However, to the best of our knowledge, there is no incremental semisupervised clustering in the literature. Secondly, by combining a K-means algorithm and a measure of local density score, we propose a fast outlier detection algorithm, named FLDS. e complexity of FLDS is O(n 1.5 ) while the results obtained are comparable with the algorithm LOF. irdly, we introduce a multistage system-based machine learning techniques for mining the intrusion detection data applied for the 802.11 WiFi network. Finally, experiments conducted on some data sets extracted from the 802.11 networks and UCI data sets show the effectiveness of our new proposed methods. 1.Introduction Machine learning is a central problem in artificial in- telligence. e purpose of machine learning is concerned with the development of algorithms and techniques that allow computers to learn. ere are some principal kinds of machine learning such as supervised learning, unsupervised learning, and semisupervised learning. e application of machine learning techniques is very varied, for example, fault detection in bank data, transaction data, and intrusion detection system in networking, bioinformatics, natural language processing, image analysis, etc. [1]. Additionally, machine learning is very useful in cases in which human expertise does exist (robot in the Mars, in the sea, etc.), solution change in time (networking, surveillance), or so- lution needs to be adapted to particular cases. is paper focuses on developing machine learning techniques for intrusion detection systems in WiFi network. Intrusion detection system (IDS) is one of the most emerging tasks in the network connectivity. Each year, there are lots of network attacks in the world; conse- quently, the cost for solving these problems is very big, and was reported to be about 500 billion USD in 2017. is problem is a challenge not only for government/ organizations but also for individuals in daily lives. To protect the computer network system, in general, some methods can be used such as firewalls, data encryption, or user authentication. e firmware is one technique to protect the system, but nowadays, the external mecha- nisms have emerged and quickly become popular. One important method for data mining in intrusion detection problem proposed in the literature is to use machine learning techniques [2–8]. e IDS has monitored directly the network transactions where each transaction is either normal or malicious. e aim of IDS is to detect and alert network administrators when it detects a transaction that Hindawi Journal of Computer Networks and Communications Volume 2019, Article ID 4708201, 13 pages https://doi.org/10.1155/2019/4708201

Transcript of MultistageSystem-BasedMachineLearningTechniquesfor...

Page 1: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

Research ArticleMultistage System-Based Machine Learning Techniques forIntrusion Detection in WiFi Network

Vu Viet Thang 1 and F F Pashchenko23

1Moscow Institute of Physics and Technology (State University) Moscow Russia2 e Department of Information and Communication Technologies MIPT (State University) Moscow Russia3Trapeznikov Institute of Control Sciences Russian Academy of Sciences Moscow Russia

Correspondence should be addressed to Vu Viet ang vvtkhangphystechedu

Received 6 December 2018 Accepted 9 April 2019 Published 28 April 2019

Guest Editor Arash H Lashkari

Copyright copy 2019 Vu Viet ang and F F Pashchenko is is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in anymedium provided the original work isproperly cited

e aim of machine learning is to develop algorithms that can learn from data and solve specific problems in some context ashuman do is paper presents some machine learning models applied to the intrusion detection system inWiFi network Firstlywe present an incremental semisupervised clustering based on a graph Incremental clustering or one-pass clustering is very usefulwhen we work with data stream or dynamic data In fact for traditional clustering such as K-means Fuzzy C-Means DBSCANetc many versions of incremental clustering have been developed However to the best of our knowledge there is no incrementalsemisupervised clustering in the literature Secondly by combining a K-means algorithm and a measure of local density score wepropose a fast outlier detection algorithm named FLDS e complexity of FLDS is O(n15) while the results obtained arecomparable with the algorithm LOFirdly we introduce a multistage system-basedmachine learning techniques for mining theintrusion detection data applied for the 80211WiFi network Finally experiments conducted on some data sets extracted from the80211 networks and UCI data sets show the effectiveness of our new proposed methods

1 Introduction

Machine learning is a central problem in artificial in-telligence e purpose of machine learning is concernedwith the development of algorithms and techniques thatallow computers to learn ere are some principal kinds ofmachine learning such as supervised learning unsupervisedlearning and semisupervised learning e application ofmachine learning techniques is very varied for examplefault detection in bank data transaction data and intrusiondetection system in networking bioinformatics naturallanguage processing image analysis etc [1] Additionallymachine learning is very useful in cases in which humanexpertise does exist (robot in the Mars in the sea etc)solution change in time (networking surveillance) or so-lution needs to be adapted to particular cases is paperfocuses on developing machine learning techniques forintrusion detection systems in WiFi network

Intrusion detection system (IDS) is one of the mostemerging tasks in the network connectivity Each yearthere are lots of network attacks in the world conse-quently the cost for solving these problems is very big andwas reported to be about 500 billion USD in 2017 isproblem is a challenge not only for governmentorganizations but also for individuals in daily lives Toprotect the computer network system in general somemethods can be used such as firewalls data encryption oruser authentication e firmware is one technique toprotect the system but nowadays the external mecha-nisms have emerged and quickly become popular Oneimportant method for data mining in intrusion detectionproblem proposed in the literature is to use machinelearning techniques [2ndash8] e IDS has monitored directlythe network transactions where each transaction is eithernormal or malicious e aim of IDS is to detect and alertnetwork administrators when it detects a transaction that

HindawiJournal of Computer Networks and CommunicationsVolume 2019 Article ID 4708201 13 pageshttpsdoiorg10115520194708201

is an attack In some case the IDS can even immediatelyblock the connection

Generally data mining task in IDS must detect two kindsof attack including known attacks and outlier (anomaly) attacksFor the known attacks we can use a (semi-)supervised learningmethod such as neural network support vector machinerandom forest decision tree and naıve Bayes to mention a fewto construct a classifier from data training (labeled normalattacks connection) [4ndash7 9] e classifier trained is used fordetecting new connections and the supervised learning modelis illustrated in Figure 1With the outlier attacks in which we donot know its labels the trained classifier cannot detect them Inthis case we have to use another kind ofmachine learning calledunsupervised outliers detection such as LOF [10] ODIN [11]and so on e outliers detection process can be realized offlinefor some periods of time defined by usersexperts e generalschema for outlier detection is presented in Figure 2 and this isthe unsupervised learning model e aim of this schema is todetect outliers in a period of time For example of IDS systemsthe users can set a period of time from u to v for capturing thedata then the data will be transformed by the preprocessingstep and finally we can use an outlier detection method todetect attacks from the observed data

e contributions of our paper are as follows

(i) We propose an incremental semisupervised graph-based clustering To the best of our knowledge this isthe first incremental semisupervised clustering al-gorithm e preliminary work is presented in [12]

(ii) We introduce a fast outliers detection method basedon local density score and K-means clustering al-gorithm e preliminary work is introduced in [13]

(iii) We propose a multistage system-based machinelearning techniques which can boost the accuracy ofthe intrusion detection process for the 80211 WiFidata set

(iv) e experiments carefully conducted on data setextracted from Aegean WiFi Intrusion Dataset(AWID) show the effectiveness of our proposedalgorithms [14] e AWID is a publicly availablecollection of sets of data which contain real traces ofboth the normal and intrusive 80211 traffic Up todate AWID is one of the standard data sets toevaluate the capacity of IDS systems

is paper is organized as follows Section 2 presentsthe related work Section 3 introduces the new in-cremental semisupervised clustering method and a newfast outlier detection algorithm Section 4 presents ex-periments for the proposed algorithms and proposes ahybrid framework applied for the AWID data set FinallySection 5 concludes the paper and presents some directionfor further research studies

2 Incremental Clustering andOutlier Detection

21 Incremental Clustering Clustering is the task of parti-tioning a data set into k clusters in which the points in the

same cluster are similar and the points in different clustersare dissimilar e context of incremental clustering is asfollows given some current clusters the incrementalclustering is one-pass clustering kind which aims toidentify cluster label for incremental data points In-cremental clustering is very useful for data stream ordynamic data (data warehouse) In general the in-cremental clustering is combined with two processes ofinsertion and deletion Given a set of clusters the insertionstep aims to identify the labels of a new data point based onthe current clusters In some cases some new clusters willbe created or the new data points will be integrated withthe current clusters With the deletion process if we wantto remove one or some data points we need to reform theclusters because some clusters may be affected by theseoperations For each kind of clustering there are someincremental clustering algorithms proposed in the liter-ature such as Incremental K-means [15] Incre-mentalDBSCAN [16] or Incremental graph clustering[17] e key idea of these algorithms is that we need toidentify the situation for each kind of algorithm for theinsertion step and deletion step e incremental clus-tering addresses the problem of identifying the label for anew data object or updating clusters when we removepoints in the current clusters is problem is verymeaningful when we tackle with the big data in which thedata set is too big to fit into the available memory For eachkind of clustering there are some versions of incrementalclustering proposed in the literature

In [16] the Incremental density-based clustering(IncrementalDBSCAN) is introduced Based on the notionof density-based clustering the IncrementalDBSCAN canefficiently add and delete points for the current clusterse adding process of a new point has some cases forexample the new point can be noise the new point will beadded in a cluster and the new point can merge someclusters For the deletion process the point can be a noisepoint and the point can split to some clusters or not affectthe current clusters Some cases of the insertion process anddeletion process of IncrementalDBSCAN are shown inFigure 3

In [15] a single-pass incremental clustering for largedata set based on K-means is introduced (named GenIC)GenIC updates each center with each new data point andmerges clusters only at the end of a generation (ie windowof data) By a generalized incremental algorithm GenICalgorithm can move a center in the list of centers using aweighted sum of the existing center and the new pointpresented e idea of GenIC is to divide the stream of datainto chunks or windows as is common with streamingalgorithms We view each chunk of n data points as ageneration and think of the ldquofitnessrdquo of a center as beingmeasured by the number of points assigned to it In generalthe fittest centers survive to the next generation but oc-casionally new centers are selected and old centers arekilled offe GenIC is compared with K-means and shownthe effectiveness in running time and less affected by thechoice of initial centers than K-means In [18] a version ofIncremental K-means clustering is also proposed In the

2 Journal of Computer Networks and Communications

algorithm clusters are built incrementally by adding onecluster center at a time In [19] a novel two-phase staticsingle-pass algorithm as well as a dynamic two-phasesingle-pass algorithm based on Fuzzy C-means havebeen presented and are showing high utility e idea

behind the multistage methods reported in the paper is thatan estimate of the partition matrix and the location of thecluster centers can be obtained by clustering a sample of thedata A small sample is expected to produce a fast yet lessreliable estimation of the cluster centers is leads to a

Data training

Internet

Data capture Preprocessing

- Net flow tool- tcpdump

Normal traffics

Training by a (semi) supervised learning

method

Trained (semi) supervised

learning model

Attacks

Figure 1 A general model for misuse detection in IDS

Normal traffics

Internet

Data capture Preprocessing

-Net f low tool-tcpdump Attacks

Outlier detection model

Figure 2 A general model for outlier detection in IDS

Case 1 noise Case 2 creation

Case 4 mergeCase 3 absorption

px xp

xp

xp

xp

Case 1 removal Case 2 reduction

xp

Case 3 split

xp

Case 3 nonsplit

xp

Figure 3 Insertion cases (a) and deletion cases (b) of IncrementalDBSCAN [16]

Journal of Computer Networks and Communications 3

multistage approach which involves several stages ofsampling (with replacement) of the data and estimating themembership matrix for the next stage e experimentsconducted show the effectiveness of the proposed methodIn [17] Chandrasekhar et al propose an incremental localdensity clustering scheme for finding dense subgraphs instreaming data ie when data arrive incrementally(ILDC) e incremental clustering scheme captures re-dundancy in the streaming data source by finding densesubgraphs which correspond to salient objects and scenese ILDC process performs greedy operations like clusterexpansion cluster addition and cluster merging based onthe similarity between clusters defined e ILDC showsthe effectiveness when using in image-retrieval applica-tions In [20] an incremental semisupervised ensembleclustering algorithm has successfully presented namedISSCE ISSCE uses constraints to update incrementalmembers e authors develop an incremental ensemblemember selection process based on a global objectivefunction and a local objective function to remove theredundant ensemble members e experiment resultsshow the improvement of ISSCE over traditional semi-supervised clustering ensemble approaches or conven-tional cluster ensemble methods on six real-world datasetsfrom UCI machine learning repository and 12 real-worlddata sets of cancer gene expression profiles In the contextof classification we need to find the label for a new dataobject by using a classifier trained by data training eproblem of identifying the label for a new object in in-cremental clustering can be seen similar to classificationcontext

22 Outlier Detection Problem Outlier (anomaly) detectionis one of the important problems of machine learning anddata mining As mentioned in [21] outliers detection is theproblem of finding patterns in data that do not conform toexpected behaviore applications of outlier detection canbe found in many applications such as intrusion detectioncredit fraud detection video surveillance weather pre-diction discovery of criminal activities of electroniccommerce etc [9 21] ere are some kinds of outliersincluding point outliers contextual outliers and collectiveoutliers In this paper we focus on point outliers detectionthat can be applied in a variety of applications For a dataset consisting of points a point will be called outlier if it isdifferent from a large number of the rest of the points Todetect outliers there are some principal methods in theliterature such as classification methods nearest neighbormethods clustering methods statistical methods distance-based methods etc

For the classification-based outliers detection we havetwo categories multiclass and one-class anomalies de-tection methods In multiclass classification techniques weassume that the training data contain labeled points of allnormal classes e learner using a supervised learningmodel trains a model using the labeled data e classifiercan distinguish between each normal class and the rest ofthe class A test point will be called outlier if it does not

belong to any normal class In one-class outliers detectionmethods we assume that the number of normal class isonly one e classifier learns a model that can detect theboundary of the normal class If a test point does not fall inthe boundary it will be called outliers Although manytechniques have been done however the main disadvan-tage of these methods based on the availability of accuratelabels for normal classes which is not easy to apply for realapplications

For the nearest neighbor-based outlier detectionmethodswe use the assumption as follows normal points belong to thedense regions while outliers belong to the sparse regionsemost famous method of this kind is the LOF algorithm eidea of LOF is based on the local density evaluation score forpoints Each point will be assigned a score which is the ratio ofthe average local density of the k-nearest neighbors of thepoint and the local density of the data point itself Manyvariants of LOF can be cited here such as COF [22] ODIN[11] LOCI [23] etc e main drawback of the method is theO(n2) complexity required

For the clustering-based outliers detection techniquesthe idea here is using clustering methods to group data intoclusters e points do not belong to any clusters calledoutliers Some clustering methods can be detected outlierssuch as DBSCAN [24] SNN [25] etc In fact the purpose ofclustering is finding clusters so the outliers are just theproduct of the clustering process and hence are notcarefully optimized One more reason that can be madehere is the complexity of clustering techniques requiredO(n2)

In the statistical outliers detection methods thesemethods are based on the assumption as follows normaldata points occur in high-probability regions of a sto-chastic model while anomalies occur in the low-probability regions of the stochastic model Somemethods have been done for the kind of outliers detectionsIn general statistical methods fit a statistical model(Gaussian distribution the mixture of parametric statis-tical distribution etc) to the given data and then apply astatistical inference test to determine if an unseen instancebelongs to this model or not e key limitation of thesemethods is the assumption about the distribution of datapoints is assumption is not true especially when thedimension of data is high [21]

In the distance-based outliers detection methods a pointis considered as outlier if it does not have enough pctpoints in the data set that distance from this point is smallerthan the threshold value dmin [26]

3 Proposed Method

31 Semisupervised Graph-Based Clustering In recent yearssemisupervised clustering is an important research topic thatis illustrated by a number of studies introduced [27] epurpose of semisupervised clustering is to integrate sideinformation for improving the clustering performancesGenerally there are two kinds of side information includingconstraints and seeds Given a data set X constraints involvemust-link and cannot-link in which the must-link constraint

4 Journal of Computer Networks and Communications

(ML) between two observations x isin X and y isin X meansthat x and y should be in the same cluster and the cannot-link constraint (CL) means that x and y should not be inthe same cluster With seeds a small set of labeled data(called seeds) S isin X will be provided for semisupervisedclustering algorithms In fact this side information isavailable or can be collected from users [28ndash31] We cancite here the work of semisupervised clustering forK-means [32] hierarchical clustering [33] graph-basedclustering [34 35] spectral clustering [36 37] density-based clustering [38] etc While many semisupervisedclustering algorithms are introduced to the best of ourknowledge there are no incremental semisupervisedclustering algorithms in the literature

Our new incremental clustering introduced in the nextsection is based on the work of semisupervised graph-basedclustering using seeds (SSGC) We choose the SSGC al-gorithm because SSGC algorithm has several advantagessuch as SSGC use only one parameter and SSGC can detectclusters in varied density regions of data [35] SSGC in-cludes two steps as the following description (seeAlgorithm 1)

Step 1 Given a k-nearest neighbor graph presenting a dataset X this step uses a loop in which at each step all edgeswhich have the weight less than threshold θ will be re-moved e value of θ is initialized by 0 at the first step andincremented by 1 after each step is loop will stop wheneach connected component has at most one kind of seedse main clusters are identified by propagating label ineach connected component that contains seeds

Step 2 e remaining points (graph nodes) that do notbelong to any main clusters will be divided into two kindspoints that have edges which relate to one or more clustersand other points which are isolated points In the first casepoints will be assigned to the cluster with the largest relatedweight For the isolated points we can either remove them asoutliers or label them

We note that in SSGC the weight ω(xi xj) of the edge(the similarity) between two points xi and xj in the k-nearestneighbor graph is equal to the number of points that the twopoints share as the following equation

ω xi xj1113872 1113873 NN xi( 1113857capNN xj1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868 (1)

where NN(middot) is the set of k-nearest neighbors of the specifiedpoint

SSGC is efficient when compared with the semi-supervised density-based clustering in detecting clusters forbatch data however it is not adapted for data stream or datawarehousing environment where many updates (insertiondeletion) occur

32 Incremental Graph-Based Clustering Using Seeds In thissection we propose IncrementalSSGC based on the SSGCalgorithm In the IncrementalSSGC the seeds will be used totrain a k-nearest neighbor graph to construct connected

components and identify the value of θ as in SSGC algo-rithm Like other incremental clustering algorithms twoprocedures must be developed including insertion anddeletion

Algorithm 2 shows the insertion step of Incre-mentalSSGC for a new data point xnew At first the list ofedges between xnew and the current clusters is created andall edges with weight smaller than θ will be removed If thelist is empty it is illustrated that xnew is an outlier with thecurrent situation and hence xnew will be added in a tem-porary list Lo In the case of existing edges between xnew andsome connected components we need to remove someedges until xnew connects to components with one kind oflabel Finally the label of xnew will be identified by the labelof its connected components In Step 10 xnew and its relatededges will be added to L some edges between xt and xl willalso be recalculated if xnew appears in the nearest neighborslist of xt or xl In Step 12 after some insertion steps we canexamine the points in Lo

Algorithm 3 presents the detailed steps of the deletionprocess When we want to remove a point xdel from thecurrent clusters we simply remove xdel and all edges relatedwith xdel in the graph Step 2 of the algorithm shows theupdating process In this step we need to update all edgesaffected by xdel It means that all edges between xi and xj

must be updated if xdel appears in the commune list of thenearest neighbors Finally Step 3 is simply to remove alledges that have weight less than θ

321 e Complexity Analysis Now we will analyse thecomplexity of IncrementalSSGC Given a data set with nobject we recall that the complexity of SSGC is O(k times n2) inwhich k is the number of nearest neighbors Assuming thatwe have the current clusters including n objects we willanalyse the complexity of the insertion and deletion processof IncrementalSSGC at step (n + 1) as follows

For the insertion process which aims to identify thecluster label for a new data point xnew in Step 1 to createthe list of edges between xnew and the current clusters thecomplexity is O(n times k) In Steps 2 6 and 7 the complexityis just O(k) In Step 10 some edges between xt and xl willalso be recalculated if xnew appears in the nearest neighborslist of xt or xl in fact the number of such edges is alsosmall So for the insertion of a new point the complexity isO(n times k)

For the deletion process the complexity of Step 1 isO(k) In Steps 2 and 3 the number of edges updated is thenumber of edges that received xdel as commune points andthe value of commune points depends on the data set Let qbe the average value of v deletion processes in fact q isdetermined by perfoming experiments So the complexityof a deletion process is O(q times n times k)

In summary with the analysis of the insertion anddeletion process above we can see that it is very useful fordata set that we usually need to update In the next sectionwe also present the running time of both SSGC andIncrementalSSGC for some data sets extracted from in-trusion detection problem

Journal of Computer Networks and Communications 5

33 A Fast Outlier Detection Method Given a k-nearestneighbors graph (k-NNG) the local density score LDS of avertex u isin k-NNG is defined as follows [39]

LDS(u) 1113936qisinNN(u)ω(u q)

k (2)

in whichω is calculated as in equation (1) and k is the numberof nearest neighbors used e LDS is used as an indicator of

the density of the region of a vertex u e LDS value is in theinterval of [0 kndash 1] the larger the LDS of u the denser theregion that u belongs to and vice versa So we can apply theway of LDSrsquos calculation to identify outliers To detect outlierby this method we have to use a parameter as the thresholdthe point which has LDS value smaller than the threshold canbe seen as an outlier and vice versa Similar to LOF themethod has required O(n2) of complexity

Input X number of neighbors k a set of seeds SOutput A set of detected clustersoutliersPROCESS

(1) Constructing the k-NN graph of X(2) θ 0(3) repeat(4) Constructing the connected components using the threshold θ(5) θ θ + 1(6) until the cut condition is satisfied(7) Propagating the labels to form the principal clusters(8) Constructing the final clusters

ALGORITHM 1 e algorithm SSGC [35]

Input a new data object xnew a set of current clusters C list containing edges for each point of current clusters L θ (threshold)and number of nearest neighbors (NN) kOutput label for xnewProcess

(1) Create the k-nearest neighbors list of edges (LE) between xnew and all current clusters(2) Delete all (u v) isin LE weight(u v)lt θ(3) if (LE empty) then(4) xnew is added in a temporary list Lo(5) else(6) If xnew related to two or more components with different label then(7) Delete edges in LE with ascending order of weight until xnew connecting with components with at most one kind of label(8) end if(9) Get label for xnew and its connected points (if any) by propagating(10) Update list L adding edges relating to xnew to L some edges between xt and xl will also be recalculated if xnew appears in the

nearest neighbors list of xt or xl(11) end if(12) Examine points in Lo

ALGORITHM 2 IncrementalSSGC insertion process

Input an object xdel in a component will be deleted a set of current clusters C list of edges for each point of current clusters L θ(threshold)Output the updated C the updated LProcess

(1) Delete xdel and all edges related to xdel in L(2) Update all weights (k l) isin C xdel isin NN(k)capNN(l)

(3) Delete all updated (at Step 2) (k l) isin L weight(t l)lt θ

ALGORITHM 3 IncrementalSSGC deletion process

6 Journal of Computer Networks and Communications

To reduce the running time of the method we proposea Fast outlier detection method based on Local DensityScore called FLDS e basic idea of the algorithm FLDSis to use divide-and-conquer strategy Given a data set X tofind outliers first the input data set will be split into kclusters using K-means algorithm Next k-nearestneighbor graphs will be used for each cluster and identifyoutlier on each local cluster e outliers found in allclusters will be recalculated on the whole data set e ideaof divide-and-conquer strategies by using the K-means inthe preprocessing step has been successfully applied insolving some problems such as fast spectral clusteringproblem [40] and fast minimum spanning tree problem[41] and in the efficient and effective shape-based clus-tering paper [42] e FLDS algorithm is described inAlgorithm 4

e FLDS algorithm is an outlierrsquos detection methodbased on K-means and local density score using graph ecomplexity of FLDS is O(n times k) + O(k2) + O(t times n) inwhich the value of k may be used up to n05 [41 42] t≪ n isevaluated approximately equal to k so the complexity of theFLDS is O(n15)

4 Experiment Results

is section aims to evaluate the effectiveness of our pro-posed algorithms We will show the results of the Incre-mentalSSGC the results of FLDS and the results when usingour methods for a hybrid framework for intrusion detectionproblem e IncrementalSSGC will be compared with theIncrementalDBSCAN while the FLDS will be comparedwith the LOF

e data sets used in the experiments are mostlyextracted from the Aegean WiFi Intrusion Dataset (AWID)[14] AWID is a publicly available collection of sets of data inan easily distributed format which contain real traces ofboth the normal and intrusive 80211 traffic In the AWIDmany kinds of attacks have been introduced but they alsofall into four main categories including flooding injectionand impersonation e AWID has 156 attributes we use 35attributes extracted by an artificial neural network as pre-sented in [8] We also use some supplement data sets thatcome from UCI [43] and data sets with different size shapeand density and contain noise points as well as special ar-tifacts [44] in this experiment

41 Experiment Setup

411 Data Sets for Incremental Clustering Algorithms Toshow the effectiveness of the IncrementalSSGC two aspectswill be examined including the running time and accuracy 5UCI data sets and 3 data sets extracted from AWID willbe used for testing IncrementalSSGC and Incre-mentalDBSCAN e details of these data sets are presentedin Table 1

To evaluate clustering results the Rand Index is usedGiven a data set X with n points for clustering P1 is an arraycontaining the true labels P2 is an array containing the

results of a clustering algorithm the Rand Index (RI) iscalculated as follows

RI a + b

(n(2(nminus 2))) (3)

in which ab is the number of pairs that are in the samedifferent clusters in both partitions P1 and P2 e bigger theRand Index the better the result

412 Data Sets for FLDS and LOF We used 5 data setsextracted from AWDI and four 2D data sets includingDS1 (10000 points) DS2 (8000 points) DS3 (8000 points)and DS4 (8000 points) [44] for FLDS and LOF ese 2Ddata sets have clusters of different size shape and ori-entation as well as random noise points and special ar-tifacts e details of these AWID data sets are presentedin Table 2

To compare LOF and FLDS for AWID data sets we usethe ROC measure that has two factors including FalsePositive (False Alarm) Rate (FPR) and False Negative (MissDetection) Rate (FNR) e detail of these factors is shownin the following equations

FPR FP

FP + TN (4)

FNR FN

TP + FN (5)

in which True Positive (TP) is the number of attacks cor-rectly classified as attack True Negative (TN) is the numberof normal correctly detected as normal False Positive (FP) isthe number of normal falsely classified as attacks namelyfalse alarm and False Negative (FN) is the number of attacksfalsely detected as normal

To combine FPR and FNR values we calculate the HalfTotal Error Rate (HTER) that is similar to the evaluationmethod used in [11] defined as follows

HTER FPR + FNR

2 (6)

42 Clustering Results We note that there is no incrementalsemisupervised clustering algorithm in the literature So wecompare the performance obtained by our algorithm and theIncrementalDBSCAN algorithm IncrementalDBSCAN canbe seen as the state of the art among Incremental clusteringproposed e algorithm can detect clusters with differentsize and shape with noises Because both SSGC andIncrementalSSGC produce the same results we just show theresults for IncrementalSSGC and IncrementalDBSCANeresults are shown in Figure 4

We can see from the figure that the IncrementalSSGCobtains better results compared with the Incre-mentalDBSCAN It can be explained by the fact that theIncrementalDBSCAN cannot detect clusters with differentdensities as mentioned in the paper we assumed that theparameter values Eps and MinPts of DBSCAN do notchange significantly when inserting and deleting objects

Journal of Computer Networks and Communications 7

is assumption means that the IncrementalDBSCANcannot work well with the data set having dierent den-sities In contrary to IncrementalDBSCAN the algorithmIncrementalSSGC does not depend on the density of thedata because the similarity measure used is based on sharednearest neighbors

421 Running Time Comparison Figure 5 presents therunning time for IncrementalSSGC and Incre-mentalDBSCAN for three AWID data sets We can see therunning time of both algorithms is similar It can beexplained by the fact that both algorithms use k-nearestneighbor to nd clusters for each step of incremental Wealso present the running time of the SSGC algorithm forreference purpose From this experiment we can see ad-vantages of the incremental clustering algorithms

43eResults of FLDSandLOF Table 3 presents the resultsobtained by FLDS and LOF for 5 AWID data sets We cansee that the results of FLDS are comparable with the al-gorithm LOF e parameters used for both methods areshown in Table 4 For some 2D data sets Figure 6 presentsthe results obtained by FLDS and LOF Intuitively theoutliers detected by both methods are mostly similar Wecan explain the results by the fact that the strategy forevaluating a point is outlier or not based on local densityscore

Figures 7 and 8 illustrate the running time comparisonbetween FLDS and LOF With 4 data sets mentioned aboveit can be seen from the gure that the calculation time ofFLDS is about 12 times faster than the running time of LOFis is the signicant improvement compared with LOF Itcan be explained by the fact that the complexity of FLDS isjust O(n15) compared with O(n2) of LOF

Input a data set X with n points the number of nearest neighbors k number of clusters nc thetaOutput outliers of XProcess

(1) Using K-means to split X into nc clusters(2) Using LDS algorithm on each separate cluster to obtain local outliers (using the threshold theta)(3) e local outliers obtained in Step 2 will be recalculated LDSrsquos value across the data set

ALGORITHM 4 Algorithm FLDS

Table 1 Main characteristics for clustering evaluation

ID Data Normal + impers Attributes Clusters1 Iris 150 4 33 Wine 178 13 32 E coli 336 8 84 Breast 569 30 25 Yeast 1484 8 106 AWID1 5000 35 27 AWID2 8000 35 28 AWID3 12000 35 2

Table 2 Main characteristics for FLDS and LOF

ID Data Objects Categories1 O-AWID1 3030 Impers currenooding injections3 O-AWID2 5030 Impers normal currenooding2 O-AWID3 7040 Flooding normal impers4 O-AWID4 10040 Normal impers injection currenooding5 O-AWID5 15050 Normal currenooding injection and impers

IncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Rand

Inde

x

2 3 4 5 6 7 81

Figure 4 Clustering results obtained by IncrementalDBSCAN andIncrementalSSGC for 8 data set of Table 1 respectively

8 Journal of Computer Networks and Communications

44 A Framework for Intrusion Detection in 80211 NetworksIn this section we propose a multistage system-basedmachine learning techniques applied for the AWDI dataset e detail of our system is presented in Figure 9 reecomponents are used for intrusion detection task a su-pervised learning model (J48 Bayes random forest sup-port vector machine neural network etc) trained bylabeled data set and this model can be seen as misusedetection component an outlier detection method (LOFFLDS etc) is optionally used to detect new attacks in someperiods of time additionally for the AWID data sets aspresented above it is very disectcult to detect impersonationattacks so we use an Incremental clustering algorithm(IncrementalDBSCAN IncrementalSSGC etc) for furthernding this kind of attack

In this experiment we use J48 for the misuse detectionprocess and IncrementalSSGC for the detecting impersonation

attacks In the outliers detection step we propose to use FLDSor LOF and the results have been presented in the subsectionabove Because the outliers detection step can be realized oumlinefor some periods of time we just show the results obtained bycombining J48 and IncrementalSSGCe confusionmatrix of

AWID1 AWID2 AWID3

SSGCIncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Tim

e (m

inut

es)

Figure 5 Running time comparison between IncrementalSSGCand IncrementalDBSCAN

Table 4 e parameters used in data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS (k ncθ)

(25 306)

(25 306)

(25 306)

(25 456)

(25 456)

LOF (MinPtsη) (27 12) (27 12) (25 12) (25 12) (27 12)

k the number of nearest neighbors nc number of cluster used θ thethreshold

Table 3 e HTER measure of LOF and FLDS (the smaller thebetter) for some extracted AWID data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS 013 012 010 011 006LOF 023 011 011 009 009

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(a)

Figure 6 Continued

Journal of Computer Networks and Communications 9

these results is illustrated in Table 5e total accuracy obtained989 compared with 9626 in the paper [14] We can ex-plain the results obtained by IncrementalSSGC by the fact thatthe algorithm used the distance based on shared nearest

neighbors which overcome the limit of transitional distancemeasures such as Euclidean or Minskovki distance and theshared nearest neighbors measure does not depend on thedensity of datais proposed system is generally called hybridmethod which is one of the best strategies in developing In-trusion Detection Systems [7 9] in which there is no singleclassier that can exactly detect all kinds of classes

We also note that for real applications whenever anattack appears the system needs to immediately produce awarning e multistage system-based machine learningtechniques provide a solution for users for constructing thereal IDSIPS system that is one of the most importantproblems in the network security

5 Conclusion

is paper introduces an incremental semisupervisedgraph-based clustering and a fast outlier detection

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(b)

Figure 6 Results of LOF (a) and FLDS (b) on some 2D data setsthe outliers marked as red plus

LOFFLDS

0

20

40

60

80

Tim

e (m

inut

es)

2 3 41

Figure 7 Running time comparison between FLDS and LOF forfour 2D data sets

LOFFLDS

0

50

100

150

200

250

300Ti

me (

min

utes

)

2 3 4 51

Figure 8 Running time comparison between FLDS and LOF forve AWID data sets

10 Journal of Computer Networks and Communications

method Both methods can be used in a hybrid frameworkfor the intrusion detection problem of WiFi data sets(AWID) Our proposed multistage system-based machinelearning techniques provide a solution to guideline forconstructing the real IDSIPS system that is one of themost important problems in the network security Ex-periments conducted on the extracted data sets from theAWID and UCI show the eectiveness of our proposedmethods In the near future we will continue to

develop other kinds of machine learning methods forintrusion detection problem and test for other experi-mental setup

Data Availability

e data used to support the ndings of this study can bedownloaded from the AWID repository (httpicsdwebaegeangrawiddownloadhtml)

InternetAccess point

Data capture

Load data Data preprocessing

Training by J48

Block attacks

Block attacksldquoimpersonationrdquo

Skip normal traffics

Classifier (J48)

Incremental clustering model

(ISSGC)

Anomaly detection model

(FLDS)

Marking a new attack by a

security expert

Database of marked attacks

Data training

Setting parameters of incremental

clustering

Intrusion detection system for local wireless network based on machine learning methods

Attacks

Impersonationattacks

Normal traffics

Normaltraffics

Outliers

Marked attacks

Figure 9 A new framework for intrusion detection in 80211 networks

Table 5 Confusion matrix for AWID data set using J48 and IncrementalSSGC in proposed framework

Normal Flooding Impersonation Injection Classication530588 116 6 75 Normal2553 5544 0 0 Flooding2 0 16680 0 Injection3297 148 0 16364 Impersonation

Journal of Computer Networks and Communications 11

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] K P Murphy Machine Learning A Probabilistic PerspectiveMIT press Cambridge MA USA 2012

[2] B Ahmad W Jian and Z Anwar Ali ldquoRole of machinelearning and data mining in internet security standing statewith future directionsrdquo Journal of Computer Networks andCommunications vol 2018 Article ID 6383145 10 pages2018

[3] T Bakhshi and B Ghita ldquoOn internet traffic classification atwo-phasedmachine learning approachrdquo Journal of ComputerNetworks and Communications vol 2016 Article ID 204830221 pages 2016

[4] B Luo and J Xia ldquoA novel intrusion detection system basedon feature generation with visualization strategyrdquo ExpertSystems with Applications vol 41 no 9 pp 4139ndash4147 2014

[5] W-C Lin S-W Ke and C-F Tsai ldquoCANN an intrusiondetection system based on combining cluster centers andnearest neighborsrdquo Knowledge-Based Systems vol 78pp 13ndash21 2015

[6] C-F Tsai and C-Y Lin ldquoA triangle area based nearestneighbors approach to intrusion detectionrdquo Pattern Recog-nition vol 43 no 1 pp 222ndash229 2010

[7] F Kuang S Zhang Z Jin and W Xu ldquoA novel SVM bycombining kernel principal component analysis and im-proved chaotic particle swarm optimization for intrusiondetectionrdquo Soft Computing vol 19 no 5 pp 1187ndash1199 2015

[8] M E Aminanto and K Kim ldquoDetecting impersonation attackin WiFi networks using deep learning approachrdquo in In-formation Security Applications D Choi and S Guilley EdsVol 10144 Springer Berlin Germany 2016

[9] A A Aburomman and M BI Reaz ldquoA novel SVM-kNN-PSO ensemble method for intrusion detection systemrdquo Ap-plied Soft Computing vol 38 pp 360ndash372 2016

[10] M M Breunig H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of the2000 ACM SIGMOD International Conference on Manage-ment of Data pp 93ndash104 Dallas TX USA May 2000

[11] V Hautamaki I Karkkainen and P Franti ldquoOutlier detectionusing k-nearest neighbour graphrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition pp 430ndash433Cambridge MA USA August 2004

[12] V ang and F F Pashchenko ldquoA new incremental semi-supervised graph based clusteringrdquo in Proceedings of the IEEEInternational Conference on Engineering andTelecommunication Moscow Russia March 2018

[13] V ang D V Pantiukhin and A N Nazarov ldquoFLDS fastoutlier detection based on local density scorerdquo in Proceedingsof the IEEE International Conference on Engineering andTelecommunication pp 137ndash141 Moscow Russia November2016

[14] C Kolias G Kambourakis A Stavrou and S GritzalisldquoIntrusion detection in 80211 networks empirical evaluationof threats and a public datasetrdquo IEEE Communications Sur-veys amp Tutorials vol 18 no 1 pp 184ndash208 2016

[15] Ch Gupta and R Grossman ldquoGenIca single pass generalizedincremental algorithm for clusteringrdquo in Proceedings of theFourth SIAM International Conference on Data Miningpp 147ndash153 Lake Buena Vista FL USA April 2004

[16] M Ester H-P Kriegel J Sander M Wimmer and X XuldquoIncremental clustering for mining in a data warehousingenvironmentrdquo in Proceedings of the International Conferenceon Very Large Data Bases pp 323ndash333 New York NY USAAugust 1998

[17] V Chandrasekhar C Tan M Wu L Li X Li and J-H LimldquoIncremental graph clustering for efficient retrieval fromstreaming egocentric video datardquo in Proceedings of the In-ternational Conference on Pattern Recognition pp 2631ndash2636Stockholm Sweden August 2014

[18] A M Bagirov J Ugon and D Webb ldquoFast modified globalK-means algorithm for incremental cluster constructionrdquoPattern Recognition vol 44 no 4 pp 866ndash876 2011

[19] A Bryant D E Tamir N D Rishe and K AbrahamldquoDynamic incremental fuzzy C-means clusteringrdquo in Pro-ceedings of the Sixth International Conference on PervasivePatterns and Applications Venice Italy May 2014

[20] Z Yu P Luo J You et al ldquoIncremental semi-supervisedclustering ensemble for high dimensional data clusteringrdquoIEEE Transactions on Knowledge and Data Engineeringvol 28 no 3 pp 701ndash714 2016

[21] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 3pp 1ndash58 2009

[22] J Tang Z Chen A Fu and D Cheung ldquoEnhancing effec-tiveness of outlier detections for low density patternsrdquo inProceedings of the Sixth Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD) Taipei Taiwan May2002

[23] S Papadimitriou H Kitagawa P B Gibbons andC Faloutsos ldquoLoci fast outlier detection using the localcorrelation integralrdquo in Proceedings of the 19th InternationalConference on Data Engineering pp 315ndash326 BangaloreIndia March 2003

[24] M Ester H-P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databaseswith noiserdquo in Proceedings of the Conference on KnowledgeDiscovery and Data Mining (KDD) pp 226ndash231 PortlandOR USA August 1996

[25] L Ertoz M Steinbach and V Kumar ldquoFinding clusters ofdifferent sizes shapes and densities in noisy high di-mensional datardquo in Proceedings of the SIAM InternationalConference on Data Mining pp 47ndash58 San Francisco CAUSA May 2003

[26] E M Knorr R T Ng and V Tucakov ldquoDistance-basedoutliers algorithms and applicationsrdquo e InternationalJournal on Very Large Data Bases vol 8 no 3-4 pp 237ndash2532000

[27] S Basu I Davidson and K L Wagstaff ldquoConstrainedclustering advances in algorithms theory and applicationsrdquoin Chapman and HallCRC Data Mining and KnowledgeDiscovery Series CRC Press Boca Raton FL USA 1stedition 2008

[28] A A Abin ldquoClustering with side information further effortsto improve efficiencyrdquo Pattern Recognition Letters vol 84pp 252ndash258 2016

[29] Y Shi C Otto and A K Jain ldquoFace clustering representationand pairwise constraintsrdquo IEEE Transactions on InformationForensics and Security vol 13 no 7 pp 1626ndash1640 2018

[30] A A Abin and B Hamid ldquoActive constrained fuzzy clus-tering a multiple kernels learning approachrdquo Pattern Rec-ognition vol 48 no 3 pp 953ndash967 2015

[31] S Xiong J Azimi and X Z Fern ldquoActive learning of con-straints for semi-supervised clusteringrdquo IEEE Transactions on

12 Journal of Computer Networks and Communications

Knowledge and Data Engineering vol 26 no 1 pp 43ndash542014

[32] K Wagstaff C Cardie S Rogers and S Schrodl ldquoCon-strained K-means clustering with background knowledgerdquo inProceedings of the International Conference on MachineLearning (ICML) pp 577ndash584 Williamstown MA USAJune 2001

[33] I Davidson and S S Ravi ldquoUsing instance-level constraints inagglomerative hierarchical clustering theoretical and em-pirical resultsrdquoDataMining and Knowledge Discovery vol 18no 2 pp 257ndash282 2009

[34] B Kulis S Basu I Dhillon and R Mooney ldquoSemi-supervisedgraph clustering a kernel approachrdquo Machine Learningvol 74 no 1 pp 1ndash22 2009

[35] V-V Vu ldquoAn efficient semi-supervised graph based clus-teringrdquo Intelligent Data Analysis vol 22 no 2 pp 297ndash3072018

[36] X Wang B Qian and I Davidson ldquoOn constrained spectralclustering and its applicationsrdquo Data Mining and KnowledgeDiscovery vol 28 no 1 pp 1ndash30 2014

[37] D Mavroeidis ldquoAccelerating spectral clustering with partialsupervisionrdquo Data Mining and Knowledge Discovery vol 21no 2 pp 241ndash258 2010

[38] L Lelis and J Sander ldquoSemi-supervised density-based clus-teringrdquo in Proceeding of IEEE International Conference onData Mining pp 842ndash847 Miami FL USA December 2009

[39] D-D Le and S Satoh ldquoUnsupervised face annotation bymining the webrdquo in Proceedings of the 8th IEEE InternationalConference on Data Mining pp 383ndash392 Pisa Italy De-cember 2008

[40] D Yan L Huang and M I Jordan ldquoFast approximatespectral clusteringrdquo in Proceedings of the Conference onKnowledge Discovery and Data Mining (KDD) pp 907ndash916New York NY USA June 2009

[41] C Zhong M Malinen D Miao and P Franti ldquoA fastminimum spanning tree algorithm based on K-meansrdquo In-formation Sciences vol 295 pp 1ndash17 2015

[42] V Chaoji M Al Hasan S Salem andM J Zaki ldquoSPARCL aneffective and efficient algorithm for mining arbitrary shape-based clustersrdquo Knowledge and Information Systems vol 21no 2 pp 201ndash229 2009

[43] A Asuncion and D J Newman UCI Machine LearningRepository American Statistical Association Boston MAUSA 2015 httparchiveicsuciedumlindexphp

[44] G Karypis E-H Han and V Kumar ldquoChameleon hierar-chical clustering using dynamic modelingrdquo Computer vol 32no 8 pp 68ndash75 1999

Journal of Computer Networks and Communications 13

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

is an attack In some case the IDS can even immediatelyblock the connection

Generally data mining task in IDS must detect two kindsof attack including known attacks and outlier (anomaly) attacksFor the known attacks we can use a (semi-)supervised learningmethod such as neural network support vector machinerandom forest decision tree and naıve Bayes to mention a fewto construct a classifier from data training (labeled normalattacks connection) [4ndash7 9] e classifier trained is used fordetecting new connections and the supervised learning modelis illustrated in Figure 1With the outlier attacks in which we donot know its labels the trained classifier cannot detect them Inthis case we have to use another kind ofmachine learning calledunsupervised outliers detection such as LOF [10] ODIN [11]and so on e outliers detection process can be realized offlinefor some periods of time defined by usersexperts e generalschema for outlier detection is presented in Figure 2 and this isthe unsupervised learning model e aim of this schema is todetect outliers in a period of time For example of IDS systemsthe users can set a period of time from u to v for capturing thedata then the data will be transformed by the preprocessingstep and finally we can use an outlier detection method todetect attacks from the observed data

e contributions of our paper are as follows

(i) We propose an incremental semisupervised graph-based clustering To the best of our knowledge this isthe first incremental semisupervised clustering al-gorithm e preliminary work is presented in [12]

(ii) We introduce a fast outliers detection method basedon local density score and K-means clustering al-gorithm e preliminary work is introduced in [13]

(iii) We propose a multistage system-based machinelearning techniques which can boost the accuracy ofthe intrusion detection process for the 80211 WiFidata set

(iv) e experiments carefully conducted on data setextracted from Aegean WiFi Intrusion Dataset(AWID) show the effectiveness of our proposedalgorithms [14] e AWID is a publicly availablecollection of sets of data which contain real traces ofboth the normal and intrusive 80211 traffic Up todate AWID is one of the standard data sets toevaluate the capacity of IDS systems

is paper is organized as follows Section 2 presentsthe related work Section 3 introduces the new in-cremental semisupervised clustering method and a newfast outlier detection algorithm Section 4 presents ex-periments for the proposed algorithms and proposes ahybrid framework applied for the AWID data set FinallySection 5 concludes the paper and presents some directionfor further research studies

2 Incremental Clustering andOutlier Detection

21 Incremental Clustering Clustering is the task of parti-tioning a data set into k clusters in which the points in the

same cluster are similar and the points in different clustersare dissimilar e context of incremental clustering is asfollows given some current clusters the incrementalclustering is one-pass clustering kind which aims toidentify cluster label for incremental data points In-cremental clustering is very useful for data stream ordynamic data (data warehouse) In general the in-cremental clustering is combined with two processes ofinsertion and deletion Given a set of clusters the insertionstep aims to identify the labels of a new data point based onthe current clusters In some cases some new clusters willbe created or the new data points will be integrated withthe current clusters With the deletion process if we wantto remove one or some data points we need to reform theclusters because some clusters may be affected by theseoperations For each kind of clustering there are someincremental clustering algorithms proposed in the liter-ature such as Incremental K-means [15] Incre-mentalDBSCAN [16] or Incremental graph clustering[17] e key idea of these algorithms is that we need toidentify the situation for each kind of algorithm for theinsertion step and deletion step e incremental clus-tering addresses the problem of identifying the label for anew data object or updating clusters when we removepoints in the current clusters is problem is verymeaningful when we tackle with the big data in which thedata set is too big to fit into the available memory For eachkind of clustering there are some versions of incrementalclustering proposed in the literature

In [16] the Incremental density-based clustering(IncrementalDBSCAN) is introduced Based on the notionof density-based clustering the IncrementalDBSCAN canefficiently add and delete points for the current clusterse adding process of a new point has some cases forexample the new point can be noise the new point will beadded in a cluster and the new point can merge someclusters For the deletion process the point can be a noisepoint and the point can split to some clusters or not affectthe current clusters Some cases of the insertion process anddeletion process of IncrementalDBSCAN are shown inFigure 3

In [15] a single-pass incremental clustering for largedata set based on K-means is introduced (named GenIC)GenIC updates each center with each new data point andmerges clusters only at the end of a generation (ie windowof data) By a generalized incremental algorithm GenICalgorithm can move a center in the list of centers using aweighted sum of the existing center and the new pointpresented e idea of GenIC is to divide the stream of datainto chunks or windows as is common with streamingalgorithms We view each chunk of n data points as ageneration and think of the ldquofitnessrdquo of a center as beingmeasured by the number of points assigned to it In generalthe fittest centers survive to the next generation but oc-casionally new centers are selected and old centers arekilled offe GenIC is compared with K-means and shownthe effectiveness in running time and less affected by thechoice of initial centers than K-means In [18] a version ofIncremental K-means clustering is also proposed In the

2 Journal of Computer Networks and Communications

algorithm clusters are built incrementally by adding onecluster center at a time In [19] a novel two-phase staticsingle-pass algorithm as well as a dynamic two-phasesingle-pass algorithm based on Fuzzy C-means havebeen presented and are showing high utility e idea

behind the multistage methods reported in the paper is thatan estimate of the partition matrix and the location of thecluster centers can be obtained by clustering a sample of thedata A small sample is expected to produce a fast yet lessreliable estimation of the cluster centers is leads to a

Data training

Internet

Data capture Preprocessing

- Net flow tool- tcpdump

Normal traffics

Training by a (semi) supervised learning

method

Trained (semi) supervised

learning model

Attacks

Figure 1 A general model for misuse detection in IDS

Normal traffics

Internet

Data capture Preprocessing

-Net f low tool-tcpdump Attacks

Outlier detection model

Figure 2 A general model for outlier detection in IDS

Case 1 noise Case 2 creation

Case 4 mergeCase 3 absorption

px xp

xp

xp

xp

Case 1 removal Case 2 reduction

xp

Case 3 split

xp

Case 3 nonsplit

xp

Figure 3 Insertion cases (a) and deletion cases (b) of IncrementalDBSCAN [16]

Journal of Computer Networks and Communications 3

multistage approach which involves several stages ofsampling (with replacement) of the data and estimating themembership matrix for the next stage e experimentsconducted show the effectiveness of the proposed methodIn [17] Chandrasekhar et al propose an incremental localdensity clustering scheme for finding dense subgraphs instreaming data ie when data arrive incrementally(ILDC) e incremental clustering scheme captures re-dundancy in the streaming data source by finding densesubgraphs which correspond to salient objects and scenese ILDC process performs greedy operations like clusterexpansion cluster addition and cluster merging based onthe similarity between clusters defined e ILDC showsthe effectiveness when using in image-retrieval applica-tions In [20] an incremental semisupervised ensembleclustering algorithm has successfully presented namedISSCE ISSCE uses constraints to update incrementalmembers e authors develop an incremental ensemblemember selection process based on a global objectivefunction and a local objective function to remove theredundant ensemble members e experiment resultsshow the improvement of ISSCE over traditional semi-supervised clustering ensemble approaches or conven-tional cluster ensemble methods on six real-world datasetsfrom UCI machine learning repository and 12 real-worlddata sets of cancer gene expression profiles In the contextof classification we need to find the label for a new dataobject by using a classifier trained by data training eproblem of identifying the label for a new object in in-cremental clustering can be seen similar to classificationcontext

22 Outlier Detection Problem Outlier (anomaly) detectionis one of the important problems of machine learning anddata mining As mentioned in [21] outliers detection is theproblem of finding patterns in data that do not conform toexpected behaviore applications of outlier detection canbe found in many applications such as intrusion detectioncredit fraud detection video surveillance weather pre-diction discovery of criminal activities of electroniccommerce etc [9 21] ere are some kinds of outliersincluding point outliers contextual outliers and collectiveoutliers In this paper we focus on point outliers detectionthat can be applied in a variety of applications For a dataset consisting of points a point will be called outlier if it isdifferent from a large number of the rest of the points Todetect outliers there are some principal methods in theliterature such as classification methods nearest neighbormethods clustering methods statistical methods distance-based methods etc

For the classification-based outliers detection we havetwo categories multiclass and one-class anomalies de-tection methods In multiclass classification techniques weassume that the training data contain labeled points of allnormal classes e learner using a supervised learningmodel trains a model using the labeled data e classifiercan distinguish between each normal class and the rest ofthe class A test point will be called outlier if it does not

belong to any normal class In one-class outliers detectionmethods we assume that the number of normal class isonly one e classifier learns a model that can detect theboundary of the normal class If a test point does not fall inthe boundary it will be called outliers Although manytechniques have been done however the main disadvan-tage of these methods based on the availability of accuratelabels for normal classes which is not easy to apply for realapplications

For the nearest neighbor-based outlier detectionmethodswe use the assumption as follows normal points belong to thedense regions while outliers belong to the sparse regionsemost famous method of this kind is the LOF algorithm eidea of LOF is based on the local density evaluation score forpoints Each point will be assigned a score which is the ratio ofthe average local density of the k-nearest neighbors of thepoint and the local density of the data point itself Manyvariants of LOF can be cited here such as COF [22] ODIN[11] LOCI [23] etc e main drawback of the method is theO(n2) complexity required

For the clustering-based outliers detection techniquesthe idea here is using clustering methods to group data intoclusters e points do not belong to any clusters calledoutliers Some clustering methods can be detected outlierssuch as DBSCAN [24] SNN [25] etc In fact the purpose ofclustering is finding clusters so the outliers are just theproduct of the clustering process and hence are notcarefully optimized One more reason that can be madehere is the complexity of clustering techniques requiredO(n2)

In the statistical outliers detection methods thesemethods are based on the assumption as follows normaldata points occur in high-probability regions of a sto-chastic model while anomalies occur in the low-probability regions of the stochastic model Somemethods have been done for the kind of outliers detectionsIn general statistical methods fit a statistical model(Gaussian distribution the mixture of parametric statis-tical distribution etc) to the given data and then apply astatistical inference test to determine if an unseen instancebelongs to this model or not e key limitation of thesemethods is the assumption about the distribution of datapoints is assumption is not true especially when thedimension of data is high [21]

In the distance-based outliers detection methods a pointis considered as outlier if it does not have enough pctpoints in the data set that distance from this point is smallerthan the threshold value dmin [26]

3 Proposed Method

31 Semisupervised Graph-Based Clustering In recent yearssemisupervised clustering is an important research topic thatis illustrated by a number of studies introduced [27] epurpose of semisupervised clustering is to integrate sideinformation for improving the clustering performancesGenerally there are two kinds of side information includingconstraints and seeds Given a data set X constraints involvemust-link and cannot-link in which the must-link constraint

4 Journal of Computer Networks and Communications

(ML) between two observations x isin X and y isin X meansthat x and y should be in the same cluster and the cannot-link constraint (CL) means that x and y should not be inthe same cluster With seeds a small set of labeled data(called seeds) S isin X will be provided for semisupervisedclustering algorithms In fact this side information isavailable or can be collected from users [28ndash31] We cancite here the work of semisupervised clustering forK-means [32] hierarchical clustering [33] graph-basedclustering [34 35] spectral clustering [36 37] density-based clustering [38] etc While many semisupervisedclustering algorithms are introduced to the best of ourknowledge there are no incremental semisupervisedclustering algorithms in the literature

Our new incremental clustering introduced in the nextsection is based on the work of semisupervised graph-basedclustering using seeds (SSGC) We choose the SSGC al-gorithm because SSGC algorithm has several advantagessuch as SSGC use only one parameter and SSGC can detectclusters in varied density regions of data [35] SSGC in-cludes two steps as the following description (seeAlgorithm 1)

Step 1 Given a k-nearest neighbor graph presenting a dataset X this step uses a loop in which at each step all edgeswhich have the weight less than threshold θ will be re-moved e value of θ is initialized by 0 at the first step andincremented by 1 after each step is loop will stop wheneach connected component has at most one kind of seedse main clusters are identified by propagating label ineach connected component that contains seeds

Step 2 e remaining points (graph nodes) that do notbelong to any main clusters will be divided into two kindspoints that have edges which relate to one or more clustersand other points which are isolated points In the first casepoints will be assigned to the cluster with the largest relatedweight For the isolated points we can either remove them asoutliers or label them

We note that in SSGC the weight ω(xi xj) of the edge(the similarity) between two points xi and xj in the k-nearestneighbor graph is equal to the number of points that the twopoints share as the following equation

ω xi xj1113872 1113873 NN xi( 1113857capNN xj1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868 (1)

where NN(middot) is the set of k-nearest neighbors of the specifiedpoint

SSGC is efficient when compared with the semi-supervised density-based clustering in detecting clusters forbatch data however it is not adapted for data stream or datawarehousing environment where many updates (insertiondeletion) occur

32 Incremental Graph-Based Clustering Using Seeds In thissection we propose IncrementalSSGC based on the SSGCalgorithm In the IncrementalSSGC the seeds will be used totrain a k-nearest neighbor graph to construct connected

components and identify the value of θ as in SSGC algo-rithm Like other incremental clustering algorithms twoprocedures must be developed including insertion anddeletion

Algorithm 2 shows the insertion step of Incre-mentalSSGC for a new data point xnew At first the list ofedges between xnew and the current clusters is created andall edges with weight smaller than θ will be removed If thelist is empty it is illustrated that xnew is an outlier with thecurrent situation and hence xnew will be added in a tem-porary list Lo In the case of existing edges between xnew andsome connected components we need to remove someedges until xnew connects to components with one kind oflabel Finally the label of xnew will be identified by the labelof its connected components In Step 10 xnew and its relatededges will be added to L some edges between xt and xl willalso be recalculated if xnew appears in the nearest neighborslist of xt or xl In Step 12 after some insertion steps we canexamine the points in Lo

Algorithm 3 presents the detailed steps of the deletionprocess When we want to remove a point xdel from thecurrent clusters we simply remove xdel and all edges relatedwith xdel in the graph Step 2 of the algorithm shows theupdating process In this step we need to update all edgesaffected by xdel It means that all edges between xi and xj

must be updated if xdel appears in the commune list of thenearest neighbors Finally Step 3 is simply to remove alledges that have weight less than θ

321 e Complexity Analysis Now we will analyse thecomplexity of IncrementalSSGC Given a data set with nobject we recall that the complexity of SSGC is O(k times n2) inwhich k is the number of nearest neighbors Assuming thatwe have the current clusters including n objects we willanalyse the complexity of the insertion and deletion processof IncrementalSSGC at step (n + 1) as follows

For the insertion process which aims to identify thecluster label for a new data point xnew in Step 1 to createthe list of edges between xnew and the current clusters thecomplexity is O(n times k) In Steps 2 6 and 7 the complexityis just O(k) In Step 10 some edges between xt and xl willalso be recalculated if xnew appears in the nearest neighborslist of xt or xl in fact the number of such edges is alsosmall So for the insertion of a new point the complexity isO(n times k)

For the deletion process the complexity of Step 1 isO(k) In Steps 2 and 3 the number of edges updated is thenumber of edges that received xdel as commune points andthe value of commune points depends on the data set Let qbe the average value of v deletion processes in fact q isdetermined by perfoming experiments So the complexityof a deletion process is O(q times n times k)

In summary with the analysis of the insertion anddeletion process above we can see that it is very useful fordata set that we usually need to update In the next sectionwe also present the running time of both SSGC andIncrementalSSGC for some data sets extracted from in-trusion detection problem

Journal of Computer Networks and Communications 5

33 A Fast Outlier Detection Method Given a k-nearestneighbors graph (k-NNG) the local density score LDS of avertex u isin k-NNG is defined as follows [39]

LDS(u) 1113936qisinNN(u)ω(u q)

k (2)

in whichω is calculated as in equation (1) and k is the numberof nearest neighbors used e LDS is used as an indicator of

the density of the region of a vertex u e LDS value is in theinterval of [0 kndash 1] the larger the LDS of u the denser theregion that u belongs to and vice versa So we can apply theway of LDSrsquos calculation to identify outliers To detect outlierby this method we have to use a parameter as the thresholdthe point which has LDS value smaller than the threshold canbe seen as an outlier and vice versa Similar to LOF themethod has required O(n2) of complexity

Input X number of neighbors k a set of seeds SOutput A set of detected clustersoutliersPROCESS

(1) Constructing the k-NN graph of X(2) θ 0(3) repeat(4) Constructing the connected components using the threshold θ(5) θ θ + 1(6) until the cut condition is satisfied(7) Propagating the labels to form the principal clusters(8) Constructing the final clusters

ALGORITHM 1 e algorithm SSGC [35]

Input a new data object xnew a set of current clusters C list containing edges for each point of current clusters L θ (threshold)and number of nearest neighbors (NN) kOutput label for xnewProcess

(1) Create the k-nearest neighbors list of edges (LE) between xnew and all current clusters(2) Delete all (u v) isin LE weight(u v)lt θ(3) if (LE empty) then(4) xnew is added in a temporary list Lo(5) else(6) If xnew related to two or more components with different label then(7) Delete edges in LE with ascending order of weight until xnew connecting with components with at most one kind of label(8) end if(9) Get label for xnew and its connected points (if any) by propagating(10) Update list L adding edges relating to xnew to L some edges between xt and xl will also be recalculated if xnew appears in the

nearest neighbors list of xt or xl(11) end if(12) Examine points in Lo

ALGORITHM 2 IncrementalSSGC insertion process

Input an object xdel in a component will be deleted a set of current clusters C list of edges for each point of current clusters L θ(threshold)Output the updated C the updated LProcess

(1) Delete xdel and all edges related to xdel in L(2) Update all weights (k l) isin C xdel isin NN(k)capNN(l)

(3) Delete all updated (at Step 2) (k l) isin L weight(t l)lt θ

ALGORITHM 3 IncrementalSSGC deletion process

6 Journal of Computer Networks and Communications

To reduce the running time of the method we proposea Fast outlier detection method based on Local DensityScore called FLDS e basic idea of the algorithm FLDSis to use divide-and-conquer strategy Given a data set X tofind outliers first the input data set will be split into kclusters using K-means algorithm Next k-nearestneighbor graphs will be used for each cluster and identifyoutlier on each local cluster e outliers found in allclusters will be recalculated on the whole data set e ideaof divide-and-conquer strategies by using the K-means inthe preprocessing step has been successfully applied insolving some problems such as fast spectral clusteringproblem [40] and fast minimum spanning tree problem[41] and in the efficient and effective shape-based clus-tering paper [42] e FLDS algorithm is described inAlgorithm 4

e FLDS algorithm is an outlierrsquos detection methodbased on K-means and local density score using graph ecomplexity of FLDS is O(n times k) + O(k2) + O(t times n) inwhich the value of k may be used up to n05 [41 42] t≪ n isevaluated approximately equal to k so the complexity of theFLDS is O(n15)

4 Experiment Results

is section aims to evaluate the effectiveness of our pro-posed algorithms We will show the results of the Incre-mentalSSGC the results of FLDS and the results when usingour methods for a hybrid framework for intrusion detectionproblem e IncrementalSSGC will be compared with theIncrementalDBSCAN while the FLDS will be comparedwith the LOF

e data sets used in the experiments are mostlyextracted from the Aegean WiFi Intrusion Dataset (AWID)[14] AWID is a publicly available collection of sets of data inan easily distributed format which contain real traces ofboth the normal and intrusive 80211 traffic In the AWIDmany kinds of attacks have been introduced but they alsofall into four main categories including flooding injectionand impersonation e AWID has 156 attributes we use 35attributes extracted by an artificial neural network as pre-sented in [8] We also use some supplement data sets thatcome from UCI [43] and data sets with different size shapeand density and contain noise points as well as special ar-tifacts [44] in this experiment

41 Experiment Setup

411 Data Sets for Incremental Clustering Algorithms Toshow the effectiveness of the IncrementalSSGC two aspectswill be examined including the running time and accuracy 5UCI data sets and 3 data sets extracted from AWID willbe used for testing IncrementalSSGC and Incre-mentalDBSCAN e details of these data sets are presentedin Table 1

To evaluate clustering results the Rand Index is usedGiven a data set X with n points for clustering P1 is an arraycontaining the true labels P2 is an array containing the

results of a clustering algorithm the Rand Index (RI) iscalculated as follows

RI a + b

(n(2(nminus 2))) (3)

in which ab is the number of pairs that are in the samedifferent clusters in both partitions P1 and P2 e bigger theRand Index the better the result

412 Data Sets for FLDS and LOF We used 5 data setsextracted from AWDI and four 2D data sets includingDS1 (10000 points) DS2 (8000 points) DS3 (8000 points)and DS4 (8000 points) [44] for FLDS and LOF ese 2Ddata sets have clusters of different size shape and ori-entation as well as random noise points and special ar-tifacts e details of these AWID data sets are presentedin Table 2

To compare LOF and FLDS for AWID data sets we usethe ROC measure that has two factors including FalsePositive (False Alarm) Rate (FPR) and False Negative (MissDetection) Rate (FNR) e detail of these factors is shownin the following equations

FPR FP

FP + TN (4)

FNR FN

TP + FN (5)

in which True Positive (TP) is the number of attacks cor-rectly classified as attack True Negative (TN) is the numberof normal correctly detected as normal False Positive (FP) isthe number of normal falsely classified as attacks namelyfalse alarm and False Negative (FN) is the number of attacksfalsely detected as normal

To combine FPR and FNR values we calculate the HalfTotal Error Rate (HTER) that is similar to the evaluationmethod used in [11] defined as follows

HTER FPR + FNR

2 (6)

42 Clustering Results We note that there is no incrementalsemisupervised clustering algorithm in the literature So wecompare the performance obtained by our algorithm and theIncrementalDBSCAN algorithm IncrementalDBSCAN canbe seen as the state of the art among Incremental clusteringproposed e algorithm can detect clusters with differentsize and shape with noises Because both SSGC andIncrementalSSGC produce the same results we just show theresults for IncrementalSSGC and IncrementalDBSCANeresults are shown in Figure 4

We can see from the figure that the IncrementalSSGCobtains better results compared with the Incre-mentalDBSCAN It can be explained by the fact that theIncrementalDBSCAN cannot detect clusters with differentdensities as mentioned in the paper we assumed that theparameter values Eps and MinPts of DBSCAN do notchange significantly when inserting and deleting objects

Journal of Computer Networks and Communications 7

is assumption means that the IncrementalDBSCANcannot work well with the data set having dierent den-sities In contrary to IncrementalDBSCAN the algorithmIncrementalSSGC does not depend on the density of thedata because the similarity measure used is based on sharednearest neighbors

421 Running Time Comparison Figure 5 presents therunning time for IncrementalSSGC and Incre-mentalDBSCAN for three AWID data sets We can see therunning time of both algorithms is similar It can beexplained by the fact that both algorithms use k-nearestneighbor to nd clusters for each step of incremental Wealso present the running time of the SSGC algorithm forreference purpose From this experiment we can see ad-vantages of the incremental clustering algorithms

43eResults of FLDSandLOF Table 3 presents the resultsobtained by FLDS and LOF for 5 AWID data sets We cansee that the results of FLDS are comparable with the al-gorithm LOF e parameters used for both methods areshown in Table 4 For some 2D data sets Figure 6 presentsthe results obtained by FLDS and LOF Intuitively theoutliers detected by both methods are mostly similar Wecan explain the results by the fact that the strategy forevaluating a point is outlier or not based on local densityscore

Figures 7 and 8 illustrate the running time comparisonbetween FLDS and LOF With 4 data sets mentioned aboveit can be seen from the gure that the calculation time ofFLDS is about 12 times faster than the running time of LOFis is the signicant improvement compared with LOF Itcan be explained by the fact that the complexity of FLDS isjust O(n15) compared with O(n2) of LOF

Input a data set X with n points the number of nearest neighbors k number of clusters nc thetaOutput outliers of XProcess

(1) Using K-means to split X into nc clusters(2) Using LDS algorithm on each separate cluster to obtain local outliers (using the threshold theta)(3) e local outliers obtained in Step 2 will be recalculated LDSrsquos value across the data set

ALGORITHM 4 Algorithm FLDS

Table 1 Main characteristics for clustering evaluation

ID Data Normal + impers Attributes Clusters1 Iris 150 4 33 Wine 178 13 32 E coli 336 8 84 Breast 569 30 25 Yeast 1484 8 106 AWID1 5000 35 27 AWID2 8000 35 28 AWID3 12000 35 2

Table 2 Main characteristics for FLDS and LOF

ID Data Objects Categories1 O-AWID1 3030 Impers currenooding injections3 O-AWID2 5030 Impers normal currenooding2 O-AWID3 7040 Flooding normal impers4 O-AWID4 10040 Normal impers injection currenooding5 O-AWID5 15050 Normal currenooding injection and impers

IncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Rand

Inde

x

2 3 4 5 6 7 81

Figure 4 Clustering results obtained by IncrementalDBSCAN andIncrementalSSGC for 8 data set of Table 1 respectively

8 Journal of Computer Networks and Communications

44 A Framework for Intrusion Detection in 80211 NetworksIn this section we propose a multistage system-basedmachine learning techniques applied for the AWDI dataset e detail of our system is presented in Figure 9 reecomponents are used for intrusion detection task a su-pervised learning model (J48 Bayes random forest sup-port vector machine neural network etc) trained bylabeled data set and this model can be seen as misusedetection component an outlier detection method (LOFFLDS etc) is optionally used to detect new attacks in someperiods of time additionally for the AWID data sets aspresented above it is very disectcult to detect impersonationattacks so we use an Incremental clustering algorithm(IncrementalDBSCAN IncrementalSSGC etc) for furthernding this kind of attack

In this experiment we use J48 for the misuse detectionprocess and IncrementalSSGC for the detecting impersonation

attacks In the outliers detection step we propose to use FLDSor LOF and the results have been presented in the subsectionabove Because the outliers detection step can be realized oumlinefor some periods of time we just show the results obtained bycombining J48 and IncrementalSSGCe confusionmatrix of

AWID1 AWID2 AWID3

SSGCIncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Tim

e (m

inut

es)

Figure 5 Running time comparison between IncrementalSSGCand IncrementalDBSCAN

Table 4 e parameters used in data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS (k ncθ)

(25 306)

(25 306)

(25 306)

(25 456)

(25 456)

LOF (MinPtsη) (27 12) (27 12) (25 12) (25 12) (27 12)

k the number of nearest neighbors nc number of cluster used θ thethreshold

Table 3 e HTER measure of LOF and FLDS (the smaller thebetter) for some extracted AWID data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS 013 012 010 011 006LOF 023 011 011 009 009

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(a)

Figure 6 Continued

Journal of Computer Networks and Communications 9

these results is illustrated in Table 5e total accuracy obtained989 compared with 9626 in the paper [14] We can ex-plain the results obtained by IncrementalSSGC by the fact thatthe algorithm used the distance based on shared nearest

neighbors which overcome the limit of transitional distancemeasures such as Euclidean or Minskovki distance and theshared nearest neighbors measure does not depend on thedensity of datais proposed system is generally called hybridmethod which is one of the best strategies in developing In-trusion Detection Systems [7 9] in which there is no singleclassier that can exactly detect all kinds of classes

We also note that for real applications whenever anattack appears the system needs to immediately produce awarning e multistage system-based machine learningtechniques provide a solution for users for constructing thereal IDSIPS system that is one of the most importantproblems in the network security

5 Conclusion

is paper introduces an incremental semisupervisedgraph-based clustering and a fast outlier detection

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(b)

Figure 6 Results of LOF (a) and FLDS (b) on some 2D data setsthe outliers marked as red plus

LOFFLDS

0

20

40

60

80

Tim

e (m

inut

es)

2 3 41

Figure 7 Running time comparison between FLDS and LOF forfour 2D data sets

LOFFLDS

0

50

100

150

200

250

300Ti

me (

min

utes

)

2 3 4 51

Figure 8 Running time comparison between FLDS and LOF forve AWID data sets

10 Journal of Computer Networks and Communications

method Both methods can be used in a hybrid frameworkfor the intrusion detection problem of WiFi data sets(AWID) Our proposed multistage system-based machinelearning techniques provide a solution to guideline forconstructing the real IDSIPS system that is one of themost important problems in the network security Ex-periments conducted on the extracted data sets from theAWID and UCI show the eectiveness of our proposedmethods In the near future we will continue to

develop other kinds of machine learning methods forintrusion detection problem and test for other experi-mental setup

Data Availability

e data used to support the ndings of this study can bedownloaded from the AWID repository (httpicsdwebaegeangrawiddownloadhtml)

InternetAccess point

Data capture

Load data Data preprocessing

Training by J48

Block attacks

Block attacksldquoimpersonationrdquo

Skip normal traffics

Classifier (J48)

Incremental clustering model

(ISSGC)

Anomaly detection model

(FLDS)

Marking a new attack by a

security expert

Database of marked attacks

Data training

Setting parameters of incremental

clustering

Intrusion detection system for local wireless network based on machine learning methods

Attacks

Impersonationattacks

Normal traffics

Normaltraffics

Outliers

Marked attacks

Figure 9 A new framework for intrusion detection in 80211 networks

Table 5 Confusion matrix for AWID data set using J48 and IncrementalSSGC in proposed framework

Normal Flooding Impersonation Injection Classication530588 116 6 75 Normal2553 5544 0 0 Flooding2 0 16680 0 Injection3297 148 0 16364 Impersonation

Journal of Computer Networks and Communications 11

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] K P Murphy Machine Learning A Probabilistic PerspectiveMIT press Cambridge MA USA 2012

[2] B Ahmad W Jian and Z Anwar Ali ldquoRole of machinelearning and data mining in internet security standing statewith future directionsrdquo Journal of Computer Networks andCommunications vol 2018 Article ID 6383145 10 pages2018

[3] T Bakhshi and B Ghita ldquoOn internet traffic classification atwo-phasedmachine learning approachrdquo Journal of ComputerNetworks and Communications vol 2016 Article ID 204830221 pages 2016

[4] B Luo and J Xia ldquoA novel intrusion detection system basedon feature generation with visualization strategyrdquo ExpertSystems with Applications vol 41 no 9 pp 4139ndash4147 2014

[5] W-C Lin S-W Ke and C-F Tsai ldquoCANN an intrusiondetection system based on combining cluster centers andnearest neighborsrdquo Knowledge-Based Systems vol 78pp 13ndash21 2015

[6] C-F Tsai and C-Y Lin ldquoA triangle area based nearestneighbors approach to intrusion detectionrdquo Pattern Recog-nition vol 43 no 1 pp 222ndash229 2010

[7] F Kuang S Zhang Z Jin and W Xu ldquoA novel SVM bycombining kernel principal component analysis and im-proved chaotic particle swarm optimization for intrusiondetectionrdquo Soft Computing vol 19 no 5 pp 1187ndash1199 2015

[8] M E Aminanto and K Kim ldquoDetecting impersonation attackin WiFi networks using deep learning approachrdquo in In-formation Security Applications D Choi and S Guilley EdsVol 10144 Springer Berlin Germany 2016

[9] A A Aburomman and M BI Reaz ldquoA novel SVM-kNN-PSO ensemble method for intrusion detection systemrdquo Ap-plied Soft Computing vol 38 pp 360ndash372 2016

[10] M M Breunig H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of the2000 ACM SIGMOD International Conference on Manage-ment of Data pp 93ndash104 Dallas TX USA May 2000

[11] V Hautamaki I Karkkainen and P Franti ldquoOutlier detectionusing k-nearest neighbour graphrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition pp 430ndash433Cambridge MA USA August 2004

[12] V ang and F F Pashchenko ldquoA new incremental semi-supervised graph based clusteringrdquo in Proceedings of the IEEEInternational Conference on Engineering andTelecommunication Moscow Russia March 2018

[13] V ang D V Pantiukhin and A N Nazarov ldquoFLDS fastoutlier detection based on local density scorerdquo in Proceedingsof the IEEE International Conference on Engineering andTelecommunication pp 137ndash141 Moscow Russia November2016

[14] C Kolias G Kambourakis A Stavrou and S GritzalisldquoIntrusion detection in 80211 networks empirical evaluationof threats and a public datasetrdquo IEEE Communications Sur-veys amp Tutorials vol 18 no 1 pp 184ndash208 2016

[15] Ch Gupta and R Grossman ldquoGenIca single pass generalizedincremental algorithm for clusteringrdquo in Proceedings of theFourth SIAM International Conference on Data Miningpp 147ndash153 Lake Buena Vista FL USA April 2004

[16] M Ester H-P Kriegel J Sander M Wimmer and X XuldquoIncremental clustering for mining in a data warehousingenvironmentrdquo in Proceedings of the International Conferenceon Very Large Data Bases pp 323ndash333 New York NY USAAugust 1998

[17] V Chandrasekhar C Tan M Wu L Li X Li and J-H LimldquoIncremental graph clustering for efficient retrieval fromstreaming egocentric video datardquo in Proceedings of the In-ternational Conference on Pattern Recognition pp 2631ndash2636Stockholm Sweden August 2014

[18] A M Bagirov J Ugon and D Webb ldquoFast modified globalK-means algorithm for incremental cluster constructionrdquoPattern Recognition vol 44 no 4 pp 866ndash876 2011

[19] A Bryant D E Tamir N D Rishe and K AbrahamldquoDynamic incremental fuzzy C-means clusteringrdquo in Pro-ceedings of the Sixth International Conference on PervasivePatterns and Applications Venice Italy May 2014

[20] Z Yu P Luo J You et al ldquoIncremental semi-supervisedclustering ensemble for high dimensional data clusteringrdquoIEEE Transactions on Knowledge and Data Engineeringvol 28 no 3 pp 701ndash714 2016

[21] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 3pp 1ndash58 2009

[22] J Tang Z Chen A Fu and D Cheung ldquoEnhancing effec-tiveness of outlier detections for low density patternsrdquo inProceedings of the Sixth Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD) Taipei Taiwan May2002

[23] S Papadimitriou H Kitagawa P B Gibbons andC Faloutsos ldquoLoci fast outlier detection using the localcorrelation integralrdquo in Proceedings of the 19th InternationalConference on Data Engineering pp 315ndash326 BangaloreIndia March 2003

[24] M Ester H-P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databaseswith noiserdquo in Proceedings of the Conference on KnowledgeDiscovery and Data Mining (KDD) pp 226ndash231 PortlandOR USA August 1996

[25] L Ertoz M Steinbach and V Kumar ldquoFinding clusters ofdifferent sizes shapes and densities in noisy high di-mensional datardquo in Proceedings of the SIAM InternationalConference on Data Mining pp 47ndash58 San Francisco CAUSA May 2003

[26] E M Knorr R T Ng and V Tucakov ldquoDistance-basedoutliers algorithms and applicationsrdquo e InternationalJournal on Very Large Data Bases vol 8 no 3-4 pp 237ndash2532000

[27] S Basu I Davidson and K L Wagstaff ldquoConstrainedclustering advances in algorithms theory and applicationsrdquoin Chapman and HallCRC Data Mining and KnowledgeDiscovery Series CRC Press Boca Raton FL USA 1stedition 2008

[28] A A Abin ldquoClustering with side information further effortsto improve efficiencyrdquo Pattern Recognition Letters vol 84pp 252ndash258 2016

[29] Y Shi C Otto and A K Jain ldquoFace clustering representationand pairwise constraintsrdquo IEEE Transactions on InformationForensics and Security vol 13 no 7 pp 1626ndash1640 2018

[30] A A Abin and B Hamid ldquoActive constrained fuzzy clus-tering a multiple kernels learning approachrdquo Pattern Rec-ognition vol 48 no 3 pp 953ndash967 2015

[31] S Xiong J Azimi and X Z Fern ldquoActive learning of con-straints for semi-supervised clusteringrdquo IEEE Transactions on

12 Journal of Computer Networks and Communications

Knowledge and Data Engineering vol 26 no 1 pp 43ndash542014

[32] K Wagstaff C Cardie S Rogers and S Schrodl ldquoCon-strained K-means clustering with background knowledgerdquo inProceedings of the International Conference on MachineLearning (ICML) pp 577ndash584 Williamstown MA USAJune 2001

[33] I Davidson and S S Ravi ldquoUsing instance-level constraints inagglomerative hierarchical clustering theoretical and em-pirical resultsrdquoDataMining and Knowledge Discovery vol 18no 2 pp 257ndash282 2009

[34] B Kulis S Basu I Dhillon and R Mooney ldquoSemi-supervisedgraph clustering a kernel approachrdquo Machine Learningvol 74 no 1 pp 1ndash22 2009

[35] V-V Vu ldquoAn efficient semi-supervised graph based clus-teringrdquo Intelligent Data Analysis vol 22 no 2 pp 297ndash3072018

[36] X Wang B Qian and I Davidson ldquoOn constrained spectralclustering and its applicationsrdquo Data Mining and KnowledgeDiscovery vol 28 no 1 pp 1ndash30 2014

[37] D Mavroeidis ldquoAccelerating spectral clustering with partialsupervisionrdquo Data Mining and Knowledge Discovery vol 21no 2 pp 241ndash258 2010

[38] L Lelis and J Sander ldquoSemi-supervised density-based clus-teringrdquo in Proceeding of IEEE International Conference onData Mining pp 842ndash847 Miami FL USA December 2009

[39] D-D Le and S Satoh ldquoUnsupervised face annotation bymining the webrdquo in Proceedings of the 8th IEEE InternationalConference on Data Mining pp 383ndash392 Pisa Italy De-cember 2008

[40] D Yan L Huang and M I Jordan ldquoFast approximatespectral clusteringrdquo in Proceedings of the Conference onKnowledge Discovery and Data Mining (KDD) pp 907ndash916New York NY USA June 2009

[41] C Zhong M Malinen D Miao and P Franti ldquoA fastminimum spanning tree algorithm based on K-meansrdquo In-formation Sciences vol 295 pp 1ndash17 2015

[42] V Chaoji M Al Hasan S Salem andM J Zaki ldquoSPARCL aneffective and efficient algorithm for mining arbitrary shape-based clustersrdquo Knowledge and Information Systems vol 21no 2 pp 201ndash229 2009

[43] A Asuncion and D J Newman UCI Machine LearningRepository American Statistical Association Boston MAUSA 2015 httparchiveicsuciedumlindexphp

[44] G Karypis E-H Han and V Kumar ldquoChameleon hierar-chical clustering using dynamic modelingrdquo Computer vol 32no 8 pp 68ndash75 1999

Journal of Computer Networks and Communications 13

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

algorithm clusters are built incrementally by adding onecluster center at a time In [19] a novel two-phase staticsingle-pass algorithm as well as a dynamic two-phasesingle-pass algorithm based on Fuzzy C-means havebeen presented and are showing high utility e idea

behind the multistage methods reported in the paper is thatan estimate of the partition matrix and the location of thecluster centers can be obtained by clustering a sample of thedata A small sample is expected to produce a fast yet lessreliable estimation of the cluster centers is leads to a

Data training

Internet

Data capture Preprocessing

- Net flow tool- tcpdump

Normal traffics

Training by a (semi) supervised learning

method

Trained (semi) supervised

learning model

Attacks

Figure 1 A general model for misuse detection in IDS

Normal traffics

Internet

Data capture Preprocessing

-Net f low tool-tcpdump Attacks

Outlier detection model

Figure 2 A general model for outlier detection in IDS

Case 1 noise Case 2 creation

Case 4 mergeCase 3 absorption

px xp

xp

xp

xp

Case 1 removal Case 2 reduction

xp

Case 3 split

xp

Case 3 nonsplit

xp

Figure 3 Insertion cases (a) and deletion cases (b) of IncrementalDBSCAN [16]

Journal of Computer Networks and Communications 3

multistage approach which involves several stages ofsampling (with replacement) of the data and estimating themembership matrix for the next stage e experimentsconducted show the effectiveness of the proposed methodIn [17] Chandrasekhar et al propose an incremental localdensity clustering scheme for finding dense subgraphs instreaming data ie when data arrive incrementally(ILDC) e incremental clustering scheme captures re-dundancy in the streaming data source by finding densesubgraphs which correspond to salient objects and scenese ILDC process performs greedy operations like clusterexpansion cluster addition and cluster merging based onthe similarity between clusters defined e ILDC showsthe effectiveness when using in image-retrieval applica-tions In [20] an incremental semisupervised ensembleclustering algorithm has successfully presented namedISSCE ISSCE uses constraints to update incrementalmembers e authors develop an incremental ensemblemember selection process based on a global objectivefunction and a local objective function to remove theredundant ensemble members e experiment resultsshow the improvement of ISSCE over traditional semi-supervised clustering ensemble approaches or conven-tional cluster ensemble methods on six real-world datasetsfrom UCI machine learning repository and 12 real-worlddata sets of cancer gene expression profiles In the contextof classification we need to find the label for a new dataobject by using a classifier trained by data training eproblem of identifying the label for a new object in in-cremental clustering can be seen similar to classificationcontext

22 Outlier Detection Problem Outlier (anomaly) detectionis one of the important problems of machine learning anddata mining As mentioned in [21] outliers detection is theproblem of finding patterns in data that do not conform toexpected behaviore applications of outlier detection canbe found in many applications such as intrusion detectioncredit fraud detection video surveillance weather pre-diction discovery of criminal activities of electroniccommerce etc [9 21] ere are some kinds of outliersincluding point outliers contextual outliers and collectiveoutliers In this paper we focus on point outliers detectionthat can be applied in a variety of applications For a dataset consisting of points a point will be called outlier if it isdifferent from a large number of the rest of the points Todetect outliers there are some principal methods in theliterature such as classification methods nearest neighbormethods clustering methods statistical methods distance-based methods etc

For the classification-based outliers detection we havetwo categories multiclass and one-class anomalies de-tection methods In multiclass classification techniques weassume that the training data contain labeled points of allnormal classes e learner using a supervised learningmodel trains a model using the labeled data e classifiercan distinguish between each normal class and the rest ofthe class A test point will be called outlier if it does not

belong to any normal class In one-class outliers detectionmethods we assume that the number of normal class isonly one e classifier learns a model that can detect theboundary of the normal class If a test point does not fall inthe boundary it will be called outliers Although manytechniques have been done however the main disadvan-tage of these methods based on the availability of accuratelabels for normal classes which is not easy to apply for realapplications

For the nearest neighbor-based outlier detectionmethodswe use the assumption as follows normal points belong to thedense regions while outliers belong to the sparse regionsemost famous method of this kind is the LOF algorithm eidea of LOF is based on the local density evaluation score forpoints Each point will be assigned a score which is the ratio ofthe average local density of the k-nearest neighbors of thepoint and the local density of the data point itself Manyvariants of LOF can be cited here such as COF [22] ODIN[11] LOCI [23] etc e main drawback of the method is theO(n2) complexity required

For the clustering-based outliers detection techniquesthe idea here is using clustering methods to group data intoclusters e points do not belong to any clusters calledoutliers Some clustering methods can be detected outlierssuch as DBSCAN [24] SNN [25] etc In fact the purpose ofclustering is finding clusters so the outliers are just theproduct of the clustering process and hence are notcarefully optimized One more reason that can be madehere is the complexity of clustering techniques requiredO(n2)

In the statistical outliers detection methods thesemethods are based on the assumption as follows normaldata points occur in high-probability regions of a sto-chastic model while anomalies occur in the low-probability regions of the stochastic model Somemethods have been done for the kind of outliers detectionsIn general statistical methods fit a statistical model(Gaussian distribution the mixture of parametric statis-tical distribution etc) to the given data and then apply astatistical inference test to determine if an unseen instancebelongs to this model or not e key limitation of thesemethods is the assumption about the distribution of datapoints is assumption is not true especially when thedimension of data is high [21]

In the distance-based outliers detection methods a pointis considered as outlier if it does not have enough pctpoints in the data set that distance from this point is smallerthan the threshold value dmin [26]

3 Proposed Method

31 Semisupervised Graph-Based Clustering In recent yearssemisupervised clustering is an important research topic thatis illustrated by a number of studies introduced [27] epurpose of semisupervised clustering is to integrate sideinformation for improving the clustering performancesGenerally there are two kinds of side information includingconstraints and seeds Given a data set X constraints involvemust-link and cannot-link in which the must-link constraint

4 Journal of Computer Networks and Communications

(ML) between two observations x isin X and y isin X meansthat x and y should be in the same cluster and the cannot-link constraint (CL) means that x and y should not be inthe same cluster With seeds a small set of labeled data(called seeds) S isin X will be provided for semisupervisedclustering algorithms In fact this side information isavailable or can be collected from users [28ndash31] We cancite here the work of semisupervised clustering forK-means [32] hierarchical clustering [33] graph-basedclustering [34 35] spectral clustering [36 37] density-based clustering [38] etc While many semisupervisedclustering algorithms are introduced to the best of ourknowledge there are no incremental semisupervisedclustering algorithms in the literature

Our new incremental clustering introduced in the nextsection is based on the work of semisupervised graph-basedclustering using seeds (SSGC) We choose the SSGC al-gorithm because SSGC algorithm has several advantagessuch as SSGC use only one parameter and SSGC can detectclusters in varied density regions of data [35] SSGC in-cludes two steps as the following description (seeAlgorithm 1)

Step 1 Given a k-nearest neighbor graph presenting a dataset X this step uses a loop in which at each step all edgeswhich have the weight less than threshold θ will be re-moved e value of θ is initialized by 0 at the first step andincremented by 1 after each step is loop will stop wheneach connected component has at most one kind of seedse main clusters are identified by propagating label ineach connected component that contains seeds

Step 2 e remaining points (graph nodes) that do notbelong to any main clusters will be divided into two kindspoints that have edges which relate to one or more clustersand other points which are isolated points In the first casepoints will be assigned to the cluster with the largest relatedweight For the isolated points we can either remove them asoutliers or label them

We note that in SSGC the weight ω(xi xj) of the edge(the similarity) between two points xi and xj in the k-nearestneighbor graph is equal to the number of points that the twopoints share as the following equation

ω xi xj1113872 1113873 NN xi( 1113857capNN xj1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868 (1)

where NN(middot) is the set of k-nearest neighbors of the specifiedpoint

SSGC is efficient when compared with the semi-supervised density-based clustering in detecting clusters forbatch data however it is not adapted for data stream or datawarehousing environment where many updates (insertiondeletion) occur

32 Incremental Graph-Based Clustering Using Seeds In thissection we propose IncrementalSSGC based on the SSGCalgorithm In the IncrementalSSGC the seeds will be used totrain a k-nearest neighbor graph to construct connected

components and identify the value of θ as in SSGC algo-rithm Like other incremental clustering algorithms twoprocedures must be developed including insertion anddeletion

Algorithm 2 shows the insertion step of Incre-mentalSSGC for a new data point xnew At first the list ofedges between xnew and the current clusters is created andall edges with weight smaller than θ will be removed If thelist is empty it is illustrated that xnew is an outlier with thecurrent situation and hence xnew will be added in a tem-porary list Lo In the case of existing edges between xnew andsome connected components we need to remove someedges until xnew connects to components with one kind oflabel Finally the label of xnew will be identified by the labelof its connected components In Step 10 xnew and its relatededges will be added to L some edges between xt and xl willalso be recalculated if xnew appears in the nearest neighborslist of xt or xl In Step 12 after some insertion steps we canexamine the points in Lo

Algorithm 3 presents the detailed steps of the deletionprocess When we want to remove a point xdel from thecurrent clusters we simply remove xdel and all edges relatedwith xdel in the graph Step 2 of the algorithm shows theupdating process In this step we need to update all edgesaffected by xdel It means that all edges between xi and xj

must be updated if xdel appears in the commune list of thenearest neighbors Finally Step 3 is simply to remove alledges that have weight less than θ

321 e Complexity Analysis Now we will analyse thecomplexity of IncrementalSSGC Given a data set with nobject we recall that the complexity of SSGC is O(k times n2) inwhich k is the number of nearest neighbors Assuming thatwe have the current clusters including n objects we willanalyse the complexity of the insertion and deletion processof IncrementalSSGC at step (n + 1) as follows

For the insertion process which aims to identify thecluster label for a new data point xnew in Step 1 to createthe list of edges between xnew and the current clusters thecomplexity is O(n times k) In Steps 2 6 and 7 the complexityis just O(k) In Step 10 some edges between xt and xl willalso be recalculated if xnew appears in the nearest neighborslist of xt or xl in fact the number of such edges is alsosmall So for the insertion of a new point the complexity isO(n times k)

For the deletion process the complexity of Step 1 isO(k) In Steps 2 and 3 the number of edges updated is thenumber of edges that received xdel as commune points andthe value of commune points depends on the data set Let qbe the average value of v deletion processes in fact q isdetermined by perfoming experiments So the complexityof a deletion process is O(q times n times k)

In summary with the analysis of the insertion anddeletion process above we can see that it is very useful fordata set that we usually need to update In the next sectionwe also present the running time of both SSGC andIncrementalSSGC for some data sets extracted from in-trusion detection problem

Journal of Computer Networks and Communications 5

33 A Fast Outlier Detection Method Given a k-nearestneighbors graph (k-NNG) the local density score LDS of avertex u isin k-NNG is defined as follows [39]

LDS(u) 1113936qisinNN(u)ω(u q)

k (2)

in whichω is calculated as in equation (1) and k is the numberof nearest neighbors used e LDS is used as an indicator of

the density of the region of a vertex u e LDS value is in theinterval of [0 kndash 1] the larger the LDS of u the denser theregion that u belongs to and vice versa So we can apply theway of LDSrsquos calculation to identify outliers To detect outlierby this method we have to use a parameter as the thresholdthe point which has LDS value smaller than the threshold canbe seen as an outlier and vice versa Similar to LOF themethod has required O(n2) of complexity

Input X number of neighbors k a set of seeds SOutput A set of detected clustersoutliersPROCESS

(1) Constructing the k-NN graph of X(2) θ 0(3) repeat(4) Constructing the connected components using the threshold θ(5) θ θ + 1(6) until the cut condition is satisfied(7) Propagating the labels to form the principal clusters(8) Constructing the final clusters

ALGORITHM 1 e algorithm SSGC [35]

Input a new data object xnew a set of current clusters C list containing edges for each point of current clusters L θ (threshold)and number of nearest neighbors (NN) kOutput label for xnewProcess

(1) Create the k-nearest neighbors list of edges (LE) between xnew and all current clusters(2) Delete all (u v) isin LE weight(u v)lt θ(3) if (LE empty) then(4) xnew is added in a temporary list Lo(5) else(6) If xnew related to two or more components with different label then(7) Delete edges in LE with ascending order of weight until xnew connecting with components with at most one kind of label(8) end if(9) Get label for xnew and its connected points (if any) by propagating(10) Update list L adding edges relating to xnew to L some edges between xt and xl will also be recalculated if xnew appears in the

nearest neighbors list of xt or xl(11) end if(12) Examine points in Lo

ALGORITHM 2 IncrementalSSGC insertion process

Input an object xdel in a component will be deleted a set of current clusters C list of edges for each point of current clusters L θ(threshold)Output the updated C the updated LProcess

(1) Delete xdel and all edges related to xdel in L(2) Update all weights (k l) isin C xdel isin NN(k)capNN(l)

(3) Delete all updated (at Step 2) (k l) isin L weight(t l)lt θ

ALGORITHM 3 IncrementalSSGC deletion process

6 Journal of Computer Networks and Communications

To reduce the running time of the method we proposea Fast outlier detection method based on Local DensityScore called FLDS e basic idea of the algorithm FLDSis to use divide-and-conquer strategy Given a data set X tofind outliers first the input data set will be split into kclusters using K-means algorithm Next k-nearestneighbor graphs will be used for each cluster and identifyoutlier on each local cluster e outliers found in allclusters will be recalculated on the whole data set e ideaof divide-and-conquer strategies by using the K-means inthe preprocessing step has been successfully applied insolving some problems such as fast spectral clusteringproblem [40] and fast minimum spanning tree problem[41] and in the efficient and effective shape-based clus-tering paper [42] e FLDS algorithm is described inAlgorithm 4

e FLDS algorithm is an outlierrsquos detection methodbased on K-means and local density score using graph ecomplexity of FLDS is O(n times k) + O(k2) + O(t times n) inwhich the value of k may be used up to n05 [41 42] t≪ n isevaluated approximately equal to k so the complexity of theFLDS is O(n15)

4 Experiment Results

is section aims to evaluate the effectiveness of our pro-posed algorithms We will show the results of the Incre-mentalSSGC the results of FLDS and the results when usingour methods for a hybrid framework for intrusion detectionproblem e IncrementalSSGC will be compared with theIncrementalDBSCAN while the FLDS will be comparedwith the LOF

e data sets used in the experiments are mostlyextracted from the Aegean WiFi Intrusion Dataset (AWID)[14] AWID is a publicly available collection of sets of data inan easily distributed format which contain real traces ofboth the normal and intrusive 80211 traffic In the AWIDmany kinds of attacks have been introduced but they alsofall into four main categories including flooding injectionand impersonation e AWID has 156 attributes we use 35attributes extracted by an artificial neural network as pre-sented in [8] We also use some supplement data sets thatcome from UCI [43] and data sets with different size shapeand density and contain noise points as well as special ar-tifacts [44] in this experiment

41 Experiment Setup

411 Data Sets for Incremental Clustering Algorithms Toshow the effectiveness of the IncrementalSSGC two aspectswill be examined including the running time and accuracy 5UCI data sets and 3 data sets extracted from AWID willbe used for testing IncrementalSSGC and Incre-mentalDBSCAN e details of these data sets are presentedin Table 1

To evaluate clustering results the Rand Index is usedGiven a data set X with n points for clustering P1 is an arraycontaining the true labels P2 is an array containing the

results of a clustering algorithm the Rand Index (RI) iscalculated as follows

RI a + b

(n(2(nminus 2))) (3)

in which ab is the number of pairs that are in the samedifferent clusters in both partitions P1 and P2 e bigger theRand Index the better the result

412 Data Sets for FLDS and LOF We used 5 data setsextracted from AWDI and four 2D data sets includingDS1 (10000 points) DS2 (8000 points) DS3 (8000 points)and DS4 (8000 points) [44] for FLDS and LOF ese 2Ddata sets have clusters of different size shape and ori-entation as well as random noise points and special ar-tifacts e details of these AWID data sets are presentedin Table 2

To compare LOF and FLDS for AWID data sets we usethe ROC measure that has two factors including FalsePositive (False Alarm) Rate (FPR) and False Negative (MissDetection) Rate (FNR) e detail of these factors is shownin the following equations

FPR FP

FP + TN (4)

FNR FN

TP + FN (5)

in which True Positive (TP) is the number of attacks cor-rectly classified as attack True Negative (TN) is the numberof normal correctly detected as normal False Positive (FP) isthe number of normal falsely classified as attacks namelyfalse alarm and False Negative (FN) is the number of attacksfalsely detected as normal

To combine FPR and FNR values we calculate the HalfTotal Error Rate (HTER) that is similar to the evaluationmethod used in [11] defined as follows

HTER FPR + FNR

2 (6)

42 Clustering Results We note that there is no incrementalsemisupervised clustering algorithm in the literature So wecompare the performance obtained by our algorithm and theIncrementalDBSCAN algorithm IncrementalDBSCAN canbe seen as the state of the art among Incremental clusteringproposed e algorithm can detect clusters with differentsize and shape with noises Because both SSGC andIncrementalSSGC produce the same results we just show theresults for IncrementalSSGC and IncrementalDBSCANeresults are shown in Figure 4

We can see from the figure that the IncrementalSSGCobtains better results compared with the Incre-mentalDBSCAN It can be explained by the fact that theIncrementalDBSCAN cannot detect clusters with differentdensities as mentioned in the paper we assumed that theparameter values Eps and MinPts of DBSCAN do notchange significantly when inserting and deleting objects

Journal of Computer Networks and Communications 7

is assumption means that the IncrementalDBSCANcannot work well with the data set having dierent den-sities In contrary to IncrementalDBSCAN the algorithmIncrementalSSGC does not depend on the density of thedata because the similarity measure used is based on sharednearest neighbors

421 Running Time Comparison Figure 5 presents therunning time for IncrementalSSGC and Incre-mentalDBSCAN for three AWID data sets We can see therunning time of both algorithms is similar It can beexplained by the fact that both algorithms use k-nearestneighbor to nd clusters for each step of incremental Wealso present the running time of the SSGC algorithm forreference purpose From this experiment we can see ad-vantages of the incremental clustering algorithms

43eResults of FLDSandLOF Table 3 presents the resultsobtained by FLDS and LOF for 5 AWID data sets We cansee that the results of FLDS are comparable with the al-gorithm LOF e parameters used for both methods areshown in Table 4 For some 2D data sets Figure 6 presentsthe results obtained by FLDS and LOF Intuitively theoutliers detected by both methods are mostly similar Wecan explain the results by the fact that the strategy forevaluating a point is outlier or not based on local densityscore

Figures 7 and 8 illustrate the running time comparisonbetween FLDS and LOF With 4 data sets mentioned aboveit can be seen from the gure that the calculation time ofFLDS is about 12 times faster than the running time of LOFis is the signicant improvement compared with LOF Itcan be explained by the fact that the complexity of FLDS isjust O(n15) compared with O(n2) of LOF

Input a data set X with n points the number of nearest neighbors k number of clusters nc thetaOutput outliers of XProcess

(1) Using K-means to split X into nc clusters(2) Using LDS algorithm on each separate cluster to obtain local outliers (using the threshold theta)(3) e local outliers obtained in Step 2 will be recalculated LDSrsquos value across the data set

ALGORITHM 4 Algorithm FLDS

Table 1 Main characteristics for clustering evaluation

ID Data Normal + impers Attributes Clusters1 Iris 150 4 33 Wine 178 13 32 E coli 336 8 84 Breast 569 30 25 Yeast 1484 8 106 AWID1 5000 35 27 AWID2 8000 35 28 AWID3 12000 35 2

Table 2 Main characteristics for FLDS and LOF

ID Data Objects Categories1 O-AWID1 3030 Impers currenooding injections3 O-AWID2 5030 Impers normal currenooding2 O-AWID3 7040 Flooding normal impers4 O-AWID4 10040 Normal impers injection currenooding5 O-AWID5 15050 Normal currenooding injection and impers

IncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Rand

Inde

x

2 3 4 5 6 7 81

Figure 4 Clustering results obtained by IncrementalDBSCAN andIncrementalSSGC for 8 data set of Table 1 respectively

8 Journal of Computer Networks and Communications

44 A Framework for Intrusion Detection in 80211 NetworksIn this section we propose a multistage system-basedmachine learning techniques applied for the AWDI dataset e detail of our system is presented in Figure 9 reecomponents are used for intrusion detection task a su-pervised learning model (J48 Bayes random forest sup-port vector machine neural network etc) trained bylabeled data set and this model can be seen as misusedetection component an outlier detection method (LOFFLDS etc) is optionally used to detect new attacks in someperiods of time additionally for the AWID data sets aspresented above it is very disectcult to detect impersonationattacks so we use an Incremental clustering algorithm(IncrementalDBSCAN IncrementalSSGC etc) for furthernding this kind of attack

In this experiment we use J48 for the misuse detectionprocess and IncrementalSSGC for the detecting impersonation

attacks In the outliers detection step we propose to use FLDSor LOF and the results have been presented in the subsectionabove Because the outliers detection step can be realized oumlinefor some periods of time we just show the results obtained bycombining J48 and IncrementalSSGCe confusionmatrix of

AWID1 AWID2 AWID3

SSGCIncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Tim

e (m

inut

es)

Figure 5 Running time comparison between IncrementalSSGCand IncrementalDBSCAN

Table 4 e parameters used in data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS (k ncθ)

(25 306)

(25 306)

(25 306)

(25 456)

(25 456)

LOF (MinPtsη) (27 12) (27 12) (25 12) (25 12) (27 12)

k the number of nearest neighbors nc number of cluster used θ thethreshold

Table 3 e HTER measure of LOF and FLDS (the smaller thebetter) for some extracted AWID data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS 013 012 010 011 006LOF 023 011 011 009 009

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(a)

Figure 6 Continued

Journal of Computer Networks and Communications 9

these results is illustrated in Table 5e total accuracy obtained989 compared with 9626 in the paper [14] We can ex-plain the results obtained by IncrementalSSGC by the fact thatthe algorithm used the distance based on shared nearest

neighbors which overcome the limit of transitional distancemeasures such as Euclidean or Minskovki distance and theshared nearest neighbors measure does not depend on thedensity of datais proposed system is generally called hybridmethod which is one of the best strategies in developing In-trusion Detection Systems [7 9] in which there is no singleclassier that can exactly detect all kinds of classes

We also note that for real applications whenever anattack appears the system needs to immediately produce awarning e multistage system-based machine learningtechniques provide a solution for users for constructing thereal IDSIPS system that is one of the most importantproblems in the network security

5 Conclusion

is paper introduces an incremental semisupervisedgraph-based clustering and a fast outlier detection

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(b)

Figure 6 Results of LOF (a) and FLDS (b) on some 2D data setsthe outliers marked as red plus

LOFFLDS

0

20

40

60

80

Tim

e (m

inut

es)

2 3 41

Figure 7 Running time comparison between FLDS and LOF forfour 2D data sets

LOFFLDS

0

50

100

150

200

250

300Ti

me (

min

utes

)

2 3 4 51

Figure 8 Running time comparison between FLDS and LOF forve AWID data sets

10 Journal of Computer Networks and Communications

method Both methods can be used in a hybrid frameworkfor the intrusion detection problem of WiFi data sets(AWID) Our proposed multistage system-based machinelearning techniques provide a solution to guideline forconstructing the real IDSIPS system that is one of themost important problems in the network security Ex-periments conducted on the extracted data sets from theAWID and UCI show the eectiveness of our proposedmethods In the near future we will continue to

develop other kinds of machine learning methods forintrusion detection problem and test for other experi-mental setup

Data Availability

e data used to support the ndings of this study can bedownloaded from the AWID repository (httpicsdwebaegeangrawiddownloadhtml)

InternetAccess point

Data capture

Load data Data preprocessing

Training by J48

Block attacks

Block attacksldquoimpersonationrdquo

Skip normal traffics

Classifier (J48)

Incremental clustering model

(ISSGC)

Anomaly detection model

(FLDS)

Marking a new attack by a

security expert

Database of marked attacks

Data training

Setting parameters of incremental

clustering

Intrusion detection system for local wireless network based on machine learning methods

Attacks

Impersonationattacks

Normal traffics

Normaltraffics

Outliers

Marked attacks

Figure 9 A new framework for intrusion detection in 80211 networks

Table 5 Confusion matrix for AWID data set using J48 and IncrementalSSGC in proposed framework

Normal Flooding Impersonation Injection Classication530588 116 6 75 Normal2553 5544 0 0 Flooding2 0 16680 0 Injection3297 148 0 16364 Impersonation

Journal of Computer Networks and Communications 11

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] K P Murphy Machine Learning A Probabilistic PerspectiveMIT press Cambridge MA USA 2012

[2] B Ahmad W Jian and Z Anwar Ali ldquoRole of machinelearning and data mining in internet security standing statewith future directionsrdquo Journal of Computer Networks andCommunications vol 2018 Article ID 6383145 10 pages2018

[3] T Bakhshi and B Ghita ldquoOn internet traffic classification atwo-phasedmachine learning approachrdquo Journal of ComputerNetworks and Communications vol 2016 Article ID 204830221 pages 2016

[4] B Luo and J Xia ldquoA novel intrusion detection system basedon feature generation with visualization strategyrdquo ExpertSystems with Applications vol 41 no 9 pp 4139ndash4147 2014

[5] W-C Lin S-W Ke and C-F Tsai ldquoCANN an intrusiondetection system based on combining cluster centers andnearest neighborsrdquo Knowledge-Based Systems vol 78pp 13ndash21 2015

[6] C-F Tsai and C-Y Lin ldquoA triangle area based nearestneighbors approach to intrusion detectionrdquo Pattern Recog-nition vol 43 no 1 pp 222ndash229 2010

[7] F Kuang S Zhang Z Jin and W Xu ldquoA novel SVM bycombining kernel principal component analysis and im-proved chaotic particle swarm optimization for intrusiondetectionrdquo Soft Computing vol 19 no 5 pp 1187ndash1199 2015

[8] M E Aminanto and K Kim ldquoDetecting impersonation attackin WiFi networks using deep learning approachrdquo in In-formation Security Applications D Choi and S Guilley EdsVol 10144 Springer Berlin Germany 2016

[9] A A Aburomman and M BI Reaz ldquoA novel SVM-kNN-PSO ensemble method for intrusion detection systemrdquo Ap-plied Soft Computing vol 38 pp 360ndash372 2016

[10] M M Breunig H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of the2000 ACM SIGMOD International Conference on Manage-ment of Data pp 93ndash104 Dallas TX USA May 2000

[11] V Hautamaki I Karkkainen and P Franti ldquoOutlier detectionusing k-nearest neighbour graphrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition pp 430ndash433Cambridge MA USA August 2004

[12] V ang and F F Pashchenko ldquoA new incremental semi-supervised graph based clusteringrdquo in Proceedings of the IEEEInternational Conference on Engineering andTelecommunication Moscow Russia March 2018

[13] V ang D V Pantiukhin and A N Nazarov ldquoFLDS fastoutlier detection based on local density scorerdquo in Proceedingsof the IEEE International Conference on Engineering andTelecommunication pp 137ndash141 Moscow Russia November2016

[14] C Kolias G Kambourakis A Stavrou and S GritzalisldquoIntrusion detection in 80211 networks empirical evaluationof threats and a public datasetrdquo IEEE Communications Sur-veys amp Tutorials vol 18 no 1 pp 184ndash208 2016

[15] Ch Gupta and R Grossman ldquoGenIca single pass generalizedincremental algorithm for clusteringrdquo in Proceedings of theFourth SIAM International Conference on Data Miningpp 147ndash153 Lake Buena Vista FL USA April 2004

[16] M Ester H-P Kriegel J Sander M Wimmer and X XuldquoIncremental clustering for mining in a data warehousingenvironmentrdquo in Proceedings of the International Conferenceon Very Large Data Bases pp 323ndash333 New York NY USAAugust 1998

[17] V Chandrasekhar C Tan M Wu L Li X Li and J-H LimldquoIncremental graph clustering for efficient retrieval fromstreaming egocentric video datardquo in Proceedings of the In-ternational Conference on Pattern Recognition pp 2631ndash2636Stockholm Sweden August 2014

[18] A M Bagirov J Ugon and D Webb ldquoFast modified globalK-means algorithm for incremental cluster constructionrdquoPattern Recognition vol 44 no 4 pp 866ndash876 2011

[19] A Bryant D E Tamir N D Rishe and K AbrahamldquoDynamic incremental fuzzy C-means clusteringrdquo in Pro-ceedings of the Sixth International Conference on PervasivePatterns and Applications Venice Italy May 2014

[20] Z Yu P Luo J You et al ldquoIncremental semi-supervisedclustering ensemble for high dimensional data clusteringrdquoIEEE Transactions on Knowledge and Data Engineeringvol 28 no 3 pp 701ndash714 2016

[21] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 3pp 1ndash58 2009

[22] J Tang Z Chen A Fu and D Cheung ldquoEnhancing effec-tiveness of outlier detections for low density patternsrdquo inProceedings of the Sixth Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD) Taipei Taiwan May2002

[23] S Papadimitriou H Kitagawa P B Gibbons andC Faloutsos ldquoLoci fast outlier detection using the localcorrelation integralrdquo in Proceedings of the 19th InternationalConference on Data Engineering pp 315ndash326 BangaloreIndia March 2003

[24] M Ester H-P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databaseswith noiserdquo in Proceedings of the Conference on KnowledgeDiscovery and Data Mining (KDD) pp 226ndash231 PortlandOR USA August 1996

[25] L Ertoz M Steinbach and V Kumar ldquoFinding clusters ofdifferent sizes shapes and densities in noisy high di-mensional datardquo in Proceedings of the SIAM InternationalConference on Data Mining pp 47ndash58 San Francisco CAUSA May 2003

[26] E M Knorr R T Ng and V Tucakov ldquoDistance-basedoutliers algorithms and applicationsrdquo e InternationalJournal on Very Large Data Bases vol 8 no 3-4 pp 237ndash2532000

[27] S Basu I Davidson and K L Wagstaff ldquoConstrainedclustering advances in algorithms theory and applicationsrdquoin Chapman and HallCRC Data Mining and KnowledgeDiscovery Series CRC Press Boca Raton FL USA 1stedition 2008

[28] A A Abin ldquoClustering with side information further effortsto improve efficiencyrdquo Pattern Recognition Letters vol 84pp 252ndash258 2016

[29] Y Shi C Otto and A K Jain ldquoFace clustering representationand pairwise constraintsrdquo IEEE Transactions on InformationForensics and Security vol 13 no 7 pp 1626ndash1640 2018

[30] A A Abin and B Hamid ldquoActive constrained fuzzy clus-tering a multiple kernels learning approachrdquo Pattern Rec-ognition vol 48 no 3 pp 953ndash967 2015

[31] S Xiong J Azimi and X Z Fern ldquoActive learning of con-straints for semi-supervised clusteringrdquo IEEE Transactions on

12 Journal of Computer Networks and Communications

Knowledge and Data Engineering vol 26 no 1 pp 43ndash542014

[32] K Wagstaff C Cardie S Rogers and S Schrodl ldquoCon-strained K-means clustering with background knowledgerdquo inProceedings of the International Conference on MachineLearning (ICML) pp 577ndash584 Williamstown MA USAJune 2001

[33] I Davidson and S S Ravi ldquoUsing instance-level constraints inagglomerative hierarchical clustering theoretical and em-pirical resultsrdquoDataMining and Knowledge Discovery vol 18no 2 pp 257ndash282 2009

[34] B Kulis S Basu I Dhillon and R Mooney ldquoSemi-supervisedgraph clustering a kernel approachrdquo Machine Learningvol 74 no 1 pp 1ndash22 2009

[35] V-V Vu ldquoAn efficient semi-supervised graph based clus-teringrdquo Intelligent Data Analysis vol 22 no 2 pp 297ndash3072018

[36] X Wang B Qian and I Davidson ldquoOn constrained spectralclustering and its applicationsrdquo Data Mining and KnowledgeDiscovery vol 28 no 1 pp 1ndash30 2014

[37] D Mavroeidis ldquoAccelerating spectral clustering with partialsupervisionrdquo Data Mining and Knowledge Discovery vol 21no 2 pp 241ndash258 2010

[38] L Lelis and J Sander ldquoSemi-supervised density-based clus-teringrdquo in Proceeding of IEEE International Conference onData Mining pp 842ndash847 Miami FL USA December 2009

[39] D-D Le and S Satoh ldquoUnsupervised face annotation bymining the webrdquo in Proceedings of the 8th IEEE InternationalConference on Data Mining pp 383ndash392 Pisa Italy De-cember 2008

[40] D Yan L Huang and M I Jordan ldquoFast approximatespectral clusteringrdquo in Proceedings of the Conference onKnowledge Discovery and Data Mining (KDD) pp 907ndash916New York NY USA June 2009

[41] C Zhong M Malinen D Miao and P Franti ldquoA fastminimum spanning tree algorithm based on K-meansrdquo In-formation Sciences vol 295 pp 1ndash17 2015

[42] V Chaoji M Al Hasan S Salem andM J Zaki ldquoSPARCL aneffective and efficient algorithm for mining arbitrary shape-based clustersrdquo Knowledge and Information Systems vol 21no 2 pp 201ndash229 2009

[43] A Asuncion and D J Newman UCI Machine LearningRepository American Statistical Association Boston MAUSA 2015 httparchiveicsuciedumlindexphp

[44] G Karypis E-H Han and V Kumar ldquoChameleon hierar-chical clustering using dynamic modelingrdquo Computer vol 32no 8 pp 68ndash75 1999

Journal of Computer Networks and Communications 13

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

multistage approach which involves several stages ofsampling (with replacement) of the data and estimating themembership matrix for the next stage e experimentsconducted show the effectiveness of the proposed methodIn [17] Chandrasekhar et al propose an incremental localdensity clustering scheme for finding dense subgraphs instreaming data ie when data arrive incrementally(ILDC) e incremental clustering scheme captures re-dundancy in the streaming data source by finding densesubgraphs which correspond to salient objects and scenese ILDC process performs greedy operations like clusterexpansion cluster addition and cluster merging based onthe similarity between clusters defined e ILDC showsthe effectiveness when using in image-retrieval applica-tions In [20] an incremental semisupervised ensembleclustering algorithm has successfully presented namedISSCE ISSCE uses constraints to update incrementalmembers e authors develop an incremental ensemblemember selection process based on a global objectivefunction and a local objective function to remove theredundant ensemble members e experiment resultsshow the improvement of ISSCE over traditional semi-supervised clustering ensemble approaches or conven-tional cluster ensemble methods on six real-world datasetsfrom UCI machine learning repository and 12 real-worlddata sets of cancer gene expression profiles In the contextof classification we need to find the label for a new dataobject by using a classifier trained by data training eproblem of identifying the label for a new object in in-cremental clustering can be seen similar to classificationcontext

22 Outlier Detection Problem Outlier (anomaly) detectionis one of the important problems of machine learning anddata mining As mentioned in [21] outliers detection is theproblem of finding patterns in data that do not conform toexpected behaviore applications of outlier detection canbe found in many applications such as intrusion detectioncredit fraud detection video surveillance weather pre-diction discovery of criminal activities of electroniccommerce etc [9 21] ere are some kinds of outliersincluding point outliers contextual outliers and collectiveoutliers In this paper we focus on point outliers detectionthat can be applied in a variety of applications For a dataset consisting of points a point will be called outlier if it isdifferent from a large number of the rest of the points Todetect outliers there are some principal methods in theliterature such as classification methods nearest neighbormethods clustering methods statistical methods distance-based methods etc

For the classification-based outliers detection we havetwo categories multiclass and one-class anomalies de-tection methods In multiclass classification techniques weassume that the training data contain labeled points of allnormal classes e learner using a supervised learningmodel trains a model using the labeled data e classifiercan distinguish between each normal class and the rest ofthe class A test point will be called outlier if it does not

belong to any normal class In one-class outliers detectionmethods we assume that the number of normal class isonly one e classifier learns a model that can detect theboundary of the normal class If a test point does not fall inthe boundary it will be called outliers Although manytechniques have been done however the main disadvan-tage of these methods based on the availability of accuratelabels for normal classes which is not easy to apply for realapplications

For the nearest neighbor-based outlier detectionmethodswe use the assumption as follows normal points belong to thedense regions while outliers belong to the sparse regionsemost famous method of this kind is the LOF algorithm eidea of LOF is based on the local density evaluation score forpoints Each point will be assigned a score which is the ratio ofthe average local density of the k-nearest neighbors of thepoint and the local density of the data point itself Manyvariants of LOF can be cited here such as COF [22] ODIN[11] LOCI [23] etc e main drawback of the method is theO(n2) complexity required

For the clustering-based outliers detection techniquesthe idea here is using clustering methods to group data intoclusters e points do not belong to any clusters calledoutliers Some clustering methods can be detected outlierssuch as DBSCAN [24] SNN [25] etc In fact the purpose ofclustering is finding clusters so the outliers are just theproduct of the clustering process and hence are notcarefully optimized One more reason that can be madehere is the complexity of clustering techniques requiredO(n2)

In the statistical outliers detection methods thesemethods are based on the assumption as follows normaldata points occur in high-probability regions of a sto-chastic model while anomalies occur in the low-probability regions of the stochastic model Somemethods have been done for the kind of outliers detectionsIn general statistical methods fit a statistical model(Gaussian distribution the mixture of parametric statis-tical distribution etc) to the given data and then apply astatistical inference test to determine if an unseen instancebelongs to this model or not e key limitation of thesemethods is the assumption about the distribution of datapoints is assumption is not true especially when thedimension of data is high [21]

In the distance-based outliers detection methods a pointis considered as outlier if it does not have enough pctpoints in the data set that distance from this point is smallerthan the threshold value dmin [26]

3 Proposed Method

31 Semisupervised Graph-Based Clustering In recent yearssemisupervised clustering is an important research topic thatis illustrated by a number of studies introduced [27] epurpose of semisupervised clustering is to integrate sideinformation for improving the clustering performancesGenerally there are two kinds of side information includingconstraints and seeds Given a data set X constraints involvemust-link and cannot-link in which the must-link constraint

4 Journal of Computer Networks and Communications

(ML) between two observations x isin X and y isin X meansthat x and y should be in the same cluster and the cannot-link constraint (CL) means that x and y should not be inthe same cluster With seeds a small set of labeled data(called seeds) S isin X will be provided for semisupervisedclustering algorithms In fact this side information isavailable or can be collected from users [28ndash31] We cancite here the work of semisupervised clustering forK-means [32] hierarchical clustering [33] graph-basedclustering [34 35] spectral clustering [36 37] density-based clustering [38] etc While many semisupervisedclustering algorithms are introduced to the best of ourknowledge there are no incremental semisupervisedclustering algorithms in the literature

Our new incremental clustering introduced in the nextsection is based on the work of semisupervised graph-basedclustering using seeds (SSGC) We choose the SSGC al-gorithm because SSGC algorithm has several advantagessuch as SSGC use only one parameter and SSGC can detectclusters in varied density regions of data [35] SSGC in-cludes two steps as the following description (seeAlgorithm 1)

Step 1 Given a k-nearest neighbor graph presenting a dataset X this step uses a loop in which at each step all edgeswhich have the weight less than threshold θ will be re-moved e value of θ is initialized by 0 at the first step andincremented by 1 after each step is loop will stop wheneach connected component has at most one kind of seedse main clusters are identified by propagating label ineach connected component that contains seeds

Step 2 e remaining points (graph nodes) that do notbelong to any main clusters will be divided into two kindspoints that have edges which relate to one or more clustersand other points which are isolated points In the first casepoints will be assigned to the cluster with the largest relatedweight For the isolated points we can either remove them asoutliers or label them

We note that in SSGC the weight ω(xi xj) of the edge(the similarity) between two points xi and xj in the k-nearestneighbor graph is equal to the number of points that the twopoints share as the following equation

ω xi xj1113872 1113873 NN xi( 1113857capNN xj1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868 (1)

where NN(middot) is the set of k-nearest neighbors of the specifiedpoint

SSGC is efficient when compared with the semi-supervised density-based clustering in detecting clusters forbatch data however it is not adapted for data stream or datawarehousing environment where many updates (insertiondeletion) occur

32 Incremental Graph-Based Clustering Using Seeds In thissection we propose IncrementalSSGC based on the SSGCalgorithm In the IncrementalSSGC the seeds will be used totrain a k-nearest neighbor graph to construct connected

components and identify the value of θ as in SSGC algo-rithm Like other incremental clustering algorithms twoprocedures must be developed including insertion anddeletion

Algorithm 2 shows the insertion step of Incre-mentalSSGC for a new data point xnew At first the list ofedges between xnew and the current clusters is created andall edges with weight smaller than θ will be removed If thelist is empty it is illustrated that xnew is an outlier with thecurrent situation and hence xnew will be added in a tem-porary list Lo In the case of existing edges between xnew andsome connected components we need to remove someedges until xnew connects to components with one kind oflabel Finally the label of xnew will be identified by the labelof its connected components In Step 10 xnew and its relatededges will be added to L some edges between xt and xl willalso be recalculated if xnew appears in the nearest neighborslist of xt or xl In Step 12 after some insertion steps we canexamine the points in Lo

Algorithm 3 presents the detailed steps of the deletionprocess When we want to remove a point xdel from thecurrent clusters we simply remove xdel and all edges relatedwith xdel in the graph Step 2 of the algorithm shows theupdating process In this step we need to update all edgesaffected by xdel It means that all edges between xi and xj

must be updated if xdel appears in the commune list of thenearest neighbors Finally Step 3 is simply to remove alledges that have weight less than θ

321 e Complexity Analysis Now we will analyse thecomplexity of IncrementalSSGC Given a data set with nobject we recall that the complexity of SSGC is O(k times n2) inwhich k is the number of nearest neighbors Assuming thatwe have the current clusters including n objects we willanalyse the complexity of the insertion and deletion processof IncrementalSSGC at step (n + 1) as follows

For the insertion process which aims to identify thecluster label for a new data point xnew in Step 1 to createthe list of edges between xnew and the current clusters thecomplexity is O(n times k) In Steps 2 6 and 7 the complexityis just O(k) In Step 10 some edges between xt and xl willalso be recalculated if xnew appears in the nearest neighborslist of xt or xl in fact the number of such edges is alsosmall So for the insertion of a new point the complexity isO(n times k)

For the deletion process the complexity of Step 1 isO(k) In Steps 2 and 3 the number of edges updated is thenumber of edges that received xdel as commune points andthe value of commune points depends on the data set Let qbe the average value of v deletion processes in fact q isdetermined by perfoming experiments So the complexityof a deletion process is O(q times n times k)

In summary with the analysis of the insertion anddeletion process above we can see that it is very useful fordata set that we usually need to update In the next sectionwe also present the running time of both SSGC andIncrementalSSGC for some data sets extracted from in-trusion detection problem

Journal of Computer Networks and Communications 5

33 A Fast Outlier Detection Method Given a k-nearestneighbors graph (k-NNG) the local density score LDS of avertex u isin k-NNG is defined as follows [39]

LDS(u) 1113936qisinNN(u)ω(u q)

k (2)

in whichω is calculated as in equation (1) and k is the numberof nearest neighbors used e LDS is used as an indicator of

the density of the region of a vertex u e LDS value is in theinterval of [0 kndash 1] the larger the LDS of u the denser theregion that u belongs to and vice versa So we can apply theway of LDSrsquos calculation to identify outliers To detect outlierby this method we have to use a parameter as the thresholdthe point which has LDS value smaller than the threshold canbe seen as an outlier and vice versa Similar to LOF themethod has required O(n2) of complexity

Input X number of neighbors k a set of seeds SOutput A set of detected clustersoutliersPROCESS

(1) Constructing the k-NN graph of X(2) θ 0(3) repeat(4) Constructing the connected components using the threshold θ(5) θ θ + 1(6) until the cut condition is satisfied(7) Propagating the labels to form the principal clusters(8) Constructing the final clusters

ALGORITHM 1 e algorithm SSGC [35]

Input a new data object xnew a set of current clusters C list containing edges for each point of current clusters L θ (threshold)and number of nearest neighbors (NN) kOutput label for xnewProcess

(1) Create the k-nearest neighbors list of edges (LE) between xnew and all current clusters(2) Delete all (u v) isin LE weight(u v)lt θ(3) if (LE empty) then(4) xnew is added in a temporary list Lo(5) else(6) If xnew related to two or more components with different label then(7) Delete edges in LE with ascending order of weight until xnew connecting with components with at most one kind of label(8) end if(9) Get label for xnew and its connected points (if any) by propagating(10) Update list L adding edges relating to xnew to L some edges between xt and xl will also be recalculated if xnew appears in the

nearest neighbors list of xt or xl(11) end if(12) Examine points in Lo

ALGORITHM 2 IncrementalSSGC insertion process

Input an object xdel in a component will be deleted a set of current clusters C list of edges for each point of current clusters L θ(threshold)Output the updated C the updated LProcess

(1) Delete xdel and all edges related to xdel in L(2) Update all weights (k l) isin C xdel isin NN(k)capNN(l)

(3) Delete all updated (at Step 2) (k l) isin L weight(t l)lt θ

ALGORITHM 3 IncrementalSSGC deletion process

6 Journal of Computer Networks and Communications

To reduce the running time of the method we proposea Fast outlier detection method based on Local DensityScore called FLDS e basic idea of the algorithm FLDSis to use divide-and-conquer strategy Given a data set X tofind outliers first the input data set will be split into kclusters using K-means algorithm Next k-nearestneighbor graphs will be used for each cluster and identifyoutlier on each local cluster e outliers found in allclusters will be recalculated on the whole data set e ideaof divide-and-conquer strategies by using the K-means inthe preprocessing step has been successfully applied insolving some problems such as fast spectral clusteringproblem [40] and fast minimum spanning tree problem[41] and in the efficient and effective shape-based clus-tering paper [42] e FLDS algorithm is described inAlgorithm 4

e FLDS algorithm is an outlierrsquos detection methodbased on K-means and local density score using graph ecomplexity of FLDS is O(n times k) + O(k2) + O(t times n) inwhich the value of k may be used up to n05 [41 42] t≪ n isevaluated approximately equal to k so the complexity of theFLDS is O(n15)

4 Experiment Results

is section aims to evaluate the effectiveness of our pro-posed algorithms We will show the results of the Incre-mentalSSGC the results of FLDS and the results when usingour methods for a hybrid framework for intrusion detectionproblem e IncrementalSSGC will be compared with theIncrementalDBSCAN while the FLDS will be comparedwith the LOF

e data sets used in the experiments are mostlyextracted from the Aegean WiFi Intrusion Dataset (AWID)[14] AWID is a publicly available collection of sets of data inan easily distributed format which contain real traces ofboth the normal and intrusive 80211 traffic In the AWIDmany kinds of attacks have been introduced but they alsofall into four main categories including flooding injectionand impersonation e AWID has 156 attributes we use 35attributes extracted by an artificial neural network as pre-sented in [8] We also use some supplement data sets thatcome from UCI [43] and data sets with different size shapeand density and contain noise points as well as special ar-tifacts [44] in this experiment

41 Experiment Setup

411 Data Sets for Incremental Clustering Algorithms Toshow the effectiveness of the IncrementalSSGC two aspectswill be examined including the running time and accuracy 5UCI data sets and 3 data sets extracted from AWID willbe used for testing IncrementalSSGC and Incre-mentalDBSCAN e details of these data sets are presentedin Table 1

To evaluate clustering results the Rand Index is usedGiven a data set X with n points for clustering P1 is an arraycontaining the true labels P2 is an array containing the

results of a clustering algorithm the Rand Index (RI) iscalculated as follows

RI a + b

(n(2(nminus 2))) (3)

in which ab is the number of pairs that are in the samedifferent clusters in both partitions P1 and P2 e bigger theRand Index the better the result

412 Data Sets for FLDS and LOF We used 5 data setsextracted from AWDI and four 2D data sets includingDS1 (10000 points) DS2 (8000 points) DS3 (8000 points)and DS4 (8000 points) [44] for FLDS and LOF ese 2Ddata sets have clusters of different size shape and ori-entation as well as random noise points and special ar-tifacts e details of these AWID data sets are presentedin Table 2

To compare LOF and FLDS for AWID data sets we usethe ROC measure that has two factors including FalsePositive (False Alarm) Rate (FPR) and False Negative (MissDetection) Rate (FNR) e detail of these factors is shownin the following equations

FPR FP

FP + TN (4)

FNR FN

TP + FN (5)

in which True Positive (TP) is the number of attacks cor-rectly classified as attack True Negative (TN) is the numberof normal correctly detected as normal False Positive (FP) isthe number of normal falsely classified as attacks namelyfalse alarm and False Negative (FN) is the number of attacksfalsely detected as normal

To combine FPR and FNR values we calculate the HalfTotal Error Rate (HTER) that is similar to the evaluationmethod used in [11] defined as follows

HTER FPR + FNR

2 (6)

42 Clustering Results We note that there is no incrementalsemisupervised clustering algorithm in the literature So wecompare the performance obtained by our algorithm and theIncrementalDBSCAN algorithm IncrementalDBSCAN canbe seen as the state of the art among Incremental clusteringproposed e algorithm can detect clusters with differentsize and shape with noises Because both SSGC andIncrementalSSGC produce the same results we just show theresults for IncrementalSSGC and IncrementalDBSCANeresults are shown in Figure 4

We can see from the figure that the IncrementalSSGCobtains better results compared with the Incre-mentalDBSCAN It can be explained by the fact that theIncrementalDBSCAN cannot detect clusters with differentdensities as mentioned in the paper we assumed that theparameter values Eps and MinPts of DBSCAN do notchange significantly when inserting and deleting objects

Journal of Computer Networks and Communications 7

is assumption means that the IncrementalDBSCANcannot work well with the data set having dierent den-sities In contrary to IncrementalDBSCAN the algorithmIncrementalSSGC does not depend on the density of thedata because the similarity measure used is based on sharednearest neighbors

421 Running Time Comparison Figure 5 presents therunning time for IncrementalSSGC and Incre-mentalDBSCAN for three AWID data sets We can see therunning time of both algorithms is similar It can beexplained by the fact that both algorithms use k-nearestneighbor to nd clusters for each step of incremental Wealso present the running time of the SSGC algorithm forreference purpose From this experiment we can see ad-vantages of the incremental clustering algorithms

43eResults of FLDSandLOF Table 3 presents the resultsobtained by FLDS and LOF for 5 AWID data sets We cansee that the results of FLDS are comparable with the al-gorithm LOF e parameters used for both methods areshown in Table 4 For some 2D data sets Figure 6 presentsthe results obtained by FLDS and LOF Intuitively theoutliers detected by both methods are mostly similar Wecan explain the results by the fact that the strategy forevaluating a point is outlier or not based on local densityscore

Figures 7 and 8 illustrate the running time comparisonbetween FLDS and LOF With 4 data sets mentioned aboveit can be seen from the gure that the calculation time ofFLDS is about 12 times faster than the running time of LOFis is the signicant improvement compared with LOF Itcan be explained by the fact that the complexity of FLDS isjust O(n15) compared with O(n2) of LOF

Input a data set X with n points the number of nearest neighbors k number of clusters nc thetaOutput outliers of XProcess

(1) Using K-means to split X into nc clusters(2) Using LDS algorithm on each separate cluster to obtain local outliers (using the threshold theta)(3) e local outliers obtained in Step 2 will be recalculated LDSrsquos value across the data set

ALGORITHM 4 Algorithm FLDS

Table 1 Main characteristics for clustering evaluation

ID Data Normal + impers Attributes Clusters1 Iris 150 4 33 Wine 178 13 32 E coli 336 8 84 Breast 569 30 25 Yeast 1484 8 106 AWID1 5000 35 27 AWID2 8000 35 28 AWID3 12000 35 2

Table 2 Main characteristics for FLDS and LOF

ID Data Objects Categories1 O-AWID1 3030 Impers currenooding injections3 O-AWID2 5030 Impers normal currenooding2 O-AWID3 7040 Flooding normal impers4 O-AWID4 10040 Normal impers injection currenooding5 O-AWID5 15050 Normal currenooding injection and impers

IncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Rand

Inde

x

2 3 4 5 6 7 81

Figure 4 Clustering results obtained by IncrementalDBSCAN andIncrementalSSGC for 8 data set of Table 1 respectively

8 Journal of Computer Networks and Communications

44 A Framework for Intrusion Detection in 80211 NetworksIn this section we propose a multistage system-basedmachine learning techniques applied for the AWDI dataset e detail of our system is presented in Figure 9 reecomponents are used for intrusion detection task a su-pervised learning model (J48 Bayes random forest sup-port vector machine neural network etc) trained bylabeled data set and this model can be seen as misusedetection component an outlier detection method (LOFFLDS etc) is optionally used to detect new attacks in someperiods of time additionally for the AWID data sets aspresented above it is very disectcult to detect impersonationattacks so we use an Incremental clustering algorithm(IncrementalDBSCAN IncrementalSSGC etc) for furthernding this kind of attack

In this experiment we use J48 for the misuse detectionprocess and IncrementalSSGC for the detecting impersonation

attacks In the outliers detection step we propose to use FLDSor LOF and the results have been presented in the subsectionabove Because the outliers detection step can be realized oumlinefor some periods of time we just show the results obtained bycombining J48 and IncrementalSSGCe confusionmatrix of

AWID1 AWID2 AWID3

SSGCIncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Tim

e (m

inut

es)

Figure 5 Running time comparison between IncrementalSSGCand IncrementalDBSCAN

Table 4 e parameters used in data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS (k ncθ)

(25 306)

(25 306)

(25 306)

(25 456)

(25 456)

LOF (MinPtsη) (27 12) (27 12) (25 12) (25 12) (27 12)

k the number of nearest neighbors nc number of cluster used θ thethreshold

Table 3 e HTER measure of LOF and FLDS (the smaller thebetter) for some extracted AWID data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS 013 012 010 011 006LOF 023 011 011 009 009

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(a)

Figure 6 Continued

Journal of Computer Networks and Communications 9

these results is illustrated in Table 5e total accuracy obtained989 compared with 9626 in the paper [14] We can ex-plain the results obtained by IncrementalSSGC by the fact thatthe algorithm used the distance based on shared nearest

neighbors which overcome the limit of transitional distancemeasures such as Euclidean or Minskovki distance and theshared nearest neighbors measure does not depend on thedensity of datais proposed system is generally called hybridmethod which is one of the best strategies in developing In-trusion Detection Systems [7 9] in which there is no singleclassier that can exactly detect all kinds of classes

We also note that for real applications whenever anattack appears the system needs to immediately produce awarning e multistage system-based machine learningtechniques provide a solution for users for constructing thereal IDSIPS system that is one of the most importantproblems in the network security

5 Conclusion

is paper introduces an incremental semisupervisedgraph-based clustering and a fast outlier detection

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(b)

Figure 6 Results of LOF (a) and FLDS (b) on some 2D data setsthe outliers marked as red plus

LOFFLDS

0

20

40

60

80

Tim

e (m

inut

es)

2 3 41

Figure 7 Running time comparison between FLDS and LOF forfour 2D data sets

LOFFLDS

0

50

100

150

200

250

300Ti

me (

min

utes

)

2 3 4 51

Figure 8 Running time comparison between FLDS and LOF forve AWID data sets

10 Journal of Computer Networks and Communications

method Both methods can be used in a hybrid frameworkfor the intrusion detection problem of WiFi data sets(AWID) Our proposed multistage system-based machinelearning techniques provide a solution to guideline forconstructing the real IDSIPS system that is one of themost important problems in the network security Ex-periments conducted on the extracted data sets from theAWID and UCI show the eectiveness of our proposedmethods In the near future we will continue to

develop other kinds of machine learning methods forintrusion detection problem and test for other experi-mental setup

Data Availability

e data used to support the ndings of this study can bedownloaded from the AWID repository (httpicsdwebaegeangrawiddownloadhtml)

InternetAccess point

Data capture

Load data Data preprocessing

Training by J48

Block attacks

Block attacksldquoimpersonationrdquo

Skip normal traffics

Classifier (J48)

Incremental clustering model

(ISSGC)

Anomaly detection model

(FLDS)

Marking a new attack by a

security expert

Database of marked attacks

Data training

Setting parameters of incremental

clustering

Intrusion detection system for local wireless network based on machine learning methods

Attacks

Impersonationattacks

Normal traffics

Normaltraffics

Outliers

Marked attacks

Figure 9 A new framework for intrusion detection in 80211 networks

Table 5 Confusion matrix for AWID data set using J48 and IncrementalSSGC in proposed framework

Normal Flooding Impersonation Injection Classication530588 116 6 75 Normal2553 5544 0 0 Flooding2 0 16680 0 Injection3297 148 0 16364 Impersonation

Journal of Computer Networks and Communications 11

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] K P Murphy Machine Learning A Probabilistic PerspectiveMIT press Cambridge MA USA 2012

[2] B Ahmad W Jian and Z Anwar Ali ldquoRole of machinelearning and data mining in internet security standing statewith future directionsrdquo Journal of Computer Networks andCommunications vol 2018 Article ID 6383145 10 pages2018

[3] T Bakhshi and B Ghita ldquoOn internet traffic classification atwo-phasedmachine learning approachrdquo Journal of ComputerNetworks and Communications vol 2016 Article ID 204830221 pages 2016

[4] B Luo and J Xia ldquoA novel intrusion detection system basedon feature generation with visualization strategyrdquo ExpertSystems with Applications vol 41 no 9 pp 4139ndash4147 2014

[5] W-C Lin S-W Ke and C-F Tsai ldquoCANN an intrusiondetection system based on combining cluster centers andnearest neighborsrdquo Knowledge-Based Systems vol 78pp 13ndash21 2015

[6] C-F Tsai and C-Y Lin ldquoA triangle area based nearestneighbors approach to intrusion detectionrdquo Pattern Recog-nition vol 43 no 1 pp 222ndash229 2010

[7] F Kuang S Zhang Z Jin and W Xu ldquoA novel SVM bycombining kernel principal component analysis and im-proved chaotic particle swarm optimization for intrusiondetectionrdquo Soft Computing vol 19 no 5 pp 1187ndash1199 2015

[8] M E Aminanto and K Kim ldquoDetecting impersonation attackin WiFi networks using deep learning approachrdquo in In-formation Security Applications D Choi and S Guilley EdsVol 10144 Springer Berlin Germany 2016

[9] A A Aburomman and M BI Reaz ldquoA novel SVM-kNN-PSO ensemble method for intrusion detection systemrdquo Ap-plied Soft Computing vol 38 pp 360ndash372 2016

[10] M M Breunig H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of the2000 ACM SIGMOD International Conference on Manage-ment of Data pp 93ndash104 Dallas TX USA May 2000

[11] V Hautamaki I Karkkainen and P Franti ldquoOutlier detectionusing k-nearest neighbour graphrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition pp 430ndash433Cambridge MA USA August 2004

[12] V ang and F F Pashchenko ldquoA new incremental semi-supervised graph based clusteringrdquo in Proceedings of the IEEEInternational Conference on Engineering andTelecommunication Moscow Russia March 2018

[13] V ang D V Pantiukhin and A N Nazarov ldquoFLDS fastoutlier detection based on local density scorerdquo in Proceedingsof the IEEE International Conference on Engineering andTelecommunication pp 137ndash141 Moscow Russia November2016

[14] C Kolias G Kambourakis A Stavrou and S GritzalisldquoIntrusion detection in 80211 networks empirical evaluationof threats and a public datasetrdquo IEEE Communications Sur-veys amp Tutorials vol 18 no 1 pp 184ndash208 2016

[15] Ch Gupta and R Grossman ldquoGenIca single pass generalizedincremental algorithm for clusteringrdquo in Proceedings of theFourth SIAM International Conference on Data Miningpp 147ndash153 Lake Buena Vista FL USA April 2004

[16] M Ester H-P Kriegel J Sander M Wimmer and X XuldquoIncremental clustering for mining in a data warehousingenvironmentrdquo in Proceedings of the International Conferenceon Very Large Data Bases pp 323ndash333 New York NY USAAugust 1998

[17] V Chandrasekhar C Tan M Wu L Li X Li and J-H LimldquoIncremental graph clustering for efficient retrieval fromstreaming egocentric video datardquo in Proceedings of the In-ternational Conference on Pattern Recognition pp 2631ndash2636Stockholm Sweden August 2014

[18] A M Bagirov J Ugon and D Webb ldquoFast modified globalK-means algorithm for incremental cluster constructionrdquoPattern Recognition vol 44 no 4 pp 866ndash876 2011

[19] A Bryant D E Tamir N D Rishe and K AbrahamldquoDynamic incremental fuzzy C-means clusteringrdquo in Pro-ceedings of the Sixth International Conference on PervasivePatterns and Applications Venice Italy May 2014

[20] Z Yu P Luo J You et al ldquoIncremental semi-supervisedclustering ensemble for high dimensional data clusteringrdquoIEEE Transactions on Knowledge and Data Engineeringvol 28 no 3 pp 701ndash714 2016

[21] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 3pp 1ndash58 2009

[22] J Tang Z Chen A Fu and D Cheung ldquoEnhancing effec-tiveness of outlier detections for low density patternsrdquo inProceedings of the Sixth Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD) Taipei Taiwan May2002

[23] S Papadimitriou H Kitagawa P B Gibbons andC Faloutsos ldquoLoci fast outlier detection using the localcorrelation integralrdquo in Proceedings of the 19th InternationalConference on Data Engineering pp 315ndash326 BangaloreIndia March 2003

[24] M Ester H-P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databaseswith noiserdquo in Proceedings of the Conference on KnowledgeDiscovery and Data Mining (KDD) pp 226ndash231 PortlandOR USA August 1996

[25] L Ertoz M Steinbach and V Kumar ldquoFinding clusters ofdifferent sizes shapes and densities in noisy high di-mensional datardquo in Proceedings of the SIAM InternationalConference on Data Mining pp 47ndash58 San Francisco CAUSA May 2003

[26] E M Knorr R T Ng and V Tucakov ldquoDistance-basedoutliers algorithms and applicationsrdquo e InternationalJournal on Very Large Data Bases vol 8 no 3-4 pp 237ndash2532000

[27] S Basu I Davidson and K L Wagstaff ldquoConstrainedclustering advances in algorithms theory and applicationsrdquoin Chapman and HallCRC Data Mining and KnowledgeDiscovery Series CRC Press Boca Raton FL USA 1stedition 2008

[28] A A Abin ldquoClustering with side information further effortsto improve efficiencyrdquo Pattern Recognition Letters vol 84pp 252ndash258 2016

[29] Y Shi C Otto and A K Jain ldquoFace clustering representationand pairwise constraintsrdquo IEEE Transactions on InformationForensics and Security vol 13 no 7 pp 1626ndash1640 2018

[30] A A Abin and B Hamid ldquoActive constrained fuzzy clus-tering a multiple kernels learning approachrdquo Pattern Rec-ognition vol 48 no 3 pp 953ndash967 2015

[31] S Xiong J Azimi and X Z Fern ldquoActive learning of con-straints for semi-supervised clusteringrdquo IEEE Transactions on

12 Journal of Computer Networks and Communications

Knowledge and Data Engineering vol 26 no 1 pp 43ndash542014

[32] K Wagstaff C Cardie S Rogers and S Schrodl ldquoCon-strained K-means clustering with background knowledgerdquo inProceedings of the International Conference on MachineLearning (ICML) pp 577ndash584 Williamstown MA USAJune 2001

[33] I Davidson and S S Ravi ldquoUsing instance-level constraints inagglomerative hierarchical clustering theoretical and em-pirical resultsrdquoDataMining and Knowledge Discovery vol 18no 2 pp 257ndash282 2009

[34] B Kulis S Basu I Dhillon and R Mooney ldquoSemi-supervisedgraph clustering a kernel approachrdquo Machine Learningvol 74 no 1 pp 1ndash22 2009

[35] V-V Vu ldquoAn efficient semi-supervised graph based clus-teringrdquo Intelligent Data Analysis vol 22 no 2 pp 297ndash3072018

[36] X Wang B Qian and I Davidson ldquoOn constrained spectralclustering and its applicationsrdquo Data Mining and KnowledgeDiscovery vol 28 no 1 pp 1ndash30 2014

[37] D Mavroeidis ldquoAccelerating spectral clustering with partialsupervisionrdquo Data Mining and Knowledge Discovery vol 21no 2 pp 241ndash258 2010

[38] L Lelis and J Sander ldquoSemi-supervised density-based clus-teringrdquo in Proceeding of IEEE International Conference onData Mining pp 842ndash847 Miami FL USA December 2009

[39] D-D Le and S Satoh ldquoUnsupervised face annotation bymining the webrdquo in Proceedings of the 8th IEEE InternationalConference on Data Mining pp 383ndash392 Pisa Italy De-cember 2008

[40] D Yan L Huang and M I Jordan ldquoFast approximatespectral clusteringrdquo in Proceedings of the Conference onKnowledge Discovery and Data Mining (KDD) pp 907ndash916New York NY USA June 2009

[41] C Zhong M Malinen D Miao and P Franti ldquoA fastminimum spanning tree algorithm based on K-meansrdquo In-formation Sciences vol 295 pp 1ndash17 2015

[42] V Chaoji M Al Hasan S Salem andM J Zaki ldquoSPARCL aneffective and efficient algorithm for mining arbitrary shape-based clustersrdquo Knowledge and Information Systems vol 21no 2 pp 201ndash229 2009

[43] A Asuncion and D J Newman UCI Machine LearningRepository American Statistical Association Boston MAUSA 2015 httparchiveicsuciedumlindexphp

[44] G Karypis E-H Han and V Kumar ldquoChameleon hierar-chical clustering using dynamic modelingrdquo Computer vol 32no 8 pp 68ndash75 1999

Journal of Computer Networks and Communications 13

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

(ML) between two observations x isin X and y isin X meansthat x and y should be in the same cluster and the cannot-link constraint (CL) means that x and y should not be inthe same cluster With seeds a small set of labeled data(called seeds) S isin X will be provided for semisupervisedclustering algorithms In fact this side information isavailable or can be collected from users [28ndash31] We cancite here the work of semisupervised clustering forK-means [32] hierarchical clustering [33] graph-basedclustering [34 35] spectral clustering [36 37] density-based clustering [38] etc While many semisupervisedclustering algorithms are introduced to the best of ourknowledge there are no incremental semisupervisedclustering algorithms in the literature

Our new incremental clustering introduced in the nextsection is based on the work of semisupervised graph-basedclustering using seeds (SSGC) We choose the SSGC al-gorithm because SSGC algorithm has several advantagessuch as SSGC use only one parameter and SSGC can detectclusters in varied density regions of data [35] SSGC in-cludes two steps as the following description (seeAlgorithm 1)

Step 1 Given a k-nearest neighbor graph presenting a dataset X this step uses a loop in which at each step all edgeswhich have the weight less than threshold θ will be re-moved e value of θ is initialized by 0 at the first step andincremented by 1 after each step is loop will stop wheneach connected component has at most one kind of seedse main clusters are identified by propagating label ineach connected component that contains seeds

Step 2 e remaining points (graph nodes) that do notbelong to any main clusters will be divided into two kindspoints that have edges which relate to one or more clustersand other points which are isolated points In the first casepoints will be assigned to the cluster with the largest relatedweight For the isolated points we can either remove them asoutliers or label them

We note that in SSGC the weight ω(xi xj) of the edge(the similarity) between two points xi and xj in the k-nearestneighbor graph is equal to the number of points that the twopoints share as the following equation

ω xi xj1113872 1113873 NN xi( 1113857capNN xj1113872 111387311138681113868111386811138681113868

11138681113868111386811138681113868 (1)

where NN(middot) is the set of k-nearest neighbors of the specifiedpoint

SSGC is efficient when compared with the semi-supervised density-based clustering in detecting clusters forbatch data however it is not adapted for data stream or datawarehousing environment where many updates (insertiondeletion) occur

32 Incremental Graph-Based Clustering Using Seeds In thissection we propose IncrementalSSGC based on the SSGCalgorithm In the IncrementalSSGC the seeds will be used totrain a k-nearest neighbor graph to construct connected

components and identify the value of θ as in SSGC algo-rithm Like other incremental clustering algorithms twoprocedures must be developed including insertion anddeletion

Algorithm 2 shows the insertion step of Incre-mentalSSGC for a new data point xnew At first the list ofedges between xnew and the current clusters is created andall edges with weight smaller than θ will be removed If thelist is empty it is illustrated that xnew is an outlier with thecurrent situation and hence xnew will be added in a tem-porary list Lo In the case of existing edges between xnew andsome connected components we need to remove someedges until xnew connects to components with one kind oflabel Finally the label of xnew will be identified by the labelof its connected components In Step 10 xnew and its relatededges will be added to L some edges between xt and xl willalso be recalculated if xnew appears in the nearest neighborslist of xt or xl In Step 12 after some insertion steps we canexamine the points in Lo

Algorithm 3 presents the detailed steps of the deletionprocess When we want to remove a point xdel from thecurrent clusters we simply remove xdel and all edges relatedwith xdel in the graph Step 2 of the algorithm shows theupdating process In this step we need to update all edgesaffected by xdel It means that all edges between xi and xj

must be updated if xdel appears in the commune list of thenearest neighbors Finally Step 3 is simply to remove alledges that have weight less than θ

321 e Complexity Analysis Now we will analyse thecomplexity of IncrementalSSGC Given a data set with nobject we recall that the complexity of SSGC is O(k times n2) inwhich k is the number of nearest neighbors Assuming thatwe have the current clusters including n objects we willanalyse the complexity of the insertion and deletion processof IncrementalSSGC at step (n + 1) as follows

For the insertion process which aims to identify thecluster label for a new data point xnew in Step 1 to createthe list of edges between xnew and the current clusters thecomplexity is O(n times k) In Steps 2 6 and 7 the complexityis just O(k) In Step 10 some edges between xt and xl willalso be recalculated if xnew appears in the nearest neighborslist of xt or xl in fact the number of such edges is alsosmall So for the insertion of a new point the complexity isO(n times k)

For the deletion process the complexity of Step 1 isO(k) In Steps 2 and 3 the number of edges updated is thenumber of edges that received xdel as commune points andthe value of commune points depends on the data set Let qbe the average value of v deletion processes in fact q isdetermined by perfoming experiments So the complexityof a deletion process is O(q times n times k)

In summary with the analysis of the insertion anddeletion process above we can see that it is very useful fordata set that we usually need to update In the next sectionwe also present the running time of both SSGC andIncrementalSSGC for some data sets extracted from in-trusion detection problem

Journal of Computer Networks and Communications 5

33 A Fast Outlier Detection Method Given a k-nearestneighbors graph (k-NNG) the local density score LDS of avertex u isin k-NNG is defined as follows [39]

LDS(u) 1113936qisinNN(u)ω(u q)

k (2)

in whichω is calculated as in equation (1) and k is the numberof nearest neighbors used e LDS is used as an indicator of

the density of the region of a vertex u e LDS value is in theinterval of [0 kndash 1] the larger the LDS of u the denser theregion that u belongs to and vice versa So we can apply theway of LDSrsquos calculation to identify outliers To detect outlierby this method we have to use a parameter as the thresholdthe point which has LDS value smaller than the threshold canbe seen as an outlier and vice versa Similar to LOF themethod has required O(n2) of complexity

Input X number of neighbors k a set of seeds SOutput A set of detected clustersoutliersPROCESS

(1) Constructing the k-NN graph of X(2) θ 0(3) repeat(4) Constructing the connected components using the threshold θ(5) θ θ + 1(6) until the cut condition is satisfied(7) Propagating the labels to form the principal clusters(8) Constructing the final clusters

ALGORITHM 1 e algorithm SSGC [35]

Input a new data object xnew a set of current clusters C list containing edges for each point of current clusters L θ (threshold)and number of nearest neighbors (NN) kOutput label for xnewProcess

(1) Create the k-nearest neighbors list of edges (LE) between xnew and all current clusters(2) Delete all (u v) isin LE weight(u v)lt θ(3) if (LE empty) then(4) xnew is added in a temporary list Lo(5) else(6) If xnew related to two or more components with different label then(7) Delete edges in LE with ascending order of weight until xnew connecting with components with at most one kind of label(8) end if(9) Get label for xnew and its connected points (if any) by propagating(10) Update list L adding edges relating to xnew to L some edges between xt and xl will also be recalculated if xnew appears in the

nearest neighbors list of xt or xl(11) end if(12) Examine points in Lo

ALGORITHM 2 IncrementalSSGC insertion process

Input an object xdel in a component will be deleted a set of current clusters C list of edges for each point of current clusters L θ(threshold)Output the updated C the updated LProcess

(1) Delete xdel and all edges related to xdel in L(2) Update all weights (k l) isin C xdel isin NN(k)capNN(l)

(3) Delete all updated (at Step 2) (k l) isin L weight(t l)lt θ

ALGORITHM 3 IncrementalSSGC deletion process

6 Journal of Computer Networks and Communications

To reduce the running time of the method we proposea Fast outlier detection method based on Local DensityScore called FLDS e basic idea of the algorithm FLDSis to use divide-and-conquer strategy Given a data set X tofind outliers first the input data set will be split into kclusters using K-means algorithm Next k-nearestneighbor graphs will be used for each cluster and identifyoutlier on each local cluster e outliers found in allclusters will be recalculated on the whole data set e ideaof divide-and-conquer strategies by using the K-means inthe preprocessing step has been successfully applied insolving some problems such as fast spectral clusteringproblem [40] and fast minimum spanning tree problem[41] and in the efficient and effective shape-based clus-tering paper [42] e FLDS algorithm is described inAlgorithm 4

e FLDS algorithm is an outlierrsquos detection methodbased on K-means and local density score using graph ecomplexity of FLDS is O(n times k) + O(k2) + O(t times n) inwhich the value of k may be used up to n05 [41 42] t≪ n isevaluated approximately equal to k so the complexity of theFLDS is O(n15)

4 Experiment Results

is section aims to evaluate the effectiveness of our pro-posed algorithms We will show the results of the Incre-mentalSSGC the results of FLDS and the results when usingour methods for a hybrid framework for intrusion detectionproblem e IncrementalSSGC will be compared with theIncrementalDBSCAN while the FLDS will be comparedwith the LOF

e data sets used in the experiments are mostlyextracted from the Aegean WiFi Intrusion Dataset (AWID)[14] AWID is a publicly available collection of sets of data inan easily distributed format which contain real traces ofboth the normal and intrusive 80211 traffic In the AWIDmany kinds of attacks have been introduced but they alsofall into four main categories including flooding injectionand impersonation e AWID has 156 attributes we use 35attributes extracted by an artificial neural network as pre-sented in [8] We also use some supplement data sets thatcome from UCI [43] and data sets with different size shapeand density and contain noise points as well as special ar-tifacts [44] in this experiment

41 Experiment Setup

411 Data Sets for Incremental Clustering Algorithms Toshow the effectiveness of the IncrementalSSGC two aspectswill be examined including the running time and accuracy 5UCI data sets and 3 data sets extracted from AWID willbe used for testing IncrementalSSGC and Incre-mentalDBSCAN e details of these data sets are presentedin Table 1

To evaluate clustering results the Rand Index is usedGiven a data set X with n points for clustering P1 is an arraycontaining the true labels P2 is an array containing the

results of a clustering algorithm the Rand Index (RI) iscalculated as follows

RI a + b

(n(2(nminus 2))) (3)

in which ab is the number of pairs that are in the samedifferent clusters in both partitions P1 and P2 e bigger theRand Index the better the result

412 Data Sets for FLDS and LOF We used 5 data setsextracted from AWDI and four 2D data sets includingDS1 (10000 points) DS2 (8000 points) DS3 (8000 points)and DS4 (8000 points) [44] for FLDS and LOF ese 2Ddata sets have clusters of different size shape and ori-entation as well as random noise points and special ar-tifacts e details of these AWID data sets are presentedin Table 2

To compare LOF and FLDS for AWID data sets we usethe ROC measure that has two factors including FalsePositive (False Alarm) Rate (FPR) and False Negative (MissDetection) Rate (FNR) e detail of these factors is shownin the following equations

FPR FP

FP + TN (4)

FNR FN

TP + FN (5)

in which True Positive (TP) is the number of attacks cor-rectly classified as attack True Negative (TN) is the numberof normal correctly detected as normal False Positive (FP) isthe number of normal falsely classified as attacks namelyfalse alarm and False Negative (FN) is the number of attacksfalsely detected as normal

To combine FPR and FNR values we calculate the HalfTotal Error Rate (HTER) that is similar to the evaluationmethod used in [11] defined as follows

HTER FPR + FNR

2 (6)

42 Clustering Results We note that there is no incrementalsemisupervised clustering algorithm in the literature So wecompare the performance obtained by our algorithm and theIncrementalDBSCAN algorithm IncrementalDBSCAN canbe seen as the state of the art among Incremental clusteringproposed e algorithm can detect clusters with differentsize and shape with noises Because both SSGC andIncrementalSSGC produce the same results we just show theresults for IncrementalSSGC and IncrementalDBSCANeresults are shown in Figure 4

We can see from the figure that the IncrementalSSGCobtains better results compared with the Incre-mentalDBSCAN It can be explained by the fact that theIncrementalDBSCAN cannot detect clusters with differentdensities as mentioned in the paper we assumed that theparameter values Eps and MinPts of DBSCAN do notchange significantly when inserting and deleting objects

Journal of Computer Networks and Communications 7

is assumption means that the IncrementalDBSCANcannot work well with the data set having dierent den-sities In contrary to IncrementalDBSCAN the algorithmIncrementalSSGC does not depend on the density of thedata because the similarity measure used is based on sharednearest neighbors

421 Running Time Comparison Figure 5 presents therunning time for IncrementalSSGC and Incre-mentalDBSCAN for three AWID data sets We can see therunning time of both algorithms is similar It can beexplained by the fact that both algorithms use k-nearestneighbor to nd clusters for each step of incremental Wealso present the running time of the SSGC algorithm forreference purpose From this experiment we can see ad-vantages of the incremental clustering algorithms

43eResults of FLDSandLOF Table 3 presents the resultsobtained by FLDS and LOF for 5 AWID data sets We cansee that the results of FLDS are comparable with the al-gorithm LOF e parameters used for both methods areshown in Table 4 For some 2D data sets Figure 6 presentsthe results obtained by FLDS and LOF Intuitively theoutliers detected by both methods are mostly similar Wecan explain the results by the fact that the strategy forevaluating a point is outlier or not based on local densityscore

Figures 7 and 8 illustrate the running time comparisonbetween FLDS and LOF With 4 data sets mentioned aboveit can be seen from the gure that the calculation time ofFLDS is about 12 times faster than the running time of LOFis is the signicant improvement compared with LOF Itcan be explained by the fact that the complexity of FLDS isjust O(n15) compared with O(n2) of LOF

Input a data set X with n points the number of nearest neighbors k number of clusters nc thetaOutput outliers of XProcess

(1) Using K-means to split X into nc clusters(2) Using LDS algorithm on each separate cluster to obtain local outliers (using the threshold theta)(3) e local outliers obtained in Step 2 will be recalculated LDSrsquos value across the data set

ALGORITHM 4 Algorithm FLDS

Table 1 Main characteristics for clustering evaluation

ID Data Normal + impers Attributes Clusters1 Iris 150 4 33 Wine 178 13 32 E coli 336 8 84 Breast 569 30 25 Yeast 1484 8 106 AWID1 5000 35 27 AWID2 8000 35 28 AWID3 12000 35 2

Table 2 Main characteristics for FLDS and LOF

ID Data Objects Categories1 O-AWID1 3030 Impers currenooding injections3 O-AWID2 5030 Impers normal currenooding2 O-AWID3 7040 Flooding normal impers4 O-AWID4 10040 Normal impers injection currenooding5 O-AWID5 15050 Normal currenooding injection and impers

IncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Rand

Inde

x

2 3 4 5 6 7 81

Figure 4 Clustering results obtained by IncrementalDBSCAN andIncrementalSSGC for 8 data set of Table 1 respectively

8 Journal of Computer Networks and Communications

44 A Framework for Intrusion Detection in 80211 NetworksIn this section we propose a multistage system-basedmachine learning techniques applied for the AWDI dataset e detail of our system is presented in Figure 9 reecomponents are used for intrusion detection task a su-pervised learning model (J48 Bayes random forest sup-port vector machine neural network etc) trained bylabeled data set and this model can be seen as misusedetection component an outlier detection method (LOFFLDS etc) is optionally used to detect new attacks in someperiods of time additionally for the AWID data sets aspresented above it is very disectcult to detect impersonationattacks so we use an Incremental clustering algorithm(IncrementalDBSCAN IncrementalSSGC etc) for furthernding this kind of attack

In this experiment we use J48 for the misuse detectionprocess and IncrementalSSGC for the detecting impersonation

attacks In the outliers detection step we propose to use FLDSor LOF and the results have been presented in the subsectionabove Because the outliers detection step can be realized oumlinefor some periods of time we just show the results obtained bycombining J48 and IncrementalSSGCe confusionmatrix of

AWID1 AWID2 AWID3

SSGCIncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Tim

e (m

inut

es)

Figure 5 Running time comparison between IncrementalSSGCand IncrementalDBSCAN

Table 4 e parameters used in data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS (k ncθ)

(25 306)

(25 306)

(25 306)

(25 456)

(25 456)

LOF (MinPtsη) (27 12) (27 12) (25 12) (25 12) (27 12)

k the number of nearest neighbors nc number of cluster used θ thethreshold

Table 3 e HTER measure of LOF and FLDS (the smaller thebetter) for some extracted AWID data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS 013 012 010 011 006LOF 023 011 011 009 009

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(a)

Figure 6 Continued

Journal of Computer Networks and Communications 9

these results is illustrated in Table 5e total accuracy obtained989 compared with 9626 in the paper [14] We can ex-plain the results obtained by IncrementalSSGC by the fact thatthe algorithm used the distance based on shared nearest

neighbors which overcome the limit of transitional distancemeasures such as Euclidean or Minskovki distance and theshared nearest neighbors measure does not depend on thedensity of datais proposed system is generally called hybridmethod which is one of the best strategies in developing In-trusion Detection Systems [7 9] in which there is no singleclassier that can exactly detect all kinds of classes

We also note that for real applications whenever anattack appears the system needs to immediately produce awarning e multistage system-based machine learningtechniques provide a solution for users for constructing thereal IDSIPS system that is one of the most importantproblems in the network security

5 Conclusion

is paper introduces an incremental semisupervisedgraph-based clustering and a fast outlier detection

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(b)

Figure 6 Results of LOF (a) and FLDS (b) on some 2D data setsthe outliers marked as red plus

LOFFLDS

0

20

40

60

80

Tim

e (m

inut

es)

2 3 41

Figure 7 Running time comparison between FLDS and LOF forfour 2D data sets

LOFFLDS

0

50

100

150

200

250

300Ti

me (

min

utes

)

2 3 4 51

Figure 8 Running time comparison between FLDS and LOF forve AWID data sets

10 Journal of Computer Networks and Communications

method Both methods can be used in a hybrid frameworkfor the intrusion detection problem of WiFi data sets(AWID) Our proposed multistage system-based machinelearning techniques provide a solution to guideline forconstructing the real IDSIPS system that is one of themost important problems in the network security Ex-periments conducted on the extracted data sets from theAWID and UCI show the eectiveness of our proposedmethods In the near future we will continue to

develop other kinds of machine learning methods forintrusion detection problem and test for other experi-mental setup

Data Availability

e data used to support the ndings of this study can bedownloaded from the AWID repository (httpicsdwebaegeangrawiddownloadhtml)

InternetAccess point

Data capture

Load data Data preprocessing

Training by J48

Block attacks

Block attacksldquoimpersonationrdquo

Skip normal traffics

Classifier (J48)

Incremental clustering model

(ISSGC)

Anomaly detection model

(FLDS)

Marking a new attack by a

security expert

Database of marked attacks

Data training

Setting parameters of incremental

clustering

Intrusion detection system for local wireless network based on machine learning methods

Attacks

Impersonationattacks

Normal traffics

Normaltraffics

Outliers

Marked attacks

Figure 9 A new framework for intrusion detection in 80211 networks

Table 5 Confusion matrix for AWID data set using J48 and IncrementalSSGC in proposed framework

Normal Flooding Impersonation Injection Classication530588 116 6 75 Normal2553 5544 0 0 Flooding2 0 16680 0 Injection3297 148 0 16364 Impersonation

Journal of Computer Networks and Communications 11

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] K P Murphy Machine Learning A Probabilistic PerspectiveMIT press Cambridge MA USA 2012

[2] B Ahmad W Jian and Z Anwar Ali ldquoRole of machinelearning and data mining in internet security standing statewith future directionsrdquo Journal of Computer Networks andCommunications vol 2018 Article ID 6383145 10 pages2018

[3] T Bakhshi and B Ghita ldquoOn internet traffic classification atwo-phasedmachine learning approachrdquo Journal of ComputerNetworks and Communications vol 2016 Article ID 204830221 pages 2016

[4] B Luo and J Xia ldquoA novel intrusion detection system basedon feature generation with visualization strategyrdquo ExpertSystems with Applications vol 41 no 9 pp 4139ndash4147 2014

[5] W-C Lin S-W Ke and C-F Tsai ldquoCANN an intrusiondetection system based on combining cluster centers andnearest neighborsrdquo Knowledge-Based Systems vol 78pp 13ndash21 2015

[6] C-F Tsai and C-Y Lin ldquoA triangle area based nearestneighbors approach to intrusion detectionrdquo Pattern Recog-nition vol 43 no 1 pp 222ndash229 2010

[7] F Kuang S Zhang Z Jin and W Xu ldquoA novel SVM bycombining kernel principal component analysis and im-proved chaotic particle swarm optimization for intrusiondetectionrdquo Soft Computing vol 19 no 5 pp 1187ndash1199 2015

[8] M E Aminanto and K Kim ldquoDetecting impersonation attackin WiFi networks using deep learning approachrdquo in In-formation Security Applications D Choi and S Guilley EdsVol 10144 Springer Berlin Germany 2016

[9] A A Aburomman and M BI Reaz ldquoA novel SVM-kNN-PSO ensemble method for intrusion detection systemrdquo Ap-plied Soft Computing vol 38 pp 360ndash372 2016

[10] M M Breunig H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of the2000 ACM SIGMOD International Conference on Manage-ment of Data pp 93ndash104 Dallas TX USA May 2000

[11] V Hautamaki I Karkkainen and P Franti ldquoOutlier detectionusing k-nearest neighbour graphrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition pp 430ndash433Cambridge MA USA August 2004

[12] V ang and F F Pashchenko ldquoA new incremental semi-supervised graph based clusteringrdquo in Proceedings of the IEEEInternational Conference on Engineering andTelecommunication Moscow Russia March 2018

[13] V ang D V Pantiukhin and A N Nazarov ldquoFLDS fastoutlier detection based on local density scorerdquo in Proceedingsof the IEEE International Conference on Engineering andTelecommunication pp 137ndash141 Moscow Russia November2016

[14] C Kolias G Kambourakis A Stavrou and S GritzalisldquoIntrusion detection in 80211 networks empirical evaluationof threats and a public datasetrdquo IEEE Communications Sur-veys amp Tutorials vol 18 no 1 pp 184ndash208 2016

[15] Ch Gupta and R Grossman ldquoGenIca single pass generalizedincremental algorithm for clusteringrdquo in Proceedings of theFourth SIAM International Conference on Data Miningpp 147ndash153 Lake Buena Vista FL USA April 2004

[16] M Ester H-P Kriegel J Sander M Wimmer and X XuldquoIncremental clustering for mining in a data warehousingenvironmentrdquo in Proceedings of the International Conferenceon Very Large Data Bases pp 323ndash333 New York NY USAAugust 1998

[17] V Chandrasekhar C Tan M Wu L Li X Li and J-H LimldquoIncremental graph clustering for efficient retrieval fromstreaming egocentric video datardquo in Proceedings of the In-ternational Conference on Pattern Recognition pp 2631ndash2636Stockholm Sweden August 2014

[18] A M Bagirov J Ugon and D Webb ldquoFast modified globalK-means algorithm for incremental cluster constructionrdquoPattern Recognition vol 44 no 4 pp 866ndash876 2011

[19] A Bryant D E Tamir N D Rishe and K AbrahamldquoDynamic incremental fuzzy C-means clusteringrdquo in Pro-ceedings of the Sixth International Conference on PervasivePatterns and Applications Venice Italy May 2014

[20] Z Yu P Luo J You et al ldquoIncremental semi-supervisedclustering ensemble for high dimensional data clusteringrdquoIEEE Transactions on Knowledge and Data Engineeringvol 28 no 3 pp 701ndash714 2016

[21] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 3pp 1ndash58 2009

[22] J Tang Z Chen A Fu and D Cheung ldquoEnhancing effec-tiveness of outlier detections for low density patternsrdquo inProceedings of the Sixth Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD) Taipei Taiwan May2002

[23] S Papadimitriou H Kitagawa P B Gibbons andC Faloutsos ldquoLoci fast outlier detection using the localcorrelation integralrdquo in Proceedings of the 19th InternationalConference on Data Engineering pp 315ndash326 BangaloreIndia March 2003

[24] M Ester H-P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databaseswith noiserdquo in Proceedings of the Conference on KnowledgeDiscovery and Data Mining (KDD) pp 226ndash231 PortlandOR USA August 1996

[25] L Ertoz M Steinbach and V Kumar ldquoFinding clusters ofdifferent sizes shapes and densities in noisy high di-mensional datardquo in Proceedings of the SIAM InternationalConference on Data Mining pp 47ndash58 San Francisco CAUSA May 2003

[26] E M Knorr R T Ng and V Tucakov ldquoDistance-basedoutliers algorithms and applicationsrdquo e InternationalJournal on Very Large Data Bases vol 8 no 3-4 pp 237ndash2532000

[27] S Basu I Davidson and K L Wagstaff ldquoConstrainedclustering advances in algorithms theory and applicationsrdquoin Chapman and HallCRC Data Mining and KnowledgeDiscovery Series CRC Press Boca Raton FL USA 1stedition 2008

[28] A A Abin ldquoClustering with side information further effortsto improve efficiencyrdquo Pattern Recognition Letters vol 84pp 252ndash258 2016

[29] Y Shi C Otto and A K Jain ldquoFace clustering representationand pairwise constraintsrdquo IEEE Transactions on InformationForensics and Security vol 13 no 7 pp 1626ndash1640 2018

[30] A A Abin and B Hamid ldquoActive constrained fuzzy clus-tering a multiple kernels learning approachrdquo Pattern Rec-ognition vol 48 no 3 pp 953ndash967 2015

[31] S Xiong J Azimi and X Z Fern ldquoActive learning of con-straints for semi-supervised clusteringrdquo IEEE Transactions on

12 Journal of Computer Networks and Communications

Knowledge and Data Engineering vol 26 no 1 pp 43ndash542014

[32] K Wagstaff C Cardie S Rogers and S Schrodl ldquoCon-strained K-means clustering with background knowledgerdquo inProceedings of the International Conference on MachineLearning (ICML) pp 577ndash584 Williamstown MA USAJune 2001

[33] I Davidson and S S Ravi ldquoUsing instance-level constraints inagglomerative hierarchical clustering theoretical and em-pirical resultsrdquoDataMining and Knowledge Discovery vol 18no 2 pp 257ndash282 2009

[34] B Kulis S Basu I Dhillon and R Mooney ldquoSemi-supervisedgraph clustering a kernel approachrdquo Machine Learningvol 74 no 1 pp 1ndash22 2009

[35] V-V Vu ldquoAn efficient semi-supervised graph based clus-teringrdquo Intelligent Data Analysis vol 22 no 2 pp 297ndash3072018

[36] X Wang B Qian and I Davidson ldquoOn constrained spectralclustering and its applicationsrdquo Data Mining and KnowledgeDiscovery vol 28 no 1 pp 1ndash30 2014

[37] D Mavroeidis ldquoAccelerating spectral clustering with partialsupervisionrdquo Data Mining and Knowledge Discovery vol 21no 2 pp 241ndash258 2010

[38] L Lelis and J Sander ldquoSemi-supervised density-based clus-teringrdquo in Proceeding of IEEE International Conference onData Mining pp 842ndash847 Miami FL USA December 2009

[39] D-D Le and S Satoh ldquoUnsupervised face annotation bymining the webrdquo in Proceedings of the 8th IEEE InternationalConference on Data Mining pp 383ndash392 Pisa Italy De-cember 2008

[40] D Yan L Huang and M I Jordan ldquoFast approximatespectral clusteringrdquo in Proceedings of the Conference onKnowledge Discovery and Data Mining (KDD) pp 907ndash916New York NY USA June 2009

[41] C Zhong M Malinen D Miao and P Franti ldquoA fastminimum spanning tree algorithm based on K-meansrdquo In-formation Sciences vol 295 pp 1ndash17 2015

[42] V Chaoji M Al Hasan S Salem andM J Zaki ldquoSPARCL aneffective and efficient algorithm for mining arbitrary shape-based clustersrdquo Knowledge and Information Systems vol 21no 2 pp 201ndash229 2009

[43] A Asuncion and D J Newman UCI Machine LearningRepository American Statistical Association Boston MAUSA 2015 httparchiveicsuciedumlindexphp

[44] G Karypis E-H Han and V Kumar ldquoChameleon hierar-chical clustering using dynamic modelingrdquo Computer vol 32no 8 pp 68ndash75 1999

Journal of Computer Networks and Communications 13

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

33 A Fast Outlier Detection Method Given a k-nearestneighbors graph (k-NNG) the local density score LDS of avertex u isin k-NNG is defined as follows [39]

LDS(u) 1113936qisinNN(u)ω(u q)

k (2)

in whichω is calculated as in equation (1) and k is the numberof nearest neighbors used e LDS is used as an indicator of

the density of the region of a vertex u e LDS value is in theinterval of [0 kndash 1] the larger the LDS of u the denser theregion that u belongs to and vice versa So we can apply theway of LDSrsquos calculation to identify outliers To detect outlierby this method we have to use a parameter as the thresholdthe point which has LDS value smaller than the threshold canbe seen as an outlier and vice versa Similar to LOF themethod has required O(n2) of complexity

Input X number of neighbors k a set of seeds SOutput A set of detected clustersoutliersPROCESS

(1) Constructing the k-NN graph of X(2) θ 0(3) repeat(4) Constructing the connected components using the threshold θ(5) θ θ + 1(6) until the cut condition is satisfied(7) Propagating the labels to form the principal clusters(8) Constructing the final clusters

ALGORITHM 1 e algorithm SSGC [35]

Input a new data object xnew a set of current clusters C list containing edges for each point of current clusters L θ (threshold)and number of nearest neighbors (NN) kOutput label for xnewProcess

(1) Create the k-nearest neighbors list of edges (LE) between xnew and all current clusters(2) Delete all (u v) isin LE weight(u v)lt θ(3) if (LE empty) then(4) xnew is added in a temporary list Lo(5) else(6) If xnew related to two or more components with different label then(7) Delete edges in LE with ascending order of weight until xnew connecting with components with at most one kind of label(8) end if(9) Get label for xnew and its connected points (if any) by propagating(10) Update list L adding edges relating to xnew to L some edges between xt and xl will also be recalculated if xnew appears in the

nearest neighbors list of xt or xl(11) end if(12) Examine points in Lo

ALGORITHM 2 IncrementalSSGC insertion process

Input an object xdel in a component will be deleted a set of current clusters C list of edges for each point of current clusters L θ(threshold)Output the updated C the updated LProcess

(1) Delete xdel and all edges related to xdel in L(2) Update all weights (k l) isin C xdel isin NN(k)capNN(l)

(3) Delete all updated (at Step 2) (k l) isin L weight(t l)lt θ

ALGORITHM 3 IncrementalSSGC deletion process

6 Journal of Computer Networks and Communications

To reduce the running time of the method we proposea Fast outlier detection method based on Local DensityScore called FLDS e basic idea of the algorithm FLDSis to use divide-and-conquer strategy Given a data set X tofind outliers first the input data set will be split into kclusters using K-means algorithm Next k-nearestneighbor graphs will be used for each cluster and identifyoutlier on each local cluster e outliers found in allclusters will be recalculated on the whole data set e ideaof divide-and-conquer strategies by using the K-means inthe preprocessing step has been successfully applied insolving some problems such as fast spectral clusteringproblem [40] and fast minimum spanning tree problem[41] and in the efficient and effective shape-based clus-tering paper [42] e FLDS algorithm is described inAlgorithm 4

e FLDS algorithm is an outlierrsquos detection methodbased on K-means and local density score using graph ecomplexity of FLDS is O(n times k) + O(k2) + O(t times n) inwhich the value of k may be used up to n05 [41 42] t≪ n isevaluated approximately equal to k so the complexity of theFLDS is O(n15)

4 Experiment Results

is section aims to evaluate the effectiveness of our pro-posed algorithms We will show the results of the Incre-mentalSSGC the results of FLDS and the results when usingour methods for a hybrid framework for intrusion detectionproblem e IncrementalSSGC will be compared with theIncrementalDBSCAN while the FLDS will be comparedwith the LOF

e data sets used in the experiments are mostlyextracted from the Aegean WiFi Intrusion Dataset (AWID)[14] AWID is a publicly available collection of sets of data inan easily distributed format which contain real traces ofboth the normal and intrusive 80211 traffic In the AWIDmany kinds of attacks have been introduced but they alsofall into four main categories including flooding injectionand impersonation e AWID has 156 attributes we use 35attributes extracted by an artificial neural network as pre-sented in [8] We also use some supplement data sets thatcome from UCI [43] and data sets with different size shapeand density and contain noise points as well as special ar-tifacts [44] in this experiment

41 Experiment Setup

411 Data Sets for Incremental Clustering Algorithms Toshow the effectiveness of the IncrementalSSGC two aspectswill be examined including the running time and accuracy 5UCI data sets and 3 data sets extracted from AWID willbe used for testing IncrementalSSGC and Incre-mentalDBSCAN e details of these data sets are presentedin Table 1

To evaluate clustering results the Rand Index is usedGiven a data set X with n points for clustering P1 is an arraycontaining the true labels P2 is an array containing the

results of a clustering algorithm the Rand Index (RI) iscalculated as follows

RI a + b

(n(2(nminus 2))) (3)

in which ab is the number of pairs that are in the samedifferent clusters in both partitions P1 and P2 e bigger theRand Index the better the result

412 Data Sets for FLDS and LOF We used 5 data setsextracted from AWDI and four 2D data sets includingDS1 (10000 points) DS2 (8000 points) DS3 (8000 points)and DS4 (8000 points) [44] for FLDS and LOF ese 2Ddata sets have clusters of different size shape and ori-entation as well as random noise points and special ar-tifacts e details of these AWID data sets are presentedin Table 2

To compare LOF and FLDS for AWID data sets we usethe ROC measure that has two factors including FalsePositive (False Alarm) Rate (FPR) and False Negative (MissDetection) Rate (FNR) e detail of these factors is shownin the following equations

FPR FP

FP + TN (4)

FNR FN

TP + FN (5)

in which True Positive (TP) is the number of attacks cor-rectly classified as attack True Negative (TN) is the numberof normal correctly detected as normal False Positive (FP) isthe number of normal falsely classified as attacks namelyfalse alarm and False Negative (FN) is the number of attacksfalsely detected as normal

To combine FPR and FNR values we calculate the HalfTotal Error Rate (HTER) that is similar to the evaluationmethod used in [11] defined as follows

HTER FPR + FNR

2 (6)

42 Clustering Results We note that there is no incrementalsemisupervised clustering algorithm in the literature So wecompare the performance obtained by our algorithm and theIncrementalDBSCAN algorithm IncrementalDBSCAN canbe seen as the state of the art among Incremental clusteringproposed e algorithm can detect clusters with differentsize and shape with noises Because both SSGC andIncrementalSSGC produce the same results we just show theresults for IncrementalSSGC and IncrementalDBSCANeresults are shown in Figure 4

We can see from the figure that the IncrementalSSGCobtains better results compared with the Incre-mentalDBSCAN It can be explained by the fact that theIncrementalDBSCAN cannot detect clusters with differentdensities as mentioned in the paper we assumed that theparameter values Eps and MinPts of DBSCAN do notchange significantly when inserting and deleting objects

Journal of Computer Networks and Communications 7

is assumption means that the IncrementalDBSCANcannot work well with the data set having dierent den-sities In contrary to IncrementalDBSCAN the algorithmIncrementalSSGC does not depend on the density of thedata because the similarity measure used is based on sharednearest neighbors

421 Running Time Comparison Figure 5 presents therunning time for IncrementalSSGC and Incre-mentalDBSCAN for three AWID data sets We can see therunning time of both algorithms is similar It can beexplained by the fact that both algorithms use k-nearestneighbor to nd clusters for each step of incremental Wealso present the running time of the SSGC algorithm forreference purpose From this experiment we can see ad-vantages of the incremental clustering algorithms

43eResults of FLDSandLOF Table 3 presents the resultsobtained by FLDS and LOF for 5 AWID data sets We cansee that the results of FLDS are comparable with the al-gorithm LOF e parameters used for both methods areshown in Table 4 For some 2D data sets Figure 6 presentsthe results obtained by FLDS and LOF Intuitively theoutliers detected by both methods are mostly similar Wecan explain the results by the fact that the strategy forevaluating a point is outlier or not based on local densityscore

Figures 7 and 8 illustrate the running time comparisonbetween FLDS and LOF With 4 data sets mentioned aboveit can be seen from the gure that the calculation time ofFLDS is about 12 times faster than the running time of LOFis is the signicant improvement compared with LOF Itcan be explained by the fact that the complexity of FLDS isjust O(n15) compared with O(n2) of LOF

Input a data set X with n points the number of nearest neighbors k number of clusters nc thetaOutput outliers of XProcess

(1) Using K-means to split X into nc clusters(2) Using LDS algorithm on each separate cluster to obtain local outliers (using the threshold theta)(3) e local outliers obtained in Step 2 will be recalculated LDSrsquos value across the data set

ALGORITHM 4 Algorithm FLDS

Table 1 Main characteristics for clustering evaluation

ID Data Normal + impers Attributes Clusters1 Iris 150 4 33 Wine 178 13 32 E coli 336 8 84 Breast 569 30 25 Yeast 1484 8 106 AWID1 5000 35 27 AWID2 8000 35 28 AWID3 12000 35 2

Table 2 Main characteristics for FLDS and LOF

ID Data Objects Categories1 O-AWID1 3030 Impers currenooding injections3 O-AWID2 5030 Impers normal currenooding2 O-AWID3 7040 Flooding normal impers4 O-AWID4 10040 Normal impers injection currenooding5 O-AWID5 15050 Normal currenooding injection and impers

IncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Rand

Inde

x

2 3 4 5 6 7 81

Figure 4 Clustering results obtained by IncrementalDBSCAN andIncrementalSSGC for 8 data set of Table 1 respectively

8 Journal of Computer Networks and Communications

44 A Framework for Intrusion Detection in 80211 NetworksIn this section we propose a multistage system-basedmachine learning techniques applied for the AWDI dataset e detail of our system is presented in Figure 9 reecomponents are used for intrusion detection task a su-pervised learning model (J48 Bayes random forest sup-port vector machine neural network etc) trained bylabeled data set and this model can be seen as misusedetection component an outlier detection method (LOFFLDS etc) is optionally used to detect new attacks in someperiods of time additionally for the AWID data sets aspresented above it is very disectcult to detect impersonationattacks so we use an Incremental clustering algorithm(IncrementalDBSCAN IncrementalSSGC etc) for furthernding this kind of attack

In this experiment we use J48 for the misuse detectionprocess and IncrementalSSGC for the detecting impersonation

attacks In the outliers detection step we propose to use FLDSor LOF and the results have been presented in the subsectionabove Because the outliers detection step can be realized oumlinefor some periods of time we just show the results obtained bycombining J48 and IncrementalSSGCe confusionmatrix of

AWID1 AWID2 AWID3

SSGCIncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Tim

e (m

inut

es)

Figure 5 Running time comparison between IncrementalSSGCand IncrementalDBSCAN

Table 4 e parameters used in data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS (k ncθ)

(25 306)

(25 306)

(25 306)

(25 456)

(25 456)

LOF (MinPtsη) (27 12) (27 12) (25 12) (25 12) (27 12)

k the number of nearest neighbors nc number of cluster used θ thethreshold

Table 3 e HTER measure of LOF and FLDS (the smaller thebetter) for some extracted AWID data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS 013 012 010 011 006LOF 023 011 011 009 009

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(a)

Figure 6 Continued

Journal of Computer Networks and Communications 9

these results is illustrated in Table 5e total accuracy obtained989 compared with 9626 in the paper [14] We can ex-plain the results obtained by IncrementalSSGC by the fact thatthe algorithm used the distance based on shared nearest

neighbors which overcome the limit of transitional distancemeasures such as Euclidean or Minskovki distance and theshared nearest neighbors measure does not depend on thedensity of datais proposed system is generally called hybridmethod which is one of the best strategies in developing In-trusion Detection Systems [7 9] in which there is no singleclassier that can exactly detect all kinds of classes

We also note that for real applications whenever anattack appears the system needs to immediately produce awarning e multistage system-based machine learningtechniques provide a solution for users for constructing thereal IDSIPS system that is one of the most importantproblems in the network security

5 Conclusion

is paper introduces an incremental semisupervisedgraph-based clustering and a fast outlier detection

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(b)

Figure 6 Results of LOF (a) and FLDS (b) on some 2D data setsthe outliers marked as red plus

LOFFLDS

0

20

40

60

80

Tim

e (m

inut

es)

2 3 41

Figure 7 Running time comparison between FLDS and LOF forfour 2D data sets

LOFFLDS

0

50

100

150

200

250

300Ti

me (

min

utes

)

2 3 4 51

Figure 8 Running time comparison between FLDS and LOF forve AWID data sets

10 Journal of Computer Networks and Communications

method Both methods can be used in a hybrid frameworkfor the intrusion detection problem of WiFi data sets(AWID) Our proposed multistage system-based machinelearning techniques provide a solution to guideline forconstructing the real IDSIPS system that is one of themost important problems in the network security Ex-periments conducted on the extracted data sets from theAWID and UCI show the eectiveness of our proposedmethods In the near future we will continue to

develop other kinds of machine learning methods forintrusion detection problem and test for other experi-mental setup

Data Availability

e data used to support the ndings of this study can bedownloaded from the AWID repository (httpicsdwebaegeangrawiddownloadhtml)

InternetAccess point

Data capture

Load data Data preprocessing

Training by J48

Block attacks

Block attacksldquoimpersonationrdquo

Skip normal traffics

Classifier (J48)

Incremental clustering model

(ISSGC)

Anomaly detection model

(FLDS)

Marking a new attack by a

security expert

Database of marked attacks

Data training

Setting parameters of incremental

clustering

Intrusion detection system for local wireless network based on machine learning methods

Attacks

Impersonationattacks

Normal traffics

Normaltraffics

Outliers

Marked attacks

Figure 9 A new framework for intrusion detection in 80211 networks

Table 5 Confusion matrix for AWID data set using J48 and IncrementalSSGC in proposed framework

Normal Flooding Impersonation Injection Classication530588 116 6 75 Normal2553 5544 0 0 Flooding2 0 16680 0 Injection3297 148 0 16364 Impersonation

Journal of Computer Networks and Communications 11

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] K P Murphy Machine Learning A Probabilistic PerspectiveMIT press Cambridge MA USA 2012

[2] B Ahmad W Jian and Z Anwar Ali ldquoRole of machinelearning and data mining in internet security standing statewith future directionsrdquo Journal of Computer Networks andCommunications vol 2018 Article ID 6383145 10 pages2018

[3] T Bakhshi and B Ghita ldquoOn internet traffic classification atwo-phasedmachine learning approachrdquo Journal of ComputerNetworks and Communications vol 2016 Article ID 204830221 pages 2016

[4] B Luo and J Xia ldquoA novel intrusion detection system basedon feature generation with visualization strategyrdquo ExpertSystems with Applications vol 41 no 9 pp 4139ndash4147 2014

[5] W-C Lin S-W Ke and C-F Tsai ldquoCANN an intrusiondetection system based on combining cluster centers andnearest neighborsrdquo Knowledge-Based Systems vol 78pp 13ndash21 2015

[6] C-F Tsai and C-Y Lin ldquoA triangle area based nearestneighbors approach to intrusion detectionrdquo Pattern Recog-nition vol 43 no 1 pp 222ndash229 2010

[7] F Kuang S Zhang Z Jin and W Xu ldquoA novel SVM bycombining kernel principal component analysis and im-proved chaotic particle swarm optimization for intrusiondetectionrdquo Soft Computing vol 19 no 5 pp 1187ndash1199 2015

[8] M E Aminanto and K Kim ldquoDetecting impersonation attackin WiFi networks using deep learning approachrdquo in In-formation Security Applications D Choi and S Guilley EdsVol 10144 Springer Berlin Germany 2016

[9] A A Aburomman and M BI Reaz ldquoA novel SVM-kNN-PSO ensemble method for intrusion detection systemrdquo Ap-plied Soft Computing vol 38 pp 360ndash372 2016

[10] M M Breunig H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of the2000 ACM SIGMOD International Conference on Manage-ment of Data pp 93ndash104 Dallas TX USA May 2000

[11] V Hautamaki I Karkkainen and P Franti ldquoOutlier detectionusing k-nearest neighbour graphrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition pp 430ndash433Cambridge MA USA August 2004

[12] V ang and F F Pashchenko ldquoA new incremental semi-supervised graph based clusteringrdquo in Proceedings of the IEEEInternational Conference on Engineering andTelecommunication Moscow Russia March 2018

[13] V ang D V Pantiukhin and A N Nazarov ldquoFLDS fastoutlier detection based on local density scorerdquo in Proceedingsof the IEEE International Conference on Engineering andTelecommunication pp 137ndash141 Moscow Russia November2016

[14] C Kolias G Kambourakis A Stavrou and S GritzalisldquoIntrusion detection in 80211 networks empirical evaluationof threats and a public datasetrdquo IEEE Communications Sur-veys amp Tutorials vol 18 no 1 pp 184ndash208 2016

[15] Ch Gupta and R Grossman ldquoGenIca single pass generalizedincremental algorithm for clusteringrdquo in Proceedings of theFourth SIAM International Conference on Data Miningpp 147ndash153 Lake Buena Vista FL USA April 2004

[16] M Ester H-P Kriegel J Sander M Wimmer and X XuldquoIncremental clustering for mining in a data warehousingenvironmentrdquo in Proceedings of the International Conferenceon Very Large Data Bases pp 323ndash333 New York NY USAAugust 1998

[17] V Chandrasekhar C Tan M Wu L Li X Li and J-H LimldquoIncremental graph clustering for efficient retrieval fromstreaming egocentric video datardquo in Proceedings of the In-ternational Conference on Pattern Recognition pp 2631ndash2636Stockholm Sweden August 2014

[18] A M Bagirov J Ugon and D Webb ldquoFast modified globalK-means algorithm for incremental cluster constructionrdquoPattern Recognition vol 44 no 4 pp 866ndash876 2011

[19] A Bryant D E Tamir N D Rishe and K AbrahamldquoDynamic incremental fuzzy C-means clusteringrdquo in Pro-ceedings of the Sixth International Conference on PervasivePatterns and Applications Venice Italy May 2014

[20] Z Yu P Luo J You et al ldquoIncremental semi-supervisedclustering ensemble for high dimensional data clusteringrdquoIEEE Transactions on Knowledge and Data Engineeringvol 28 no 3 pp 701ndash714 2016

[21] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 3pp 1ndash58 2009

[22] J Tang Z Chen A Fu and D Cheung ldquoEnhancing effec-tiveness of outlier detections for low density patternsrdquo inProceedings of the Sixth Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD) Taipei Taiwan May2002

[23] S Papadimitriou H Kitagawa P B Gibbons andC Faloutsos ldquoLoci fast outlier detection using the localcorrelation integralrdquo in Proceedings of the 19th InternationalConference on Data Engineering pp 315ndash326 BangaloreIndia March 2003

[24] M Ester H-P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databaseswith noiserdquo in Proceedings of the Conference on KnowledgeDiscovery and Data Mining (KDD) pp 226ndash231 PortlandOR USA August 1996

[25] L Ertoz M Steinbach and V Kumar ldquoFinding clusters ofdifferent sizes shapes and densities in noisy high di-mensional datardquo in Proceedings of the SIAM InternationalConference on Data Mining pp 47ndash58 San Francisco CAUSA May 2003

[26] E M Knorr R T Ng and V Tucakov ldquoDistance-basedoutliers algorithms and applicationsrdquo e InternationalJournal on Very Large Data Bases vol 8 no 3-4 pp 237ndash2532000

[27] S Basu I Davidson and K L Wagstaff ldquoConstrainedclustering advances in algorithms theory and applicationsrdquoin Chapman and HallCRC Data Mining and KnowledgeDiscovery Series CRC Press Boca Raton FL USA 1stedition 2008

[28] A A Abin ldquoClustering with side information further effortsto improve efficiencyrdquo Pattern Recognition Letters vol 84pp 252ndash258 2016

[29] Y Shi C Otto and A K Jain ldquoFace clustering representationand pairwise constraintsrdquo IEEE Transactions on InformationForensics and Security vol 13 no 7 pp 1626ndash1640 2018

[30] A A Abin and B Hamid ldquoActive constrained fuzzy clus-tering a multiple kernels learning approachrdquo Pattern Rec-ognition vol 48 no 3 pp 953ndash967 2015

[31] S Xiong J Azimi and X Z Fern ldquoActive learning of con-straints for semi-supervised clusteringrdquo IEEE Transactions on

12 Journal of Computer Networks and Communications

Knowledge and Data Engineering vol 26 no 1 pp 43ndash542014

[32] K Wagstaff C Cardie S Rogers and S Schrodl ldquoCon-strained K-means clustering with background knowledgerdquo inProceedings of the International Conference on MachineLearning (ICML) pp 577ndash584 Williamstown MA USAJune 2001

[33] I Davidson and S S Ravi ldquoUsing instance-level constraints inagglomerative hierarchical clustering theoretical and em-pirical resultsrdquoDataMining and Knowledge Discovery vol 18no 2 pp 257ndash282 2009

[34] B Kulis S Basu I Dhillon and R Mooney ldquoSemi-supervisedgraph clustering a kernel approachrdquo Machine Learningvol 74 no 1 pp 1ndash22 2009

[35] V-V Vu ldquoAn efficient semi-supervised graph based clus-teringrdquo Intelligent Data Analysis vol 22 no 2 pp 297ndash3072018

[36] X Wang B Qian and I Davidson ldquoOn constrained spectralclustering and its applicationsrdquo Data Mining and KnowledgeDiscovery vol 28 no 1 pp 1ndash30 2014

[37] D Mavroeidis ldquoAccelerating spectral clustering with partialsupervisionrdquo Data Mining and Knowledge Discovery vol 21no 2 pp 241ndash258 2010

[38] L Lelis and J Sander ldquoSemi-supervised density-based clus-teringrdquo in Proceeding of IEEE International Conference onData Mining pp 842ndash847 Miami FL USA December 2009

[39] D-D Le and S Satoh ldquoUnsupervised face annotation bymining the webrdquo in Proceedings of the 8th IEEE InternationalConference on Data Mining pp 383ndash392 Pisa Italy De-cember 2008

[40] D Yan L Huang and M I Jordan ldquoFast approximatespectral clusteringrdquo in Proceedings of the Conference onKnowledge Discovery and Data Mining (KDD) pp 907ndash916New York NY USA June 2009

[41] C Zhong M Malinen D Miao and P Franti ldquoA fastminimum spanning tree algorithm based on K-meansrdquo In-formation Sciences vol 295 pp 1ndash17 2015

[42] V Chaoji M Al Hasan S Salem andM J Zaki ldquoSPARCL aneffective and efficient algorithm for mining arbitrary shape-based clustersrdquo Knowledge and Information Systems vol 21no 2 pp 201ndash229 2009

[43] A Asuncion and D J Newman UCI Machine LearningRepository American Statistical Association Boston MAUSA 2015 httparchiveicsuciedumlindexphp

[44] G Karypis E-H Han and V Kumar ldquoChameleon hierar-chical clustering using dynamic modelingrdquo Computer vol 32no 8 pp 68ndash75 1999

Journal of Computer Networks and Communications 13

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

To reduce the running time of the method we proposea Fast outlier detection method based on Local DensityScore called FLDS e basic idea of the algorithm FLDSis to use divide-and-conquer strategy Given a data set X tofind outliers first the input data set will be split into kclusters using K-means algorithm Next k-nearestneighbor graphs will be used for each cluster and identifyoutlier on each local cluster e outliers found in allclusters will be recalculated on the whole data set e ideaof divide-and-conquer strategies by using the K-means inthe preprocessing step has been successfully applied insolving some problems such as fast spectral clusteringproblem [40] and fast minimum spanning tree problem[41] and in the efficient and effective shape-based clus-tering paper [42] e FLDS algorithm is described inAlgorithm 4

e FLDS algorithm is an outlierrsquos detection methodbased on K-means and local density score using graph ecomplexity of FLDS is O(n times k) + O(k2) + O(t times n) inwhich the value of k may be used up to n05 [41 42] t≪ n isevaluated approximately equal to k so the complexity of theFLDS is O(n15)

4 Experiment Results

is section aims to evaluate the effectiveness of our pro-posed algorithms We will show the results of the Incre-mentalSSGC the results of FLDS and the results when usingour methods for a hybrid framework for intrusion detectionproblem e IncrementalSSGC will be compared with theIncrementalDBSCAN while the FLDS will be comparedwith the LOF

e data sets used in the experiments are mostlyextracted from the Aegean WiFi Intrusion Dataset (AWID)[14] AWID is a publicly available collection of sets of data inan easily distributed format which contain real traces ofboth the normal and intrusive 80211 traffic In the AWIDmany kinds of attacks have been introduced but they alsofall into four main categories including flooding injectionand impersonation e AWID has 156 attributes we use 35attributes extracted by an artificial neural network as pre-sented in [8] We also use some supplement data sets thatcome from UCI [43] and data sets with different size shapeand density and contain noise points as well as special ar-tifacts [44] in this experiment

41 Experiment Setup

411 Data Sets for Incremental Clustering Algorithms Toshow the effectiveness of the IncrementalSSGC two aspectswill be examined including the running time and accuracy 5UCI data sets and 3 data sets extracted from AWID willbe used for testing IncrementalSSGC and Incre-mentalDBSCAN e details of these data sets are presentedin Table 1

To evaluate clustering results the Rand Index is usedGiven a data set X with n points for clustering P1 is an arraycontaining the true labels P2 is an array containing the

results of a clustering algorithm the Rand Index (RI) iscalculated as follows

RI a + b

(n(2(nminus 2))) (3)

in which ab is the number of pairs that are in the samedifferent clusters in both partitions P1 and P2 e bigger theRand Index the better the result

412 Data Sets for FLDS and LOF We used 5 data setsextracted from AWDI and four 2D data sets includingDS1 (10000 points) DS2 (8000 points) DS3 (8000 points)and DS4 (8000 points) [44] for FLDS and LOF ese 2Ddata sets have clusters of different size shape and ori-entation as well as random noise points and special ar-tifacts e details of these AWID data sets are presentedin Table 2

To compare LOF and FLDS for AWID data sets we usethe ROC measure that has two factors including FalsePositive (False Alarm) Rate (FPR) and False Negative (MissDetection) Rate (FNR) e detail of these factors is shownin the following equations

FPR FP

FP + TN (4)

FNR FN

TP + FN (5)

in which True Positive (TP) is the number of attacks cor-rectly classified as attack True Negative (TN) is the numberof normal correctly detected as normal False Positive (FP) isthe number of normal falsely classified as attacks namelyfalse alarm and False Negative (FN) is the number of attacksfalsely detected as normal

To combine FPR and FNR values we calculate the HalfTotal Error Rate (HTER) that is similar to the evaluationmethod used in [11] defined as follows

HTER FPR + FNR

2 (6)

42 Clustering Results We note that there is no incrementalsemisupervised clustering algorithm in the literature So wecompare the performance obtained by our algorithm and theIncrementalDBSCAN algorithm IncrementalDBSCAN canbe seen as the state of the art among Incremental clusteringproposed e algorithm can detect clusters with differentsize and shape with noises Because both SSGC andIncrementalSSGC produce the same results we just show theresults for IncrementalSSGC and IncrementalDBSCANeresults are shown in Figure 4

We can see from the figure that the IncrementalSSGCobtains better results compared with the Incre-mentalDBSCAN It can be explained by the fact that theIncrementalDBSCAN cannot detect clusters with differentdensities as mentioned in the paper we assumed that theparameter values Eps and MinPts of DBSCAN do notchange significantly when inserting and deleting objects

Journal of Computer Networks and Communications 7

is assumption means that the IncrementalDBSCANcannot work well with the data set having dierent den-sities In contrary to IncrementalDBSCAN the algorithmIncrementalSSGC does not depend on the density of thedata because the similarity measure used is based on sharednearest neighbors

421 Running Time Comparison Figure 5 presents therunning time for IncrementalSSGC and Incre-mentalDBSCAN for three AWID data sets We can see therunning time of both algorithms is similar It can beexplained by the fact that both algorithms use k-nearestneighbor to nd clusters for each step of incremental Wealso present the running time of the SSGC algorithm forreference purpose From this experiment we can see ad-vantages of the incremental clustering algorithms

43eResults of FLDSandLOF Table 3 presents the resultsobtained by FLDS and LOF for 5 AWID data sets We cansee that the results of FLDS are comparable with the al-gorithm LOF e parameters used for both methods areshown in Table 4 For some 2D data sets Figure 6 presentsthe results obtained by FLDS and LOF Intuitively theoutliers detected by both methods are mostly similar Wecan explain the results by the fact that the strategy forevaluating a point is outlier or not based on local densityscore

Figures 7 and 8 illustrate the running time comparisonbetween FLDS and LOF With 4 data sets mentioned aboveit can be seen from the gure that the calculation time ofFLDS is about 12 times faster than the running time of LOFis is the signicant improvement compared with LOF Itcan be explained by the fact that the complexity of FLDS isjust O(n15) compared with O(n2) of LOF

Input a data set X with n points the number of nearest neighbors k number of clusters nc thetaOutput outliers of XProcess

(1) Using K-means to split X into nc clusters(2) Using LDS algorithm on each separate cluster to obtain local outliers (using the threshold theta)(3) e local outliers obtained in Step 2 will be recalculated LDSrsquos value across the data set

ALGORITHM 4 Algorithm FLDS

Table 1 Main characteristics for clustering evaluation

ID Data Normal + impers Attributes Clusters1 Iris 150 4 33 Wine 178 13 32 E coli 336 8 84 Breast 569 30 25 Yeast 1484 8 106 AWID1 5000 35 27 AWID2 8000 35 28 AWID3 12000 35 2

Table 2 Main characteristics for FLDS and LOF

ID Data Objects Categories1 O-AWID1 3030 Impers currenooding injections3 O-AWID2 5030 Impers normal currenooding2 O-AWID3 7040 Flooding normal impers4 O-AWID4 10040 Normal impers injection currenooding5 O-AWID5 15050 Normal currenooding injection and impers

IncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Rand

Inde

x

2 3 4 5 6 7 81

Figure 4 Clustering results obtained by IncrementalDBSCAN andIncrementalSSGC for 8 data set of Table 1 respectively

8 Journal of Computer Networks and Communications

44 A Framework for Intrusion Detection in 80211 NetworksIn this section we propose a multistage system-basedmachine learning techniques applied for the AWDI dataset e detail of our system is presented in Figure 9 reecomponents are used for intrusion detection task a su-pervised learning model (J48 Bayes random forest sup-port vector machine neural network etc) trained bylabeled data set and this model can be seen as misusedetection component an outlier detection method (LOFFLDS etc) is optionally used to detect new attacks in someperiods of time additionally for the AWID data sets aspresented above it is very disectcult to detect impersonationattacks so we use an Incremental clustering algorithm(IncrementalDBSCAN IncrementalSSGC etc) for furthernding this kind of attack

In this experiment we use J48 for the misuse detectionprocess and IncrementalSSGC for the detecting impersonation

attacks In the outliers detection step we propose to use FLDSor LOF and the results have been presented in the subsectionabove Because the outliers detection step can be realized oumlinefor some periods of time we just show the results obtained bycombining J48 and IncrementalSSGCe confusionmatrix of

AWID1 AWID2 AWID3

SSGCIncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Tim

e (m

inut

es)

Figure 5 Running time comparison between IncrementalSSGCand IncrementalDBSCAN

Table 4 e parameters used in data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS (k ncθ)

(25 306)

(25 306)

(25 306)

(25 456)

(25 456)

LOF (MinPtsη) (27 12) (27 12) (25 12) (25 12) (27 12)

k the number of nearest neighbors nc number of cluster used θ thethreshold

Table 3 e HTER measure of LOF and FLDS (the smaller thebetter) for some extracted AWID data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS 013 012 010 011 006LOF 023 011 011 009 009

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(a)

Figure 6 Continued

Journal of Computer Networks and Communications 9

these results is illustrated in Table 5e total accuracy obtained989 compared with 9626 in the paper [14] We can ex-plain the results obtained by IncrementalSSGC by the fact thatthe algorithm used the distance based on shared nearest

neighbors which overcome the limit of transitional distancemeasures such as Euclidean or Minskovki distance and theshared nearest neighbors measure does not depend on thedensity of datais proposed system is generally called hybridmethod which is one of the best strategies in developing In-trusion Detection Systems [7 9] in which there is no singleclassier that can exactly detect all kinds of classes

We also note that for real applications whenever anattack appears the system needs to immediately produce awarning e multistage system-based machine learningtechniques provide a solution for users for constructing thereal IDSIPS system that is one of the most importantproblems in the network security

5 Conclusion

is paper introduces an incremental semisupervisedgraph-based clustering and a fast outlier detection

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(b)

Figure 6 Results of LOF (a) and FLDS (b) on some 2D data setsthe outliers marked as red plus

LOFFLDS

0

20

40

60

80

Tim

e (m

inut

es)

2 3 41

Figure 7 Running time comparison between FLDS and LOF forfour 2D data sets

LOFFLDS

0

50

100

150

200

250

300Ti

me (

min

utes

)

2 3 4 51

Figure 8 Running time comparison between FLDS and LOF forve AWID data sets

10 Journal of Computer Networks and Communications

method Both methods can be used in a hybrid frameworkfor the intrusion detection problem of WiFi data sets(AWID) Our proposed multistage system-based machinelearning techniques provide a solution to guideline forconstructing the real IDSIPS system that is one of themost important problems in the network security Ex-periments conducted on the extracted data sets from theAWID and UCI show the eectiveness of our proposedmethods In the near future we will continue to

develop other kinds of machine learning methods forintrusion detection problem and test for other experi-mental setup

Data Availability

e data used to support the ndings of this study can bedownloaded from the AWID repository (httpicsdwebaegeangrawiddownloadhtml)

InternetAccess point

Data capture

Load data Data preprocessing

Training by J48

Block attacks

Block attacksldquoimpersonationrdquo

Skip normal traffics

Classifier (J48)

Incremental clustering model

(ISSGC)

Anomaly detection model

(FLDS)

Marking a new attack by a

security expert

Database of marked attacks

Data training

Setting parameters of incremental

clustering

Intrusion detection system for local wireless network based on machine learning methods

Attacks

Impersonationattacks

Normal traffics

Normaltraffics

Outliers

Marked attacks

Figure 9 A new framework for intrusion detection in 80211 networks

Table 5 Confusion matrix for AWID data set using J48 and IncrementalSSGC in proposed framework

Normal Flooding Impersonation Injection Classication530588 116 6 75 Normal2553 5544 0 0 Flooding2 0 16680 0 Injection3297 148 0 16364 Impersonation

Journal of Computer Networks and Communications 11

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] K P Murphy Machine Learning A Probabilistic PerspectiveMIT press Cambridge MA USA 2012

[2] B Ahmad W Jian and Z Anwar Ali ldquoRole of machinelearning and data mining in internet security standing statewith future directionsrdquo Journal of Computer Networks andCommunications vol 2018 Article ID 6383145 10 pages2018

[3] T Bakhshi and B Ghita ldquoOn internet traffic classification atwo-phasedmachine learning approachrdquo Journal of ComputerNetworks and Communications vol 2016 Article ID 204830221 pages 2016

[4] B Luo and J Xia ldquoA novel intrusion detection system basedon feature generation with visualization strategyrdquo ExpertSystems with Applications vol 41 no 9 pp 4139ndash4147 2014

[5] W-C Lin S-W Ke and C-F Tsai ldquoCANN an intrusiondetection system based on combining cluster centers andnearest neighborsrdquo Knowledge-Based Systems vol 78pp 13ndash21 2015

[6] C-F Tsai and C-Y Lin ldquoA triangle area based nearestneighbors approach to intrusion detectionrdquo Pattern Recog-nition vol 43 no 1 pp 222ndash229 2010

[7] F Kuang S Zhang Z Jin and W Xu ldquoA novel SVM bycombining kernel principal component analysis and im-proved chaotic particle swarm optimization for intrusiondetectionrdquo Soft Computing vol 19 no 5 pp 1187ndash1199 2015

[8] M E Aminanto and K Kim ldquoDetecting impersonation attackin WiFi networks using deep learning approachrdquo in In-formation Security Applications D Choi and S Guilley EdsVol 10144 Springer Berlin Germany 2016

[9] A A Aburomman and M BI Reaz ldquoA novel SVM-kNN-PSO ensemble method for intrusion detection systemrdquo Ap-plied Soft Computing vol 38 pp 360ndash372 2016

[10] M M Breunig H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of the2000 ACM SIGMOD International Conference on Manage-ment of Data pp 93ndash104 Dallas TX USA May 2000

[11] V Hautamaki I Karkkainen and P Franti ldquoOutlier detectionusing k-nearest neighbour graphrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition pp 430ndash433Cambridge MA USA August 2004

[12] V ang and F F Pashchenko ldquoA new incremental semi-supervised graph based clusteringrdquo in Proceedings of the IEEEInternational Conference on Engineering andTelecommunication Moscow Russia March 2018

[13] V ang D V Pantiukhin and A N Nazarov ldquoFLDS fastoutlier detection based on local density scorerdquo in Proceedingsof the IEEE International Conference on Engineering andTelecommunication pp 137ndash141 Moscow Russia November2016

[14] C Kolias G Kambourakis A Stavrou and S GritzalisldquoIntrusion detection in 80211 networks empirical evaluationof threats and a public datasetrdquo IEEE Communications Sur-veys amp Tutorials vol 18 no 1 pp 184ndash208 2016

[15] Ch Gupta and R Grossman ldquoGenIca single pass generalizedincremental algorithm for clusteringrdquo in Proceedings of theFourth SIAM International Conference on Data Miningpp 147ndash153 Lake Buena Vista FL USA April 2004

[16] M Ester H-P Kriegel J Sander M Wimmer and X XuldquoIncremental clustering for mining in a data warehousingenvironmentrdquo in Proceedings of the International Conferenceon Very Large Data Bases pp 323ndash333 New York NY USAAugust 1998

[17] V Chandrasekhar C Tan M Wu L Li X Li and J-H LimldquoIncremental graph clustering for efficient retrieval fromstreaming egocentric video datardquo in Proceedings of the In-ternational Conference on Pattern Recognition pp 2631ndash2636Stockholm Sweden August 2014

[18] A M Bagirov J Ugon and D Webb ldquoFast modified globalK-means algorithm for incremental cluster constructionrdquoPattern Recognition vol 44 no 4 pp 866ndash876 2011

[19] A Bryant D E Tamir N D Rishe and K AbrahamldquoDynamic incremental fuzzy C-means clusteringrdquo in Pro-ceedings of the Sixth International Conference on PervasivePatterns and Applications Venice Italy May 2014

[20] Z Yu P Luo J You et al ldquoIncremental semi-supervisedclustering ensemble for high dimensional data clusteringrdquoIEEE Transactions on Knowledge and Data Engineeringvol 28 no 3 pp 701ndash714 2016

[21] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 3pp 1ndash58 2009

[22] J Tang Z Chen A Fu and D Cheung ldquoEnhancing effec-tiveness of outlier detections for low density patternsrdquo inProceedings of the Sixth Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD) Taipei Taiwan May2002

[23] S Papadimitriou H Kitagawa P B Gibbons andC Faloutsos ldquoLoci fast outlier detection using the localcorrelation integralrdquo in Proceedings of the 19th InternationalConference on Data Engineering pp 315ndash326 BangaloreIndia March 2003

[24] M Ester H-P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databaseswith noiserdquo in Proceedings of the Conference on KnowledgeDiscovery and Data Mining (KDD) pp 226ndash231 PortlandOR USA August 1996

[25] L Ertoz M Steinbach and V Kumar ldquoFinding clusters ofdifferent sizes shapes and densities in noisy high di-mensional datardquo in Proceedings of the SIAM InternationalConference on Data Mining pp 47ndash58 San Francisco CAUSA May 2003

[26] E M Knorr R T Ng and V Tucakov ldquoDistance-basedoutliers algorithms and applicationsrdquo e InternationalJournal on Very Large Data Bases vol 8 no 3-4 pp 237ndash2532000

[27] S Basu I Davidson and K L Wagstaff ldquoConstrainedclustering advances in algorithms theory and applicationsrdquoin Chapman and HallCRC Data Mining and KnowledgeDiscovery Series CRC Press Boca Raton FL USA 1stedition 2008

[28] A A Abin ldquoClustering with side information further effortsto improve efficiencyrdquo Pattern Recognition Letters vol 84pp 252ndash258 2016

[29] Y Shi C Otto and A K Jain ldquoFace clustering representationand pairwise constraintsrdquo IEEE Transactions on InformationForensics and Security vol 13 no 7 pp 1626ndash1640 2018

[30] A A Abin and B Hamid ldquoActive constrained fuzzy clus-tering a multiple kernels learning approachrdquo Pattern Rec-ognition vol 48 no 3 pp 953ndash967 2015

[31] S Xiong J Azimi and X Z Fern ldquoActive learning of con-straints for semi-supervised clusteringrdquo IEEE Transactions on

12 Journal of Computer Networks and Communications

Knowledge and Data Engineering vol 26 no 1 pp 43ndash542014

[32] K Wagstaff C Cardie S Rogers and S Schrodl ldquoCon-strained K-means clustering with background knowledgerdquo inProceedings of the International Conference on MachineLearning (ICML) pp 577ndash584 Williamstown MA USAJune 2001

[33] I Davidson and S S Ravi ldquoUsing instance-level constraints inagglomerative hierarchical clustering theoretical and em-pirical resultsrdquoDataMining and Knowledge Discovery vol 18no 2 pp 257ndash282 2009

[34] B Kulis S Basu I Dhillon and R Mooney ldquoSemi-supervisedgraph clustering a kernel approachrdquo Machine Learningvol 74 no 1 pp 1ndash22 2009

[35] V-V Vu ldquoAn efficient semi-supervised graph based clus-teringrdquo Intelligent Data Analysis vol 22 no 2 pp 297ndash3072018

[36] X Wang B Qian and I Davidson ldquoOn constrained spectralclustering and its applicationsrdquo Data Mining and KnowledgeDiscovery vol 28 no 1 pp 1ndash30 2014

[37] D Mavroeidis ldquoAccelerating spectral clustering with partialsupervisionrdquo Data Mining and Knowledge Discovery vol 21no 2 pp 241ndash258 2010

[38] L Lelis and J Sander ldquoSemi-supervised density-based clus-teringrdquo in Proceeding of IEEE International Conference onData Mining pp 842ndash847 Miami FL USA December 2009

[39] D-D Le and S Satoh ldquoUnsupervised face annotation bymining the webrdquo in Proceedings of the 8th IEEE InternationalConference on Data Mining pp 383ndash392 Pisa Italy De-cember 2008

[40] D Yan L Huang and M I Jordan ldquoFast approximatespectral clusteringrdquo in Proceedings of the Conference onKnowledge Discovery and Data Mining (KDD) pp 907ndash916New York NY USA June 2009

[41] C Zhong M Malinen D Miao and P Franti ldquoA fastminimum spanning tree algorithm based on K-meansrdquo In-formation Sciences vol 295 pp 1ndash17 2015

[42] V Chaoji M Al Hasan S Salem andM J Zaki ldquoSPARCL aneffective and efficient algorithm for mining arbitrary shape-based clustersrdquo Knowledge and Information Systems vol 21no 2 pp 201ndash229 2009

[43] A Asuncion and D J Newman UCI Machine LearningRepository American Statistical Association Boston MAUSA 2015 httparchiveicsuciedumlindexphp

[44] G Karypis E-H Han and V Kumar ldquoChameleon hierar-chical clustering using dynamic modelingrdquo Computer vol 32no 8 pp 68ndash75 1999

Journal of Computer Networks and Communications 13

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

is assumption means that the IncrementalDBSCANcannot work well with the data set having dierent den-sities In contrary to IncrementalDBSCAN the algorithmIncrementalSSGC does not depend on the density of thedata because the similarity measure used is based on sharednearest neighbors

421 Running Time Comparison Figure 5 presents therunning time for IncrementalSSGC and Incre-mentalDBSCAN for three AWID data sets We can see therunning time of both algorithms is similar It can beexplained by the fact that both algorithms use k-nearestneighbor to nd clusters for each step of incremental Wealso present the running time of the SSGC algorithm forreference purpose From this experiment we can see ad-vantages of the incremental clustering algorithms

43eResults of FLDSandLOF Table 3 presents the resultsobtained by FLDS and LOF for 5 AWID data sets We cansee that the results of FLDS are comparable with the al-gorithm LOF e parameters used for both methods areshown in Table 4 For some 2D data sets Figure 6 presentsthe results obtained by FLDS and LOF Intuitively theoutliers detected by both methods are mostly similar Wecan explain the results by the fact that the strategy forevaluating a point is outlier or not based on local densityscore

Figures 7 and 8 illustrate the running time comparisonbetween FLDS and LOF With 4 data sets mentioned aboveit can be seen from the gure that the calculation time ofFLDS is about 12 times faster than the running time of LOFis is the signicant improvement compared with LOF Itcan be explained by the fact that the complexity of FLDS isjust O(n15) compared with O(n2) of LOF

Input a data set X with n points the number of nearest neighbors k number of clusters nc thetaOutput outliers of XProcess

(1) Using K-means to split X into nc clusters(2) Using LDS algorithm on each separate cluster to obtain local outliers (using the threshold theta)(3) e local outliers obtained in Step 2 will be recalculated LDSrsquos value across the data set

ALGORITHM 4 Algorithm FLDS

Table 1 Main characteristics for clustering evaluation

ID Data Normal + impers Attributes Clusters1 Iris 150 4 33 Wine 178 13 32 E coli 336 8 84 Breast 569 30 25 Yeast 1484 8 106 AWID1 5000 35 27 AWID2 8000 35 28 AWID3 12000 35 2

Table 2 Main characteristics for FLDS and LOF

ID Data Objects Categories1 O-AWID1 3030 Impers currenooding injections3 O-AWID2 5030 Impers normal currenooding2 O-AWID3 7040 Flooding normal impers4 O-AWID4 10040 Normal impers injection currenooding5 O-AWID5 15050 Normal currenooding injection and impers

IncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Rand

Inde

x

2 3 4 5 6 7 81

Figure 4 Clustering results obtained by IncrementalDBSCAN andIncrementalSSGC for 8 data set of Table 1 respectively

8 Journal of Computer Networks and Communications

44 A Framework for Intrusion Detection in 80211 NetworksIn this section we propose a multistage system-basedmachine learning techniques applied for the AWDI dataset e detail of our system is presented in Figure 9 reecomponents are used for intrusion detection task a su-pervised learning model (J48 Bayes random forest sup-port vector machine neural network etc) trained bylabeled data set and this model can be seen as misusedetection component an outlier detection method (LOFFLDS etc) is optionally used to detect new attacks in someperiods of time additionally for the AWID data sets aspresented above it is very disectcult to detect impersonationattacks so we use an Incremental clustering algorithm(IncrementalDBSCAN IncrementalSSGC etc) for furthernding this kind of attack

In this experiment we use J48 for the misuse detectionprocess and IncrementalSSGC for the detecting impersonation

attacks In the outliers detection step we propose to use FLDSor LOF and the results have been presented in the subsectionabove Because the outliers detection step can be realized oumlinefor some periods of time we just show the results obtained bycombining J48 and IncrementalSSGCe confusionmatrix of

AWID1 AWID2 AWID3

SSGCIncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Tim

e (m

inut

es)

Figure 5 Running time comparison between IncrementalSSGCand IncrementalDBSCAN

Table 4 e parameters used in data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS (k ncθ)

(25 306)

(25 306)

(25 306)

(25 456)

(25 456)

LOF (MinPtsη) (27 12) (27 12) (25 12) (25 12) (27 12)

k the number of nearest neighbors nc number of cluster used θ thethreshold

Table 3 e HTER measure of LOF and FLDS (the smaller thebetter) for some extracted AWID data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS 013 012 010 011 006LOF 023 011 011 009 009

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(a)

Figure 6 Continued

Journal of Computer Networks and Communications 9

these results is illustrated in Table 5e total accuracy obtained989 compared with 9626 in the paper [14] We can ex-plain the results obtained by IncrementalSSGC by the fact thatthe algorithm used the distance based on shared nearest

neighbors which overcome the limit of transitional distancemeasures such as Euclidean or Minskovki distance and theshared nearest neighbors measure does not depend on thedensity of datais proposed system is generally called hybridmethod which is one of the best strategies in developing In-trusion Detection Systems [7 9] in which there is no singleclassier that can exactly detect all kinds of classes

We also note that for real applications whenever anattack appears the system needs to immediately produce awarning e multistage system-based machine learningtechniques provide a solution for users for constructing thereal IDSIPS system that is one of the most importantproblems in the network security

5 Conclusion

is paper introduces an incremental semisupervisedgraph-based clustering and a fast outlier detection

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(b)

Figure 6 Results of LOF (a) and FLDS (b) on some 2D data setsthe outliers marked as red plus

LOFFLDS

0

20

40

60

80

Tim

e (m

inut

es)

2 3 41

Figure 7 Running time comparison between FLDS and LOF forfour 2D data sets

LOFFLDS

0

50

100

150

200

250

300Ti

me (

min

utes

)

2 3 4 51

Figure 8 Running time comparison between FLDS and LOF forve AWID data sets

10 Journal of Computer Networks and Communications

method Both methods can be used in a hybrid frameworkfor the intrusion detection problem of WiFi data sets(AWID) Our proposed multistage system-based machinelearning techniques provide a solution to guideline forconstructing the real IDSIPS system that is one of themost important problems in the network security Ex-periments conducted on the extracted data sets from theAWID and UCI show the eectiveness of our proposedmethods In the near future we will continue to

develop other kinds of machine learning methods forintrusion detection problem and test for other experi-mental setup

Data Availability

e data used to support the ndings of this study can bedownloaded from the AWID repository (httpicsdwebaegeangrawiddownloadhtml)

InternetAccess point

Data capture

Load data Data preprocessing

Training by J48

Block attacks

Block attacksldquoimpersonationrdquo

Skip normal traffics

Classifier (J48)

Incremental clustering model

(ISSGC)

Anomaly detection model

(FLDS)

Marking a new attack by a

security expert

Database of marked attacks

Data training

Setting parameters of incremental

clustering

Intrusion detection system for local wireless network based on machine learning methods

Attacks

Impersonationattacks

Normal traffics

Normaltraffics

Outliers

Marked attacks

Figure 9 A new framework for intrusion detection in 80211 networks

Table 5 Confusion matrix for AWID data set using J48 and IncrementalSSGC in proposed framework

Normal Flooding Impersonation Injection Classication530588 116 6 75 Normal2553 5544 0 0 Flooding2 0 16680 0 Injection3297 148 0 16364 Impersonation

Journal of Computer Networks and Communications 11

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] K P Murphy Machine Learning A Probabilistic PerspectiveMIT press Cambridge MA USA 2012

[2] B Ahmad W Jian and Z Anwar Ali ldquoRole of machinelearning and data mining in internet security standing statewith future directionsrdquo Journal of Computer Networks andCommunications vol 2018 Article ID 6383145 10 pages2018

[3] T Bakhshi and B Ghita ldquoOn internet traffic classification atwo-phasedmachine learning approachrdquo Journal of ComputerNetworks and Communications vol 2016 Article ID 204830221 pages 2016

[4] B Luo and J Xia ldquoA novel intrusion detection system basedon feature generation with visualization strategyrdquo ExpertSystems with Applications vol 41 no 9 pp 4139ndash4147 2014

[5] W-C Lin S-W Ke and C-F Tsai ldquoCANN an intrusiondetection system based on combining cluster centers andnearest neighborsrdquo Knowledge-Based Systems vol 78pp 13ndash21 2015

[6] C-F Tsai and C-Y Lin ldquoA triangle area based nearestneighbors approach to intrusion detectionrdquo Pattern Recog-nition vol 43 no 1 pp 222ndash229 2010

[7] F Kuang S Zhang Z Jin and W Xu ldquoA novel SVM bycombining kernel principal component analysis and im-proved chaotic particle swarm optimization for intrusiondetectionrdquo Soft Computing vol 19 no 5 pp 1187ndash1199 2015

[8] M E Aminanto and K Kim ldquoDetecting impersonation attackin WiFi networks using deep learning approachrdquo in In-formation Security Applications D Choi and S Guilley EdsVol 10144 Springer Berlin Germany 2016

[9] A A Aburomman and M BI Reaz ldquoA novel SVM-kNN-PSO ensemble method for intrusion detection systemrdquo Ap-plied Soft Computing vol 38 pp 360ndash372 2016

[10] M M Breunig H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of the2000 ACM SIGMOD International Conference on Manage-ment of Data pp 93ndash104 Dallas TX USA May 2000

[11] V Hautamaki I Karkkainen and P Franti ldquoOutlier detectionusing k-nearest neighbour graphrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition pp 430ndash433Cambridge MA USA August 2004

[12] V ang and F F Pashchenko ldquoA new incremental semi-supervised graph based clusteringrdquo in Proceedings of the IEEEInternational Conference on Engineering andTelecommunication Moscow Russia March 2018

[13] V ang D V Pantiukhin and A N Nazarov ldquoFLDS fastoutlier detection based on local density scorerdquo in Proceedingsof the IEEE International Conference on Engineering andTelecommunication pp 137ndash141 Moscow Russia November2016

[14] C Kolias G Kambourakis A Stavrou and S GritzalisldquoIntrusion detection in 80211 networks empirical evaluationof threats and a public datasetrdquo IEEE Communications Sur-veys amp Tutorials vol 18 no 1 pp 184ndash208 2016

[15] Ch Gupta and R Grossman ldquoGenIca single pass generalizedincremental algorithm for clusteringrdquo in Proceedings of theFourth SIAM International Conference on Data Miningpp 147ndash153 Lake Buena Vista FL USA April 2004

[16] M Ester H-P Kriegel J Sander M Wimmer and X XuldquoIncremental clustering for mining in a data warehousingenvironmentrdquo in Proceedings of the International Conferenceon Very Large Data Bases pp 323ndash333 New York NY USAAugust 1998

[17] V Chandrasekhar C Tan M Wu L Li X Li and J-H LimldquoIncremental graph clustering for efficient retrieval fromstreaming egocentric video datardquo in Proceedings of the In-ternational Conference on Pattern Recognition pp 2631ndash2636Stockholm Sweden August 2014

[18] A M Bagirov J Ugon and D Webb ldquoFast modified globalK-means algorithm for incremental cluster constructionrdquoPattern Recognition vol 44 no 4 pp 866ndash876 2011

[19] A Bryant D E Tamir N D Rishe and K AbrahamldquoDynamic incremental fuzzy C-means clusteringrdquo in Pro-ceedings of the Sixth International Conference on PervasivePatterns and Applications Venice Italy May 2014

[20] Z Yu P Luo J You et al ldquoIncremental semi-supervisedclustering ensemble for high dimensional data clusteringrdquoIEEE Transactions on Knowledge and Data Engineeringvol 28 no 3 pp 701ndash714 2016

[21] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 3pp 1ndash58 2009

[22] J Tang Z Chen A Fu and D Cheung ldquoEnhancing effec-tiveness of outlier detections for low density patternsrdquo inProceedings of the Sixth Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD) Taipei Taiwan May2002

[23] S Papadimitriou H Kitagawa P B Gibbons andC Faloutsos ldquoLoci fast outlier detection using the localcorrelation integralrdquo in Proceedings of the 19th InternationalConference on Data Engineering pp 315ndash326 BangaloreIndia March 2003

[24] M Ester H-P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databaseswith noiserdquo in Proceedings of the Conference on KnowledgeDiscovery and Data Mining (KDD) pp 226ndash231 PortlandOR USA August 1996

[25] L Ertoz M Steinbach and V Kumar ldquoFinding clusters ofdifferent sizes shapes and densities in noisy high di-mensional datardquo in Proceedings of the SIAM InternationalConference on Data Mining pp 47ndash58 San Francisco CAUSA May 2003

[26] E M Knorr R T Ng and V Tucakov ldquoDistance-basedoutliers algorithms and applicationsrdquo e InternationalJournal on Very Large Data Bases vol 8 no 3-4 pp 237ndash2532000

[27] S Basu I Davidson and K L Wagstaff ldquoConstrainedclustering advances in algorithms theory and applicationsrdquoin Chapman and HallCRC Data Mining and KnowledgeDiscovery Series CRC Press Boca Raton FL USA 1stedition 2008

[28] A A Abin ldquoClustering with side information further effortsto improve efficiencyrdquo Pattern Recognition Letters vol 84pp 252ndash258 2016

[29] Y Shi C Otto and A K Jain ldquoFace clustering representationand pairwise constraintsrdquo IEEE Transactions on InformationForensics and Security vol 13 no 7 pp 1626ndash1640 2018

[30] A A Abin and B Hamid ldquoActive constrained fuzzy clus-tering a multiple kernels learning approachrdquo Pattern Rec-ognition vol 48 no 3 pp 953ndash967 2015

[31] S Xiong J Azimi and X Z Fern ldquoActive learning of con-straints for semi-supervised clusteringrdquo IEEE Transactions on

12 Journal of Computer Networks and Communications

Knowledge and Data Engineering vol 26 no 1 pp 43ndash542014

[32] K Wagstaff C Cardie S Rogers and S Schrodl ldquoCon-strained K-means clustering with background knowledgerdquo inProceedings of the International Conference on MachineLearning (ICML) pp 577ndash584 Williamstown MA USAJune 2001

[33] I Davidson and S S Ravi ldquoUsing instance-level constraints inagglomerative hierarchical clustering theoretical and em-pirical resultsrdquoDataMining and Knowledge Discovery vol 18no 2 pp 257ndash282 2009

[34] B Kulis S Basu I Dhillon and R Mooney ldquoSemi-supervisedgraph clustering a kernel approachrdquo Machine Learningvol 74 no 1 pp 1ndash22 2009

[35] V-V Vu ldquoAn efficient semi-supervised graph based clus-teringrdquo Intelligent Data Analysis vol 22 no 2 pp 297ndash3072018

[36] X Wang B Qian and I Davidson ldquoOn constrained spectralclustering and its applicationsrdquo Data Mining and KnowledgeDiscovery vol 28 no 1 pp 1ndash30 2014

[37] D Mavroeidis ldquoAccelerating spectral clustering with partialsupervisionrdquo Data Mining and Knowledge Discovery vol 21no 2 pp 241ndash258 2010

[38] L Lelis and J Sander ldquoSemi-supervised density-based clus-teringrdquo in Proceeding of IEEE International Conference onData Mining pp 842ndash847 Miami FL USA December 2009

[39] D-D Le and S Satoh ldquoUnsupervised face annotation bymining the webrdquo in Proceedings of the 8th IEEE InternationalConference on Data Mining pp 383ndash392 Pisa Italy De-cember 2008

[40] D Yan L Huang and M I Jordan ldquoFast approximatespectral clusteringrdquo in Proceedings of the Conference onKnowledge Discovery and Data Mining (KDD) pp 907ndash916New York NY USA June 2009

[41] C Zhong M Malinen D Miao and P Franti ldquoA fastminimum spanning tree algorithm based on K-meansrdquo In-formation Sciences vol 295 pp 1ndash17 2015

[42] V Chaoji M Al Hasan S Salem andM J Zaki ldquoSPARCL aneffective and efficient algorithm for mining arbitrary shape-based clustersrdquo Knowledge and Information Systems vol 21no 2 pp 201ndash229 2009

[43] A Asuncion and D J Newman UCI Machine LearningRepository American Statistical Association Boston MAUSA 2015 httparchiveicsuciedumlindexphp

[44] G Karypis E-H Han and V Kumar ldquoChameleon hierar-chical clustering using dynamic modelingrdquo Computer vol 32no 8 pp 68ndash75 1999

Journal of Computer Networks and Communications 13

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

44 A Framework for Intrusion Detection in 80211 NetworksIn this section we propose a multistage system-basedmachine learning techniques applied for the AWDI dataset e detail of our system is presented in Figure 9 reecomponents are used for intrusion detection task a su-pervised learning model (J48 Bayes random forest sup-port vector machine neural network etc) trained bylabeled data set and this model can be seen as misusedetection component an outlier detection method (LOFFLDS etc) is optionally used to detect new attacks in someperiods of time additionally for the AWID data sets aspresented above it is very disectcult to detect impersonationattacks so we use an Incremental clustering algorithm(IncrementalDBSCAN IncrementalSSGC etc) for furthernding this kind of attack

In this experiment we use J48 for the misuse detectionprocess and IncrementalSSGC for the detecting impersonation

attacks In the outliers detection step we propose to use FLDSor LOF and the results have been presented in the subsectionabove Because the outliers detection step can be realized oumlinefor some periods of time we just show the results obtained bycombining J48 and IncrementalSSGCe confusionmatrix of

AWID1 AWID2 AWID3

SSGCIncrementalDBSCANIncrementalSSGC

0

20

40

60

80

100

Tim

e (m

inut

es)

Figure 5 Running time comparison between IncrementalSSGCand IncrementalDBSCAN

Table 4 e parameters used in data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS (k ncθ)

(25 306)

(25 306)

(25 306)

(25 456)

(25 456)

LOF (MinPtsη) (27 12) (27 12) (25 12) (25 12) (27 12)

k the number of nearest neighbors nc number of cluster used θ thethreshold

Table 3 e HTER measure of LOF and FLDS (the smaller thebetter) for some extracted AWID data sets

Methods O-AWID1

O-AWID2

O-AWID3

O-AWID4

O-AWID5

FLDS 013 012 010 011 006LOF 023 011 011 009 009

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(a)

Figure 6 Continued

Journal of Computer Networks and Communications 9

these results is illustrated in Table 5e total accuracy obtained989 compared with 9626 in the paper [14] We can ex-plain the results obtained by IncrementalSSGC by the fact thatthe algorithm used the distance based on shared nearest

neighbors which overcome the limit of transitional distancemeasures such as Euclidean or Minskovki distance and theshared nearest neighbors measure does not depend on thedensity of datais proposed system is generally called hybridmethod which is one of the best strategies in developing In-trusion Detection Systems [7 9] in which there is no singleclassier that can exactly detect all kinds of classes

We also note that for real applications whenever anattack appears the system needs to immediately produce awarning e multistage system-based machine learningtechniques provide a solution for users for constructing thereal IDSIPS system that is one of the most importantproblems in the network security

5 Conclusion

is paper introduces an incremental semisupervisedgraph-based clustering and a fast outlier detection

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(b)

Figure 6 Results of LOF (a) and FLDS (b) on some 2D data setsthe outliers marked as red plus

LOFFLDS

0

20

40

60

80

Tim

e (m

inut

es)

2 3 41

Figure 7 Running time comparison between FLDS and LOF forfour 2D data sets

LOFFLDS

0

50

100

150

200

250

300Ti

me (

min

utes

)

2 3 4 51

Figure 8 Running time comparison between FLDS and LOF forve AWID data sets

10 Journal of Computer Networks and Communications

method Both methods can be used in a hybrid frameworkfor the intrusion detection problem of WiFi data sets(AWID) Our proposed multistage system-based machinelearning techniques provide a solution to guideline forconstructing the real IDSIPS system that is one of themost important problems in the network security Ex-periments conducted on the extracted data sets from theAWID and UCI show the eectiveness of our proposedmethods In the near future we will continue to

develop other kinds of machine learning methods forintrusion detection problem and test for other experi-mental setup

Data Availability

e data used to support the ndings of this study can bedownloaded from the AWID repository (httpicsdwebaegeangrawiddownloadhtml)

InternetAccess point

Data capture

Load data Data preprocessing

Training by J48

Block attacks

Block attacksldquoimpersonationrdquo

Skip normal traffics

Classifier (J48)

Incremental clustering model

(ISSGC)

Anomaly detection model

(FLDS)

Marking a new attack by a

security expert

Database of marked attacks

Data training

Setting parameters of incremental

clustering

Intrusion detection system for local wireless network based on machine learning methods

Attacks

Impersonationattacks

Normal traffics

Normaltraffics

Outliers

Marked attacks

Figure 9 A new framework for intrusion detection in 80211 networks

Table 5 Confusion matrix for AWID data set using J48 and IncrementalSSGC in proposed framework

Normal Flooding Impersonation Injection Classication530588 116 6 75 Normal2553 5544 0 0 Flooding2 0 16680 0 Injection3297 148 0 16364 Impersonation

Journal of Computer Networks and Communications 11

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] K P Murphy Machine Learning A Probabilistic PerspectiveMIT press Cambridge MA USA 2012

[2] B Ahmad W Jian and Z Anwar Ali ldquoRole of machinelearning and data mining in internet security standing statewith future directionsrdquo Journal of Computer Networks andCommunications vol 2018 Article ID 6383145 10 pages2018

[3] T Bakhshi and B Ghita ldquoOn internet traffic classification atwo-phasedmachine learning approachrdquo Journal of ComputerNetworks and Communications vol 2016 Article ID 204830221 pages 2016

[4] B Luo and J Xia ldquoA novel intrusion detection system basedon feature generation with visualization strategyrdquo ExpertSystems with Applications vol 41 no 9 pp 4139ndash4147 2014

[5] W-C Lin S-W Ke and C-F Tsai ldquoCANN an intrusiondetection system based on combining cluster centers andnearest neighborsrdquo Knowledge-Based Systems vol 78pp 13ndash21 2015

[6] C-F Tsai and C-Y Lin ldquoA triangle area based nearestneighbors approach to intrusion detectionrdquo Pattern Recog-nition vol 43 no 1 pp 222ndash229 2010

[7] F Kuang S Zhang Z Jin and W Xu ldquoA novel SVM bycombining kernel principal component analysis and im-proved chaotic particle swarm optimization for intrusiondetectionrdquo Soft Computing vol 19 no 5 pp 1187ndash1199 2015

[8] M E Aminanto and K Kim ldquoDetecting impersonation attackin WiFi networks using deep learning approachrdquo in In-formation Security Applications D Choi and S Guilley EdsVol 10144 Springer Berlin Germany 2016

[9] A A Aburomman and M BI Reaz ldquoA novel SVM-kNN-PSO ensemble method for intrusion detection systemrdquo Ap-plied Soft Computing vol 38 pp 360ndash372 2016

[10] M M Breunig H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of the2000 ACM SIGMOD International Conference on Manage-ment of Data pp 93ndash104 Dallas TX USA May 2000

[11] V Hautamaki I Karkkainen and P Franti ldquoOutlier detectionusing k-nearest neighbour graphrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition pp 430ndash433Cambridge MA USA August 2004

[12] V ang and F F Pashchenko ldquoA new incremental semi-supervised graph based clusteringrdquo in Proceedings of the IEEEInternational Conference on Engineering andTelecommunication Moscow Russia March 2018

[13] V ang D V Pantiukhin and A N Nazarov ldquoFLDS fastoutlier detection based on local density scorerdquo in Proceedingsof the IEEE International Conference on Engineering andTelecommunication pp 137ndash141 Moscow Russia November2016

[14] C Kolias G Kambourakis A Stavrou and S GritzalisldquoIntrusion detection in 80211 networks empirical evaluationof threats and a public datasetrdquo IEEE Communications Sur-veys amp Tutorials vol 18 no 1 pp 184ndash208 2016

[15] Ch Gupta and R Grossman ldquoGenIca single pass generalizedincremental algorithm for clusteringrdquo in Proceedings of theFourth SIAM International Conference on Data Miningpp 147ndash153 Lake Buena Vista FL USA April 2004

[16] M Ester H-P Kriegel J Sander M Wimmer and X XuldquoIncremental clustering for mining in a data warehousingenvironmentrdquo in Proceedings of the International Conferenceon Very Large Data Bases pp 323ndash333 New York NY USAAugust 1998

[17] V Chandrasekhar C Tan M Wu L Li X Li and J-H LimldquoIncremental graph clustering for efficient retrieval fromstreaming egocentric video datardquo in Proceedings of the In-ternational Conference on Pattern Recognition pp 2631ndash2636Stockholm Sweden August 2014

[18] A M Bagirov J Ugon and D Webb ldquoFast modified globalK-means algorithm for incremental cluster constructionrdquoPattern Recognition vol 44 no 4 pp 866ndash876 2011

[19] A Bryant D E Tamir N D Rishe and K AbrahamldquoDynamic incremental fuzzy C-means clusteringrdquo in Pro-ceedings of the Sixth International Conference on PervasivePatterns and Applications Venice Italy May 2014

[20] Z Yu P Luo J You et al ldquoIncremental semi-supervisedclustering ensemble for high dimensional data clusteringrdquoIEEE Transactions on Knowledge and Data Engineeringvol 28 no 3 pp 701ndash714 2016

[21] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 3pp 1ndash58 2009

[22] J Tang Z Chen A Fu and D Cheung ldquoEnhancing effec-tiveness of outlier detections for low density patternsrdquo inProceedings of the Sixth Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD) Taipei Taiwan May2002

[23] S Papadimitriou H Kitagawa P B Gibbons andC Faloutsos ldquoLoci fast outlier detection using the localcorrelation integralrdquo in Proceedings of the 19th InternationalConference on Data Engineering pp 315ndash326 BangaloreIndia March 2003

[24] M Ester H-P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databaseswith noiserdquo in Proceedings of the Conference on KnowledgeDiscovery and Data Mining (KDD) pp 226ndash231 PortlandOR USA August 1996

[25] L Ertoz M Steinbach and V Kumar ldquoFinding clusters ofdifferent sizes shapes and densities in noisy high di-mensional datardquo in Proceedings of the SIAM InternationalConference on Data Mining pp 47ndash58 San Francisco CAUSA May 2003

[26] E M Knorr R T Ng and V Tucakov ldquoDistance-basedoutliers algorithms and applicationsrdquo e InternationalJournal on Very Large Data Bases vol 8 no 3-4 pp 237ndash2532000

[27] S Basu I Davidson and K L Wagstaff ldquoConstrainedclustering advances in algorithms theory and applicationsrdquoin Chapman and HallCRC Data Mining and KnowledgeDiscovery Series CRC Press Boca Raton FL USA 1stedition 2008

[28] A A Abin ldquoClustering with side information further effortsto improve efficiencyrdquo Pattern Recognition Letters vol 84pp 252ndash258 2016

[29] Y Shi C Otto and A K Jain ldquoFace clustering representationand pairwise constraintsrdquo IEEE Transactions on InformationForensics and Security vol 13 no 7 pp 1626ndash1640 2018

[30] A A Abin and B Hamid ldquoActive constrained fuzzy clus-tering a multiple kernels learning approachrdquo Pattern Rec-ognition vol 48 no 3 pp 953ndash967 2015

[31] S Xiong J Azimi and X Z Fern ldquoActive learning of con-straints for semi-supervised clusteringrdquo IEEE Transactions on

12 Journal of Computer Networks and Communications

Knowledge and Data Engineering vol 26 no 1 pp 43ndash542014

[32] K Wagstaff C Cardie S Rogers and S Schrodl ldquoCon-strained K-means clustering with background knowledgerdquo inProceedings of the International Conference on MachineLearning (ICML) pp 577ndash584 Williamstown MA USAJune 2001

[33] I Davidson and S S Ravi ldquoUsing instance-level constraints inagglomerative hierarchical clustering theoretical and em-pirical resultsrdquoDataMining and Knowledge Discovery vol 18no 2 pp 257ndash282 2009

[34] B Kulis S Basu I Dhillon and R Mooney ldquoSemi-supervisedgraph clustering a kernel approachrdquo Machine Learningvol 74 no 1 pp 1ndash22 2009

[35] V-V Vu ldquoAn efficient semi-supervised graph based clus-teringrdquo Intelligent Data Analysis vol 22 no 2 pp 297ndash3072018

[36] X Wang B Qian and I Davidson ldquoOn constrained spectralclustering and its applicationsrdquo Data Mining and KnowledgeDiscovery vol 28 no 1 pp 1ndash30 2014

[37] D Mavroeidis ldquoAccelerating spectral clustering with partialsupervisionrdquo Data Mining and Knowledge Discovery vol 21no 2 pp 241ndash258 2010

[38] L Lelis and J Sander ldquoSemi-supervised density-based clus-teringrdquo in Proceeding of IEEE International Conference onData Mining pp 842ndash847 Miami FL USA December 2009

[39] D-D Le and S Satoh ldquoUnsupervised face annotation bymining the webrdquo in Proceedings of the 8th IEEE InternationalConference on Data Mining pp 383ndash392 Pisa Italy De-cember 2008

[40] D Yan L Huang and M I Jordan ldquoFast approximatespectral clusteringrdquo in Proceedings of the Conference onKnowledge Discovery and Data Mining (KDD) pp 907ndash916New York NY USA June 2009

[41] C Zhong M Malinen D Miao and P Franti ldquoA fastminimum spanning tree algorithm based on K-meansrdquo In-formation Sciences vol 295 pp 1ndash17 2015

[42] V Chaoji M Al Hasan S Salem andM J Zaki ldquoSPARCL aneffective and efficient algorithm for mining arbitrary shape-based clustersrdquo Knowledge and Information Systems vol 21no 2 pp 201ndash229 2009

[43] A Asuncion and D J Newman UCI Machine LearningRepository American Statistical Association Boston MAUSA 2015 httparchiveicsuciedumlindexphp

[44] G Karypis E-H Han and V Kumar ldquoChameleon hierar-chical clustering using dynamic modelingrdquo Computer vol 32no 8 pp 68ndash75 1999

Journal of Computer Networks and Communications 13

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

these results is illustrated in Table 5e total accuracy obtained989 compared with 9626 in the paper [14] We can ex-plain the results obtained by IncrementalSSGC by the fact thatthe algorithm used the distance based on shared nearest

neighbors which overcome the limit of transitional distancemeasures such as Euclidean or Minskovki distance and theshared nearest neighbors measure does not depend on thedensity of datais proposed system is generally called hybridmethod which is one of the best strategies in developing In-trusion Detection Systems [7 9] in which there is no singleclassier that can exactly detect all kinds of classes

We also note that for real applications whenever anattack appears the system needs to immediately produce awarning e multistage system-based machine learningtechniques provide a solution for users for constructing thereal IDSIPS system that is one of the most importantproblems in the network security

5 Conclusion

is paper introduces an incremental semisupervisedgraph-based clustering and a fast outlier detection

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

400

450

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

50

100

150

200

250

300

350

50 100 150 200 250 300 350 400 450 500 550 600 650 7000

0

20

40

60

80

100

120

140

160

100 200 300 400 500 600 700 800 9000

(b)

Figure 6 Results of LOF (a) and FLDS (b) on some 2D data setsthe outliers marked as red plus

LOFFLDS

0

20

40

60

80

Tim

e (m

inut

es)

2 3 41

Figure 7 Running time comparison between FLDS and LOF forfour 2D data sets

LOFFLDS

0

50

100

150

200

250

300Ti

me (

min

utes

)

2 3 4 51

Figure 8 Running time comparison between FLDS and LOF forve AWID data sets

10 Journal of Computer Networks and Communications

method Both methods can be used in a hybrid frameworkfor the intrusion detection problem of WiFi data sets(AWID) Our proposed multistage system-based machinelearning techniques provide a solution to guideline forconstructing the real IDSIPS system that is one of themost important problems in the network security Ex-periments conducted on the extracted data sets from theAWID and UCI show the eectiveness of our proposedmethods In the near future we will continue to

develop other kinds of machine learning methods forintrusion detection problem and test for other experi-mental setup

Data Availability

e data used to support the ndings of this study can bedownloaded from the AWID repository (httpicsdwebaegeangrawiddownloadhtml)

InternetAccess point

Data capture

Load data Data preprocessing

Training by J48

Block attacks

Block attacksldquoimpersonationrdquo

Skip normal traffics

Classifier (J48)

Incremental clustering model

(ISSGC)

Anomaly detection model

(FLDS)

Marking a new attack by a

security expert

Database of marked attacks

Data training

Setting parameters of incremental

clustering

Intrusion detection system for local wireless network based on machine learning methods

Attacks

Impersonationattacks

Normal traffics

Normaltraffics

Outliers

Marked attacks

Figure 9 A new framework for intrusion detection in 80211 networks

Table 5 Confusion matrix for AWID data set using J48 and IncrementalSSGC in proposed framework

Normal Flooding Impersonation Injection Classication530588 116 6 75 Normal2553 5544 0 0 Flooding2 0 16680 0 Injection3297 148 0 16364 Impersonation

Journal of Computer Networks and Communications 11

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] K P Murphy Machine Learning A Probabilistic PerspectiveMIT press Cambridge MA USA 2012

[2] B Ahmad W Jian and Z Anwar Ali ldquoRole of machinelearning and data mining in internet security standing statewith future directionsrdquo Journal of Computer Networks andCommunications vol 2018 Article ID 6383145 10 pages2018

[3] T Bakhshi and B Ghita ldquoOn internet traffic classification atwo-phasedmachine learning approachrdquo Journal of ComputerNetworks and Communications vol 2016 Article ID 204830221 pages 2016

[4] B Luo and J Xia ldquoA novel intrusion detection system basedon feature generation with visualization strategyrdquo ExpertSystems with Applications vol 41 no 9 pp 4139ndash4147 2014

[5] W-C Lin S-W Ke and C-F Tsai ldquoCANN an intrusiondetection system based on combining cluster centers andnearest neighborsrdquo Knowledge-Based Systems vol 78pp 13ndash21 2015

[6] C-F Tsai and C-Y Lin ldquoA triangle area based nearestneighbors approach to intrusion detectionrdquo Pattern Recog-nition vol 43 no 1 pp 222ndash229 2010

[7] F Kuang S Zhang Z Jin and W Xu ldquoA novel SVM bycombining kernel principal component analysis and im-proved chaotic particle swarm optimization for intrusiondetectionrdquo Soft Computing vol 19 no 5 pp 1187ndash1199 2015

[8] M E Aminanto and K Kim ldquoDetecting impersonation attackin WiFi networks using deep learning approachrdquo in In-formation Security Applications D Choi and S Guilley EdsVol 10144 Springer Berlin Germany 2016

[9] A A Aburomman and M BI Reaz ldquoA novel SVM-kNN-PSO ensemble method for intrusion detection systemrdquo Ap-plied Soft Computing vol 38 pp 360ndash372 2016

[10] M M Breunig H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of the2000 ACM SIGMOD International Conference on Manage-ment of Data pp 93ndash104 Dallas TX USA May 2000

[11] V Hautamaki I Karkkainen and P Franti ldquoOutlier detectionusing k-nearest neighbour graphrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition pp 430ndash433Cambridge MA USA August 2004

[12] V ang and F F Pashchenko ldquoA new incremental semi-supervised graph based clusteringrdquo in Proceedings of the IEEEInternational Conference on Engineering andTelecommunication Moscow Russia March 2018

[13] V ang D V Pantiukhin and A N Nazarov ldquoFLDS fastoutlier detection based on local density scorerdquo in Proceedingsof the IEEE International Conference on Engineering andTelecommunication pp 137ndash141 Moscow Russia November2016

[14] C Kolias G Kambourakis A Stavrou and S GritzalisldquoIntrusion detection in 80211 networks empirical evaluationof threats and a public datasetrdquo IEEE Communications Sur-veys amp Tutorials vol 18 no 1 pp 184ndash208 2016

[15] Ch Gupta and R Grossman ldquoGenIca single pass generalizedincremental algorithm for clusteringrdquo in Proceedings of theFourth SIAM International Conference on Data Miningpp 147ndash153 Lake Buena Vista FL USA April 2004

[16] M Ester H-P Kriegel J Sander M Wimmer and X XuldquoIncremental clustering for mining in a data warehousingenvironmentrdquo in Proceedings of the International Conferenceon Very Large Data Bases pp 323ndash333 New York NY USAAugust 1998

[17] V Chandrasekhar C Tan M Wu L Li X Li and J-H LimldquoIncremental graph clustering for efficient retrieval fromstreaming egocentric video datardquo in Proceedings of the In-ternational Conference on Pattern Recognition pp 2631ndash2636Stockholm Sweden August 2014

[18] A M Bagirov J Ugon and D Webb ldquoFast modified globalK-means algorithm for incremental cluster constructionrdquoPattern Recognition vol 44 no 4 pp 866ndash876 2011

[19] A Bryant D E Tamir N D Rishe and K AbrahamldquoDynamic incremental fuzzy C-means clusteringrdquo in Pro-ceedings of the Sixth International Conference on PervasivePatterns and Applications Venice Italy May 2014

[20] Z Yu P Luo J You et al ldquoIncremental semi-supervisedclustering ensemble for high dimensional data clusteringrdquoIEEE Transactions on Knowledge and Data Engineeringvol 28 no 3 pp 701ndash714 2016

[21] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 3pp 1ndash58 2009

[22] J Tang Z Chen A Fu and D Cheung ldquoEnhancing effec-tiveness of outlier detections for low density patternsrdquo inProceedings of the Sixth Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD) Taipei Taiwan May2002

[23] S Papadimitriou H Kitagawa P B Gibbons andC Faloutsos ldquoLoci fast outlier detection using the localcorrelation integralrdquo in Proceedings of the 19th InternationalConference on Data Engineering pp 315ndash326 BangaloreIndia March 2003

[24] M Ester H-P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databaseswith noiserdquo in Proceedings of the Conference on KnowledgeDiscovery and Data Mining (KDD) pp 226ndash231 PortlandOR USA August 1996

[25] L Ertoz M Steinbach and V Kumar ldquoFinding clusters ofdifferent sizes shapes and densities in noisy high di-mensional datardquo in Proceedings of the SIAM InternationalConference on Data Mining pp 47ndash58 San Francisco CAUSA May 2003

[26] E M Knorr R T Ng and V Tucakov ldquoDistance-basedoutliers algorithms and applicationsrdquo e InternationalJournal on Very Large Data Bases vol 8 no 3-4 pp 237ndash2532000

[27] S Basu I Davidson and K L Wagstaff ldquoConstrainedclustering advances in algorithms theory and applicationsrdquoin Chapman and HallCRC Data Mining and KnowledgeDiscovery Series CRC Press Boca Raton FL USA 1stedition 2008

[28] A A Abin ldquoClustering with side information further effortsto improve efficiencyrdquo Pattern Recognition Letters vol 84pp 252ndash258 2016

[29] Y Shi C Otto and A K Jain ldquoFace clustering representationand pairwise constraintsrdquo IEEE Transactions on InformationForensics and Security vol 13 no 7 pp 1626ndash1640 2018

[30] A A Abin and B Hamid ldquoActive constrained fuzzy clus-tering a multiple kernels learning approachrdquo Pattern Rec-ognition vol 48 no 3 pp 953ndash967 2015

[31] S Xiong J Azimi and X Z Fern ldquoActive learning of con-straints for semi-supervised clusteringrdquo IEEE Transactions on

12 Journal of Computer Networks and Communications

Knowledge and Data Engineering vol 26 no 1 pp 43ndash542014

[32] K Wagstaff C Cardie S Rogers and S Schrodl ldquoCon-strained K-means clustering with background knowledgerdquo inProceedings of the International Conference on MachineLearning (ICML) pp 577ndash584 Williamstown MA USAJune 2001

[33] I Davidson and S S Ravi ldquoUsing instance-level constraints inagglomerative hierarchical clustering theoretical and em-pirical resultsrdquoDataMining and Knowledge Discovery vol 18no 2 pp 257ndash282 2009

[34] B Kulis S Basu I Dhillon and R Mooney ldquoSemi-supervisedgraph clustering a kernel approachrdquo Machine Learningvol 74 no 1 pp 1ndash22 2009

[35] V-V Vu ldquoAn efficient semi-supervised graph based clus-teringrdquo Intelligent Data Analysis vol 22 no 2 pp 297ndash3072018

[36] X Wang B Qian and I Davidson ldquoOn constrained spectralclustering and its applicationsrdquo Data Mining and KnowledgeDiscovery vol 28 no 1 pp 1ndash30 2014

[37] D Mavroeidis ldquoAccelerating spectral clustering with partialsupervisionrdquo Data Mining and Knowledge Discovery vol 21no 2 pp 241ndash258 2010

[38] L Lelis and J Sander ldquoSemi-supervised density-based clus-teringrdquo in Proceeding of IEEE International Conference onData Mining pp 842ndash847 Miami FL USA December 2009

[39] D-D Le and S Satoh ldquoUnsupervised face annotation bymining the webrdquo in Proceedings of the 8th IEEE InternationalConference on Data Mining pp 383ndash392 Pisa Italy De-cember 2008

[40] D Yan L Huang and M I Jordan ldquoFast approximatespectral clusteringrdquo in Proceedings of the Conference onKnowledge Discovery and Data Mining (KDD) pp 907ndash916New York NY USA June 2009

[41] C Zhong M Malinen D Miao and P Franti ldquoA fastminimum spanning tree algorithm based on K-meansrdquo In-formation Sciences vol 295 pp 1ndash17 2015

[42] V Chaoji M Al Hasan S Salem andM J Zaki ldquoSPARCL aneffective and efficient algorithm for mining arbitrary shape-based clustersrdquo Knowledge and Information Systems vol 21no 2 pp 201ndash229 2009

[43] A Asuncion and D J Newman UCI Machine LearningRepository American Statistical Association Boston MAUSA 2015 httparchiveicsuciedumlindexphp

[44] G Karypis E-H Han and V Kumar ldquoChameleon hierar-chical clustering using dynamic modelingrdquo Computer vol 32no 8 pp 68ndash75 1999

Journal of Computer Networks and Communications 13

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

method Both methods can be used in a hybrid frameworkfor the intrusion detection problem of WiFi data sets(AWID) Our proposed multistage system-based machinelearning techniques provide a solution to guideline forconstructing the real IDSIPS system that is one of themost important problems in the network security Ex-periments conducted on the extracted data sets from theAWID and UCI show the eectiveness of our proposedmethods In the near future we will continue to

develop other kinds of machine learning methods forintrusion detection problem and test for other experi-mental setup

Data Availability

e data used to support the ndings of this study can bedownloaded from the AWID repository (httpicsdwebaegeangrawiddownloadhtml)

InternetAccess point

Data capture

Load data Data preprocessing

Training by J48

Block attacks

Block attacksldquoimpersonationrdquo

Skip normal traffics

Classifier (J48)

Incremental clustering model

(ISSGC)

Anomaly detection model

(FLDS)

Marking a new attack by a

security expert

Database of marked attacks

Data training

Setting parameters of incremental

clustering

Intrusion detection system for local wireless network based on machine learning methods

Attacks

Impersonationattacks

Normal traffics

Normaltraffics

Outliers

Marked attacks

Figure 9 A new framework for intrusion detection in 80211 networks

Table 5 Confusion matrix for AWID data set using J48 and IncrementalSSGC in proposed framework

Normal Flooding Impersonation Injection Classication530588 116 6 75 Normal2553 5544 0 0 Flooding2 0 16680 0 Injection3297 148 0 16364 Impersonation

Journal of Computer Networks and Communications 11

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] K P Murphy Machine Learning A Probabilistic PerspectiveMIT press Cambridge MA USA 2012

[2] B Ahmad W Jian and Z Anwar Ali ldquoRole of machinelearning and data mining in internet security standing statewith future directionsrdquo Journal of Computer Networks andCommunications vol 2018 Article ID 6383145 10 pages2018

[3] T Bakhshi and B Ghita ldquoOn internet traffic classification atwo-phasedmachine learning approachrdquo Journal of ComputerNetworks and Communications vol 2016 Article ID 204830221 pages 2016

[4] B Luo and J Xia ldquoA novel intrusion detection system basedon feature generation with visualization strategyrdquo ExpertSystems with Applications vol 41 no 9 pp 4139ndash4147 2014

[5] W-C Lin S-W Ke and C-F Tsai ldquoCANN an intrusiondetection system based on combining cluster centers andnearest neighborsrdquo Knowledge-Based Systems vol 78pp 13ndash21 2015

[6] C-F Tsai and C-Y Lin ldquoA triangle area based nearestneighbors approach to intrusion detectionrdquo Pattern Recog-nition vol 43 no 1 pp 222ndash229 2010

[7] F Kuang S Zhang Z Jin and W Xu ldquoA novel SVM bycombining kernel principal component analysis and im-proved chaotic particle swarm optimization for intrusiondetectionrdquo Soft Computing vol 19 no 5 pp 1187ndash1199 2015

[8] M E Aminanto and K Kim ldquoDetecting impersonation attackin WiFi networks using deep learning approachrdquo in In-formation Security Applications D Choi and S Guilley EdsVol 10144 Springer Berlin Germany 2016

[9] A A Aburomman and M BI Reaz ldquoA novel SVM-kNN-PSO ensemble method for intrusion detection systemrdquo Ap-plied Soft Computing vol 38 pp 360ndash372 2016

[10] M M Breunig H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of the2000 ACM SIGMOD International Conference on Manage-ment of Data pp 93ndash104 Dallas TX USA May 2000

[11] V Hautamaki I Karkkainen and P Franti ldquoOutlier detectionusing k-nearest neighbour graphrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition pp 430ndash433Cambridge MA USA August 2004

[12] V ang and F F Pashchenko ldquoA new incremental semi-supervised graph based clusteringrdquo in Proceedings of the IEEEInternational Conference on Engineering andTelecommunication Moscow Russia March 2018

[13] V ang D V Pantiukhin and A N Nazarov ldquoFLDS fastoutlier detection based on local density scorerdquo in Proceedingsof the IEEE International Conference on Engineering andTelecommunication pp 137ndash141 Moscow Russia November2016

[14] C Kolias G Kambourakis A Stavrou and S GritzalisldquoIntrusion detection in 80211 networks empirical evaluationof threats and a public datasetrdquo IEEE Communications Sur-veys amp Tutorials vol 18 no 1 pp 184ndash208 2016

[15] Ch Gupta and R Grossman ldquoGenIca single pass generalizedincremental algorithm for clusteringrdquo in Proceedings of theFourth SIAM International Conference on Data Miningpp 147ndash153 Lake Buena Vista FL USA April 2004

[16] M Ester H-P Kriegel J Sander M Wimmer and X XuldquoIncremental clustering for mining in a data warehousingenvironmentrdquo in Proceedings of the International Conferenceon Very Large Data Bases pp 323ndash333 New York NY USAAugust 1998

[17] V Chandrasekhar C Tan M Wu L Li X Li and J-H LimldquoIncremental graph clustering for efficient retrieval fromstreaming egocentric video datardquo in Proceedings of the In-ternational Conference on Pattern Recognition pp 2631ndash2636Stockholm Sweden August 2014

[18] A M Bagirov J Ugon and D Webb ldquoFast modified globalK-means algorithm for incremental cluster constructionrdquoPattern Recognition vol 44 no 4 pp 866ndash876 2011

[19] A Bryant D E Tamir N D Rishe and K AbrahamldquoDynamic incremental fuzzy C-means clusteringrdquo in Pro-ceedings of the Sixth International Conference on PervasivePatterns and Applications Venice Italy May 2014

[20] Z Yu P Luo J You et al ldquoIncremental semi-supervisedclustering ensemble for high dimensional data clusteringrdquoIEEE Transactions on Knowledge and Data Engineeringvol 28 no 3 pp 701ndash714 2016

[21] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 3pp 1ndash58 2009

[22] J Tang Z Chen A Fu and D Cheung ldquoEnhancing effec-tiveness of outlier detections for low density patternsrdquo inProceedings of the Sixth Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD) Taipei Taiwan May2002

[23] S Papadimitriou H Kitagawa P B Gibbons andC Faloutsos ldquoLoci fast outlier detection using the localcorrelation integralrdquo in Proceedings of the 19th InternationalConference on Data Engineering pp 315ndash326 BangaloreIndia March 2003

[24] M Ester H-P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databaseswith noiserdquo in Proceedings of the Conference on KnowledgeDiscovery and Data Mining (KDD) pp 226ndash231 PortlandOR USA August 1996

[25] L Ertoz M Steinbach and V Kumar ldquoFinding clusters ofdifferent sizes shapes and densities in noisy high di-mensional datardquo in Proceedings of the SIAM InternationalConference on Data Mining pp 47ndash58 San Francisco CAUSA May 2003

[26] E M Knorr R T Ng and V Tucakov ldquoDistance-basedoutliers algorithms and applicationsrdquo e InternationalJournal on Very Large Data Bases vol 8 no 3-4 pp 237ndash2532000

[27] S Basu I Davidson and K L Wagstaff ldquoConstrainedclustering advances in algorithms theory and applicationsrdquoin Chapman and HallCRC Data Mining and KnowledgeDiscovery Series CRC Press Boca Raton FL USA 1stedition 2008

[28] A A Abin ldquoClustering with side information further effortsto improve efficiencyrdquo Pattern Recognition Letters vol 84pp 252ndash258 2016

[29] Y Shi C Otto and A K Jain ldquoFace clustering representationand pairwise constraintsrdquo IEEE Transactions on InformationForensics and Security vol 13 no 7 pp 1626ndash1640 2018

[30] A A Abin and B Hamid ldquoActive constrained fuzzy clus-tering a multiple kernels learning approachrdquo Pattern Rec-ognition vol 48 no 3 pp 953ndash967 2015

[31] S Xiong J Azimi and X Z Fern ldquoActive learning of con-straints for semi-supervised clusteringrdquo IEEE Transactions on

12 Journal of Computer Networks and Communications

Knowledge and Data Engineering vol 26 no 1 pp 43ndash542014

[32] K Wagstaff C Cardie S Rogers and S Schrodl ldquoCon-strained K-means clustering with background knowledgerdquo inProceedings of the International Conference on MachineLearning (ICML) pp 577ndash584 Williamstown MA USAJune 2001

[33] I Davidson and S S Ravi ldquoUsing instance-level constraints inagglomerative hierarchical clustering theoretical and em-pirical resultsrdquoDataMining and Knowledge Discovery vol 18no 2 pp 257ndash282 2009

[34] B Kulis S Basu I Dhillon and R Mooney ldquoSemi-supervisedgraph clustering a kernel approachrdquo Machine Learningvol 74 no 1 pp 1ndash22 2009

[35] V-V Vu ldquoAn efficient semi-supervised graph based clus-teringrdquo Intelligent Data Analysis vol 22 no 2 pp 297ndash3072018

[36] X Wang B Qian and I Davidson ldquoOn constrained spectralclustering and its applicationsrdquo Data Mining and KnowledgeDiscovery vol 28 no 1 pp 1ndash30 2014

[37] D Mavroeidis ldquoAccelerating spectral clustering with partialsupervisionrdquo Data Mining and Knowledge Discovery vol 21no 2 pp 241ndash258 2010

[38] L Lelis and J Sander ldquoSemi-supervised density-based clus-teringrdquo in Proceeding of IEEE International Conference onData Mining pp 842ndash847 Miami FL USA December 2009

[39] D-D Le and S Satoh ldquoUnsupervised face annotation bymining the webrdquo in Proceedings of the 8th IEEE InternationalConference on Data Mining pp 383ndash392 Pisa Italy De-cember 2008

[40] D Yan L Huang and M I Jordan ldquoFast approximatespectral clusteringrdquo in Proceedings of the Conference onKnowledge Discovery and Data Mining (KDD) pp 907ndash916New York NY USA June 2009

[41] C Zhong M Malinen D Miao and P Franti ldquoA fastminimum spanning tree algorithm based on K-meansrdquo In-formation Sciences vol 295 pp 1ndash17 2015

[42] V Chaoji M Al Hasan S Salem andM J Zaki ldquoSPARCL aneffective and efficient algorithm for mining arbitrary shape-based clustersrdquo Knowledge and Information Systems vol 21no 2 pp 201ndash229 2009

[43] A Asuncion and D J Newman UCI Machine LearningRepository American Statistical Association Boston MAUSA 2015 httparchiveicsuciedumlindexphp

[44] G Karypis E-H Han and V Kumar ldquoChameleon hierar-chical clustering using dynamic modelingrdquo Computer vol 32no 8 pp 68ndash75 1999

Journal of Computer Networks and Communications 13

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

References

[1] K P Murphy Machine Learning A Probabilistic PerspectiveMIT press Cambridge MA USA 2012

[2] B Ahmad W Jian and Z Anwar Ali ldquoRole of machinelearning and data mining in internet security standing statewith future directionsrdquo Journal of Computer Networks andCommunications vol 2018 Article ID 6383145 10 pages2018

[3] T Bakhshi and B Ghita ldquoOn internet traffic classification atwo-phasedmachine learning approachrdquo Journal of ComputerNetworks and Communications vol 2016 Article ID 204830221 pages 2016

[4] B Luo and J Xia ldquoA novel intrusion detection system basedon feature generation with visualization strategyrdquo ExpertSystems with Applications vol 41 no 9 pp 4139ndash4147 2014

[5] W-C Lin S-W Ke and C-F Tsai ldquoCANN an intrusiondetection system based on combining cluster centers andnearest neighborsrdquo Knowledge-Based Systems vol 78pp 13ndash21 2015

[6] C-F Tsai and C-Y Lin ldquoA triangle area based nearestneighbors approach to intrusion detectionrdquo Pattern Recog-nition vol 43 no 1 pp 222ndash229 2010

[7] F Kuang S Zhang Z Jin and W Xu ldquoA novel SVM bycombining kernel principal component analysis and im-proved chaotic particle swarm optimization for intrusiondetectionrdquo Soft Computing vol 19 no 5 pp 1187ndash1199 2015

[8] M E Aminanto and K Kim ldquoDetecting impersonation attackin WiFi networks using deep learning approachrdquo in In-formation Security Applications D Choi and S Guilley EdsVol 10144 Springer Berlin Germany 2016

[9] A A Aburomman and M BI Reaz ldquoA novel SVM-kNN-PSO ensemble method for intrusion detection systemrdquo Ap-plied Soft Computing vol 38 pp 360ndash372 2016

[10] M M Breunig H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquo in Proceedings of the2000 ACM SIGMOD International Conference on Manage-ment of Data pp 93ndash104 Dallas TX USA May 2000

[11] V Hautamaki I Karkkainen and P Franti ldquoOutlier detectionusing k-nearest neighbour graphrdquo in Proceedings of the 17thInternational Conference on Pattern Recognition pp 430ndash433Cambridge MA USA August 2004

[12] V ang and F F Pashchenko ldquoA new incremental semi-supervised graph based clusteringrdquo in Proceedings of the IEEEInternational Conference on Engineering andTelecommunication Moscow Russia March 2018

[13] V ang D V Pantiukhin and A N Nazarov ldquoFLDS fastoutlier detection based on local density scorerdquo in Proceedingsof the IEEE International Conference on Engineering andTelecommunication pp 137ndash141 Moscow Russia November2016

[14] C Kolias G Kambourakis A Stavrou and S GritzalisldquoIntrusion detection in 80211 networks empirical evaluationof threats and a public datasetrdquo IEEE Communications Sur-veys amp Tutorials vol 18 no 1 pp 184ndash208 2016

[15] Ch Gupta and R Grossman ldquoGenIca single pass generalizedincremental algorithm for clusteringrdquo in Proceedings of theFourth SIAM International Conference on Data Miningpp 147ndash153 Lake Buena Vista FL USA April 2004

[16] M Ester H-P Kriegel J Sander M Wimmer and X XuldquoIncremental clustering for mining in a data warehousingenvironmentrdquo in Proceedings of the International Conferenceon Very Large Data Bases pp 323ndash333 New York NY USAAugust 1998

[17] V Chandrasekhar C Tan M Wu L Li X Li and J-H LimldquoIncremental graph clustering for efficient retrieval fromstreaming egocentric video datardquo in Proceedings of the In-ternational Conference on Pattern Recognition pp 2631ndash2636Stockholm Sweden August 2014

[18] A M Bagirov J Ugon and D Webb ldquoFast modified globalK-means algorithm for incremental cluster constructionrdquoPattern Recognition vol 44 no 4 pp 866ndash876 2011

[19] A Bryant D E Tamir N D Rishe and K AbrahamldquoDynamic incremental fuzzy C-means clusteringrdquo in Pro-ceedings of the Sixth International Conference on PervasivePatterns and Applications Venice Italy May 2014

[20] Z Yu P Luo J You et al ldquoIncremental semi-supervisedclustering ensemble for high dimensional data clusteringrdquoIEEE Transactions on Knowledge and Data Engineeringvol 28 no 3 pp 701ndash714 2016

[21] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 3pp 1ndash58 2009

[22] J Tang Z Chen A Fu and D Cheung ldquoEnhancing effec-tiveness of outlier detections for low density patternsrdquo inProceedings of the Sixth Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (PAKDD) Taipei Taiwan May2002

[23] S Papadimitriou H Kitagawa P B Gibbons andC Faloutsos ldquoLoci fast outlier detection using the localcorrelation integralrdquo in Proceedings of the 19th InternationalConference on Data Engineering pp 315ndash326 BangaloreIndia March 2003

[24] M Ester H-P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databaseswith noiserdquo in Proceedings of the Conference on KnowledgeDiscovery and Data Mining (KDD) pp 226ndash231 PortlandOR USA August 1996

[25] L Ertoz M Steinbach and V Kumar ldquoFinding clusters ofdifferent sizes shapes and densities in noisy high di-mensional datardquo in Proceedings of the SIAM InternationalConference on Data Mining pp 47ndash58 San Francisco CAUSA May 2003

[26] E M Knorr R T Ng and V Tucakov ldquoDistance-basedoutliers algorithms and applicationsrdquo e InternationalJournal on Very Large Data Bases vol 8 no 3-4 pp 237ndash2532000

[27] S Basu I Davidson and K L Wagstaff ldquoConstrainedclustering advances in algorithms theory and applicationsrdquoin Chapman and HallCRC Data Mining and KnowledgeDiscovery Series CRC Press Boca Raton FL USA 1stedition 2008

[28] A A Abin ldquoClustering with side information further effortsto improve efficiencyrdquo Pattern Recognition Letters vol 84pp 252ndash258 2016

[29] Y Shi C Otto and A K Jain ldquoFace clustering representationand pairwise constraintsrdquo IEEE Transactions on InformationForensics and Security vol 13 no 7 pp 1626ndash1640 2018

[30] A A Abin and B Hamid ldquoActive constrained fuzzy clus-tering a multiple kernels learning approachrdquo Pattern Rec-ognition vol 48 no 3 pp 953ndash967 2015

[31] S Xiong J Azimi and X Z Fern ldquoActive learning of con-straints for semi-supervised clusteringrdquo IEEE Transactions on

12 Journal of Computer Networks and Communications

Knowledge and Data Engineering vol 26 no 1 pp 43ndash542014

[32] K Wagstaff C Cardie S Rogers and S Schrodl ldquoCon-strained K-means clustering with background knowledgerdquo inProceedings of the International Conference on MachineLearning (ICML) pp 577ndash584 Williamstown MA USAJune 2001

[33] I Davidson and S S Ravi ldquoUsing instance-level constraints inagglomerative hierarchical clustering theoretical and em-pirical resultsrdquoDataMining and Knowledge Discovery vol 18no 2 pp 257ndash282 2009

[34] B Kulis S Basu I Dhillon and R Mooney ldquoSemi-supervisedgraph clustering a kernel approachrdquo Machine Learningvol 74 no 1 pp 1ndash22 2009

[35] V-V Vu ldquoAn efficient semi-supervised graph based clus-teringrdquo Intelligent Data Analysis vol 22 no 2 pp 297ndash3072018

[36] X Wang B Qian and I Davidson ldquoOn constrained spectralclustering and its applicationsrdquo Data Mining and KnowledgeDiscovery vol 28 no 1 pp 1ndash30 2014

[37] D Mavroeidis ldquoAccelerating spectral clustering with partialsupervisionrdquo Data Mining and Knowledge Discovery vol 21no 2 pp 241ndash258 2010

[38] L Lelis and J Sander ldquoSemi-supervised density-based clus-teringrdquo in Proceeding of IEEE International Conference onData Mining pp 842ndash847 Miami FL USA December 2009

[39] D-D Le and S Satoh ldquoUnsupervised face annotation bymining the webrdquo in Proceedings of the 8th IEEE InternationalConference on Data Mining pp 383ndash392 Pisa Italy De-cember 2008

[40] D Yan L Huang and M I Jordan ldquoFast approximatespectral clusteringrdquo in Proceedings of the Conference onKnowledge Discovery and Data Mining (KDD) pp 907ndash916New York NY USA June 2009

[41] C Zhong M Malinen D Miao and P Franti ldquoA fastminimum spanning tree algorithm based on K-meansrdquo In-formation Sciences vol 295 pp 1ndash17 2015

[42] V Chaoji M Al Hasan S Salem andM J Zaki ldquoSPARCL aneffective and efficient algorithm for mining arbitrary shape-based clustersrdquo Knowledge and Information Systems vol 21no 2 pp 201ndash229 2009

[43] A Asuncion and D J Newman UCI Machine LearningRepository American Statistical Association Boston MAUSA 2015 httparchiveicsuciedumlindexphp

[44] G Karypis E-H Han and V Kumar ldquoChameleon hierar-chical clustering using dynamic modelingrdquo Computer vol 32no 8 pp 68ndash75 1999

Journal of Computer Networks and Communications 13

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

Knowledge and Data Engineering vol 26 no 1 pp 43ndash542014

[32] K Wagstaff C Cardie S Rogers and S Schrodl ldquoCon-strained K-means clustering with background knowledgerdquo inProceedings of the International Conference on MachineLearning (ICML) pp 577ndash584 Williamstown MA USAJune 2001

[33] I Davidson and S S Ravi ldquoUsing instance-level constraints inagglomerative hierarchical clustering theoretical and em-pirical resultsrdquoDataMining and Knowledge Discovery vol 18no 2 pp 257ndash282 2009

[34] B Kulis S Basu I Dhillon and R Mooney ldquoSemi-supervisedgraph clustering a kernel approachrdquo Machine Learningvol 74 no 1 pp 1ndash22 2009

[35] V-V Vu ldquoAn efficient semi-supervised graph based clus-teringrdquo Intelligent Data Analysis vol 22 no 2 pp 297ndash3072018

[36] X Wang B Qian and I Davidson ldquoOn constrained spectralclustering and its applicationsrdquo Data Mining and KnowledgeDiscovery vol 28 no 1 pp 1ndash30 2014

[37] D Mavroeidis ldquoAccelerating spectral clustering with partialsupervisionrdquo Data Mining and Knowledge Discovery vol 21no 2 pp 241ndash258 2010

[38] L Lelis and J Sander ldquoSemi-supervised density-based clus-teringrdquo in Proceeding of IEEE International Conference onData Mining pp 842ndash847 Miami FL USA December 2009

[39] D-D Le and S Satoh ldquoUnsupervised face annotation bymining the webrdquo in Proceedings of the 8th IEEE InternationalConference on Data Mining pp 383ndash392 Pisa Italy De-cember 2008

[40] D Yan L Huang and M I Jordan ldquoFast approximatespectral clusteringrdquo in Proceedings of the Conference onKnowledge Discovery and Data Mining (KDD) pp 907ndash916New York NY USA June 2009

[41] C Zhong M Malinen D Miao and P Franti ldquoA fastminimum spanning tree algorithm based on K-meansrdquo In-formation Sciences vol 295 pp 1ndash17 2015

[42] V Chaoji M Al Hasan S Salem andM J Zaki ldquoSPARCL aneffective and efficient algorithm for mining arbitrary shape-based clustersrdquo Knowledge and Information Systems vol 21no 2 pp 201ndash229 2009

[43] A Asuncion and D J Newman UCI Machine LearningRepository American Statistical Association Boston MAUSA 2015 httparchiveicsuciedumlindexphp

[44] G Karypis E-H Han and V Kumar ldquoChameleon hierar-chical clustering using dynamic modelingrdquo Computer vol 32no 8 pp 68ndash75 1999

Journal of Computer Networks and Communications 13

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 14: MultistageSystem-BasedMachineLearningTechniquesfor IntrusionDetectioninWiFiNetworkdownloads.hindawi.com/journals/jcnc/2019/4708201.pdf · 2019. 7. 30. · Daa aiig Iee Daa cae Peceig

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom