Optimizing Computer Worm Detection Using...

Research ArticleOptimizing Computer Worm Detection Using Ensembles

Nelson Ochieng 1 Waweru Mwangi2 and Ismail Ateya1

1Strathmore University Kenya2Jomo Kenyatta University of Agriculture and Technology Kenya

Correspondence should be addressed to Nelson Ochieng nochiengstrathmoreedu

Received 13 December 2018 Revised 26 February 2019 Accepted 3 March 2019 Published 11 April 2019

Guest Editor Pelin Angin

Copyright copy 2019 NelsonOchieng et alThis is an open access article distributed under theCreative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

The scope of this research is computer worm detection Computer worm has been defined as a process that can cause a possiblyevolved copy of it to execute on a remote computer It does not require human intervention to propagate neither does it attachitself to an existing computer file It spreads very rapidly Modern computer worm authors obfuscate the code to make it difficultto detect the computer worm This research proposes to use machine learning methodology for the detection of computer wormsMore specifically ensembles are usedThe research deviates from existing detection approaches by using dark space network trafficattributed to an actual worm attack to train and validate the machine learning algorithms It is also obtained that the variousensembles perform comparatively well Each of them is therefore a candidate for the final modelThe algorithms also perform justas well as similar studies reported in the literature

1 Introduction

Malware includes computer virus Trojan horse spywaread-ware computer worms among many others In surveyby [1] a malware event occurs in organizations every 3minutes and attacks many sectors with alarming losses tointellectual property compromised customer records andeven destruction of data This research has as its scopecomputer worm detection in a network Reference [2] definesa computer worm as a ldquoprocess that can cause a (possiblyevolved) copy of it to execute on a remote computationalmachinerdquo Worms self-propagate across computer networksby exploiting security or policy flaws in widely used networkservices Unlike computer viruses computer worms do notrequire user intervention to propagate nor do they piggy-back on existing files Their spread is very rapid [3 4] withthe ability to infect as many as 359000 computers in under14 hours or even faster Computer worms therefore presentunique challenges to security researchers hence motivatingthis study

Defense against computer worm attacks may be throughprevention ofwormattacks detection ofworms containmentof worm spread and removal of worm infections Prevention

is not always wholly possible because of the inherent vulnera-bilities found in all software Detection is therefore the betterapproach

A number of computer worm detection approacheshave been explored in the research environment Content-based fingerprinting captures a wormrsquos characteristics byderiving the most representative content-sequence as thewormrsquos signature Anomaly-detection leverages the fact thatworms are likely to exhibit anomalous behavior such as port-scanning and failed connection attempts which are distinctfrom normal behavior Behavioral foot-printing makes use ofthe fact that each worm exhibits a definite communicationpattern as it propagates between hosts in a network andthese patterns can be used to uniquely identify a wormIntelligent detection approaches that use machine learninghave also been proposed While each of these approaches hasits strengths a number of weaknesses have also been notedFor example content-signature schemes while an establisheddimension to detect worms fail to detect novel worms andare expensive on the system In anomaly detection profilingnormal network behavior is impossible and establishingdetection threshold is also difficult Behavioral foot-printingis prone to behavior camouflaging attacks Approaches thatleverage machine learning have generated high false positive

HindawiSecurity and Communication NetworksVolume 2019 Article ID 4656480 10 pageshttpsdoiorg10115520194656480

2 Security and Communication Networks

and false negative rates This has been partly because ofpoor characterization of worm traffic and also because ofthe lack of sound datasets for training and validation of thealgorithms

This paper presents an approach that attempts to providebetter performance The feature set used for the machinelearning algorithms are selected network packet header fieldsas reported in an earlier paper by the authors [5] The rest ofthe paper is organized as follows Section 2 reviews existingliterature on computer worm detection using machine learn-ing Section 3 discusses the methodology for the researchSection 4 discusses the results The paper concludes with asummary in Section 5

2 Related Work

A number of approaches for computer worm detectionhave been reviewed in the literature These include content-based signature schemes anomaly-detection schemes andbehavioral-signature detection schemes a summary andanalysis of which has been presented by the authors in anearlier paper [6] For this present work only approachesthat utilize machine learning are emphasized Reference [7]was one of the seminal works in using machine learningtechniques for malware detection It used static programbinary properties and achieved a detection rate of 9776Reference [8] used n-grams extracted from the executable toform training examples They apply several learning methodssuch as Nearest Neighbors Naıve Bayes Support VectorMachines Decision Trees and Boosting Boosted DecisionTrees performed the best with an Area under Curve (AUC)of 0996Win32 Portable Executables (PE) as a feature is usedby [9ndash11] Paper [12] achieves a True Positive Rate of 985and a False Positive Rate of 0025 usingWindows ApplicationProgramming Interface (API) calls as the features Otherfeature types used include Operation Code (OPcode) [13]sequence of instructions that capture program control flowinformation [14] and binary images [15] Reference [15]obtains an accuracy of 98 Paper [16] uses a restrictedBoltzmann machine a neural network to create a newset of features from existing ones These are then usedto train a one-side perceptron (OSP) algorithm The workgets very close to obtaining a zero false positive classifierPaper [17] uses Logistic Model Trees Naıve Bayes SupportVector Machines (SVM) and k Nearest Neighbors (kNN)and obtains an accuracy of 983 with the Linear ModelTree algorithm A deep neural network that uses byte entropyhistogram PE import features and PE metadata features isdeployed by [17] and achieves a detection rate of 95 anda false positive rate of 01 Reference [18] also uses deeplearning

Most of the reviewed works utilize a single parameter forthe detection The present work will utilize many features asreported by the authors in [5]

3 Methods

The main aim of this work is to investigate various machinelearning ensembles on computer worm detection using uni-directional network traffic to a dark space The methodology

adopted follows the standard procedure in machine learning(1) collecting data (2) exploring and preparing the dataand (3) training a model on the data and evaluating modelperformance

31 Dataset The datasets used for the experiments wereobtained from the University San Diego California Centerfor Applied Data Analysis (USCD CAIDA) The centeroperates a network telescope that consists of a globally rooted8 network that monitors large segments of lightly usedaddress space There is little legitimate traffic in this addressspace hence it provides a monitoring point for anomaloustraffic that represents almost 1256th of all IPv4 destinationaddresses on the Internet

Two sets of datasets were requested and obtained fromthis telescope The first is the Three Days of ConfickerDatasets [19] containing data for 3 days between November2008 and January 2009 during which Conficker worm attack[20] was active This dataset contains 68 compressed packetcapture (pcap) files each containing one hour of traces Thepcap files only contain packet headers with the payloadhaving been removed to preserve privacy The destination IPaddresses have also been masked for the same reason Theother dataset is the Two Days in November 2008 dataset [21]with traces for the 12th and 19th November 2008 containingtwo typical days of background radiation just prior to thedetection of Conficker which has been used to differentiatebetween Conficker-infected traffic and clean traffic

The datasets were processed using the CAIDA Corsarosoftware suite [22] a software suite for performing large-scale analysis of trace data The raw pcap datasets were aggre-gated into the FlowTuple format This format retains onlyselected fields from captured packets instead of the wholepacket enabling a more efficient data storage processing andanalysis The 8 fields are source IP address destination IPaddress source port destination port protocol Time to LiveTCP flags and IP packet length An additional field valueindicates the number of packets in the interval whose headerfields match this FlowTuple key

The instances in the Three Days of Conficker datasethave been further filtered to retain only instances thathave a high likelihood of being attributable to Confickerworm attack of the year 2008 Reference [20] focuses onConfickerrsquos TCP scanning behavior (searching for victimsto exploit) and indicates that it engages in three typesof observable network scanning via TCP port 445 or 139(where the vulnerable Microsoft software Windows Serverservice runs) for additional victimsThe vulnerability allowedattackers to execute arbitrary code via a crafted RPC requestthat triggers a buffer overflow These include local networkscanning where Conficker determines the broadcast domainfrom network interface settings scans hosts nearby otherinfected hosts and random scanning Other distinguishingcharacteristics include TTL within reasonable distance fromWindows default TTL of 128 incremental source port inthe Windows default range of 1024-5000 2 or 1 TCP SYNpackets per connection attempt instead of the usual 3 TCPSYN packets per connection attempt due to TCPrsquos retransmitbehavior

Security and Communication Networks 3

This dataset solves the privacy challenge by removing thepayload and also masking out the first octet of the destinationIP address It is also a more recent dataset than the KDDdataset that has been the one available for network securityresearchers However it only includes unidirectional trafficto the network telescope and therefore does not allow theresearchers to include features of computer worms that wouldbe available in bidirectional traffic and would deliver a morecomplete training for the classifiers

32 Features This section presents an analysis of the featuresto be used for detection and their contribution towardsthe detection capability of the learning algorithms Thesefeatures were obtained after performing feature selectionexperiments whose results were reported in [5] The bestfeatures for the classification task were there identified asTime to Live (TTL) Internet Protocol (IP) packet lengthvalue or number of packets in the packet capture intervalwhose header fields match the Flow Tuple key well knowndestination ports or destination ports within the range 0-1024 and IP packet source country China The above featuresare IP packet header fields TTL is used to avoid looping inthe network Every packet is sent with some TTL value setwhich tells the network how many network routers (hops)this packet can cross At each hop its value is decrementedby one and when the value reaches zero the packet isdiscarded Different operating systems have default TTLranges and since computer worms target vulnerabilities inparticular operating systems they will usually be associatedwith TTL within certain ranges For example Confickerworm packets have TTL within reasonable distance fromWindows default TTL of 128 Packet length indicates thesize of the packet Particular computer worms are associatedwith particular packet length sizes For example the packetlength for Conficker worm is around 62 bytes The valuefeature referred to the number of packetswith a unique packetheader signature sequence A number of flow tuples with aparticular key would be suspicious China originates mostof the malicious software packets Computer worms targetwell known ports where popular services run for maximumimpact Conficker worm for example targets port 445 or 139

33 Ensembles Various machine learning ensembles wereexplored and their detection capabilities investigated Ensem-ble methods try to construct a set of learners and combinethemThe ensemblemethods investigated included averagingtechnique GradientBoostingClassifier AdaBoost BaggingVoting Stacking Random Forests and ExtraTreesClassifierThe base classifiers used included SVM Multilayer percep-trons kNN NB Logistic Regression and Decision TreesPython programming language was used for the classificationexperiments andmore especially the Scikit-learn library [22]These ensemble techniques are described as follows

331 ExtraTreesClassifier This builds an ensemble ofunpruned decision trees according to the classical top-downprocedure [23]

The Extra-Trees splitting procedure for numericalattributes is given in Algorithm 1 It has two parameters Kthe number of attributes randomly selected at each node andnmin the minimum sample size for splitting a node It isused several times with the (full) original learning sample togenerate an ensemble model (we denote by M the numberof trees of this ensemble) The predictions of the trees areaggregated to yield the final prediction by majority vote inclassification problems

332 Random Forests Paper [24] defines a random forestas a classifier consisting of a collection of tree-structuredclassifiers h(x 120579k) k = 1 where the 120579k are independentidentically distributed random vectors and each tree casts aunit vote for themost popular class at input xThis is as shownin Algorithm 2

333 AdaBoost Reference [25] explains AdaBoost as takingas input a training set (x1 y1) (xm 119910m) where each xibelongs to some domain or instance space X and each label119910i is in some label set Y AdaBoost calls a given weak orbase learning algorithm repeatedly in a series of rounds t =1 T One of the main ideas of the algorithm is to maintaina distribution or set of weights over the training set Theweight of this distribution on training example i on roundt is denoted Dt (i) Initially all weights are set equally buton each round the weights of incorrectly classified examplesare increased so that the weak learner is forced to focus onthe hard examples in the training set AdaBoost is shown inAlgorithm 3

334 Bagging The name Bagging came from the abbrevi-ation of Bootstrap AGGregatING [26] The two key ingre-dients of Bagging are bootstrap and aggregation Baggingapplies boostrap sampling to obtain the data subsets for train-ing the base learners Given a training data set containingm number of training examples a sample of m trainingexamples will be generated by sampling with replacementEach of these datasets is used to train a model The outputsof the models are combined by averaging (in regression) orvoting (in classification) to create a single output Algorithm4shows Bagging

335 Gradient Boosting Gradient Boosting [27] is amachinelearning technique for regression and classification problemswhich produces a prediction model in the form of anensemble of weak prediction models typically decision treesIt builds the model in a stage-wise fashion like other boostingmethods do and it generalizes themby allowing optimizationof an arbitrary differentiable loss function Gradient Boostingis shown in Algorithm 5

336 Voting Voting is the most popular and fundamentalcombination method for nominal outputs In majority vot-ing every classifier votes for one class label and the finaloutput class label is the one that receives more than half ofthe votes if none of the class labels receives more than half of


Split a node(S)Input the local learning subset S corresponding to the node we want tosplitOutput a split [a lt 119886119888] or nothing(i) If Stop split(S) is TRUE then return nothing(ii) Otherwise select K attributes 1198861 119886119870 among all non-constant (in S)candidate attributes(iii) Draw K splits 1199041 119904119870 where 119904119894 = Pick a random split(S 119886119894) foralli =1 K (iv) Return a split 119904lowast such that Score(119904lowast S) = max119894=1119870 Score(119904119894 S)Pick a random split(Sa)Inputs a subset S and an attribute aOutput a split(i) Let 119886119878max and 119886119878min denote the maximal and minimal value of a in S(ii) Draw a random cut-point 119886119888uniformly in [119886119878min 119886119878max](iii) Return the split [a lt 119886119888]Stop split(S)Input a subset SOutput a boolean(i) If |S| lt 119899min then return TRUE(ii) If all attributes are constant in S then return TRUE(iii) If the output is constant in S then return TRUE(iv) Otherwise return FALSE

Algorithm 1 Extra Trees Algorithm

Input Learning set- S Ensemble size B Proportion ofattributes considered fOutput Ensemble E(1) E = 120593(2) for i = 1 to B do(3) Si=Boostrap Sample(S)(4) Ci=Build Random Tree Model (Si f)(5) E = E cup Ci(6) return E

Algorithm 2 Random forest

the votes a rejection option will be given and the combinedclassifier makes no prediction

337 Stacking Reference [28] explains that Stacked General-ization is a method for combining heterogeneous base mod-els that is models learned with different learning algorithmssuch as the nearest neighbormethod DTs NB among othersThe base models are not combined with a fixed scheme suchas voting but rather an additional model called metamodel islearned and used for combining base models First the metalearning dataset is generated using predictions of the basemodels and then using the meta learning set the meta modelis learned which can combine predictions of base models intoa final prediction

Input Learning set- S Ensemble size BOutput Ensemble E(1) E = 120593(2) W=Assign Equal Weights (S)(3) for i = 1 to B do(4) Ci=Construct-Models (S W)(5) Err=Apply Model (Ci S)(6) if (Err = 0) cup (Err ge 05) then(7) Terminate Model Generation(8) return E(9) for j = 1 to Number Of Examples (S) do(10) if Correctly Classified (Sj C

i) then(11) Wj =Wj Err 1minusErr(12) W = Normalize Weights W(13) E = E cup Ci(14) return E

Algorithm 3 AdaBoost

34 Ensemble Experiments Ensemble experiments startedwith a comparison of the base classifiers The base classifiersperformed as shown in Figure 1

Multilayer perceptron performed poorest and was there-fore considered for elimination

To build an ensemble of various models the experimentsstarted by benching a set of Scikit-learn classifiers on the


Input Learning set- S Ensemble size BOutput Ensemble E(1) E = 120593(2) for i = 1 to B do(3) S=Boostrap Sample(S)(4) C=Construct-Base Model (Si)(5) E = E cup Ci(6) return E

Algorithm 4 Bagging

Inputs(i) input data (x y) Ni=1(ii) number of iterations M(iii) choice of the loss-function (y f)(iv) choice of the base-learner model h (x 120579)Algorithm(1) initialize f0 with a constant(2) for t = 1 to M do(3) compute the negative gradient gt(x)(4) fit a new base-learner function h(x 120579t)(5) find the best gradient descent step-size 120588t 120588t = argmin120588 N i=1 yi ftminus1(xi) + 120588h(xi 120579t)(6) update the function estimate ftlarr997888 ftminus1 + 120588th(x 120579t)(7) end for

Algorithm 5 Gradient boosting

dataset The considered models performed as shown inTable 1

It is evident that the base classifiers performed almostequally well in terms of accuracy

A way to understand what is going on in an ensemblewhen the task is classification is to inspect the Receiver Oper-ator Curve (ROC) This curve shows the tradeoff betweenprecision and recall or the rate of true positives versus truenegatives Typically different base classifiers make differenttradeoffs An ensemble can adjust these The ROC curveobtained for the various classifiers and how they compare tothe ensemble averaging technique is shown in Figure 2

Random Forest reported the results in Table 2 and Figures3 and 4

A Cohen Kappa score of 0932 was obtained for RandomForest

Experiments with ExtraTreesClassifier gave the resultsshown in Table 3 and Figures 5 and 6

AdaBoost gave the results reported in Table 4 andFigure 7

Bagging gave the results reported in Table 5 and Figure 8Voting reported the results as shown in Table 6 and

Figure 9

4 Discussion of Results

Before the construction of classifier ensembles it was foundout that errors were significantly correlated for the different

Table 1 Benchmarking models

Model ScoreLogistic Regression 0943Decision Tree 0975Support Vector Machines 0990Naıve Bayes 0985K Nearest Neighbors 0954

Table 2 Random Forest

Accuracy Precision Recall F10896 0907 0888 0895

Table 3 ExtraTreesClassifier scores

AUC Accuracy Precision Recall F10972 0887 0889 0884 0889

Table 4 AdaBoost scores


Table 5 Bagging scores


Table 6 Voting scores


classifiers which is to be expected for models that performwell Yet most correlations were in the 50-80 span showingdecent room for improvement could be realized by ensem-bles

When ROC curves were plotted for the averaging ensem-ble technique and the base algorithms the ensemble tech-nique outperformed Logistic Regression Decision Tree andkNN This was as shown in Figure 2 where the curve for theensemble technique approached the left topmost corner themost The ensemble technique performed almost as well asthe NB and SVM classifiers Trying to improve the ensembleby removing the worst offender (Logistic Regression inFigure 2) gave a truncated ensemble ROC-AUC score of0990 a further improvement

Table 7 summarizes the performance of the ensembleclassifiers

The highest ROC-AUC score was achieved by Gradient-Boosting (0997) while the lowest was achieved by RandomForest (0970) The figures were however all high and notvery different from one another indicating that all ensembletechniques perform well Voting was removed from thecomparison as it was slow especially with more base clas-sifiers integrated Some of the ensemble classifiers howeverdid not generalize well These were ExtraTreesClassifier and


Alg

orith

m

SVC

Mean Accuracy

Cross validation scores

KNeighboors

00 02 04 06 08 10

DecisionTrees

MultipleLayerPerception

NB

LogisticRegression

Figure 1 Comparison of base classifiers

00

10

04

02

08

06

True

pos

itive

rate

False positive rate

ROC curve

100806040200

Irdtsvm

naive_bayesknnensemble

Figure 2 ROC comparison of simple average and base classifiers

Table 7 Performance comparison of the ensemble classifiers

Algorithm AUCGradient Boosting 0997Random Forest 0970Extra Trees Classifier 0972AdaBoost 0993Bagging 0995Stacking 0986Simple Averaging 0987

Random Forest The rest of the ensemble techniques investi-gated generalized well including the slower voting ensembletechnique

It was evident that ensemble techniques improvedobtained scores higher than some base learners though

the performance difference was not significant as would beexpected

5 Conclusion

The study addressed the problem of detecting computerworms in networks The main problem to be solved wasthat existing detection schemes fail to detect sophisticatedcomputer worms that use code obfuscation techniques Inaddition many existing schemes use single parameter for thedetection leading to a poorer characterization of the threatmodel hence a high rate of false positives and false negativesThe datasets used in many approaches is also outdatedThe study aimed to develop a behavioral machine learningmodel to detect computer worms The datasets used for theexperiments were obtained from the University San DiegoCaliforniaCenter forAppliedDataAnalysis (USCDCAIDA)


070

100

080

075

090

085

095

Scor

e

Training examples400300200100

Learning Curves (Random Forest)

Training scoreCross validation score

Figure 3 Learning curve for Random Forest

00

10

04

02

08

06

Reca

ll

Precision

Receiver Operating Characteristic Random Forest After Tuning

100806040200

AUC=097

Figure 4 ROC for Random Forest

00

10

04

02

08

06

Reca

ll

Precision100806040200

AUC=097

Figure 5 ROC for ExtraTreesClassifier


070

100

080

075

090

085

095

Scor

e


Learning Curves (Extra Trees) After tuning


Figure 6 Learning curves for ExtraTreesClassifier

070

100

080

075

090

085

095

Scor

e


Learning Curves (AdaBoost)


Figure 7 AdaBoost learning curve

The results were promising in terms of accuracy and gen-eralization to new datasets There were nomarked differencesbetween the classifiers especially when the datasets werestandardized

It is apparent that the particular classifier used may notbe the determinant in classification in machine learningexperiments but rather the choice of features While thisis largely consistent with other similar studies it should befurther confirmed by future research

It is true that not all computer worms can be detectedby a single method In future it is recommended that acombination of different detection approaches be combinedto be able to detect as many types of computer worms aspossible Also the span of features used for detection shouldbe expanded to include even more features for the detection

The contribution of each feature to the detection abilityshould be documented

Unsupervised learning has not been investigated in thisresearch Unlabeled traffic datasets are available to securityresearchers and practitioners The cost of labeling them ishigh This makes unsupervised learning useful for threatdetection The manual effort of labeling new network traf-fic can make use of clustering and decrease the num-ber of labeled objects needed for the usage of supervisedlearning

Data Availability

The packet capture (pcap) data used to support thefindings of this study are provided by the UCSD Center


070

100

080

075

090

085

095

Scor

e


Learning Curves (Bagging)


Figure 8 Bagging learning curve

070

100

080

075

090

085

095

Scor

e


Learning Curves (Voting)


Figure 9 Learning curve for voting

for Applied Internet Data Analysis Two datasets wereused 1 the CAIDA UCSD Network Telescope ldquoTwo Daysin November 2008rdquo Dataset and 2 the CAIDA UCSDNetwork Telescope ldquoThree Days of Confickerrdquo They maybe released upon application to IMPACT Cyber Trustwho can be contacted at the website address httpswwwimpactcybertrustorgdataset viewidDataset=382 andhttpswwwimpactcybertrustorgdataset viewidDataset=383 The Corsaro tool that was used to process the pcapfiles is available as an open source tool at the Centerfor Applied Internet Data Analysis (CAIDA) website athttpswwwcaidaorgtoolsmeasurementcorsaro

Conflicts of Interest

The authors declare that they have no conflicts of interest

References

[1] FireEye The need for speed httpswww2fireeyecomismg-incident-response-surveyhtml 2018

[2] D Ellis ldquoWorm anatomy andmodelrdquo in Proceedings of the ACMWorkshop on Rapid Malcode (WORM rsquo03) pp 42ndash50 ACMOctober 2003

[3] DMoore V Paxson S Savage C Shannon S Staniford andNWeaver ldquoInside the slammer wormrdquo IEEE Security amp Privacyvol 1 no 4 pp 33ndash39 2003

[4] D Moore C Shannon and K Claffy ldquoCode-Red a case studyon the spread and victims of an internet wormrdquo in Proceedingsof the 2nd ACM SIGCOMMWorkshop on Internet measurment(IMW rsquo02) pp 273ndash284 Marseille France November 2002

[5] O Nelson W Mwangi and I Ateya ldquoA Hybrid FilterWrapperMethod for Feature Selection for Computer Worm Detection


using Darknet Trafficrdquo International Journal of Computer Appli-cations vol 180 no 44 pp 12ndash17 2018

[6] N Ochieng W Mwangi and I Ateya ldquoA Tour of the ComputerWorm Detection Spacerdquo International Journal of ComputerApplications vol 104 no 1 pp 29ndash33 2014

[7] M G Schultz E Eskin E Zadok and S J Stolfo ldquoDatamining methods for detection of new malicious executablesrdquoin Proceedings of the IEEE Symposium on Security and Privacypp 38ndash49 May 2001

[8] J Z Kolter and M A Maloof ldquoLearning to detect and classifymalicious executables in the wildrdquo Journal of Machine LearningResearch vol 7 pp 2721ndash2744 2006

[9] Z Markel ldquoMachine learning based malware detectionrdquohttpsappsdticmildtictrfulltextu2a619747pdf 2015

[10] P Vinod V Laxmi M S Gaur and G Chauhan ldquoDetectingmalicious files using non-signature-based methodsrdquo Interna-tional Journal of Information and Computer Security vol 6 no3 pp 199ndash240 2014

[11] J Bai J Wang and G Zou ldquoAmalware detection scheme basedon mining format informationrdquo The Scientific World Journalvol 2014 Article ID 260905 11 pages 2014

[12] M Alazah S Venkatranan and P Watters ldquoZero-day malwaredetection based on supervised learning algorithms of api callsignaturesrdquo in Proceedings of the 9th Australasian Data MiningConference Data Mining and Analysis 2011

[13] A Mamoun H Shamsul A Jemal et al ldquoA hybrid wrapper-filter approach for malware detectionrdquo Journal of Networks vol9 no 11 2014

[14] S Muazzam W Morgan and L Joohan ldquoDetecting Internetworms using data mining techniquesrdquo Journal of SystematicsCybernetics and Informatics 2009

[15] L Nataraj S Karthikeyan G Jacob and B Manjunath ldquoMal-ware Images visualization and automatic classificationrdquo inProceedings of the 8th International Symposium on Visualizationfor Cyber Security ACM 2011

[16] R Benchea and D T Gavrilut Combining Restricted Boltz-mann Machine adn One Side Perceptron for Malware DetectionSpringer International Publishing 2014

[17] J Saxe and K Berlin ldquoDeep neural network based malwaredetection using two dimensional binary program featuresrdquo inProceedings of the 10th International Conference on Maliciousand Unwanted Software (MALWARE rsquo15) pp 11ndash20 USAOctober 2015

[18] W Huang and J W Stokes ldquoMtNet a multi-task neuralnetwork for dynamic malware classificationrdquo in Proceedingsof the International Conference on Detection of Intrusions andMalware and Vulnerability Assessment Springer 2016

[19] The CAIDA UCSD Network Telescope ldquoThree Days ofConfickerrdquo httpwwwcaidaorgdatapassivetelescope-3days-conficker datasetxml

[20] E Aben ldquoConfickerConflickerDownadup as seen from theUCSD Network Telescoperdquo Technical Report CAIDAFebruary 2009 httpswwwcaidaorgresearchsecurityms08-067confickerxml

[21] The CAIDA UCSD Network Telescope ldquoTwo Days inNovember 2008rdquo Dataset httpwwwcaidaorgdatapassivetelescope-2days-2008-datasetxml

[22] F Pedregosa G Varoquaux and A Gramfort ldquoScikit-learnmachine learning in Pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[23] P Geurts D Ernst and L Wehenkel ldquoExtremely randomizedtreesrdquoMachine Learning vol 63 no 1 pp 3ndash42 2006

[24] S Dzeroski P Panov and B Zenko Ensemble Methods inMachine Learning Encyclopedia of Complexity and SystemsScience 2009

[25] Y Freund R Schapire and N Abe ldquoA short introduction toboostingrdquo Journal of Japanese Society For Artificial Intelligencevol 14 no 771ndash780 p 1612 1999

[26] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[27] A Natekin and A Knoll ldquoGradient boosting machines atutorialrdquo Frontiers in Neurorobotics vol 7 article 21 2013

[28] D H Wolpert ldquoStacked generalizationrdquo Neural Networks vol5 no 2 pp 241ndash259 1992

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018


Active and Passive Electronic Components

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of



Journal ofEngineeringVolume 2018

SensorsJournal of



RotatingMachinery


Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation


Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom


and false negative rates This has been partly because ofpoor characterization of worm traffic and also because ofthe lack of sound datasets for training and validation of thealgorithms

This paper presents an approach that attempts to providebetter performance The feature set used for the machinelearning algorithms are selected network packet header fieldsas reported in an earlier paper by the authors [5] The rest ofthe paper is organized as follows Section 2 reviews existingliterature on computer worm detection using machine learn-ing Section 3 discusses the methodology for the researchSection 4 discusses the results The paper concludes with asummary in Section 5

2 Related Work

A number of approaches for computer worm detectionhave been reviewed in the literature These include content-based signature schemes anomaly-detection schemes andbehavioral-signature detection schemes a summary andanalysis of which has been presented by the authors in anearlier paper [6] For this present work only approachesthat utilize machine learning are emphasized Reference [7]was one of the seminal works in using machine learningtechniques for malware detection It used static programbinary properties and achieved a detection rate of 9776Reference [8] used n-grams extracted from the executable toform training examples They apply several learning methodssuch as Nearest Neighbors Naıve Bayes Support VectorMachines Decision Trees and Boosting Boosted DecisionTrees performed the best with an Area under Curve (AUC)of 0996Win32 Portable Executables (PE) as a feature is usedby [9ndash11] Paper [12] achieves a True Positive Rate of 985and a False Positive Rate of 0025 usingWindows ApplicationProgramming Interface (API) calls as the features Otherfeature types used include Operation Code (OPcode) [13]sequence of instructions that capture program control flowinformation [14] and binary images [15] Reference [15]obtains an accuracy of 98 Paper [16] uses a restrictedBoltzmann machine a neural network to create a newset of features from existing ones These are then usedto train a one-side perceptron (OSP) algorithm The workgets very close to obtaining a zero false positive classifierPaper [17] uses Logistic Model Trees Naıve Bayes SupportVector Machines (SVM) and k Nearest Neighbors (kNN)and obtains an accuracy of 983 with the Linear ModelTree algorithm A deep neural network that uses byte entropyhistogram PE import features and PE metadata features isdeployed by [17] and achieves a detection rate of 95 anda false positive rate of 01 Reference [18] also uses deeplearning

Most of the reviewed works utilize a single parameter forthe detection The present work will utilize many features asreported by the authors in [5]

3 Methods

The main aim of this work is to investigate various machinelearning ensembles on computer worm detection using uni-directional network traffic to a dark space The methodology

adopted follows the standard procedure in machine learning(1) collecting data (2) exploring and preparing the dataand (3) training a model on the data and evaluating modelperformance

31 Dataset The datasets used for the experiments wereobtained from the University San Diego California Centerfor Applied Data Analysis (USCD CAIDA) The centeroperates a network telescope that consists of a globally rooted8 network that monitors large segments of lightly usedaddress space There is little legitimate traffic in this addressspace hence it provides a monitoring point for anomaloustraffic that represents almost 1256th of all IPv4 destinationaddresses on the Internet

Two sets of datasets were requested and obtained fromthis telescope The first is the Three Days of ConfickerDatasets [19] containing data for 3 days between November2008 and January 2009 during which Conficker worm attack[20] was active This dataset contains 68 compressed packetcapture (pcap) files each containing one hour of traces Thepcap files only contain packet headers with the payloadhaving been removed to preserve privacy The destination IPaddresses have also been masked for the same reason Theother dataset is the Two Days in November 2008 dataset [21]with traces for the 12th and 19th November 2008 containingtwo typical days of background radiation just prior to thedetection of Conficker which has been used to differentiatebetween Conficker-infected traffic and clean traffic

The datasets were processed using the CAIDA Corsarosoftware suite [22] a software suite for performing large-scale analysis of trace data The raw pcap datasets were aggre-gated into the FlowTuple format This format retains onlyselected fields from captured packets instead of the wholepacket enabling a more efficient data storage processing andanalysis The 8 fields are source IP address destination IPaddress source port destination port protocol Time to LiveTCP flags and IP packet length An additional field valueindicates the number of packets in the interval whose headerfields match this FlowTuple key

The instances in the Three Days of Conficker datasethave been further filtered to retain only instances thathave a high likelihood of being attributable to Confickerworm attack of the year 2008 Reference [20] focuses onConfickerrsquos TCP scanning behavior (searching for victimsto exploit) and indicates that it engages in three typesof observable network scanning via TCP port 445 or 139(where the vulnerable Microsoft software Windows Serverservice runs) for additional victimsThe vulnerability allowedattackers to execute arbitrary code via a crafted RPC requestthat triggers a buffer overflow These include local networkscanning where Conficker determines the broadcast domainfrom network interface settings scans hosts nearby otherinfected hosts and random scanning Other distinguishingcharacteristics include TTL within reasonable distance fromWindows default TTL of 128 incremental source port inthe Windows default range of 1024-5000 2 or 1 TCP SYNpackets per connection attempt instead of the usual 3 TCPSYN packets per connection attempt due to TCPrsquos retransmitbehavior



























Algorithm 4 Bagging











Figure 9




















Alg

orith

m

SVC

Mean Accuracy


KNeighboors

00 02 04 06 08 10

DecisionTrees


NB

LogisticRegression


00

10

04

02

08

06

True

pos

itive

rate

False positive rate

ROC curve

100806040200

Irdtsvm








5 Conclusion



070

100

080

075

090

085

095

Scor

e





00

10

04

02

08

06

Reca

ll

Precision


100806040200

AUC=097


00

10

04

02

08

06

Reca

ll


AUC=097



070

100

080

075

090

085

095

Scor

e





070

100

080

075

090

085

095

Scor

e










Data Availability



070

100

080

075

090

085

095

Scor

e





070

100

080

075

090

085

095

Scor

e








References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia




























Algorithm 4 Bagging











Figure 9




















Alg

orith

m

SVC

Mean Accuracy


KNeighboors

00 02 04 06 08 10

DecisionTrees


NB

LogisticRegression


00

10

04

02

08

06

True

pos

itive

rate

False positive rate

ROC curve

100806040200

Irdtsvm








5 Conclusion



070

100

080

075

090

085

095

Scor

e





00

10

04

02

08

06

Reca

ll

Precision


100806040200

AUC=097


00

10

04

02

08

06

Reca

ll


AUC=097



070

100

080

075

090

085

095

Scor

e





070

100

080

075

090

085

095

Scor

e










Data Availability



070

100

080

075

090

085

095

Scor

e





070

100

080

075

090

085

095

Scor

e








References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia



Alg

orith

m

SVC

Mean Accuracy


KNeighboors

00 02 04 06 08 10

DecisionTrees


NB

LogisticRegression


00

10

04

02

08

06

True

pos

itive

rate

False positive rate

ROC curve

100806040200

Irdtsvm








5 Conclusion



070

100

080

075

090

085

095

Scor

e





00

10

04

02

08

06

Reca

ll

Precision


100806040200

AUC=097


00

10

04

02

08

06

Reca

ll


AUC=097



070

100

080

075

090

085

095

Scor

e





070

100

080

075

090

085

095

Scor

e










Data Availability



070

100

080

075

090

085

095

Scor

e





070

100

080

075

090

085

095

Scor

e








References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia



070

100

080

075

090

085

095

Scor

e





00

10

04

02

08

06

Reca

ll

Precision


100806040200

AUC=097


00

10

04

02

08

06

Reca

ll


AUC=097



070

100

080

075

090

085

095

Scor

e





070

100

080

075

090

085

095

Scor

e










Data Availability



070

100

080

075

090

085

095

Scor

e





070

100

080

075

090

085

095

Scor

e








References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia



070

100

080

075

090

085

095

Scor

e





070

100

080

075

090

085

095

Scor

e










Data Availability



070

100

080

075

090

085

095

Scor

e





070

100

080

075

090

085

095

Scor

e








References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia



070

100

080

075

090

085

095

Scor

e





070

100

080

075

090

085

095

Scor

e








References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia





























RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia


Optimizing Computer Worm Detection Using...

Documents

Transcript of Optimizing Computer Worm Detection Using...