Research Article - Hindawi Publishing...
Transcript of Research Article - Hindawi Publishing...
Research ArticleAutomatic Identification of Honeypot Server Using MachineLearning Techniques
Cheng Huang 1 Jiaxuan Han1 Xing Zhang2 and Jiayong Liu 1
1College of Cybersecurity Sichuan University Chengdu 610065 China2NSFOCUS Beijing 100089 China
Correspondence should be addressed to Jiayong Liu jylscugmailcom
Received 28 April 2019 Revised 14 July 2019 Accepted 23 August 2019 Published 22 September 2019
Academic Editor Emanuele Maiorana
Copyright copy 2019 Cheng Huang et alis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Traditional security strategies are powerless when facing novel attacks in the complex network environment such as advancedpersistent threat (APT) Compared with traditional security detection strategies the honeypot system especially on the Internet ofthings research area is intended to be attacked and automatically monitor potential attacks by analyzing network packages or logles e researcher can extract exactly threat actor tactics techniques and procedures from these data and then generate moreeective defense strategies But for normal security researchers it is an urgent topic how to improve the honeypot mechanismwhich could not be recognized by attackers and silently capture their behaviors So they need awesome intelligent techniques toautomatically check remotely whether the server runs honeypot service or not As the rapid progress in honeypot detection usingmachine learning technologies the paper proposed a new automatic identication model based on random forest algorithm withthree group features application-layer feature network-layer feature and other system-layer feature e experiment datasets arecollected from public known platforms and designed to prove the eectiveness of the proposed model e experiment resultsshowed that the presented model achieved a high area under curve (AUC) value with 093 (area under the receiver operatingcharacteristic curve) which is better than other machine learning algorithms
1 Introduction
Many traditional security detection strategies such as rewallor intrusion detection systems (IDS) have been invented toprotect the systemrsquos security but there are still many criticalissues which are reported every day [1 2] e situation hasbecome more complex with the development of Internettechnologies Honeypots are dened as a new deceptiontechnology of cybersecurity defense in recent years [3]which can be used to strengthen the companyrsquos securitydetection level In other words honeypots are used to lureattackers into interacting with them and collect the in-formation which will be used to analyze and study the attackway of attackers [4ndash6]
Depending on the interaction level of honeypots theyare divided into three categories low-interaction honey-pots medium-interaction honeypots and high-interactionhoneypots [3 7ndash9] Low-interaction honeypots and
medium-interaction honeypots will not provide anyphysical environment for attackers us they only run afew sets of web services Especially for low-interactionhoneypots services that are simulated through port lis-tening could be detected easily and remotely But for high-interaction honeypots the services are fully deployed withnative programs [10]
With the widespread use of deception technologyhoneypot detection methods also need improvement whichmeans researchers need to modify their honeypots morerealistic to cheat potential attackers instead of easily dis-tinguished us many researchers start to focus on how toautomatically detect honeypots before publishing theirhoneypot framework [11ndash14] e authors [15] discussedseveral honeypot detection features by ngerprinting Holzand Raynal [16] proposed other methods based on detectingUser-mode Linux or VMWare system With the rapid de-velopment of honeypots the poundaws of previous detection
HindawiSecurity and Communication NetworksVolume 2019 Article ID 2627608 8 pageshttpsdoiorg10115520192627608
methods are exposed gradually)emain problem is that thesimulation of high-interaction honeypots is too similar toreal systems making it difficult to detect with a singlefeature )erefore other researchers start to extract otherfeatures and introduce the latest algorithms [17ndash19] Butthere are still many shortcomings at present this paperexplores honeypot detection technology to improve theeffectiveness of deception technologies which is helpful topromote the current development of honeypot technology)e contributions of this paper are summarized as follows
(i) Based on the analysis of the existing honeypot de-tection technology this paper proposed three groupfeatures which can distinguish ordinary servers andhoneypot servers remotely and accurately )esefeatures could be summarized as application-layernetwork-layer and other system-layer
(ii) )e paper presented an automatic detection modelusing multiple features and machine learning al-gorithms to identify honeypots In order to gethigher accuracy four machine learning algorithmsare introduced to build separate models
(iii) In the interest of highlighting the effectiveness of theproposed model the paper compared receiver op-erating characteristic (ROC) curves of four machinelearning algorithms based on 10-fold cross valida-tion methods According to the experiment resultthe model achieved a higher detection rate with theAUC value of 093
)e construction of this paper is as follows )e firstsection introduces the issue Section 2 is the related workwhich presents current research progress in honeypotdetection Section 3 is the proposed framework and itdescribes the detail of our system Section 4 is the exper-iment and evaluation result Section 5 is the conclusion andfuture work
2 Related Works
With the development of Internet technology honeypotsplay a significant role in network security However theattackers could easily distinguish whether the server hasdeployed honeypot services or not In response to solve sucha threat researchers need to make their honeypot servicemore realistic and improve the inner mechanism and outerinterface of the honeypot framework Recently many re-searchers start to focus on how to automatically detect ahoneypot server [20ndash24]
For example Fu et al [18] proposed a method based onexamining the link latency of honeyd honeypots Authorssaid that because the stimulative accuracy of the link latencyof honeypots such as honeyd is always 1ms or 10ms the linklatency in some virtual network is around a multiple of 1msor 10ms According to this characteristic they designed amethod based on NeymanndashPearson decision theory andachieved a high detecting rate Wenda and Ning [25] sug-gested that people can determine honeypots by analyzing thenetwork monitoring characteristic or detecting the virtual
environment Because many high-interactive honeypots arealways deployed with firewalls and IDS such features couldbe used to detect quickly Defibaugh-Chavez et al [17]proposed another method which focused on detectingnetwork-level activities and services of honeypots )enMukkamala et al [19] proposed another method based onthe SVM algorithm [26] )ey added a different featuregroup TCPIP fingerprint and designed an experiment toprove their method effective )e result proved their ideawas correct Except for these methods Holz and Raynal [16]proposed a method based on the feedback information ofhoneypots )ey presented a honeypot detection frameworkby extracting data of User-mode Linux and VMWare [27]For UML people could use the information of commandldquodmesgrdquo in fingerprinting or view the file etcfstab ForVMWare they suggested capturing the hardware and MACaddress According to IEEE Standards a MAC address suchas ldquo00-0E-23-xx-xx-xxrdquo is assigned for VMWare BesidesSend-Safe Honeypot Hunter is the first commercial anti-honeypot technology [28]
Although many detection methods have been proposedthere is still a big problem with the rapid development ofcloud and virtual technologies many of them may not be aseffective as they used to be So we studied previous re-searchersrsquo methods on honeypot detection and then pro-posed a novel honeypot detection framework based onmachine learning technologies
3 Framework
31 Architecture In order to detect honeypots more ef-fectively the paper presents an automatic identificationframework which is shown in Figure 1 )ere are threeimportant parts in the framework feature extraction labelmethods and detection model Feature extraction is pri-marily responsible for collecting features from the targetserver In this part all features are split into three groupfeatures the application layer the network layer and thesystem layer )ese extracted feature values will be cal-culated with string or integer meaning )e second im-portant part is the label methods As the name suggests thepurpose of this part is to collect label data which containhoneypots or normal host IP addresses Cyberspace searchengine such as Shodan FOFA and other public platformsare introduced to collect honeypots servers Manual checkmethods are used after crawling real data from theseplatforms via the application programming interface (API)After obtaining the feature data and label data both ofthem will be transferred to the detection method as thetraining data Each data has a corresponding label data intraining steps More in detail each data record in thetraining data can be described as (feature data and labeldata) In the detection steps the training data will be loadedfirst and then the machine learning algorithms includingrandom forest SVM kNN and Naive_Bayes will be used totrain the data for building machine learning models )ebest model will be selected to do parameter adjusting )ehoneypot detection model will be used to predict theresults
2 Security and Communication Networks
32 Feature Extraction Good features are the basis oftraining an excellent machine learning model In this paperfeatures were divided into three groups application-layerfeature network-layer feature and other system-layerfeature
321 Application-Layer Feature According to Veenarsquos in-vestigation [29] HTTP FTP and SMTP services are threecommon protocols that attackers most attack )e resultshows such simulated services could cheat majority ofhackers Moreover low-interaction honeypots do not im-plement a complete service which could be easily recognizedby simple interaction commands [17] For example bothHTTP-GET and HTTP-OPTIONS are provided in a realsystem but only HTTP-GET method is available in thehoneypot In addition referring to ldquoservice exercisingrdquo ofDefibaugh-Chavezrsquos research [17] and the complexity ofhoneypots services proposed in [28] the application-layerfeature is summarized in Table 1
322 Network-Layer Feature )is TCPIP fingerprints canhelp people recognize different devices including honeypotsSome common TCPIP fingerprints are TCP FIN flag TCPinitial window and ACK value [30] After analysis we finallydecided to use average_received_bytes average_received_ttlaverage_received_ack_flags average_received_push_flagsaverage_received_syn_flags average_received_fin_flagsand average_received_window_size as the network-layerfeature )e network-layer feature is extracted as shown inTable 2 Each value of network-layer feature is computed
from received packets by a special equation For examplethe average_received_ttl comes from the followingequation
average_received_ttl received_ttl1 + received_ttl2 + received_ttl3
3 (1)
In this equation the value of received_ttl1 is obtainedfrom the packet received time from port 21 of the remoteserver )e value of received_ttl2 is calculated from the
packet received time from port 25 of the remote server )evalue of received_ttl3 is obtained from the packet returnedtime from port 80 of the remote server
Public server
Feature data
Application layer
Network layer
System layer
Random forestSVMkNN
Detection modelLabel methods
Feature extraction
Shodan API
FOFA API
Manual Check Label dataHoneypot
detection model
Parameteradjusting Non-honeypot
server
Honeypot server
Machine learning algorithm
Results
Naiumlve_bayes
(i)(ii)
(iii)(iv)
Figure 1 )e framework of proposed automatic identification methods
Table 1 Application-layer feature
No Feature name1 HTTP_GET2 HTTP_OPTIONS3 HTTP_HEAD4 HTTP_TRACE5 FTP_USER6 FTP_PASS7 FTP_MODE8 FTP_RETR9 SMTP_HELO10 SMTP_MAIL11 SMTP_DATA12 SMTP_VRFY13 SMTP_ETRN
Table 2 Network-layer feature
No Feature name1 average_received_bytes2 average_received_ttl3 average_received_ack_flags4 average_received_push_flags5 average_received_syn_flags6 average_received_fin_flags7 Average_received_window_size
Security and Communication Networks 3
323 Other System-Layer Feature Table 3 shows the featuresextracted in the other system-layer feature)ey are port1 port2port3 System_Fingerprint and ICMP_ECHO_response_time
(1) Port1 Port2 and Port3 Since low-interactive honeypotsonly simulate a special service the ports open on them aredifferent from real systems )erefore we chose the portnumber as one feature in the other system-layer feature )emeanings and values of port1 port2 and port3 are listedbelow
(i) Port1 indicates whether the port 21 of target islistening If port 21 is open the value of port1 is 21Otherwise the value is minus 1
(ii) Port2 indicates whether the port 25 of target islistening If port 25 is open the value of port1 is 25Otherwise the value is minus 2
(iii) Port3 indicates whether the port 80 of target islistening If port 80 is open the value of port1 is 80Otherwise the value is minus 3
(2) System_Fingerprint It means the version of operatingsystem running a honeypot such as ldquoMicrosoft WindowsServer 2016rdquo or ldquoUbuntu 1604rdquo According to the manualcheck result it indicates that many honeypots have the sameSystem_Fingerprint so it is considered to be a separatefeature that distinguishes honeypots from real systems
(3) ICMP_ECHO_response_time Mukkamala et al [19]claim that it is common to run several virtual honeypots on asingle machine So when handling ICMP ECHO requesthoneypots are always slower than real systems Utilizing thischaracteristic ICMP_ECHO_response_time was extractedfor the other system-layer feature
33 Classification Model After extracting three groupfeatures it needs to consider choosing an effective andaccurate algorithm and constructing an intelligent modelEnsemble learning is a method of accomplishing the taskby structuring and combining several learners [31] Firstlywe need to explain an important concept named hy-potheses in the ensemble learning area Hypotheses is thecollection of functions which can be described as thecollection of h1 h2 hn As for traditional machinelearning algorithms their purpose is to find the mostappropriate classifier hk for f (f is a function describing therelationship between input X and label y) But ensemblelearning is different all h in hypotheses will be combinedby a special fashion (such as voting) to decide the label ofinput X Specifically the process of ensemble learning canbe described as two steps One is to generate some indi-vidual learners Another is to combine these learners by aspecial fashion By combining individual learners en-semble learning can achieve more superior generalizationperformance than single learning Random forest is themost representative algorithm in ensemble learning
)e definition of random forest is ldquoA random forest is aclassifier consisting of a collection of tree-structured
classifier h (X Θk) k 1 where the Θk are in-dependent identically distributed random vectors and eachtree casts a unit vote for the most popular class at input Xrdquo[32] In fact random forest is the collection of decision treesEach decision tree is a classifier which gives a predictedresult So there are n results from different n classifiersAccording to the principle of random forest algorithm thefinal result will be decided by themajorities of all results [33]Random forest has the ability to give a more accurate resultand it can also handle thousands of input variables withoutvariable deletion [34] this is the best choice for the proposedmachine learning model
34Model Evaluation Model evaluation is a significant partof model training )e purpose of model evaluation is toassess the performance of final model In this paper weintroduced recall precision F1-score (F1) ROC curve andAUC metrics to evaluate the proposed model )eir defi-nitions are as follows
recall TP
TP + FN
precision TP
TP + FP
F1 2 times TP
n + TP minus TN
AUC 12
1113944
nminus 1
i1xi+1 minus xi( 1113857 middot yi+1 + yi( 1113857
(2)
where n is the number of samplesTrue positive (TP) is the number of honeypot servers
classified as honeypot servers True negative (TN) is thenumber of normal servers classified as normal servers Falsepositive (FP) is the number of normal servers classified ashoneypot servers False negative (FN) is the number ofhoneypot servers classified as normal servers Recall rate isthe ratio of all samples correctly classified as honeypotservers to all honeypot servers Precision rate is the ratio ofall samples that are correctly classified as honeypot servers toall samples that are predicted to be honeypot servers F1-score is a comprehensive measure of recall and precision
ROC means receiver operating characteristic It isconstructed by true positive rate (TPR) and false positiverate (FPR) TPR is the ratio of all samples correctly clas-sified as honeypot servers to all honeypot servers FPR is theratio of the sample incorrectly predicted to be honeypotservers to all the normal servers In the coordinates of a
Table 3 Other system-layer feature
No Feature name1 Port12 Port23 Port32 System_Fingerprint3 ICMP_ECHO_response_time
4 Security and Communication Networks
ROC curve X-axis is FPR and Y-axis is TPR TPR and FPRare defined as follows
TPR TP
TP + FN
FPR FP
TN + FP
(3)
Each (FPR TPR) is a point on a ROC curve If a ROCcurve is above the diagonal line between (0 0) and (1 1) itmeans the classification performance of the model is ac-ceptable If a ROC curve is below this line there may besome mistakes in the label of the dataset If a ROC curve isnear to this line the classification performance of themodel is poor In other words the model could predictwildly In order to evaluate the model comprehensivelyresearchers usually use the AUC index to measure theoverall efficiency of classification model )e closer thevalue of AUC is to 1 the better the classification effect of themodel
4 Experiment
In order to make the predicted result more authentic andaccurate the experiment focuses on the real honeypotservers and normal servers based on remote scanningtechnologies In the experiment scikit-learn library is usedto accomplish machine learning model training )e cal-culate procedure ran on the computer with i7-7700HQCPU16GB of memory and GTX 1050ti GPU
41 Dataset Shodan and Fofa are well-known cyberspacesearch engines which are used to search for network devicesMany known honeypot servers have been marked by thesetwo platforms so we can firstly get some IP addresses whichhave been marked as honeypots We also randomly selectmany IP addresses from the Internet Secondly we manuallycheck all IP addresses to determine whether they are hon-eypots or not )en we scan all IP addresses via raw sockettechnology to collect the feature data Finally we collect 2413IP records as the experiment dataset )e number of hon-eypot IP addresses is 807 )e number of real systems IPaddress is 1606 )e dataset is shown in Table 4
42 Experiment Design In order to improve the accuracy ofthe proposed model this paper introduced three steps in thedetail experiment dimension unification parameter ad-justment and cross validation
421 Dimension Unifying In some cases some large di-mension features will affect the final result when training amodel For example there are three records in Table 5 )edimension of feature2 is larger than the dimension offeature1 So when training a model the large dimensionfeature may dominate the predicted result )erefore thelabel of the 4th record may be predicted as 0 However thetruth is 1 )at is why dimension unifying is needed )epaper chose popular normalization method to achieve
dimension unifying )e equation of normalization is asfollows
yi xi minus 1113954x
δ
1113954x 1k
1113944
k
i1xi
δ
1k minus 1
1113944
k
i1xi minus 1113954x( 1113857
2
11139741113972
(4)
x1 x2 xk is a raw data record Each value xi will betransformed into another value yi by the equation )en thenew data record y1 y2 yk is composed of all values yFinally the standard deviation of the new data record is 0and the variance is 1 In this way the problem that largedimension features affect the training result is solved
422 Parameters Adjusting Choosing the most suitableparameters for the model is a necessary part of the modeltraining)ere are two important parameters in random forestalgorithm which will determine the accuracy of the modelOne is ldquon_estimatorsrdquo Another is ldquomax_depthrdquo ldquon_estima-torsrdquo is the number of decision trees and ldquomax_depthrdquo is themaximum allowable depth for each tree If ldquon_estimatorsrdquo istoo big the result may be overfitting However if ldquon_esti-matorsrdquo is too small the result may be underfitting So asuitable value of ldquon_estimatorsrdquo is important for the predictedaccuracy of the finalmodel 100 120 200 300 500 800 1200was prepared for ldquon_estimatorsrdquo and 5 8 15 25 30 100 wasprepared for ldquomax_depthrdquo We tested each of these two col-lections separately to choose the most appropriate value forthese two parameters
423 Cross Validation We introduce 10-fold cross vali-dation methods to avoid overfitting issues Cross validationis a common method of model evaluation [35] )e crossvalidation structure of this paper is shown in Figure 2 )ecross validation structure is composed of three partsFirstly the data D was divided into ten similar mutuallyexclusive subsets in the data split (D was the collectionD1 cupD2 cup cupD10 Di capDj ϕ) )en the divided data
Table 4 Experimental dataset
Data type SizeHoneypot records 807Real system records 1606
Table 5 Records for example
Feature1 Feature2 Feature3 Label001 3000 15 10051 45000 58 00001 12785 15 1002 80000 22
Security and Communication Networks 5
could be transferred to training and testing In the trainingand testing steps nine subsets were used for training everytime and the last one was used for validation So after thetraining and testing there were ten classifiers Each clas-sifier would give a predicted result )e last result wascomputed from these ten results
424 Comparing Experiment )e result without compar-ison is not convincing So we designed this comparingexperiment to highlight the advantages of the method in thispaper Reviewing some researches about machine learningwe chose SVM kNN and Naive_Bayes as the counterpartsof random forest )en these four machine learning algo-rithms would be used to train four models on the samedataset respectively Finally for eachmodel we plotted theirrespective ROC curves in the same coordinate systemAccording to these ROC curves the performance of modelsis clear at a glance
43 Experiment Result )is section shows the result of theexperiment Firstly we tested the values in the two pa-rameter collections respectively )e effect of differentvalues of ldquon_estimatorsrdquo on the accuracy is shown in Fig-ure 3 )e effect of different values of ldquomax_depthrdquo on theaccuracy is shown in Figure 4 )e blue line represents thechange in the training data and the green line represents thechange in the testing data
Analyzing these two figures when ldquon_estimatorsrdquo was200 the accuracy of the blue line and the green line were thehighest So the value of ldquon_estimatorsrdquo was decided to use200 When ldquomax_depthrdquo was 30 the accuracy of blue lineand green line were the highest So we chose 30 to be thevalue of ldquomax_depthrdquo
Table 6 is the recall precision F1-score and accuracy ofthe final model Figure 5 is the ROC curves of the fouralgorithms All of them were concluded after the 10-foldcross validation In Figure 5 the blue line is the ROC curve ofthe random forest model the green line is the ROC curve ofthe SVMmodel the yellow line is the ROC curve of the kNNmodel and the purple line is the ROC curve of theNaive_Bayes model Analyzing the figure results it can beconcluded that the performance of Naive_Bayes is the worstHowever there is little difference in the AUC values betweenrandom forest SVM and kNN But the performance of
random forest is better In Table 6 the recall rate is 082 theprecision rate is 082 the F1-score is 082 and the accuracy is090 )e ROC curve of random forest and these indexes
D
D1 D2
Result1
D10
D1 D2 D10
Result2D10 D2 D1
Result10D1 D2 D9
Training data
Testing data
Final result
Data split
Training and testing
Figure 2 10-fold cross validation method for the model
084
086
088
090
092
094
096
098
100
Accu
racy
200 400 1000800600 1200n_estimators
Training accuracyTesting accuracy
Figure 3 )e effect of different values of ldquon_estimatorsrdquo on theaccuracy
20 40 8060 100max_depth
084
086
088
090
092
094
096
098
100
Accu
racy
Training accuracyTesting accuracy
Figure 4 )e effect of different values of ldquomax_depthrdquo on theaccuracy
6 Security and Communication Networks
prove the final model has a high detection rate and a goodability of generalization
5 Conclusion
As an active cybersecurity defense strategy the emergenceof honeypots further reduces the attackerrsquos activity spaceAlthough honeypot technology is constantly improvingattackers are constantly searching for weaknesses inhoneypots to identify them)ere is an old saying in Chinathe authorities are fascinated bystanders clear For hon-eypot researchers in order to promote the development ofhoneypot technology it is necessary to learn and discoverthe method of identifying honeypot remotely
Based on the research of early honeypot and honeypotdetection technology this paper proposed a honeypot de-tection technology based onmachine learning algorithms)etechnology used machine learning technology to achieveaccurate honeypot detection through extraction and collectiondifferent layer honeypot features )e experimental resultproved that honeypots are still insufficient in the degree ofsimulating real systems such as the integrity of the simulatedservice At the same time the experimental result provided areference for the improvement of honeypots and promotedthe development of honeypot technology
Data Availability
)e experiment data used to support the findings of thisstudy are available from the corresponding author uponrequest
Conflicts of Interest
)e authors declare that they have no conflicts of interest
Acknowledgments
)e authors of this work were supported by the CCF-NSFOCUS KunPeng Research Fund (2018008) the SichuanUniversity Postdoc Research Foundation (19XJ0002) andthe Frontier Science and Technology Innovation Projects ofNational Key Research and Development Program(2019QY1405) )e authors would like to thank BaimaohuiSecurity Research Institute for providing some experimentdata
References
[1] M B Salem S Hershkop and S J Stolfo ldquoA survey ofinsider attack detection researchrdquo in Insider Attack andCyber Security pp 69ndash90 Springer Berlin Germany 2008
[2] F Sabahi and A Movaghar ldquoIntrusion detection a surveyrdquo inProceedings of the 2008 gtird International Conference onSystems and Networks Communications pp 23ndash26 IEEESliema Malta October 2008
[3] I Mokube andM Adams ldquoHoneypots concepts approachesand challengesrdquo in Proceedings of the 45th Annual SoutheastRegional Conference pp 321ndash326 ACMWinston-Salem NCUSA March 2007
[4] L Spitzner ldquo)e honeynet project trapping the hackersrdquoIEEE Security amp Privacy vol 1 no 2 pp 15ndash23 2003
[5] O )onnard and M Dacier ldquoA framework for attack pat-ternsrsquo discovery in honeynet datardquoDigital Investigation vol 5pp 128ndash139 2008
[6] W Fan Z Du M Smith-Creasey and D FernandezldquoHoneyDOC an efficient honeypot architecture enabling all-round designrdquo IEEE Journal on Selected Areas in Commu-nications vol 37 no 3 pp 683ndash697 2019
[7] K Sadasivam B Samudrala and T A Yang ldquoDesign ofnetwork security projects using honeypotsrdquo Journal ofComputing Sciences in Colleges vol 20 pp 282ndash293 2005
[8] M Mansoori O Zakaria and A Gani ldquoImproving exposureof intrusion deception system through implementation ofhybrid honeypotrdquo gte International Arab Journal of In-formation Technology vol 9 no 5 pp 436ndash444 2012
[9] G Portokalidis and H Bos ldquoSweetBait zero-hour wormdetection and containment using low- and high-interactionhoneypotsrdquo Computer Networks vol 51 no 5 pp 1256ndash12742007
[10] W Fan Z Du D Fernandez and V A Villagra ldquoEnabling ananatomic view to investigate honeypot systems a surveyrdquoIEEE Systems Journal vol 12 no 4 pp 3906ndash3919 2017
[11] M L Bringer C A Chelmecki and H Fujinoki ldquoA surveyrecent advances and future trends in honeypot researchrdquoInternational Journal of Computer Network and InformationSecurity vol 4 no 10 pp 63ndash75 2012
[12] P Wang L Wu R Cunningham and C C Zou ldquoHoneypotdetection in advanced botnet attacksrdquo International Journal ofInformation and Computer Security vol 4 no 1 pp 30ndash512010
[13] K Papazis and N Chilamkurti ldquoDetecting indicators ofdeception in emulated monitoring systemsrdquo Service OrientedComputing and Applications vol 13 no 1 pp 17ndash29 2019
Table 6 Recall precision F1-score and accuracy of the proposedmodel
Recall Precision F1-score Accuracy Data size082 083 082 090 2413
00 02 04 06 08 10
00
02
04
06
08
10
False positive rate
ROC curves
True
pos
itive
rate
Random forest (AUC = 093)SVM (AUC = 090)
kNN (AUC = 090)Naive_bayes(AUC = 060)
Figure 5 ROC curves about four algorithms
Security and Communication Networks 7
[14] W Fan and D Fernandez ldquoA novel SDN based stealthy TCPconnection handover mechanism for hybrid honeypot sys-temsrdquo in Proceedings of the 2017 IEEE Conference on NetworkSoftwarization (NetSoft) pp 1ndash9 IEEE Bologna Italy July2017
[15] H Artail H Safa M Sraj I Kuwatly and Z Al-Masri ldquoAhybrid honeypot framework for improving intrusion detectionsystems in protecting organizational networksrdquo Computers ampSecurity vol 25 no 4 pp 274ndash288 2006
[16] T Holz and F Raynal ldquoDetecting honeypots and othersuspicious environmentsrdquo in Proceedings from the SixthAnnual IEEE SMC Information Assurance Workshoppp 29ndash36 IEEE West Point NY USA June 2005
[17] P Defibaugh-Chavez R Veeraghattam M KannappaS Mukkamala and A Sung ldquoNetwork based detection ofvirtual environments and low interaction honeypotsrdquo inProceedings of the 2006 IEEE SMC Information AssuranceWorkshop pp 283ndash289 IEEE West Point NY USA June2006
[18] X Fu W Yu D Cheng X Tan K Streff and S Graham ldquoOnrecognizing virtual honeypots and countermeasuresrdquo inProceedings of the 2006 2nd IEEE International Symposium onDependable Autonomic and Secure Computing pp 211ndash218IEEE Indianapolis IN USA September 2006
[19] S Mukkamala K Yendrapalli R Basnet M K Shankarapaniand A H Sung ldquoDetection of virtual environments and lowinteraction honeypotsrdquo in Proceedings of the 2007 IEEE SMCInformation Assurance and Security Workshop pp 92ndash98IEEE West Point NY USA June 2007
[20] J Papalitsas S Rauti and V Leppanen ldquoA comparison ofrecord and play honeypot designsrdquo in Proceedings of the 18thInternational Conference on Computer Systems and Technol-ogies pp 133ndash140 ACM Ruse Bulgaria June 2017
[21] R M Campbell K Padayachee and T Masombuka ldquoAsurvey of honeypot research trends and opportunitiesrdquo inProceedings of the 2015 10th International Conference forInternet Technology and Secured Transactions (ICITST)pp 208ndash212 IEEE London UK December 2015
[22] F Y-S Lin Y-S Wang and M-Y Huang ldquoEffective pro-active and reactive defense strategies against malicious attacksin a virtualized honeynetrdquo Journal of Applied Mathematicsvol 2013 Article ID 518213 11 pages 2013
[23] O Surnin F Hussain R Hussain et al ldquoProbabilistic esti-mation of honeypot detection in Internet of things envi-ronmentrdquo in Proceedings of the 2019 International Conferenceon Computing Networking and Communications (ICNC)pp 191ndash196 IEEE Honolulu HI USA February 2019
[24] Z Wang X Feng Y Niu C Zhang and J Su ldquoTSMWD ahigh-speed malicious web page detection system based ontwo-step classifiersrdquo in Proceedings of the 2017 InternationalConference on Networking and Network Applications (NaNA)pp 170ndash175 IEEE Kathmandu City Nepal October 2017
[25] D Wenda and D Ning ldquoA honeypot detection method basedon characteristic analysis and environment detectionrdquo in 2011International Conference in Electrics Communication andAutomatic Control Proceedings pp 201ndash206 Springer BerlinGermany 2012
[26] N Cristianini and J Shawe-TaylorAn Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[27] N Provos ldquoA virtual honeypot frameworkrdquo in Proceedings ofthe USENIX Security Symposium vol 173 pp 1ndash14 SanDiego CA USA August 2004
[28] N Krawetz ldquoAnti-honeypot technologyrdquo IEEE Security ampPrivacy Magazine vol 2 no 1 pp 76ndash79 2004
[29] K Veena and K Meena ldquoImplementing file and real timebased intrusion detections in secure direct method usingadvanced honeypotrdquo Cluster Computing pp 1ndash8 2018
[30] R Tyagi T Paul B S Manoj and B )anudas ldquoPacketinspection for unauthorized OS detection in enterprisesrdquoIEEE Security amp Privacy vol 13 no 4 pp 60ndash65 2015
[31] Y Liu and X Yao ldquoEnsemble learning via negative corre-lationrdquo Neural Networks vol 12 no 10 pp 1399ndash1404 1999
[32] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001
[33] A Liaw and M Wiener ldquoClassification and regression byrandomForestrdquo R News vol 2 no 3 pp 18ndash22 2002
[34] R Genuer J-M Poggi and C Tuleau-Malot ldquoVariable se-lection using random forestsrdquo Pattern Recognition Lettersvol 31 no 14 pp 2225ndash2236 2010
[35] T Fushiki ldquoEstimation of prediction error by using K-foldcross-validationrdquo Statistics and Computing vol 21 no 2pp 137ndash146 2011
8 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
methods are exposed gradually)emain problem is that thesimulation of high-interaction honeypots is too similar toreal systems making it difficult to detect with a singlefeature )erefore other researchers start to extract otherfeatures and introduce the latest algorithms [17ndash19] Butthere are still many shortcomings at present this paperexplores honeypot detection technology to improve theeffectiveness of deception technologies which is helpful topromote the current development of honeypot technology)e contributions of this paper are summarized as follows
(i) Based on the analysis of the existing honeypot de-tection technology this paper proposed three groupfeatures which can distinguish ordinary servers andhoneypot servers remotely and accurately )esefeatures could be summarized as application-layernetwork-layer and other system-layer
(ii) )e paper presented an automatic detection modelusing multiple features and machine learning al-gorithms to identify honeypots In order to gethigher accuracy four machine learning algorithmsare introduced to build separate models
(iii) In the interest of highlighting the effectiveness of theproposed model the paper compared receiver op-erating characteristic (ROC) curves of four machinelearning algorithms based on 10-fold cross valida-tion methods According to the experiment resultthe model achieved a higher detection rate with theAUC value of 093
)e construction of this paper is as follows )e firstsection introduces the issue Section 2 is the related workwhich presents current research progress in honeypotdetection Section 3 is the proposed framework and itdescribes the detail of our system Section 4 is the exper-iment and evaluation result Section 5 is the conclusion andfuture work
2 Related Works
With the development of Internet technology honeypotsplay a significant role in network security However theattackers could easily distinguish whether the server hasdeployed honeypot services or not In response to solve sucha threat researchers need to make their honeypot servicemore realistic and improve the inner mechanism and outerinterface of the honeypot framework Recently many re-searchers start to focus on how to automatically detect ahoneypot server [20ndash24]
For example Fu et al [18] proposed a method based onexamining the link latency of honeyd honeypots Authorssaid that because the stimulative accuracy of the link latencyof honeypots such as honeyd is always 1ms or 10ms the linklatency in some virtual network is around a multiple of 1msor 10ms According to this characteristic they designed amethod based on NeymanndashPearson decision theory andachieved a high detecting rate Wenda and Ning [25] sug-gested that people can determine honeypots by analyzing thenetwork monitoring characteristic or detecting the virtual
environment Because many high-interactive honeypots arealways deployed with firewalls and IDS such features couldbe used to detect quickly Defibaugh-Chavez et al [17]proposed another method which focused on detectingnetwork-level activities and services of honeypots )enMukkamala et al [19] proposed another method based onthe SVM algorithm [26] )ey added a different featuregroup TCPIP fingerprint and designed an experiment toprove their method effective )e result proved their ideawas correct Except for these methods Holz and Raynal [16]proposed a method based on the feedback information ofhoneypots )ey presented a honeypot detection frameworkby extracting data of User-mode Linux and VMWare [27]For UML people could use the information of commandldquodmesgrdquo in fingerprinting or view the file etcfstab ForVMWare they suggested capturing the hardware and MACaddress According to IEEE Standards a MAC address suchas ldquo00-0E-23-xx-xx-xxrdquo is assigned for VMWare BesidesSend-Safe Honeypot Hunter is the first commercial anti-honeypot technology [28]
Although many detection methods have been proposedthere is still a big problem with the rapid development ofcloud and virtual technologies many of them may not be aseffective as they used to be So we studied previous re-searchersrsquo methods on honeypot detection and then pro-posed a novel honeypot detection framework based onmachine learning technologies
3 Framework
31 Architecture In order to detect honeypots more ef-fectively the paper presents an automatic identificationframework which is shown in Figure 1 )ere are threeimportant parts in the framework feature extraction labelmethods and detection model Feature extraction is pri-marily responsible for collecting features from the targetserver In this part all features are split into three groupfeatures the application layer the network layer and thesystem layer )ese extracted feature values will be cal-culated with string or integer meaning )e second im-portant part is the label methods As the name suggests thepurpose of this part is to collect label data which containhoneypots or normal host IP addresses Cyberspace searchengine such as Shodan FOFA and other public platformsare introduced to collect honeypots servers Manual checkmethods are used after crawling real data from theseplatforms via the application programming interface (API)After obtaining the feature data and label data both ofthem will be transferred to the detection method as thetraining data Each data has a corresponding label data intraining steps More in detail each data record in thetraining data can be described as (feature data and labeldata) In the detection steps the training data will be loadedfirst and then the machine learning algorithms includingrandom forest SVM kNN and Naive_Bayes will be used totrain the data for building machine learning models )ebest model will be selected to do parameter adjusting )ehoneypot detection model will be used to predict theresults
2 Security and Communication Networks
32 Feature Extraction Good features are the basis oftraining an excellent machine learning model In this paperfeatures were divided into three groups application-layerfeature network-layer feature and other system-layerfeature
321 Application-Layer Feature According to Veenarsquos in-vestigation [29] HTTP FTP and SMTP services are threecommon protocols that attackers most attack )e resultshows such simulated services could cheat majority ofhackers Moreover low-interaction honeypots do not im-plement a complete service which could be easily recognizedby simple interaction commands [17] For example bothHTTP-GET and HTTP-OPTIONS are provided in a realsystem but only HTTP-GET method is available in thehoneypot In addition referring to ldquoservice exercisingrdquo ofDefibaugh-Chavezrsquos research [17] and the complexity ofhoneypots services proposed in [28] the application-layerfeature is summarized in Table 1
322 Network-Layer Feature )is TCPIP fingerprints canhelp people recognize different devices including honeypotsSome common TCPIP fingerprints are TCP FIN flag TCPinitial window and ACK value [30] After analysis we finallydecided to use average_received_bytes average_received_ttlaverage_received_ack_flags average_received_push_flagsaverage_received_syn_flags average_received_fin_flagsand average_received_window_size as the network-layerfeature )e network-layer feature is extracted as shown inTable 2 Each value of network-layer feature is computed
from received packets by a special equation For examplethe average_received_ttl comes from the followingequation
average_received_ttl received_ttl1 + received_ttl2 + received_ttl3
3 (1)
In this equation the value of received_ttl1 is obtainedfrom the packet received time from port 21 of the remoteserver )e value of received_ttl2 is calculated from the
packet received time from port 25 of the remote server )evalue of received_ttl3 is obtained from the packet returnedtime from port 80 of the remote server
Public server
Feature data
Application layer
Network layer
System layer
Random forestSVMkNN
Detection modelLabel methods
Feature extraction
Shodan API
FOFA API
Manual Check Label dataHoneypot
detection model
Parameteradjusting Non-honeypot
server
Honeypot server
Machine learning algorithm
Results
Naiumlve_bayes
(i)(ii)
(iii)(iv)
Figure 1 )e framework of proposed automatic identification methods
Table 1 Application-layer feature
No Feature name1 HTTP_GET2 HTTP_OPTIONS3 HTTP_HEAD4 HTTP_TRACE5 FTP_USER6 FTP_PASS7 FTP_MODE8 FTP_RETR9 SMTP_HELO10 SMTP_MAIL11 SMTP_DATA12 SMTP_VRFY13 SMTP_ETRN
Table 2 Network-layer feature
No Feature name1 average_received_bytes2 average_received_ttl3 average_received_ack_flags4 average_received_push_flags5 average_received_syn_flags6 average_received_fin_flags7 Average_received_window_size
Security and Communication Networks 3
323 Other System-Layer Feature Table 3 shows the featuresextracted in the other system-layer feature)ey are port1 port2port3 System_Fingerprint and ICMP_ECHO_response_time
(1) Port1 Port2 and Port3 Since low-interactive honeypotsonly simulate a special service the ports open on them aredifferent from real systems )erefore we chose the portnumber as one feature in the other system-layer feature )emeanings and values of port1 port2 and port3 are listedbelow
(i) Port1 indicates whether the port 21 of target islistening If port 21 is open the value of port1 is 21Otherwise the value is minus 1
(ii) Port2 indicates whether the port 25 of target islistening If port 25 is open the value of port1 is 25Otherwise the value is minus 2
(iii) Port3 indicates whether the port 80 of target islistening If port 80 is open the value of port1 is 80Otherwise the value is minus 3
(2) System_Fingerprint It means the version of operatingsystem running a honeypot such as ldquoMicrosoft WindowsServer 2016rdquo or ldquoUbuntu 1604rdquo According to the manualcheck result it indicates that many honeypots have the sameSystem_Fingerprint so it is considered to be a separatefeature that distinguishes honeypots from real systems
(3) ICMP_ECHO_response_time Mukkamala et al [19]claim that it is common to run several virtual honeypots on asingle machine So when handling ICMP ECHO requesthoneypots are always slower than real systems Utilizing thischaracteristic ICMP_ECHO_response_time was extractedfor the other system-layer feature
33 Classification Model After extracting three groupfeatures it needs to consider choosing an effective andaccurate algorithm and constructing an intelligent modelEnsemble learning is a method of accomplishing the taskby structuring and combining several learners [31] Firstlywe need to explain an important concept named hy-potheses in the ensemble learning area Hypotheses is thecollection of functions which can be described as thecollection of h1 h2 hn As for traditional machinelearning algorithms their purpose is to find the mostappropriate classifier hk for f (f is a function describing therelationship between input X and label y) But ensemblelearning is different all h in hypotheses will be combinedby a special fashion (such as voting) to decide the label ofinput X Specifically the process of ensemble learning canbe described as two steps One is to generate some indi-vidual learners Another is to combine these learners by aspecial fashion By combining individual learners en-semble learning can achieve more superior generalizationperformance than single learning Random forest is themost representative algorithm in ensemble learning
)e definition of random forest is ldquoA random forest is aclassifier consisting of a collection of tree-structured
classifier h (X Θk) k 1 where the Θk are in-dependent identically distributed random vectors and eachtree casts a unit vote for the most popular class at input Xrdquo[32] In fact random forest is the collection of decision treesEach decision tree is a classifier which gives a predictedresult So there are n results from different n classifiersAccording to the principle of random forest algorithm thefinal result will be decided by themajorities of all results [33]Random forest has the ability to give a more accurate resultand it can also handle thousands of input variables withoutvariable deletion [34] this is the best choice for the proposedmachine learning model
34Model Evaluation Model evaluation is a significant partof model training )e purpose of model evaluation is toassess the performance of final model In this paper weintroduced recall precision F1-score (F1) ROC curve andAUC metrics to evaluate the proposed model )eir defi-nitions are as follows
recall TP
TP + FN
precision TP
TP + FP
F1 2 times TP
n + TP minus TN
AUC 12
1113944
nminus 1
i1xi+1 minus xi( 1113857 middot yi+1 + yi( 1113857
(2)
where n is the number of samplesTrue positive (TP) is the number of honeypot servers
classified as honeypot servers True negative (TN) is thenumber of normal servers classified as normal servers Falsepositive (FP) is the number of normal servers classified ashoneypot servers False negative (FN) is the number ofhoneypot servers classified as normal servers Recall rate isthe ratio of all samples correctly classified as honeypotservers to all honeypot servers Precision rate is the ratio ofall samples that are correctly classified as honeypot servers toall samples that are predicted to be honeypot servers F1-score is a comprehensive measure of recall and precision
ROC means receiver operating characteristic It isconstructed by true positive rate (TPR) and false positiverate (FPR) TPR is the ratio of all samples correctly clas-sified as honeypot servers to all honeypot servers FPR is theratio of the sample incorrectly predicted to be honeypotservers to all the normal servers In the coordinates of a
Table 3 Other system-layer feature
No Feature name1 Port12 Port23 Port32 System_Fingerprint3 ICMP_ECHO_response_time
4 Security and Communication Networks
ROC curve X-axis is FPR and Y-axis is TPR TPR and FPRare defined as follows
TPR TP
TP + FN
FPR FP
TN + FP
(3)
Each (FPR TPR) is a point on a ROC curve If a ROCcurve is above the diagonal line between (0 0) and (1 1) itmeans the classification performance of the model is ac-ceptable If a ROC curve is below this line there may besome mistakes in the label of the dataset If a ROC curve isnear to this line the classification performance of themodel is poor In other words the model could predictwildly In order to evaluate the model comprehensivelyresearchers usually use the AUC index to measure theoverall efficiency of classification model )e closer thevalue of AUC is to 1 the better the classification effect of themodel
4 Experiment
In order to make the predicted result more authentic andaccurate the experiment focuses on the real honeypotservers and normal servers based on remote scanningtechnologies In the experiment scikit-learn library is usedto accomplish machine learning model training )e cal-culate procedure ran on the computer with i7-7700HQCPU16GB of memory and GTX 1050ti GPU
41 Dataset Shodan and Fofa are well-known cyberspacesearch engines which are used to search for network devicesMany known honeypot servers have been marked by thesetwo platforms so we can firstly get some IP addresses whichhave been marked as honeypots We also randomly selectmany IP addresses from the Internet Secondly we manuallycheck all IP addresses to determine whether they are hon-eypots or not )en we scan all IP addresses via raw sockettechnology to collect the feature data Finally we collect 2413IP records as the experiment dataset )e number of hon-eypot IP addresses is 807 )e number of real systems IPaddress is 1606 )e dataset is shown in Table 4
42 Experiment Design In order to improve the accuracy ofthe proposed model this paper introduced three steps in thedetail experiment dimension unification parameter ad-justment and cross validation
421 Dimension Unifying In some cases some large di-mension features will affect the final result when training amodel For example there are three records in Table 5 )edimension of feature2 is larger than the dimension offeature1 So when training a model the large dimensionfeature may dominate the predicted result )erefore thelabel of the 4th record may be predicted as 0 However thetruth is 1 )at is why dimension unifying is needed )epaper chose popular normalization method to achieve
dimension unifying )e equation of normalization is asfollows
yi xi minus 1113954x
δ
1113954x 1k
1113944
k
i1xi
δ
1k minus 1
1113944
k
i1xi minus 1113954x( 1113857
2
11139741113972
(4)
x1 x2 xk is a raw data record Each value xi will betransformed into another value yi by the equation )en thenew data record y1 y2 yk is composed of all values yFinally the standard deviation of the new data record is 0and the variance is 1 In this way the problem that largedimension features affect the training result is solved
422 Parameters Adjusting Choosing the most suitableparameters for the model is a necessary part of the modeltraining)ere are two important parameters in random forestalgorithm which will determine the accuracy of the modelOne is ldquon_estimatorsrdquo Another is ldquomax_depthrdquo ldquon_estima-torsrdquo is the number of decision trees and ldquomax_depthrdquo is themaximum allowable depth for each tree If ldquon_estimatorsrdquo istoo big the result may be overfitting However if ldquon_esti-matorsrdquo is too small the result may be underfitting So asuitable value of ldquon_estimatorsrdquo is important for the predictedaccuracy of the finalmodel 100 120 200 300 500 800 1200was prepared for ldquon_estimatorsrdquo and 5 8 15 25 30 100 wasprepared for ldquomax_depthrdquo We tested each of these two col-lections separately to choose the most appropriate value forthese two parameters
423 Cross Validation We introduce 10-fold cross vali-dation methods to avoid overfitting issues Cross validationis a common method of model evaluation [35] )e crossvalidation structure of this paper is shown in Figure 2 )ecross validation structure is composed of three partsFirstly the data D was divided into ten similar mutuallyexclusive subsets in the data split (D was the collectionD1 cupD2 cup cupD10 Di capDj ϕ) )en the divided data
Table 4 Experimental dataset
Data type SizeHoneypot records 807Real system records 1606
Table 5 Records for example
Feature1 Feature2 Feature3 Label001 3000 15 10051 45000 58 00001 12785 15 1002 80000 22
Security and Communication Networks 5
could be transferred to training and testing In the trainingand testing steps nine subsets were used for training everytime and the last one was used for validation So after thetraining and testing there were ten classifiers Each clas-sifier would give a predicted result )e last result wascomputed from these ten results
424 Comparing Experiment )e result without compar-ison is not convincing So we designed this comparingexperiment to highlight the advantages of the method in thispaper Reviewing some researches about machine learningwe chose SVM kNN and Naive_Bayes as the counterpartsof random forest )en these four machine learning algo-rithms would be used to train four models on the samedataset respectively Finally for eachmodel we plotted theirrespective ROC curves in the same coordinate systemAccording to these ROC curves the performance of modelsis clear at a glance
43 Experiment Result )is section shows the result of theexperiment Firstly we tested the values in the two pa-rameter collections respectively )e effect of differentvalues of ldquon_estimatorsrdquo on the accuracy is shown in Fig-ure 3 )e effect of different values of ldquomax_depthrdquo on theaccuracy is shown in Figure 4 )e blue line represents thechange in the training data and the green line represents thechange in the testing data
Analyzing these two figures when ldquon_estimatorsrdquo was200 the accuracy of the blue line and the green line were thehighest So the value of ldquon_estimatorsrdquo was decided to use200 When ldquomax_depthrdquo was 30 the accuracy of blue lineand green line were the highest So we chose 30 to be thevalue of ldquomax_depthrdquo
Table 6 is the recall precision F1-score and accuracy ofthe final model Figure 5 is the ROC curves of the fouralgorithms All of them were concluded after the 10-foldcross validation In Figure 5 the blue line is the ROC curve ofthe random forest model the green line is the ROC curve ofthe SVMmodel the yellow line is the ROC curve of the kNNmodel and the purple line is the ROC curve of theNaive_Bayes model Analyzing the figure results it can beconcluded that the performance of Naive_Bayes is the worstHowever there is little difference in the AUC values betweenrandom forest SVM and kNN But the performance of
random forest is better In Table 6 the recall rate is 082 theprecision rate is 082 the F1-score is 082 and the accuracy is090 )e ROC curve of random forest and these indexes
D
D1 D2
Result1
D10
D1 D2 D10
Result2D10 D2 D1
Result10D1 D2 D9
Training data
Testing data
Final result
Data split
Training and testing
Figure 2 10-fold cross validation method for the model
084
086
088
090
092
094
096
098
100
Accu
racy
200 400 1000800600 1200n_estimators
Training accuracyTesting accuracy
Figure 3 )e effect of different values of ldquon_estimatorsrdquo on theaccuracy
20 40 8060 100max_depth
084
086
088
090
092
094
096
098
100
Accu
racy
Training accuracyTesting accuracy
Figure 4 )e effect of different values of ldquomax_depthrdquo on theaccuracy
6 Security and Communication Networks
prove the final model has a high detection rate and a goodability of generalization
5 Conclusion
As an active cybersecurity defense strategy the emergenceof honeypots further reduces the attackerrsquos activity spaceAlthough honeypot technology is constantly improvingattackers are constantly searching for weaknesses inhoneypots to identify them)ere is an old saying in Chinathe authorities are fascinated bystanders clear For hon-eypot researchers in order to promote the development ofhoneypot technology it is necessary to learn and discoverthe method of identifying honeypot remotely
Based on the research of early honeypot and honeypotdetection technology this paper proposed a honeypot de-tection technology based onmachine learning algorithms)etechnology used machine learning technology to achieveaccurate honeypot detection through extraction and collectiondifferent layer honeypot features )e experimental resultproved that honeypots are still insufficient in the degree ofsimulating real systems such as the integrity of the simulatedservice At the same time the experimental result provided areference for the improvement of honeypots and promotedthe development of honeypot technology
Data Availability
)e experiment data used to support the findings of thisstudy are available from the corresponding author uponrequest
Conflicts of Interest
)e authors declare that they have no conflicts of interest
Acknowledgments
)e authors of this work were supported by the CCF-NSFOCUS KunPeng Research Fund (2018008) the SichuanUniversity Postdoc Research Foundation (19XJ0002) andthe Frontier Science and Technology Innovation Projects ofNational Key Research and Development Program(2019QY1405) )e authors would like to thank BaimaohuiSecurity Research Institute for providing some experimentdata
References
[1] M B Salem S Hershkop and S J Stolfo ldquoA survey ofinsider attack detection researchrdquo in Insider Attack andCyber Security pp 69ndash90 Springer Berlin Germany 2008
[2] F Sabahi and A Movaghar ldquoIntrusion detection a surveyrdquo inProceedings of the 2008 gtird International Conference onSystems and Networks Communications pp 23ndash26 IEEESliema Malta October 2008
[3] I Mokube andM Adams ldquoHoneypots concepts approachesand challengesrdquo in Proceedings of the 45th Annual SoutheastRegional Conference pp 321ndash326 ACMWinston-Salem NCUSA March 2007
[4] L Spitzner ldquo)e honeynet project trapping the hackersrdquoIEEE Security amp Privacy vol 1 no 2 pp 15ndash23 2003
[5] O )onnard and M Dacier ldquoA framework for attack pat-ternsrsquo discovery in honeynet datardquoDigital Investigation vol 5pp 128ndash139 2008
[6] W Fan Z Du M Smith-Creasey and D FernandezldquoHoneyDOC an efficient honeypot architecture enabling all-round designrdquo IEEE Journal on Selected Areas in Commu-nications vol 37 no 3 pp 683ndash697 2019
[7] K Sadasivam B Samudrala and T A Yang ldquoDesign ofnetwork security projects using honeypotsrdquo Journal ofComputing Sciences in Colleges vol 20 pp 282ndash293 2005
[8] M Mansoori O Zakaria and A Gani ldquoImproving exposureof intrusion deception system through implementation ofhybrid honeypotrdquo gte International Arab Journal of In-formation Technology vol 9 no 5 pp 436ndash444 2012
[9] G Portokalidis and H Bos ldquoSweetBait zero-hour wormdetection and containment using low- and high-interactionhoneypotsrdquo Computer Networks vol 51 no 5 pp 1256ndash12742007
[10] W Fan Z Du D Fernandez and V A Villagra ldquoEnabling ananatomic view to investigate honeypot systems a surveyrdquoIEEE Systems Journal vol 12 no 4 pp 3906ndash3919 2017
[11] M L Bringer C A Chelmecki and H Fujinoki ldquoA surveyrecent advances and future trends in honeypot researchrdquoInternational Journal of Computer Network and InformationSecurity vol 4 no 10 pp 63ndash75 2012
[12] P Wang L Wu R Cunningham and C C Zou ldquoHoneypotdetection in advanced botnet attacksrdquo International Journal ofInformation and Computer Security vol 4 no 1 pp 30ndash512010
[13] K Papazis and N Chilamkurti ldquoDetecting indicators ofdeception in emulated monitoring systemsrdquo Service OrientedComputing and Applications vol 13 no 1 pp 17ndash29 2019
Table 6 Recall precision F1-score and accuracy of the proposedmodel
Recall Precision F1-score Accuracy Data size082 083 082 090 2413
00 02 04 06 08 10
00
02
04
06
08
10
False positive rate
ROC curves
True
pos
itive
rate
Random forest (AUC = 093)SVM (AUC = 090)
kNN (AUC = 090)Naive_bayes(AUC = 060)
Figure 5 ROC curves about four algorithms
Security and Communication Networks 7
[14] W Fan and D Fernandez ldquoA novel SDN based stealthy TCPconnection handover mechanism for hybrid honeypot sys-temsrdquo in Proceedings of the 2017 IEEE Conference on NetworkSoftwarization (NetSoft) pp 1ndash9 IEEE Bologna Italy July2017
[15] H Artail H Safa M Sraj I Kuwatly and Z Al-Masri ldquoAhybrid honeypot framework for improving intrusion detectionsystems in protecting organizational networksrdquo Computers ampSecurity vol 25 no 4 pp 274ndash288 2006
[16] T Holz and F Raynal ldquoDetecting honeypots and othersuspicious environmentsrdquo in Proceedings from the SixthAnnual IEEE SMC Information Assurance Workshoppp 29ndash36 IEEE West Point NY USA June 2005
[17] P Defibaugh-Chavez R Veeraghattam M KannappaS Mukkamala and A Sung ldquoNetwork based detection ofvirtual environments and low interaction honeypotsrdquo inProceedings of the 2006 IEEE SMC Information AssuranceWorkshop pp 283ndash289 IEEE West Point NY USA June2006
[18] X Fu W Yu D Cheng X Tan K Streff and S Graham ldquoOnrecognizing virtual honeypots and countermeasuresrdquo inProceedings of the 2006 2nd IEEE International Symposium onDependable Autonomic and Secure Computing pp 211ndash218IEEE Indianapolis IN USA September 2006
[19] S Mukkamala K Yendrapalli R Basnet M K Shankarapaniand A H Sung ldquoDetection of virtual environments and lowinteraction honeypotsrdquo in Proceedings of the 2007 IEEE SMCInformation Assurance and Security Workshop pp 92ndash98IEEE West Point NY USA June 2007
[20] J Papalitsas S Rauti and V Leppanen ldquoA comparison ofrecord and play honeypot designsrdquo in Proceedings of the 18thInternational Conference on Computer Systems and Technol-ogies pp 133ndash140 ACM Ruse Bulgaria June 2017
[21] R M Campbell K Padayachee and T Masombuka ldquoAsurvey of honeypot research trends and opportunitiesrdquo inProceedings of the 2015 10th International Conference forInternet Technology and Secured Transactions (ICITST)pp 208ndash212 IEEE London UK December 2015
[22] F Y-S Lin Y-S Wang and M-Y Huang ldquoEffective pro-active and reactive defense strategies against malicious attacksin a virtualized honeynetrdquo Journal of Applied Mathematicsvol 2013 Article ID 518213 11 pages 2013
[23] O Surnin F Hussain R Hussain et al ldquoProbabilistic esti-mation of honeypot detection in Internet of things envi-ronmentrdquo in Proceedings of the 2019 International Conferenceon Computing Networking and Communications (ICNC)pp 191ndash196 IEEE Honolulu HI USA February 2019
[24] Z Wang X Feng Y Niu C Zhang and J Su ldquoTSMWD ahigh-speed malicious web page detection system based ontwo-step classifiersrdquo in Proceedings of the 2017 InternationalConference on Networking and Network Applications (NaNA)pp 170ndash175 IEEE Kathmandu City Nepal October 2017
[25] D Wenda and D Ning ldquoA honeypot detection method basedon characteristic analysis and environment detectionrdquo in 2011International Conference in Electrics Communication andAutomatic Control Proceedings pp 201ndash206 Springer BerlinGermany 2012
[26] N Cristianini and J Shawe-TaylorAn Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[27] N Provos ldquoA virtual honeypot frameworkrdquo in Proceedings ofthe USENIX Security Symposium vol 173 pp 1ndash14 SanDiego CA USA August 2004
[28] N Krawetz ldquoAnti-honeypot technologyrdquo IEEE Security ampPrivacy Magazine vol 2 no 1 pp 76ndash79 2004
[29] K Veena and K Meena ldquoImplementing file and real timebased intrusion detections in secure direct method usingadvanced honeypotrdquo Cluster Computing pp 1ndash8 2018
[30] R Tyagi T Paul B S Manoj and B )anudas ldquoPacketinspection for unauthorized OS detection in enterprisesrdquoIEEE Security amp Privacy vol 13 no 4 pp 60ndash65 2015
[31] Y Liu and X Yao ldquoEnsemble learning via negative corre-lationrdquo Neural Networks vol 12 no 10 pp 1399ndash1404 1999
[32] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001
[33] A Liaw and M Wiener ldquoClassification and regression byrandomForestrdquo R News vol 2 no 3 pp 18ndash22 2002
[34] R Genuer J-M Poggi and C Tuleau-Malot ldquoVariable se-lection using random forestsrdquo Pattern Recognition Lettersvol 31 no 14 pp 2225ndash2236 2010
[35] T Fushiki ldquoEstimation of prediction error by using K-foldcross-validationrdquo Statistics and Computing vol 21 no 2pp 137ndash146 2011
8 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
32 Feature Extraction Good features are the basis oftraining an excellent machine learning model In this paperfeatures were divided into three groups application-layerfeature network-layer feature and other system-layerfeature
321 Application-Layer Feature According to Veenarsquos in-vestigation [29] HTTP FTP and SMTP services are threecommon protocols that attackers most attack )e resultshows such simulated services could cheat majority ofhackers Moreover low-interaction honeypots do not im-plement a complete service which could be easily recognizedby simple interaction commands [17] For example bothHTTP-GET and HTTP-OPTIONS are provided in a realsystem but only HTTP-GET method is available in thehoneypot In addition referring to ldquoservice exercisingrdquo ofDefibaugh-Chavezrsquos research [17] and the complexity ofhoneypots services proposed in [28] the application-layerfeature is summarized in Table 1
322 Network-Layer Feature )is TCPIP fingerprints canhelp people recognize different devices including honeypotsSome common TCPIP fingerprints are TCP FIN flag TCPinitial window and ACK value [30] After analysis we finallydecided to use average_received_bytes average_received_ttlaverage_received_ack_flags average_received_push_flagsaverage_received_syn_flags average_received_fin_flagsand average_received_window_size as the network-layerfeature )e network-layer feature is extracted as shown inTable 2 Each value of network-layer feature is computed
from received packets by a special equation For examplethe average_received_ttl comes from the followingequation
average_received_ttl received_ttl1 + received_ttl2 + received_ttl3
3 (1)
In this equation the value of received_ttl1 is obtainedfrom the packet received time from port 21 of the remoteserver )e value of received_ttl2 is calculated from the
packet received time from port 25 of the remote server )evalue of received_ttl3 is obtained from the packet returnedtime from port 80 of the remote server
Public server
Feature data
Application layer
Network layer
System layer
Random forestSVMkNN
Detection modelLabel methods
Feature extraction
Shodan API
FOFA API
Manual Check Label dataHoneypot
detection model
Parameteradjusting Non-honeypot
server
Honeypot server
Machine learning algorithm
Results
Naiumlve_bayes
(i)(ii)
(iii)(iv)
Figure 1 )e framework of proposed automatic identification methods
Table 1 Application-layer feature
No Feature name1 HTTP_GET2 HTTP_OPTIONS3 HTTP_HEAD4 HTTP_TRACE5 FTP_USER6 FTP_PASS7 FTP_MODE8 FTP_RETR9 SMTP_HELO10 SMTP_MAIL11 SMTP_DATA12 SMTP_VRFY13 SMTP_ETRN
Table 2 Network-layer feature
No Feature name1 average_received_bytes2 average_received_ttl3 average_received_ack_flags4 average_received_push_flags5 average_received_syn_flags6 average_received_fin_flags7 Average_received_window_size
Security and Communication Networks 3
323 Other System-Layer Feature Table 3 shows the featuresextracted in the other system-layer feature)ey are port1 port2port3 System_Fingerprint and ICMP_ECHO_response_time
(1) Port1 Port2 and Port3 Since low-interactive honeypotsonly simulate a special service the ports open on them aredifferent from real systems )erefore we chose the portnumber as one feature in the other system-layer feature )emeanings and values of port1 port2 and port3 are listedbelow
(i) Port1 indicates whether the port 21 of target islistening If port 21 is open the value of port1 is 21Otherwise the value is minus 1
(ii) Port2 indicates whether the port 25 of target islistening If port 25 is open the value of port1 is 25Otherwise the value is minus 2
(iii) Port3 indicates whether the port 80 of target islistening If port 80 is open the value of port1 is 80Otherwise the value is minus 3
(2) System_Fingerprint It means the version of operatingsystem running a honeypot such as ldquoMicrosoft WindowsServer 2016rdquo or ldquoUbuntu 1604rdquo According to the manualcheck result it indicates that many honeypots have the sameSystem_Fingerprint so it is considered to be a separatefeature that distinguishes honeypots from real systems
(3) ICMP_ECHO_response_time Mukkamala et al [19]claim that it is common to run several virtual honeypots on asingle machine So when handling ICMP ECHO requesthoneypots are always slower than real systems Utilizing thischaracteristic ICMP_ECHO_response_time was extractedfor the other system-layer feature
33 Classification Model After extracting three groupfeatures it needs to consider choosing an effective andaccurate algorithm and constructing an intelligent modelEnsemble learning is a method of accomplishing the taskby structuring and combining several learners [31] Firstlywe need to explain an important concept named hy-potheses in the ensemble learning area Hypotheses is thecollection of functions which can be described as thecollection of h1 h2 hn As for traditional machinelearning algorithms their purpose is to find the mostappropriate classifier hk for f (f is a function describing therelationship between input X and label y) But ensemblelearning is different all h in hypotheses will be combinedby a special fashion (such as voting) to decide the label ofinput X Specifically the process of ensemble learning canbe described as two steps One is to generate some indi-vidual learners Another is to combine these learners by aspecial fashion By combining individual learners en-semble learning can achieve more superior generalizationperformance than single learning Random forest is themost representative algorithm in ensemble learning
)e definition of random forest is ldquoA random forest is aclassifier consisting of a collection of tree-structured
classifier h (X Θk) k 1 where the Θk are in-dependent identically distributed random vectors and eachtree casts a unit vote for the most popular class at input Xrdquo[32] In fact random forest is the collection of decision treesEach decision tree is a classifier which gives a predictedresult So there are n results from different n classifiersAccording to the principle of random forest algorithm thefinal result will be decided by themajorities of all results [33]Random forest has the ability to give a more accurate resultand it can also handle thousands of input variables withoutvariable deletion [34] this is the best choice for the proposedmachine learning model
34Model Evaluation Model evaluation is a significant partof model training )e purpose of model evaluation is toassess the performance of final model In this paper weintroduced recall precision F1-score (F1) ROC curve andAUC metrics to evaluate the proposed model )eir defi-nitions are as follows
recall TP
TP + FN
precision TP
TP + FP
F1 2 times TP
n + TP minus TN
AUC 12
1113944
nminus 1
i1xi+1 minus xi( 1113857 middot yi+1 + yi( 1113857
(2)
where n is the number of samplesTrue positive (TP) is the number of honeypot servers
classified as honeypot servers True negative (TN) is thenumber of normal servers classified as normal servers Falsepositive (FP) is the number of normal servers classified ashoneypot servers False negative (FN) is the number ofhoneypot servers classified as normal servers Recall rate isthe ratio of all samples correctly classified as honeypotservers to all honeypot servers Precision rate is the ratio ofall samples that are correctly classified as honeypot servers toall samples that are predicted to be honeypot servers F1-score is a comprehensive measure of recall and precision
ROC means receiver operating characteristic It isconstructed by true positive rate (TPR) and false positiverate (FPR) TPR is the ratio of all samples correctly clas-sified as honeypot servers to all honeypot servers FPR is theratio of the sample incorrectly predicted to be honeypotservers to all the normal servers In the coordinates of a
Table 3 Other system-layer feature
No Feature name1 Port12 Port23 Port32 System_Fingerprint3 ICMP_ECHO_response_time
4 Security and Communication Networks
ROC curve X-axis is FPR and Y-axis is TPR TPR and FPRare defined as follows
TPR TP
TP + FN
FPR FP
TN + FP
(3)
Each (FPR TPR) is a point on a ROC curve If a ROCcurve is above the diagonal line between (0 0) and (1 1) itmeans the classification performance of the model is ac-ceptable If a ROC curve is below this line there may besome mistakes in the label of the dataset If a ROC curve isnear to this line the classification performance of themodel is poor In other words the model could predictwildly In order to evaluate the model comprehensivelyresearchers usually use the AUC index to measure theoverall efficiency of classification model )e closer thevalue of AUC is to 1 the better the classification effect of themodel
4 Experiment
In order to make the predicted result more authentic andaccurate the experiment focuses on the real honeypotservers and normal servers based on remote scanningtechnologies In the experiment scikit-learn library is usedto accomplish machine learning model training )e cal-culate procedure ran on the computer with i7-7700HQCPU16GB of memory and GTX 1050ti GPU
41 Dataset Shodan and Fofa are well-known cyberspacesearch engines which are used to search for network devicesMany known honeypot servers have been marked by thesetwo platforms so we can firstly get some IP addresses whichhave been marked as honeypots We also randomly selectmany IP addresses from the Internet Secondly we manuallycheck all IP addresses to determine whether they are hon-eypots or not )en we scan all IP addresses via raw sockettechnology to collect the feature data Finally we collect 2413IP records as the experiment dataset )e number of hon-eypot IP addresses is 807 )e number of real systems IPaddress is 1606 )e dataset is shown in Table 4
42 Experiment Design In order to improve the accuracy ofthe proposed model this paper introduced three steps in thedetail experiment dimension unification parameter ad-justment and cross validation
421 Dimension Unifying In some cases some large di-mension features will affect the final result when training amodel For example there are three records in Table 5 )edimension of feature2 is larger than the dimension offeature1 So when training a model the large dimensionfeature may dominate the predicted result )erefore thelabel of the 4th record may be predicted as 0 However thetruth is 1 )at is why dimension unifying is needed )epaper chose popular normalization method to achieve
dimension unifying )e equation of normalization is asfollows
yi xi minus 1113954x
δ
1113954x 1k
1113944
k
i1xi
δ
1k minus 1
1113944
k
i1xi minus 1113954x( 1113857
2
11139741113972
(4)
x1 x2 xk is a raw data record Each value xi will betransformed into another value yi by the equation )en thenew data record y1 y2 yk is composed of all values yFinally the standard deviation of the new data record is 0and the variance is 1 In this way the problem that largedimension features affect the training result is solved
422 Parameters Adjusting Choosing the most suitableparameters for the model is a necessary part of the modeltraining)ere are two important parameters in random forestalgorithm which will determine the accuracy of the modelOne is ldquon_estimatorsrdquo Another is ldquomax_depthrdquo ldquon_estima-torsrdquo is the number of decision trees and ldquomax_depthrdquo is themaximum allowable depth for each tree If ldquon_estimatorsrdquo istoo big the result may be overfitting However if ldquon_esti-matorsrdquo is too small the result may be underfitting So asuitable value of ldquon_estimatorsrdquo is important for the predictedaccuracy of the finalmodel 100 120 200 300 500 800 1200was prepared for ldquon_estimatorsrdquo and 5 8 15 25 30 100 wasprepared for ldquomax_depthrdquo We tested each of these two col-lections separately to choose the most appropriate value forthese two parameters
423 Cross Validation We introduce 10-fold cross vali-dation methods to avoid overfitting issues Cross validationis a common method of model evaluation [35] )e crossvalidation structure of this paper is shown in Figure 2 )ecross validation structure is composed of three partsFirstly the data D was divided into ten similar mutuallyexclusive subsets in the data split (D was the collectionD1 cupD2 cup cupD10 Di capDj ϕ) )en the divided data
Table 4 Experimental dataset
Data type SizeHoneypot records 807Real system records 1606
Table 5 Records for example
Feature1 Feature2 Feature3 Label001 3000 15 10051 45000 58 00001 12785 15 1002 80000 22
Security and Communication Networks 5
could be transferred to training and testing In the trainingand testing steps nine subsets were used for training everytime and the last one was used for validation So after thetraining and testing there were ten classifiers Each clas-sifier would give a predicted result )e last result wascomputed from these ten results
424 Comparing Experiment )e result without compar-ison is not convincing So we designed this comparingexperiment to highlight the advantages of the method in thispaper Reviewing some researches about machine learningwe chose SVM kNN and Naive_Bayes as the counterpartsof random forest )en these four machine learning algo-rithms would be used to train four models on the samedataset respectively Finally for eachmodel we plotted theirrespective ROC curves in the same coordinate systemAccording to these ROC curves the performance of modelsis clear at a glance
43 Experiment Result )is section shows the result of theexperiment Firstly we tested the values in the two pa-rameter collections respectively )e effect of differentvalues of ldquon_estimatorsrdquo on the accuracy is shown in Fig-ure 3 )e effect of different values of ldquomax_depthrdquo on theaccuracy is shown in Figure 4 )e blue line represents thechange in the training data and the green line represents thechange in the testing data
Analyzing these two figures when ldquon_estimatorsrdquo was200 the accuracy of the blue line and the green line were thehighest So the value of ldquon_estimatorsrdquo was decided to use200 When ldquomax_depthrdquo was 30 the accuracy of blue lineand green line were the highest So we chose 30 to be thevalue of ldquomax_depthrdquo
Table 6 is the recall precision F1-score and accuracy ofthe final model Figure 5 is the ROC curves of the fouralgorithms All of them were concluded after the 10-foldcross validation In Figure 5 the blue line is the ROC curve ofthe random forest model the green line is the ROC curve ofthe SVMmodel the yellow line is the ROC curve of the kNNmodel and the purple line is the ROC curve of theNaive_Bayes model Analyzing the figure results it can beconcluded that the performance of Naive_Bayes is the worstHowever there is little difference in the AUC values betweenrandom forest SVM and kNN But the performance of
random forest is better In Table 6 the recall rate is 082 theprecision rate is 082 the F1-score is 082 and the accuracy is090 )e ROC curve of random forest and these indexes
D
D1 D2
Result1
D10
D1 D2 D10
Result2D10 D2 D1
Result10D1 D2 D9
Training data
Testing data
Final result
Data split
Training and testing
Figure 2 10-fold cross validation method for the model
084
086
088
090
092
094
096
098
100
Accu
racy
200 400 1000800600 1200n_estimators
Training accuracyTesting accuracy
Figure 3 )e effect of different values of ldquon_estimatorsrdquo on theaccuracy
20 40 8060 100max_depth
084
086
088
090
092
094
096
098
100
Accu
racy
Training accuracyTesting accuracy
Figure 4 )e effect of different values of ldquomax_depthrdquo on theaccuracy
6 Security and Communication Networks
prove the final model has a high detection rate and a goodability of generalization
5 Conclusion
As an active cybersecurity defense strategy the emergenceof honeypots further reduces the attackerrsquos activity spaceAlthough honeypot technology is constantly improvingattackers are constantly searching for weaknesses inhoneypots to identify them)ere is an old saying in Chinathe authorities are fascinated bystanders clear For hon-eypot researchers in order to promote the development ofhoneypot technology it is necessary to learn and discoverthe method of identifying honeypot remotely
Based on the research of early honeypot and honeypotdetection technology this paper proposed a honeypot de-tection technology based onmachine learning algorithms)etechnology used machine learning technology to achieveaccurate honeypot detection through extraction and collectiondifferent layer honeypot features )e experimental resultproved that honeypots are still insufficient in the degree ofsimulating real systems such as the integrity of the simulatedservice At the same time the experimental result provided areference for the improvement of honeypots and promotedthe development of honeypot technology
Data Availability
)e experiment data used to support the findings of thisstudy are available from the corresponding author uponrequest
Conflicts of Interest
)e authors declare that they have no conflicts of interest
Acknowledgments
)e authors of this work were supported by the CCF-NSFOCUS KunPeng Research Fund (2018008) the SichuanUniversity Postdoc Research Foundation (19XJ0002) andthe Frontier Science and Technology Innovation Projects ofNational Key Research and Development Program(2019QY1405) )e authors would like to thank BaimaohuiSecurity Research Institute for providing some experimentdata
References
[1] M B Salem S Hershkop and S J Stolfo ldquoA survey ofinsider attack detection researchrdquo in Insider Attack andCyber Security pp 69ndash90 Springer Berlin Germany 2008
[2] F Sabahi and A Movaghar ldquoIntrusion detection a surveyrdquo inProceedings of the 2008 gtird International Conference onSystems and Networks Communications pp 23ndash26 IEEESliema Malta October 2008
[3] I Mokube andM Adams ldquoHoneypots concepts approachesand challengesrdquo in Proceedings of the 45th Annual SoutheastRegional Conference pp 321ndash326 ACMWinston-Salem NCUSA March 2007
[4] L Spitzner ldquo)e honeynet project trapping the hackersrdquoIEEE Security amp Privacy vol 1 no 2 pp 15ndash23 2003
[5] O )onnard and M Dacier ldquoA framework for attack pat-ternsrsquo discovery in honeynet datardquoDigital Investigation vol 5pp 128ndash139 2008
[6] W Fan Z Du M Smith-Creasey and D FernandezldquoHoneyDOC an efficient honeypot architecture enabling all-round designrdquo IEEE Journal on Selected Areas in Commu-nications vol 37 no 3 pp 683ndash697 2019
[7] K Sadasivam B Samudrala and T A Yang ldquoDesign ofnetwork security projects using honeypotsrdquo Journal ofComputing Sciences in Colleges vol 20 pp 282ndash293 2005
[8] M Mansoori O Zakaria and A Gani ldquoImproving exposureof intrusion deception system through implementation ofhybrid honeypotrdquo gte International Arab Journal of In-formation Technology vol 9 no 5 pp 436ndash444 2012
[9] G Portokalidis and H Bos ldquoSweetBait zero-hour wormdetection and containment using low- and high-interactionhoneypotsrdquo Computer Networks vol 51 no 5 pp 1256ndash12742007
[10] W Fan Z Du D Fernandez and V A Villagra ldquoEnabling ananatomic view to investigate honeypot systems a surveyrdquoIEEE Systems Journal vol 12 no 4 pp 3906ndash3919 2017
[11] M L Bringer C A Chelmecki and H Fujinoki ldquoA surveyrecent advances and future trends in honeypot researchrdquoInternational Journal of Computer Network and InformationSecurity vol 4 no 10 pp 63ndash75 2012
[12] P Wang L Wu R Cunningham and C C Zou ldquoHoneypotdetection in advanced botnet attacksrdquo International Journal ofInformation and Computer Security vol 4 no 1 pp 30ndash512010
[13] K Papazis and N Chilamkurti ldquoDetecting indicators ofdeception in emulated monitoring systemsrdquo Service OrientedComputing and Applications vol 13 no 1 pp 17ndash29 2019
Table 6 Recall precision F1-score and accuracy of the proposedmodel
Recall Precision F1-score Accuracy Data size082 083 082 090 2413
00 02 04 06 08 10
00
02
04
06
08
10
False positive rate
ROC curves
True
pos
itive
rate
Random forest (AUC = 093)SVM (AUC = 090)
kNN (AUC = 090)Naive_bayes(AUC = 060)
Figure 5 ROC curves about four algorithms
Security and Communication Networks 7
[14] W Fan and D Fernandez ldquoA novel SDN based stealthy TCPconnection handover mechanism for hybrid honeypot sys-temsrdquo in Proceedings of the 2017 IEEE Conference on NetworkSoftwarization (NetSoft) pp 1ndash9 IEEE Bologna Italy July2017
[15] H Artail H Safa M Sraj I Kuwatly and Z Al-Masri ldquoAhybrid honeypot framework for improving intrusion detectionsystems in protecting organizational networksrdquo Computers ampSecurity vol 25 no 4 pp 274ndash288 2006
[16] T Holz and F Raynal ldquoDetecting honeypots and othersuspicious environmentsrdquo in Proceedings from the SixthAnnual IEEE SMC Information Assurance Workshoppp 29ndash36 IEEE West Point NY USA June 2005
[17] P Defibaugh-Chavez R Veeraghattam M KannappaS Mukkamala and A Sung ldquoNetwork based detection ofvirtual environments and low interaction honeypotsrdquo inProceedings of the 2006 IEEE SMC Information AssuranceWorkshop pp 283ndash289 IEEE West Point NY USA June2006
[18] X Fu W Yu D Cheng X Tan K Streff and S Graham ldquoOnrecognizing virtual honeypots and countermeasuresrdquo inProceedings of the 2006 2nd IEEE International Symposium onDependable Autonomic and Secure Computing pp 211ndash218IEEE Indianapolis IN USA September 2006
[19] S Mukkamala K Yendrapalli R Basnet M K Shankarapaniand A H Sung ldquoDetection of virtual environments and lowinteraction honeypotsrdquo in Proceedings of the 2007 IEEE SMCInformation Assurance and Security Workshop pp 92ndash98IEEE West Point NY USA June 2007
[20] J Papalitsas S Rauti and V Leppanen ldquoA comparison ofrecord and play honeypot designsrdquo in Proceedings of the 18thInternational Conference on Computer Systems and Technol-ogies pp 133ndash140 ACM Ruse Bulgaria June 2017
[21] R M Campbell K Padayachee and T Masombuka ldquoAsurvey of honeypot research trends and opportunitiesrdquo inProceedings of the 2015 10th International Conference forInternet Technology and Secured Transactions (ICITST)pp 208ndash212 IEEE London UK December 2015
[22] F Y-S Lin Y-S Wang and M-Y Huang ldquoEffective pro-active and reactive defense strategies against malicious attacksin a virtualized honeynetrdquo Journal of Applied Mathematicsvol 2013 Article ID 518213 11 pages 2013
[23] O Surnin F Hussain R Hussain et al ldquoProbabilistic esti-mation of honeypot detection in Internet of things envi-ronmentrdquo in Proceedings of the 2019 International Conferenceon Computing Networking and Communications (ICNC)pp 191ndash196 IEEE Honolulu HI USA February 2019
[24] Z Wang X Feng Y Niu C Zhang and J Su ldquoTSMWD ahigh-speed malicious web page detection system based ontwo-step classifiersrdquo in Proceedings of the 2017 InternationalConference on Networking and Network Applications (NaNA)pp 170ndash175 IEEE Kathmandu City Nepal October 2017
[25] D Wenda and D Ning ldquoA honeypot detection method basedon characteristic analysis and environment detectionrdquo in 2011International Conference in Electrics Communication andAutomatic Control Proceedings pp 201ndash206 Springer BerlinGermany 2012
[26] N Cristianini and J Shawe-TaylorAn Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[27] N Provos ldquoA virtual honeypot frameworkrdquo in Proceedings ofthe USENIX Security Symposium vol 173 pp 1ndash14 SanDiego CA USA August 2004
[28] N Krawetz ldquoAnti-honeypot technologyrdquo IEEE Security ampPrivacy Magazine vol 2 no 1 pp 76ndash79 2004
[29] K Veena and K Meena ldquoImplementing file and real timebased intrusion detections in secure direct method usingadvanced honeypotrdquo Cluster Computing pp 1ndash8 2018
[30] R Tyagi T Paul B S Manoj and B )anudas ldquoPacketinspection for unauthorized OS detection in enterprisesrdquoIEEE Security amp Privacy vol 13 no 4 pp 60ndash65 2015
[31] Y Liu and X Yao ldquoEnsemble learning via negative corre-lationrdquo Neural Networks vol 12 no 10 pp 1399ndash1404 1999
[32] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001
[33] A Liaw and M Wiener ldquoClassification and regression byrandomForestrdquo R News vol 2 no 3 pp 18ndash22 2002
[34] R Genuer J-M Poggi and C Tuleau-Malot ldquoVariable se-lection using random forestsrdquo Pattern Recognition Lettersvol 31 no 14 pp 2225ndash2236 2010
[35] T Fushiki ldquoEstimation of prediction error by using K-foldcross-validationrdquo Statistics and Computing vol 21 no 2pp 137ndash146 2011
8 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
323 Other System-Layer Feature Table 3 shows the featuresextracted in the other system-layer feature)ey are port1 port2port3 System_Fingerprint and ICMP_ECHO_response_time
(1) Port1 Port2 and Port3 Since low-interactive honeypotsonly simulate a special service the ports open on them aredifferent from real systems )erefore we chose the portnumber as one feature in the other system-layer feature )emeanings and values of port1 port2 and port3 are listedbelow
(i) Port1 indicates whether the port 21 of target islistening If port 21 is open the value of port1 is 21Otherwise the value is minus 1
(ii) Port2 indicates whether the port 25 of target islistening If port 25 is open the value of port1 is 25Otherwise the value is minus 2
(iii) Port3 indicates whether the port 80 of target islistening If port 80 is open the value of port1 is 80Otherwise the value is minus 3
(2) System_Fingerprint It means the version of operatingsystem running a honeypot such as ldquoMicrosoft WindowsServer 2016rdquo or ldquoUbuntu 1604rdquo According to the manualcheck result it indicates that many honeypots have the sameSystem_Fingerprint so it is considered to be a separatefeature that distinguishes honeypots from real systems
(3) ICMP_ECHO_response_time Mukkamala et al [19]claim that it is common to run several virtual honeypots on asingle machine So when handling ICMP ECHO requesthoneypots are always slower than real systems Utilizing thischaracteristic ICMP_ECHO_response_time was extractedfor the other system-layer feature
33 Classification Model After extracting three groupfeatures it needs to consider choosing an effective andaccurate algorithm and constructing an intelligent modelEnsemble learning is a method of accomplishing the taskby structuring and combining several learners [31] Firstlywe need to explain an important concept named hy-potheses in the ensemble learning area Hypotheses is thecollection of functions which can be described as thecollection of h1 h2 hn As for traditional machinelearning algorithms their purpose is to find the mostappropriate classifier hk for f (f is a function describing therelationship between input X and label y) But ensemblelearning is different all h in hypotheses will be combinedby a special fashion (such as voting) to decide the label ofinput X Specifically the process of ensemble learning canbe described as two steps One is to generate some indi-vidual learners Another is to combine these learners by aspecial fashion By combining individual learners en-semble learning can achieve more superior generalizationperformance than single learning Random forest is themost representative algorithm in ensemble learning
)e definition of random forest is ldquoA random forest is aclassifier consisting of a collection of tree-structured
classifier h (X Θk) k 1 where the Θk are in-dependent identically distributed random vectors and eachtree casts a unit vote for the most popular class at input Xrdquo[32] In fact random forest is the collection of decision treesEach decision tree is a classifier which gives a predictedresult So there are n results from different n classifiersAccording to the principle of random forest algorithm thefinal result will be decided by themajorities of all results [33]Random forest has the ability to give a more accurate resultand it can also handle thousands of input variables withoutvariable deletion [34] this is the best choice for the proposedmachine learning model
34Model Evaluation Model evaluation is a significant partof model training )e purpose of model evaluation is toassess the performance of final model In this paper weintroduced recall precision F1-score (F1) ROC curve andAUC metrics to evaluate the proposed model )eir defi-nitions are as follows
recall TP
TP + FN
precision TP
TP + FP
F1 2 times TP
n + TP minus TN
AUC 12
1113944
nminus 1
i1xi+1 minus xi( 1113857 middot yi+1 + yi( 1113857
(2)
where n is the number of samplesTrue positive (TP) is the number of honeypot servers
classified as honeypot servers True negative (TN) is thenumber of normal servers classified as normal servers Falsepositive (FP) is the number of normal servers classified ashoneypot servers False negative (FN) is the number ofhoneypot servers classified as normal servers Recall rate isthe ratio of all samples correctly classified as honeypotservers to all honeypot servers Precision rate is the ratio ofall samples that are correctly classified as honeypot servers toall samples that are predicted to be honeypot servers F1-score is a comprehensive measure of recall and precision
ROC means receiver operating characteristic It isconstructed by true positive rate (TPR) and false positiverate (FPR) TPR is the ratio of all samples correctly clas-sified as honeypot servers to all honeypot servers FPR is theratio of the sample incorrectly predicted to be honeypotservers to all the normal servers In the coordinates of a
Table 3 Other system-layer feature
No Feature name1 Port12 Port23 Port32 System_Fingerprint3 ICMP_ECHO_response_time
4 Security and Communication Networks
ROC curve X-axis is FPR and Y-axis is TPR TPR and FPRare defined as follows
TPR TP
TP + FN
FPR FP
TN + FP
(3)
Each (FPR TPR) is a point on a ROC curve If a ROCcurve is above the diagonal line between (0 0) and (1 1) itmeans the classification performance of the model is ac-ceptable If a ROC curve is below this line there may besome mistakes in the label of the dataset If a ROC curve isnear to this line the classification performance of themodel is poor In other words the model could predictwildly In order to evaluate the model comprehensivelyresearchers usually use the AUC index to measure theoverall efficiency of classification model )e closer thevalue of AUC is to 1 the better the classification effect of themodel
4 Experiment
In order to make the predicted result more authentic andaccurate the experiment focuses on the real honeypotservers and normal servers based on remote scanningtechnologies In the experiment scikit-learn library is usedto accomplish machine learning model training )e cal-culate procedure ran on the computer with i7-7700HQCPU16GB of memory and GTX 1050ti GPU
41 Dataset Shodan and Fofa are well-known cyberspacesearch engines which are used to search for network devicesMany known honeypot servers have been marked by thesetwo platforms so we can firstly get some IP addresses whichhave been marked as honeypots We also randomly selectmany IP addresses from the Internet Secondly we manuallycheck all IP addresses to determine whether they are hon-eypots or not )en we scan all IP addresses via raw sockettechnology to collect the feature data Finally we collect 2413IP records as the experiment dataset )e number of hon-eypot IP addresses is 807 )e number of real systems IPaddress is 1606 )e dataset is shown in Table 4
42 Experiment Design In order to improve the accuracy ofthe proposed model this paper introduced three steps in thedetail experiment dimension unification parameter ad-justment and cross validation
421 Dimension Unifying In some cases some large di-mension features will affect the final result when training amodel For example there are three records in Table 5 )edimension of feature2 is larger than the dimension offeature1 So when training a model the large dimensionfeature may dominate the predicted result )erefore thelabel of the 4th record may be predicted as 0 However thetruth is 1 )at is why dimension unifying is needed )epaper chose popular normalization method to achieve
dimension unifying )e equation of normalization is asfollows
yi xi minus 1113954x
δ
1113954x 1k
1113944
k
i1xi
δ
1k minus 1
1113944
k
i1xi minus 1113954x( 1113857
2
11139741113972
(4)
x1 x2 xk is a raw data record Each value xi will betransformed into another value yi by the equation )en thenew data record y1 y2 yk is composed of all values yFinally the standard deviation of the new data record is 0and the variance is 1 In this way the problem that largedimension features affect the training result is solved
422 Parameters Adjusting Choosing the most suitableparameters for the model is a necessary part of the modeltraining)ere are two important parameters in random forestalgorithm which will determine the accuracy of the modelOne is ldquon_estimatorsrdquo Another is ldquomax_depthrdquo ldquon_estima-torsrdquo is the number of decision trees and ldquomax_depthrdquo is themaximum allowable depth for each tree If ldquon_estimatorsrdquo istoo big the result may be overfitting However if ldquon_esti-matorsrdquo is too small the result may be underfitting So asuitable value of ldquon_estimatorsrdquo is important for the predictedaccuracy of the finalmodel 100 120 200 300 500 800 1200was prepared for ldquon_estimatorsrdquo and 5 8 15 25 30 100 wasprepared for ldquomax_depthrdquo We tested each of these two col-lections separately to choose the most appropriate value forthese two parameters
423 Cross Validation We introduce 10-fold cross vali-dation methods to avoid overfitting issues Cross validationis a common method of model evaluation [35] )e crossvalidation structure of this paper is shown in Figure 2 )ecross validation structure is composed of three partsFirstly the data D was divided into ten similar mutuallyexclusive subsets in the data split (D was the collectionD1 cupD2 cup cupD10 Di capDj ϕ) )en the divided data
Table 4 Experimental dataset
Data type SizeHoneypot records 807Real system records 1606
Table 5 Records for example
Feature1 Feature2 Feature3 Label001 3000 15 10051 45000 58 00001 12785 15 1002 80000 22
Security and Communication Networks 5
could be transferred to training and testing In the trainingand testing steps nine subsets were used for training everytime and the last one was used for validation So after thetraining and testing there were ten classifiers Each clas-sifier would give a predicted result )e last result wascomputed from these ten results
424 Comparing Experiment )e result without compar-ison is not convincing So we designed this comparingexperiment to highlight the advantages of the method in thispaper Reviewing some researches about machine learningwe chose SVM kNN and Naive_Bayes as the counterpartsof random forest )en these four machine learning algo-rithms would be used to train four models on the samedataset respectively Finally for eachmodel we plotted theirrespective ROC curves in the same coordinate systemAccording to these ROC curves the performance of modelsis clear at a glance
43 Experiment Result )is section shows the result of theexperiment Firstly we tested the values in the two pa-rameter collections respectively )e effect of differentvalues of ldquon_estimatorsrdquo on the accuracy is shown in Fig-ure 3 )e effect of different values of ldquomax_depthrdquo on theaccuracy is shown in Figure 4 )e blue line represents thechange in the training data and the green line represents thechange in the testing data
Analyzing these two figures when ldquon_estimatorsrdquo was200 the accuracy of the blue line and the green line were thehighest So the value of ldquon_estimatorsrdquo was decided to use200 When ldquomax_depthrdquo was 30 the accuracy of blue lineand green line were the highest So we chose 30 to be thevalue of ldquomax_depthrdquo
Table 6 is the recall precision F1-score and accuracy ofthe final model Figure 5 is the ROC curves of the fouralgorithms All of them were concluded after the 10-foldcross validation In Figure 5 the blue line is the ROC curve ofthe random forest model the green line is the ROC curve ofthe SVMmodel the yellow line is the ROC curve of the kNNmodel and the purple line is the ROC curve of theNaive_Bayes model Analyzing the figure results it can beconcluded that the performance of Naive_Bayes is the worstHowever there is little difference in the AUC values betweenrandom forest SVM and kNN But the performance of
random forest is better In Table 6 the recall rate is 082 theprecision rate is 082 the F1-score is 082 and the accuracy is090 )e ROC curve of random forest and these indexes
D
D1 D2
Result1
D10
D1 D2 D10
Result2D10 D2 D1
Result10D1 D2 D9
Training data
Testing data
Final result
Data split
Training and testing
Figure 2 10-fold cross validation method for the model
084
086
088
090
092
094
096
098
100
Accu
racy
200 400 1000800600 1200n_estimators
Training accuracyTesting accuracy
Figure 3 )e effect of different values of ldquon_estimatorsrdquo on theaccuracy
20 40 8060 100max_depth
084
086
088
090
092
094
096
098
100
Accu
racy
Training accuracyTesting accuracy
Figure 4 )e effect of different values of ldquomax_depthrdquo on theaccuracy
6 Security and Communication Networks
prove the final model has a high detection rate and a goodability of generalization
5 Conclusion
As an active cybersecurity defense strategy the emergenceof honeypots further reduces the attackerrsquos activity spaceAlthough honeypot technology is constantly improvingattackers are constantly searching for weaknesses inhoneypots to identify them)ere is an old saying in Chinathe authorities are fascinated bystanders clear For hon-eypot researchers in order to promote the development ofhoneypot technology it is necessary to learn and discoverthe method of identifying honeypot remotely
Based on the research of early honeypot and honeypotdetection technology this paper proposed a honeypot de-tection technology based onmachine learning algorithms)etechnology used machine learning technology to achieveaccurate honeypot detection through extraction and collectiondifferent layer honeypot features )e experimental resultproved that honeypots are still insufficient in the degree ofsimulating real systems such as the integrity of the simulatedservice At the same time the experimental result provided areference for the improvement of honeypots and promotedthe development of honeypot technology
Data Availability
)e experiment data used to support the findings of thisstudy are available from the corresponding author uponrequest
Conflicts of Interest
)e authors declare that they have no conflicts of interest
Acknowledgments
)e authors of this work were supported by the CCF-NSFOCUS KunPeng Research Fund (2018008) the SichuanUniversity Postdoc Research Foundation (19XJ0002) andthe Frontier Science and Technology Innovation Projects ofNational Key Research and Development Program(2019QY1405) )e authors would like to thank BaimaohuiSecurity Research Institute for providing some experimentdata
References
[1] M B Salem S Hershkop and S J Stolfo ldquoA survey ofinsider attack detection researchrdquo in Insider Attack andCyber Security pp 69ndash90 Springer Berlin Germany 2008
[2] F Sabahi and A Movaghar ldquoIntrusion detection a surveyrdquo inProceedings of the 2008 gtird International Conference onSystems and Networks Communications pp 23ndash26 IEEESliema Malta October 2008
[3] I Mokube andM Adams ldquoHoneypots concepts approachesand challengesrdquo in Proceedings of the 45th Annual SoutheastRegional Conference pp 321ndash326 ACMWinston-Salem NCUSA March 2007
[4] L Spitzner ldquo)e honeynet project trapping the hackersrdquoIEEE Security amp Privacy vol 1 no 2 pp 15ndash23 2003
[5] O )onnard and M Dacier ldquoA framework for attack pat-ternsrsquo discovery in honeynet datardquoDigital Investigation vol 5pp 128ndash139 2008
[6] W Fan Z Du M Smith-Creasey and D FernandezldquoHoneyDOC an efficient honeypot architecture enabling all-round designrdquo IEEE Journal on Selected Areas in Commu-nications vol 37 no 3 pp 683ndash697 2019
[7] K Sadasivam B Samudrala and T A Yang ldquoDesign ofnetwork security projects using honeypotsrdquo Journal ofComputing Sciences in Colleges vol 20 pp 282ndash293 2005
[8] M Mansoori O Zakaria and A Gani ldquoImproving exposureof intrusion deception system through implementation ofhybrid honeypotrdquo gte International Arab Journal of In-formation Technology vol 9 no 5 pp 436ndash444 2012
[9] G Portokalidis and H Bos ldquoSweetBait zero-hour wormdetection and containment using low- and high-interactionhoneypotsrdquo Computer Networks vol 51 no 5 pp 1256ndash12742007
[10] W Fan Z Du D Fernandez and V A Villagra ldquoEnabling ananatomic view to investigate honeypot systems a surveyrdquoIEEE Systems Journal vol 12 no 4 pp 3906ndash3919 2017
[11] M L Bringer C A Chelmecki and H Fujinoki ldquoA surveyrecent advances and future trends in honeypot researchrdquoInternational Journal of Computer Network and InformationSecurity vol 4 no 10 pp 63ndash75 2012
[12] P Wang L Wu R Cunningham and C C Zou ldquoHoneypotdetection in advanced botnet attacksrdquo International Journal ofInformation and Computer Security vol 4 no 1 pp 30ndash512010
[13] K Papazis and N Chilamkurti ldquoDetecting indicators ofdeception in emulated monitoring systemsrdquo Service OrientedComputing and Applications vol 13 no 1 pp 17ndash29 2019
Table 6 Recall precision F1-score and accuracy of the proposedmodel
Recall Precision F1-score Accuracy Data size082 083 082 090 2413
00 02 04 06 08 10
00
02
04
06
08
10
False positive rate
ROC curves
True
pos
itive
rate
Random forest (AUC = 093)SVM (AUC = 090)
kNN (AUC = 090)Naive_bayes(AUC = 060)
Figure 5 ROC curves about four algorithms
Security and Communication Networks 7
[14] W Fan and D Fernandez ldquoA novel SDN based stealthy TCPconnection handover mechanism for hybrid honeypot sys-temsrdquo in Proceedings of the 2017 IEEE Conference on NetworkSoftwarization (NetSoft) pp 1ndash9 IEEE Bologna Italy July2017
[15] H Artail H Safa M Sraj I Kuwatly and Z Al-Masri ldquoAhybrid honeypot framework for improving intrusion detectionsystems in protecting organizational networksrdquo Computers ampSecurity vol 25 no 4 pp 274ndash288 2006
[16] T Holz and F Raynal ldquoDetecting honeypots and othersuspicious environmentsrdquo in Proceedings from the SixthAnnual IEEE SMC Information Assurance Workshoppp 29ndash36 IEEE West Point NY USA June 2005
[17] P Defibaugh-Chavez R Veeraghattam M KannappaS Mukkamala and A Sung ldquoNetwork based detection ofvirtual environments and low interaction honeypotsrdquo inProceedings of the 2006 IEEE SMC Information AssuranceWorkshop pp 283ndash289 IEEE West Point NY USA June2006
[18] X Fu W Yu D Cheng X Tan K Streff and S Graham ldquoOnrecognizing virtual honeypots and countermeasuresrdquo inProceedings of the 2006 2nd IEEE International Symposium onDependable Autonomic and Secure Computing pp 211ndash218IEEE Indianapolis IN USA September 2006
[19] S Mukkamala K Yendrapalli R Basnet M K Shankarapaniand A H Sung ldquoDetection of virtual environments and lowinteraction honeypotsrdquo in Proceedings of the 2007 IEEE SMCInformation Assurance and Security Workshop pp 92ndash98IEEE West Point NY USA June 2007
[20] J Papalitsas S Rauti and V Leppanen ldquoA comparison ofrecord and play honeypot designsrdquo in Proceedings of the 18thInternational Conference on Computer Systems and Technol-ogies pp 133ndash140 ACM Ruse Bulgaria June 2017
[21] R M Campbell K Padayachee and T Masombuka ldquoAsurvey of honeypot research trends and opportunitiesrdquo inProceedings of the 2015 10th International Conference forInternet Technology and Secured Transactions (ICITST)pp 208ndash212 IEEE London UK December 2015
[22] F Y-S Lin Y-S Wang and M-Y Huang ldquoEffective pro-active and reactive defense strategies against malicious attacksin a virtualized honeynetrdquo Journal of Applied Mathematicsvol 2013 Article ID 518213 11 pages 2013
[23] O Surnin F Hussain R Hussain et al ldquoProbabilistic esti-mation of honeypot detection in Internet of things envi-ronmentrdquo in Proceedings of the 2019 International Conferenceon Computing Networking and Communications (ICNC)pp 191ndash196 IEEE Honolulu HI USA February 2019
[24] Z Wang X Feng Y Niu C Zhang and J Su ldquoTSMWD ahigh-speed malicious web page detection system based ontwo-step classifiersrdquo in Proceedings of the 2017 InternationalConference on Networking and Network Applications (NaNA)pp 170ndash175 IEEE Kathmandu City Nepal October 2017
[25] D Wenda and D Ning ldquoA honeypot detection method basedon characteristic analysis and environment detectionrdquo in 2011International Conference in Electrics Communication andAutomatic Control Proceedings pp 201ndash206 Springer BerlinGermany 2012
[26] N Cristianini and J Shawe-TaylorAn Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[27] N Provos ldquoA virtual honeypot frameworkrdquo in Proceedings ofthe USENIX Security Symposium vol 173 pp 1ndash14 SanDiego CA USA August 2004
[28] N Krawetz ldquoAnti-honeypot technologyrdquo IEEE Security ampPrivacy Magazine vol 2 no 1 pp 76ndash79 2004
[29] K Veena and K Meena ldquoImplementing file and real timebased intrusion detections in secure direct method usingadvanced honeypotrdquo Cluster Computing pp 1ndash8 2018
[30] R Tyagi T Paul B S Manoj and B )anudas ldquoPacketinspection for unauthorized OS detection in enterprisesrdquoIEEE Security amp Privacy vol 13 no 4 pp 60ndash65 2015
[31] Y Liu and X Yao ldquoEnsemble learning via negative corre-lationrdquo Neural Networks vol 12 no 10 pp 1399ndash1404 1999
[32] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001
[33] A Liaw and M Wiener ldquoClassification and regression byrandomForestrdquo R News vol 2 no 3 pp 18ndash22 2002
[34] R Genuer J-M Poggi and C Tuleau-Malot ldquoVariable se-lection using random forestsrdquo Pattern Recognition Lettersvol 31 no 14 pp 2225ndash2236 2010
[35] T Fushiki ldquoEstimation of prediction error by using K-foldcross-validationrdquo Statistics and Computing vol 21 no 2pp 137ndash146 2011
8 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
ROC curve X-axis is FPR and Y-axis is TPR TPR and FPRare defined as follows
TPR TP
TP + FN
FPR FP
TN + FP
(3)
Each (FPR TPR) is a point on a ROC curve If a ROCcurve is above the diagonal line between (0 0) and (1 1) itmeans the classification performance of the model is ac-ceptable If a ROC curve is below this line there may besome mistakes in the label of the dataset If a ROC curve isnear to this line the classification performance of themodel is poor In other words the model could predictwildly In order to evaluate the model comprehensivelyresearchers usually use the AUC index to measure theoverall efficiency of classification model )e closer thevalue of AUC is to 1 the better the classification effect of themodel
4 Experiment
In order to make the predicted result more authentic andaccurate the experiment focuses on the real honeypotservers and normal servers based on remote scanningtechnologies In the experiment scikit-learn library is usedto accomplish machine learning model training )e cal-culate procedure ran on the computer with i7-7700HQCPU16GB of memory and GTX 1050ti GPU
41 Dataset Shodan and Fofa are well-known cyberspacesearch engines which are used to search for network devicesMany known honeypot servers have been marked by thesetwo platforms so we can firstly get some IP addresses whichhave been marked as honeypots We also randomly selectmany IP addresses from the Internet Secondly we manuallycheck all IP addresses to determine whether they are hon-eypots or not )en we scan all IP addresses via raw sockettechnology to collect the feature data Finally we collect 2413IP records as the experiment dataset )e number of hon-eypot IP addresses is 807 )e number of real systems IPaddress is 1606 )e dataset is shown in Table 4
42 Experiment Design In order to improve the accuracy ofthe proposed model this paper introduced three steps in thedetail experiment dimension unification parameter ad-justment and cross validation
421 Dimension Unifying In some cases some large di-mension features will affect the final result when training amodel For example there are three records in Table 5 )edimension of feature2 is larger than the dimension offeature1 So when training a model the large dimensionfeature may dominate the predicted result )erefore thelabel of the 4th record may be predicted as 0 However thetruth is 1 )at is why dimension unifying is needed )epaper chose popular normalization method to achieve
dimension unifying )e equation of normalization is asfollows
yi xi minus 1113954x
δ
1113954x 1k
1113944
k
i1xi
δ
1k minus 1
1113944
k
i1xi minus 1113954x( 1113857
2
11139741113972
(4)
x1 x2 xk is a raw data record Each value xi will betransformed into another value yi by the equation )en thenew data record y1 y2 yk is composed of all values yFinally the standard deviation of the new data record is 0and the variance is 1 In this way the problem that largedimension features affect the training result is solved
422 Parameters Adjusting Choosing the most suitableparameters for the model is a necessary part of the modeltraining)ere are two important parameters in random forestalgorithm which will determine the accuracy of the modelOne is ldquon_estimatorsrdquo Another is ldquomax_depthrdquo ldquon_estima-torsrdquo is the number of decision trees and ldquomax_depthrdquo is themaximum allowable depth for each tree If ldquon_estimatorsrdquo istoo big the result may be overfitting However if ldquon_esti-matorsrdquo is too small the result may be underfitting So asuitable value of ldquon_estimatorsrdquo is important for the predictedaccuracy of the finalmodel 100 120 200 300 500 800 1200was prepared for ldquon_estimatorsrdquo and 5 8 15 25 30 100 wasprepared for ldquomax_depthrdquo We tested each of these two col-lections separately to choose the most appropriate value forthese two parameters
423 Cross Validation We introduce 10-fold cross vali-dation methods to avoid overfitting issues Cross validationis a common method of model evaluation [35] )e crossvalidation structure of this paper is shown in Figure 2 )ecross validation structure is composed of three partsFirstly the data D was divided into ten similar mutuallyexclusive subsets in the data split (D was the collectionD1 cupD2 cup cupD10 Di capDj ϕ) )en the divided data
Table 4 Experimental dataset
Data type SizeHoneypot records 807Real system records 1606
Table 5 Records for example
Feature1 Feature2 Feature3 Label001 3000 15 10051 45000 58 00001 12785 15 1002 80000 22
Security and Communication Networks 5
could be transferred to training and testing In the trainingand testing steps nine subsets were used for training everytime and the last one was used for validation So after thetraining and testing there were ten classifiers Each clas-sifier would give a predicted result )e last result wascomputed from these ten results
424 Comparing Experiment )e result without compar-ison is not convincing So we designed this comparingexperiment to highlight the advantages of the method in thispaper Reviewing some researches about machine learningwe chose SVM kNN and Naive_Bayes as the counterpartsof random forest )en these four machine learning algo-rithms would be used to train four models on the samedataset respectively Finally for eachmodel we plotted theirrespective ROC curves in the same coordinate systemAccording to these ROC curves the performance of modelsis clear at a glance
43 Experiment Result )is section shows the result of theexperiment Firstly we tested the values in the two pa-rameter collections respectively )e effect of differentvalues of ldquon_estimatorsrdquo on the accuracy is shown in Fig-ure 3 )e effect of different values of ldquomax_depthrdquo on theaccuracy is shown in Figure 4 )e blue line represents thechange in the training data and the green line represents thechange in the testing data
Analyzing these two figures when ldquon_estimatorsrdquo was200 the accuracy of the blue line and the green line were thehighest So the value of ldquon_estimatorsrdquo was decided to use200 When ldquomax_depthrdquo was 30 the accuracy of blue lineand green line were the highest So we chose 30 to be thevalue of ldquomax_depthrdquo
Table 6 is the recall precision F1-score and accuracy ofthe final model Figure 5 is the ROC curves of the fouralgorithms All of them were concluded after the 10-foldcross validation In Figure 5 the blue line is the ROC curve ofthe random forest model the green line is the ROC curve ofthe SVMmodel the yellow line is the ROC curve of the kNNmodel and the purple line is the ROC curve of theNaive_Bayes model Analyzing the figure results it can beconcluded that the performance of Naive_Bayes is the worstHowever there is little difference in the AUC values betweenrandom forest SVM and kNN But the performance of
random forest is better In Table 6 the recall rate is 082 theprecision rate is 082 the F1-score is 082 and the accuracy is090 )e ROC curve of random forest and these indexes
D
D1 D2
Result1
D10
D1 D2 D10
Result2D10 D2 D1
Result10D1 D2 D9
Training data
Testing data
Final result
Data split
Training and testing
Figure 2 10-fold cross validation method for the model
084
086
088
090
092
094
096
098
100
Accu
racy
200 400 1000800600 1200n_estimators
Training accuracyTesting accuracy
Figure 3 )e effect of different values of ldquon_estimatorsrdquo on theaccuracy
20 40 8060 100max_depth
084
086
088
090
092
094
096
098
100
Accu
racy
Training accuracyTesting accuracy
Figure 4 )e effect of different values of ldquomax_depthrdquo on theaccuracy
6 Security and Communication Networks
prove the final model has a high detection rate and a goodability of generalization
5 Conclusion
As an active cybersecurity defense strategy the emergenceof honeypots further reduces the attackerrsquos activity spaceAlthough honeypot technology is constantly improvingattackers are constantly searching for weaknesses inhoneypots to identify them)ere is an old saying in Chinathe authorities are fascinated bystanders clear For hon-eypot researchers in order to promote the development ofhoneypot technology it is necessary to learn and discoverthe method of identifying honeypot remotely
Based on the research of early honeypot and honeypotdetection technology this paper proposed a honeypot de-tection technology based onmachine learning algorithms)etechnology used machine learning technology to achieveaccurate honeypot detection through extraction and collectiondifferent layer honeypot features )e experimental resultproved that honeypots are still insufficient in the degree ofsimulating real systems such as the integrity of the simulatedservice At the same time the experimental result provided areference for the improvement of honeypots and promotedthe development of honeypot technology
Data Availability
)e experiment data used to support the findings of thisstudy are available from the corresponding author uponrequest
Conflicts of Interest
)e authors declare that they have no conflicts of interest
Acknowledgments
)e authors of this work were supported by the CCF-NSFOCUS KunPeng Research Fund (2018008) the SichuanUniversity Postdoc Research Foundation (19XJ0002) andthe Frontier Science and Technology Innovation Projects ofNational Key Research and Development Program(2019QY1405) )e authors would like to thank BaimaohuiSecurity Research Institute for providing some experimentdata
References
[1] M B Salem S Hershkop and S J Stolfo ldquoA survey ofinsider attack detection researchrdquo in Insider Attack andCyber Security pp 69ndash90 Springer Berlin Germany 2008
[2] F Sabahi and A Movaghar ldquoIntrusion detection a surveyrdquo inProceedings of the 2008 gtird International Conference onSystems and Networks Communications pp 23ndash26 IEEESliema Malta October 2008
[3] I Mokube andM Adams ldquoHoneypots concepts approachesand challengesrdquo in Proceedings of the 45th Annual SoutheastRegional Conference pp 321ndash326 ACMWinston-Salem NCUSA March 2007
[4] L Spitzner ldquo)e honeynet project trapping the hackersrdquoIEEE Security amp Privacy vol 1 no 2 pp 15ndash23 2003
[5] O )onnard and M Dacier ldquoA framework for attack pat-ternsrsquo discovery in honeynet datardquoDigital Investigation vol 5pp 128ndash139 2008
[6] W Fan Z Du M Smith-Creasey and D FernandezldquoHoneyDOC an efficient honeypot architecture enabling all-round designrdquo IEEE Journal on Selected Areas in Commu-nications vol 37 no 3 pp 683ndash697 2019
[7] K Sadasivam B Samudrala and T A Yang ldquoDesign ofnetwork security projects using honeypotsrdquo Journal ofComputing Sciences in Colleges vol 20 pp 282ndash293 2005
[8] M Mansoori O Zakaria and A Gani ldquoImproving exposureof intrusion deception system through implementation ofhybrid honeypotrdquo gte International Arab Journal of In-formation Technology vol 9 no 5 pp 436ndash444 2012
[9] G Portokalidis and H Bos ldquoSweetBait zero-hour wormdetection and containment using low- and high-interactionhoneypotsrdquo Computer Networks vol 51 no 5 pp 1256ndash12742007
[10] W Fan Z Du D Fernandez and V A Villagra ldquoEnabling ananatomic view to investigate honeypot systems a surveyrdquoIEEE Systems Journal vol 12 no 4 pp 3906ndash3919 2017
[11] M L Bringer C A Chelmecki and H Fujinoki ldquoA surveyrecent advances and future trends in honeypot researchrdquoInternational Journal of Computer Network and InformationSecurity vol 4 no 10 pp 63ndash75 2012
[12] P Wang L Wu R Cunningham and C C Zou ldquoHoneypotdetection in advanced botnet attacksrdquo International Journal ofInformation and Computer Security vol 4 no 1 pp 30ndash512010
[13] K Papazis and N Chilamkurti ldquoDetecting indicators ofdeception in emulated monitoring systemsrdquo Service OrientedComputing and Applications vol 13 no 1 pp 17ndash29 2019
Table 6 Recall precision F1-score and accuracy of the proposedmodel
Recall Precision F1-score Accuracy Data size082 083 082 090 2413
00 02 04 06 08 10
00
02
04
06
08
10
False positive rate
ROC curves
True
pos
itive
rate
Random forest (AUC = 093)SVM (AUC = 090)
kNN (AUC = 090)Naive_bayes(AUC = 060)
Figure 5 ROC curves about four algorithms
Security and Communication Networks 7
[14] W Fan and D Fernandez ldquoA novel SDN based stealthy TCPconnection handover mechanism for hybrid honeypot sys-temsrdquo in Proceedings of the 2017 IEEE Conference on NetworkSoftwarization (NetSoft) pp 1ndash9 IEEE Bologna Italy July2017
[15] H Artail H Safa M Sraj I Kuwatly and Z Al-Masri ldquoAhybrid honeypot framework for improving intrusion detectionsystems in protecting organizational networksrdquo Computers ampSecurity vol 25 no 4 pp 274ndash288 2006
[16] T Holz and F Raynal ldquoDetecting honeypots and othersuspicious environmentsrdquo in Proceedings from the SixthAnnual IEEE SMC Information Assurance Workshoppp 29ndash36 IEEE West Point NY USA June 2005
[17] P Defibaugh-Chavez R Veeraghattam M KannappaS Mukkamala and A Sung ldquoNetwork based detection ofvirtual environments and low interaction honeypotsrdquo inProceedings of the 2006 IEEE SMC Information AssuranceWorkshop pp 283ndash289 IEEE West Point NY USA June2006
[18] X Fu W Yu D Cheng X Tan K Streff and S Graham ldquoOnrecognizing virtual honeypots and countermeasuresrdquo inProceedings of the 2006 2nd IEEE International Symposium onDependable Autonomic and Secure Computing pp 211ndash218IEEE Indianapolis IN USA September 2006
[19] S Mukkamala K Yendrapalli R Basnet M K Shankarapaniand A H Sung ldquoDetection of virtual environments and lowinteraction honeypotsrdquo in Proceedings of the 2007 IEEE SMCInformation Assurance and Security Workshop pp 92ndash98IEEE West Point NY USA June 2007
[20] J Papalitsas S Rauti and V Leppanen ldquoA comparison ofrecord and play honeypot designsrdquo in Proceedings of the 18thInternational Conference on Computer Systems and Technol-ogies pp 133ndash140 ACM Ruse Bulgaria June 2017
[21] R M Campbell K Padayachee and T Masombuka ldquoAsurvey of honeypot research trends and opportunitiesrdquo inProceedings of the 2015 10th International Conference forInternet Technology and Secured Transactions (ICITST)pp 208ndash212 IEEE London UK December 2015
[22] F Y-S Lin Y-S Wang and M-Y Huang ldquoEffective pro-active and reactive defense strategies against malicious attacksin a virtualized honeynetrdquo Journal of Applied Mathematicsvol 2013 Article ID 518213 11 pages 2013
[23] O Surnin F Hussain R Hussain et al ldquoProbabilistic esti-mation of honeypot detection in Internet of things envi-ronmentrdquo in Proceedings of the 2019 International Conferenceon Computing Networking and Communications (ICNC)pp 191ndash196 IEEE Honolulu HI USA February 2019
[24] Z Wang X Feng Y Niu C Zhang and J Su ldquoTSMWD ahigh-speed malicious web page detection system based ontwo-step classifiersrdquo in Proceedings of the 2017 InternationalConference on Networking and Network Applications (NaNA)pp 170ndash175 IEEE Kathmandu City Nepal October 2017
[25] D Wenda and D Ning ldquoA honeypot detection method basedon characteristic analysis and environment detectionrdquo in 2011International Conference in Electrics Communication andAutomatic Control Proceedings pp 201ndash206 Springer BerlinGermany 2012
[26] N Cristianini and J Shawe-TaylorAn Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[27] N Provos ldquoA virtual honeypot frameworkrdquo in Proceedings ofthe USENIX Security Symposium vol 173 pp 1ndash14 SanDiego CA USA August 2004
[28] N Krawetz ldquoAnti-honeypot technologyrdquo IEEE Security ampPrivacy Magazine vol 2 no 1 pp 76ndash79 2004
[29] K Veena and K Meena ldquoImplementing file and real timebased intrusion detections in secure direct method usingadvanced honeypotrdquo Cluster Computing pp 1ndash8 2018
[30] R Tyagi T Paul B S Manoj and B )anudas ldquoPacketinspection for unauthorized OS detection in enterprisesrdquoIEEE Security amp Privacy vol 13 no 4 pp 60ndash65 2015
[31] Y Liu and X Yao ldquoEnsemble learning via negative corre-lationrdquo Neural Networks vol 12 no 10 pp 1399ndash1404 1999
[32] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001
[33] A Liaw and M Wiener ldquoClassification and regression byrandomForestrdquo R News vol 2 no 3 pp 18ndash22 2002
[34] R Genuer J-M Poggi and C Tuleau-Malot ldquoVariable se-lection using random forestsrdquo Pattern Recognition Lettersvol 31 no 14 pp 2225ndash2236 2010
[35] T Fushiki ldquoEstimation of prediction error by using K-foldcross-validationrdquo Statistics and Computing vol 21 no 2pp 137ndash146 2011
8 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
could be transferred to training and testing In the trainingand testing steps nine subsets were used for training everytime and the last one was used for validation So after thetraining and testing there were ten classifiers Each clas-sifier would give a predicted result )e last result wascomputed from these ten results
424 Comparing Experiment )e result without compar-ison is not convincing So we designed this comparingexperiment to highlight the advantages of the method in thispaper Reviewing some researches about machine learningwe chose SVM kNN and Naive_Bayes as the counterpartsof random forest )en these four machine learning algo-rithms would be used to train four models on the samedataset respectively Finally for eachmodel we plotted theirrespective ROC curves in the same coordinate systemAccording to these ROC curves the performance of modelsis clear at a glance
43 Experiment Result )is section shows the result of theexperiment Firstly we tested the values in the two pa-rameter collections respectively )e effect of differentvalues of ldquon_estimatorsrdquo on the accuracy is shown in Fig-ure 3 )e effect of different values of ldquomax_depthrdquo on theaccuracy is shown in Figure 4 )e blue line represents thechange in the training data and the green line represents thechange in the testing data
Analyzing these two figures when ldquon_estimatorsrdquo was200 the accuracy of the blue line and the green line were thehighest So the value of ldquon_estimatorsrdquo was decided to use200 When ldquomax_depthrdquo was 30 the accuracy of blue lineand green line were the highest So we chose 30 to be thevalue of ldquomax_depthrdquo
Table 6 is the recall precision F1-score and accuracy ofthe final model Figure 5 is the ROC curves of the fouralgorithms All of them were concluded after the 10-foldcross validation In Figure 5 the blue line is the ROC curve ofthe random forest model the green line is the ROC curve ofthe SVMmodel the yellow line is the ROC curve of the kNNmodel and the purple line is the ROC curve of theNaive_Bayes model Analyzing the figure results it can beconcluded that the performance of Naive_Bayes is the worstHowever there is little difference in the AUC values betweenrandom forest SVM and kNN But the performance of
random forest is better In Table 6 the recall rate is 082 theprecision rate is 082 the F1-score is 082 and the accuracy is090 )e ROC curve of random forest and these indexes
D
D1 D2
Result1
D10
D1 D2 D10
Result2D10 D2 D1
Result10D1 D2 D9
Training data
Testing data
Final result
Data split
Training and testing
Figure 2 10-fold cross validation method for the model
084
086
088
090
092
094
096
098
100
Accu
racy
200 400 1000800600 1200n_estimators
Training accuracyTesting accuracy
Figure 3 )e effect of different values of ldquon_estimatorsrdquo on theaccuracy
20 40 8060 100max_depth
084
086
088
090
092
094
096
098
100
Accu
racy
Training accuracyTesting accuracy
Figure 4 )e effect of different values of ldquomax_depthrdquo on theaccuracy
6 Security and Communication Networks
prove the final model has a high detection rate and a goodability of generalization
5 Conclusion
As an active cybersecurity defense strategy the emergenceof honeypots further reduces the attackerrsquos activity spaceAlthough honeypot technology is constantly improvingattackers are constantly searching for weaknesses inhoneypots to identify them)ere is an old saying in Chinathe authorities are fascinated bystanders clear For hon-eypot researchers in order to promote the development ofhoneypot technology it is necessary to learn and discoverthe method of identifying honeypot remotely
Based on the research of early honeypot and honeypotdetection technology this paper proposed a honeypot de-tection technology based onmachine learning algorithms)etechnology used machine learning technology to achieveaccurate honeypot detection through extraction and collectiondifferent layer honeypot features )e experimental resultproved that honeypots are still insufficient in the degree ofsimulating real systems such as the integrity of the simulatedservice At the same time the experimental result provided areference for the improvement of honeypots and promotedthe development of honeypot technology
Data Availability
)e experiment data used to support the findings of thisstudy are available from the corresponding author uponrequest
Conflicts of Interest
)e authors declare that they have no conflicts of interest
Acknowledgments
)e authors of this work were supported by the CCF-NSFOCUS KunPeng Research Fund (2018008) the SichuanUniversity Postdoc Research Foundation (19XJ0002) andthe Frontier Science and Technology Innovation Projects ofNational Key Research and Development Program(2019QY1405) )e authors would like to thank BaimaohuiSecurity Research Institute for providing some experimentdata
References
[1] M B Salem S Hershkop and S J Stolfo ldquoA survey ofinsider attack detection researchrdquo in Insider Attack andCyber Security pp 69ndash90 Springer Berlin Germany 2008
[2] F Sabahi and A Movaghar ldquoIntrusion detection a surveyrdquo inProceedings of the 2008 gtird International Conference onSystems and Networks Communications pp 23ndash26 IEEESliema Malta October 2008
[3] I Mokube andM Adams ldquoHoneypots concepts approachesand challengesrdquo in Proceedings of the 45th Annual SoutheastRegional Conference pp 321ndash326 ACMWinston-Salem NCUSA March 2007
[4] L Spitzner ldquo)e honeynet project trapping the hackersrdquoIEEE Security amp Privacy vol 1 no 2 pp 15ndash23 2003
[5] O )onnard and M Dacier ldquoA framework for attack pat-ternsrsquo discovery in honeynet datardquoDigital Investigation vol 5pp 128ndash139 2008
[6] W Fan Z Du M Smith-Creasey and D FernandezldquoHoneyDOC an efficient honeypot architecture enabling all-round designrdquo IEEE Journal on Selected Areas in Commu-nications vol 37 no 3 pp 683ndash697 2019
[7] K Sadasivam B Samudrala and T A Yang ldquoDesign ofnetwork security projects using honeypotsrdquo Journal ofComputing Sciences in Colleges vol 20 pp 282ndash293 2005
[8] M Mansoori O Zakaria and A Gani ldquoImproving exposureof intrusion deception system through implementation ofhybrid honeypotrdquo gte International Arab Journal of In-formation Technology vol 9 no 5 pp 436ndash444 2012
[9] G Portokalidis and H Bos ldquoSweetBait zero-hour wormdetection and containment using low- and high-interactionhoneypotsrdquo Computer Networks vol 51 no 5 pp 1256ndash12742007
[10] W Fan Z Du D Fernandez and V A Villagra ldquoEnabling ananatomic view to investigate honeypot systems a surveyrdquoIEEE Systems Journal vol 12 no 4 pp 3906ndash3919 2017
[11] M L Bringer C A Chelmecki and H Fujinoki ldquoA surveyrecent advances and future trends in honeypot researchrdquoInternational Journal of Computer Network and InformationSecurity vol 4 no 10 pp 63ndash75 2012
[12] P Wang L Wu R Cunningham and C C Zou ldquoHoneypotdetection in advanced botnet attacksrdquo International Journal ofInformation and Computer Security vol 4 no 1 pp 30ndash512010
[13] K Papazis and N Chilamkurti ldquoDetecting indicators ofdeception in emulated monitoring systemsrdquo Service OrientedComputing and Applications vol 13 no 1 pp 17ndash29 2019
Table 6 Recall precision F1-score and accuracy of the proposedmodel
Recall Precision F1-score Accuracy Data size082 083 082 090 2413
00 02 04 06 08 10
00
02
04
06
08
10
False positive rate
ROC curves
True
pos
itive
rate
Random forest (AUC = 093)SVM (AUC = 090)
kNN (AUC = 090)Naive_bayes(AUC = 060)
Figure 5 ROC curves about four algorithms
Security and Communication Networks 7
[14] W Fan and D Fernandez ldquoA novel SDN based stealthy TCPconnection handover mechanism for hybrid honeypot sys-temsrdquo in Proceedings of the 2017 IEEE Conference on NetworkSoftwarization (NetSoft) pp 1ndash9 IEEE Bologna Italy July2017
[15] H Artail H Safa M Sraj I Kuwatly and Z Al-Masri ldquoAhybrid honeypot framework for improving intrusion detectionsystems in protecting organizational networksrdquo Computers ampSecurity vol 25 no 4 pp 274ndash288 2006
[16] T Holz and F Raynal ldquoDetecting honeypots and othersuspicious environmentsrdquo in Proceedings from the SixthAnnual IEEE SMC Information Assurance Workshoppp 29ndash36 IEEE West Point NY USA June 2005
[17] P Defibaugh-Chavez R Veeraghattam M KannappaS Mukkamala and A Sung ldquoNetwork based detection ofvirtual environments and low interaction honeypotsrdquo inProceedings of the 2006 IEEE SMC Information AssuranceWorkshop pp 283ndash289 IEEE West Point NY USA June2006
[18] X Fu W Yu D Cheng X Tan K Streff and S Graham ldquoOnrecognizing virtual honeypots and countermeasuresrdquo inProceedings of the 2006 2nd IEEE International Symposium onDependable Autonomic and Secure Computing pp 211ndash218IEEE Indianapolis IN USA September 2006
[19] S Mukkamala K Yendrapalli R Basnet M K Shankarapaniand A H Sung ldquoDetection of virtual environments and lowinteraction honeypotsrdquo in Proceedings of the 2007 IEEE SMCInformation Assurance and Security Workshop pp 92ndash98IEEE West Point NY USA June 2007
[20] J Papalitsas S Rauti and V Leppanen ldquoA comparison ofrecord and play honeypot designsrdquo in Proceedings of the 18thInternational Conference on Computer Systems and Technol-ogies pp 133ndash140 ACM Ruse Bulgaria June 2017
[21] R M Campbell K Padayachee and T Masombuka ldquoAsurvey of honeypot research trends and opportunitiesrdquo inProceedings of the 2015 10th International Conference forInternet Technology and Secured Transactions (ICITST)pp 208ndash212 IEEE London UK December 2015
[22] F Y-S Lin Y-S Wang and M-Y Huang ldquoEffective pro-active and reactive defense strategies against malicious attacksin a virtualized honeynetrdquo Journal of Applied Mathematicsvol 2013 Article ID 518213 11 pages 2013
[23] O Surnin F Hussain R Hussain et al ldquoProbabilistic esti-mation of honeypot detection in Internet of things envi-ronmentrdquo in Proceedings of the 2019 International Conferenceon Computing Networking and Communications (ICNC)pp 191ndash196 IEEE Honolulu HI USA February 2019
[24] Z Wang X Feng Y Niu C Zhang and J Su ldquoTSMWD ahigh-speed malicious web page detection system based ontwo-step classifiersrdquo in Proceedings of the 2017 InternationalConference on Networking and Network Applications (NaNA)pp 170ndash175 IEEE Kathmandu City Nepal October 2017
[25] D Wenda and D Ning ldquoA honeypot detection method basedon characteristic analysis and environment detectionrdquo in 2011International Conference in Electrics Communication andAutomatic Control Proceedings pp 201ndash206 Springer BerlinGermany 2012
[26] N Cristianini and J Shawe-TaylorAn Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[27] N Provos ldquoA virtual honeypot frameworkrdquo in Proceedings ofthe USENIX Security Symposium vol 173 pp 1ndash14 SanDiego CA USA August 2004
[28] N Krawetz ldquoAnti-honeypot technologyrdquo IEEE Security ampPrivacy Magazine vol 2 no 1 pp 76ndash79 2004
[29] K Veena and K Meena ldquoImplementing file and real timebased intrusion detections in secure direct method usingadvanced honeypotrdquo Cluster Computing pp 1ndash8 2018
[30] R Tyagi T Paul B S Manoj and B )anudas ldquoPacketinspection for unauthorized OS detection in enterprisesrdquoIEEE Security amp Privacy vol 13 no 4 pp 60ndash65 2015
[31] Y Liu and X Yao ldquoEnsemble learning via negative corre-lationrdquo Neural Networks vol 12 no 10 pp 1399ndash1404 1999
[32] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001
[33] A Liaw and M Wiener ldquoClassification and regression byrandomForestrdquo R News vol 2 no 3 pp 18ndash22 2002
[34] R Genuer J-M Poggi and C Tuleau-Malot ldquoVariable se-lection using random forestsrdquo Pattern Recognition Lettersvol 31 no 14 pp 2225ndash2236 2010
[35] T Fushiki ldquoEstimation of prediction error by using K-foldcross-validationrdquo Statistics and Computing vol 21 no 2pp 137ndash146 2011
8 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
prove the final model has a high detection rate and a goodability of generalization
5 Conclusion
As an active cybersecurity defense strategy the emergenceof honeypots further reduces the attackerrsquos activity spaceAlthough honeypot technology is constantly improvingattackers are constantly searching for weaknesses inhoneypots to identify them)ere is an old saying in Chinathe authorities are fascinated bystanders clear For hon-eypot researchers in order to promote the development ofhoneypot technology it is necessary to learn and discoverthe method of identifying honeypot remotely
Based on the research of early honeypot and honeypotdetection technology this paper proposed a honeypot de-tection technology based onmachine learning algorithms)etechnology used machine learning technology to achieveaccurate honeypot detection through extraction and collectiondifferent layer honeypot features )e experimental resultproved that honeypots are still insufficient in the degree ofsimulating real systems such as the integrity of the simulatedservice At the same time the experimental result provided areference for the improvement of honeypots and promotedthe development of honeypot technology
Data Availability
)e experiment data used to support the findings of thisstudy are available from the corresponding author uponrequest
Conflicts of Interest
)e authors declare that they have no conflicts of interest
Acknowledgments
)e authors of this work were supported by the CCF-NSFOCUS KunPeng Research Fund (2018008) the SichuanUniversity Postdoc Research Foundation (19XJ0002) andthe Frontier Science and Technology Innovation Projects ofNational Key Research and Development Program(2019QY1405) )e authors would like to thank BaimaohuiSecurity Research Institute for providing some experimentdata
References
[1] M B Salem S Hershkop and S J Stolfo ldquoA survey ofinsider attack detection researchrdquo in Insider Attack andCyber Security pp 69ndash90 Springer Berlin Germany 2008
[2] F Sabahi and A Movaghar ldquoIntrusion detection a surveyrdquo inProceedings of the 2008 gtird International Conference onSystems and Networks Communications pp 23ndash26 IEEESliema Malta October 2008
[3] I Mokube andM Adams ldquoHoneypots concepts approachesand challengesrdquo in Proceedings of the 45th Annual SoutheastRegional Conference pp 321ndash326 ACMWinston-Salem NCUSA March 2007
[4] L Spitzner ldquo)e honeynet project trapping the hackersrdquoIEEE Security amp Privacy vol 1 no 2 pp 15ndash23 2003
[5] O )onnard and M Dacier ldquoA framework for attack pat-ternsrsquo discovery in honeynet datardquoDigital Investigation vol 5pp 128ndash139 2008
[6] W Fan Z Du M Smith-Creasey and D FernandezldquoHoneyDOC an efficient honeypot architecture enabling all-round designrdquo IEEE Journal on Selected Areas in Commu-nications vol 37 no 3 pp 683ndash697 2019
[7] K Sadasivam B Samudrala and T A Yang ldquoDesign ofnetwork security projects using honeypotsrdquo Journal ofComputing Sciences in Colleges vol 20 pp 282ndash293 2005
[8] M Mansoori O Zakaria and A Gani ldquoImproving exposureof intrusion deception system through implementation ofhybrid honeypotrdquo gte International Arab Journal of In-formation Technology vol 9 no 5 pp 436ndash444 2012
[9] G Portokalidis and H Bos ldquoSweetBait zero-hour wormdetection and containment using low- and high-interactionhoneypotsrdquo Computer Networks vol 51 no 5 pp 1256ndash12742007
[10] W Fan Z Du D Fernandez and V A Villagra ldquoEnabling ananatomic view to investigate honeypot systems a surveyrdquoIEEE Systems Journal vol 12 no 4 pp 3906ndash3919 2017
[11] M L Bringer C A Chelmecki and H Fujinoki ldquoA surveyrecent advances and future trends in honeypot researchrdquoInternational Journal of Computer Network and InformationSecurity vol 4 no 10 pp 63ndash75 2012
[12] P Wang L Wu R Cunningham and C C Zou ldquoHoneypotdetection in advanced botnet attacksrdquo International Journal ofInformation and Computer Security vol 4 no 1 pp 30ndash512010
[13] K Papazis and N Chilamkurti ldquoDetecting indicators ofdeception in emulated monitoring systemsrdquo Service OrientedComputing and Applications vol 13 no 1 pp 17ndash29 2019
Table 6 Recall precision F1-score and accuracy of the proposedmodel
Recall Precision F1-score Accuracy Data size082 083 082 090 2413
00 02 04 06 08 10
00
02
04
06
08
10
False positive rate
ROC curves
True
pos
itive
rate
Random forest (AUC = 093)SVM (AUC = 090)
kNN (AUC = 090)Naive_bayes(AUC = 060)
Figure 5 ROC curves about four algorithms
Security and Communication Networks 7
[14] W Fan and D Fernandez ldquoA novel SDN based stealthy TCPconnection handover mechanism for hybrid honeypot sys-temsrdquo in Proceedings of the 2017 IEEE Conference on NetworkSoftwarization (NetSoft) pp 1ndash9 IEEE Bologna Italy July2017
[15] H Artail H Safa M Sraj I Kuwatly and Z Al-Masri ldquoAhybrid honeypot framework for improving intrusion detectionsystems in protecting organizational networksrdquo Computers ampSecurity vol 25 no 4 pp 274ndash288 2006
[16] T Holz and F Raynal ldquoDetecting honeypots and othersuspicious environmentsrdquo in Proceedings from the SixthAnnual IEEE SMC Information Assurance Workshoppp 29ndash36 IEEE West Point NY USA June 2005
[17] P Defibaugh-Chavez R Veeraghattam M KannappaS Mukkamala and A Sung ldquoNetwork based detection ofvirtual environments and low interaction honeypotsrdquo inProceedings of the 2006 IEEE SMC Information AssuranceWorkshop pp 283ndash289 IEEE West Point NY USA June2006
[18] X Fu W Yu D Cheng X Tan K Streff and S Graham ldquoOnrecognizing virtual honeypots and countermeasuresrdquo inProceedings of the 2006 2nd IEEE International Symposium onDependable Autonomic and Secure Computing pp 211ndash218IEEE Indianapolis IN USA September 2006
[19] S Mukkamala K Yendrapalli R Basnet M K Shankarapaniand A H Sung ldquoDetection of virtual environments and lowinteraction honeypotsrdquo in Proceedings of the 2007 IEEE SMCInformation Assurance and Security Workshop pp 92ndash98IEEE West Point NY USA June 2007
[20] J Papalitsas S Rauti and V Leppanen ldquoA comparison ofrecord and play honeypot designsrdquo in Proceedings of the 18thInternational Conference on Computer Systems and Technol-ogies pp 133ndash140 ACM Ruse Bulgaria June 2017
[21] R M Campbell K Padayachee and T Masombuka ldquoAsurvey of honeypot research trends and opportunitiesrdquo inProceedings of the 2015 10th International Conference forInternet Technology and Secured Transactions (ICITST)pp 208ndash212 IEEE London UK December 2015
[22] F Y-S Lin Y-S Wang and M-Y Huang ldquoEffective pro-active and reactive defense strategies against malicious attacksin a virtualized honeynetrdquo Journal of Applied Mathematicsvol 2013 Article ID 518213 11 pages 2013
[23] O Surnin F Hussain R Hussain et al ldquoProbabilistic esti-mation of honeypot detection in Internet of things envi-ronmentrdquo in Proceedings of the 2019 International Conferenceon Computing Networking and Communications (ICNC)pp 191ndash196 IEEE Honolulu HI USA February 2019
[24] Z Wang X Feng Y Niu C Zhang and J Su ldquoTSMWD ahigh-speed malicious web page detection system based ontwo-step classifiersrdquo in Proceedings of the 2017 InternationalConference on Networking and Network Applications (NaNA)pp 170ndash175 IEEE Kathmandu City Nepal October 2017
[25] D Wenda and D Ning ldquoA honeypot detection method basedon characteristic analysis and environment detectionrdquo in 2011International Conference in Electrics Communication andAutomatic Control Proceedings pp 201ndash206 Springer BerlinGermany 2012
[26] N Cristianini and J Shawe-TaylorAn Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[27] N Provos ldquoA virtual honeypot frameworkrdquo in Proceedings ofthe USENIX Security Symposium vol 173 pp 1ndash14 SanDiego CA USA August 2004
[28] N Krawetz ldquoAnti-honeypot technologyrdquo IEEE Security ampPrivacy Magazine vol 2 no 1 pp 76ndash79 2004
[29] K Veena and K Meena ldquoImplementing file and real timebased intrusion detections in secure direct method usingadvanced honeypotrdquo Cluster Computing pp 1ndash8 2018
[30] R Tyagi T Paul B S Manoj and B )anudas ldquoPacketinspection for unauthorized OS detection in enterprisesrdquoIEEE Security amp Privacy vol 13 no 4 pp 60ndash65 2015
[31] Y Liu and X Yao ldquoEnsemble learning via negative corre-lationrdquo Neural Networks vol 12 no 10 pp 1399ndash1404 1999
[32] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001
[33] A Liaw and M Wiener ldquoClassification and regression byrandomForestrdquo R News vol 2 no 3 pp 18ndash22 2002
[34] R Genuer J-M Poggi and C Tuleau-Malot ldquoVariable se-lection using random forestsrdquo Pattern Recognition Lettersvol 31 no 14 pp 2225ndash2236 2010
[35] T Fushiki ldquoEstimation of prediction error by using K-foldcross-validationrdquo Statistics and Computing vol 21 no 2pp 137ndash146 2011
8 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
[14] W Fan and D Fernandez ldquoA novel SDN based stealthy TCPconnection handover mechanism for hybrid honeypot sys-temsrdquo in Proceedings of the 2017 IEEE Conference on NetworkSoftwarization (NetSoft) pp 1ndash9 IEEE Bologna Italy July2017
[15] H Artail H Safa M Sraj I Kuwatly and Z Al-Masri ldquoAhybrid honeypot framework for improving intrusion detectionsystems in protecting organizational networksrdquo Computers ampSecurity vol 25 no 4 pp 274ndash288 2006
[16] T Holz and F Raynal ldquoDetecting honeypots and othersuspicious environmentsrdquo in Proceedings from the SixthAnnual IEEE SMC Information Assurance Workshoppp 29ndash36 IEEE West Point NY USA June 2005
[17] P Defibaugh-Chavez R Veeraghattam M KannappaS Mukkamala and A Sung ldquoNetwork based detection ofvirtual environments and low interaction honeypotsrdquo inProceedings of the 2006 IEEE SMC Information AssuranceWorkshop pp 283ndash289 IEEE West Point NY USA June2006
[18] X Fu W Yu D Cheng X Tan K Streff and S Graham ldquoOnrecognizing virtual honeypots and countermeasuresrdquo inProceedings of the 2006 2nd IEEE International Symposium onDependable Autonomic and Secure Computing pp 211ndash218IEEE Indianapolis IN USA September 2006
[19] S Mukkamala K Yendrapalli R Basnet M K Shankarapaniand A H Sung ldquoDetection of virtual environments and lowinteraction honeypotsrdquo in Proceedings of the 2007 IEEE SMCInformation Assurance and Security Workshop pp 92ndash98IEEE West Point NY USA June 2007
[20] J Papalitsas S Rauti and V Leppanen ldquoA comparison ofrecord and play honeypot designsrdquo in Proceedings of the 18thInternational Conference on Computer Systems and Technol-ogies pp 133ndash140 ACM Ruse Bulgaria June 2017
[21] R M Campbell K Padayachee and T Masombuka ldquoAsurvey of honeypot research trends and opportunitiesrdquo inProceedings of the 2015 10th International Conference forInternet Technology and Secured Transactions (ICITST)pp 208ndash212 IEEE London UK December 2015
[22] F Y-S Lin Y-S Wang and M-Y Huang ldquoEffective pro-active and reactive defense strategies against malicious attacksin a virtualized honeynetrdquo Journal of Applied Mathematicsvol 2013 Article ID 518213 11 pages 2013
[23] O Surnin F Hussain R Hussain et al ldquoProbabilistic esti-mation of honeypot detection in Internet of things envi-ronmentrdquo in Proceedings of the 2019 International Conferenceon Computing Networking and Communications (ICNC)pp 191ndash196 IEEE Honolulu HI USA February 2019
[24] Z Wang X Feng Y Niu C Zhang and J Su ldquoTSMWD ahigh-speed malicious web page detection system based ontwo-step classifiersrdquo in Proceedings of the 2017 InternationalConference on Networking and Network Applications (NaNA)pp 170ndash175 IEEE Kathmandu City Nepal October 2017
[25] D Wenda and D Ning ldquoA honeypot detection method basedon characteristic analysis and environment detectionrdquo in 2011International Conference in Electrics Communication andAutomatic Control Proceedings pp 201ndash206 Springer BerlinGermany 2012
[26] N Cristianini and J Shawe-TaylorAn Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[27] N Provos ldquoA virtual honeypot frameworkrdquo in Proceedings ofthe USENIX Security Symposium vol 173 pp 1ndash14 SanDiego CA USA August 2004
[28] N Krawetz ldquoAnti-honeypot technologyrdquo IEEE Security ampPrivacy Magazine vol 2 no 1 pp 76ndash79 2004
[29] K Veena and K Meena ldquoImplementing file and real timebased intrusion detections in secure direct method usingadvanced honeypotrdquo Cluster Computing pp 1ndash8 2018
[30] R Tyagi T Paul B S Manoj and B )anudas ldquoPacketinspection for unauthorized OS detection in enterprisesrdquoIEEE Security amp Privacy vol 13 no 4 pp 60ndash65 2015
[31] Y Liu and X Yao ldquoEnsemble learning via negative corre-lationrdquo Neural Networks vol 12 no 10 pp 1399ndash1404 1999
[32] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45no 1 pp 5ndash32 2001
[33] A Liaw and M Wiener ldquoClassification and regression byrandomForestrdquo R News vol 2 no 3 pp 18ndash22 2002
[34] R Genuer J-M Poggi and C Tuleau-Malot ldquoVariable se-lection using random forestsrdquo Pattern Recognition Lettersvol 31 no 14 pp 2225ndash2236 2010
[35] T Fushiki ldquoEstimation of prediction error by using K-foldcross-validationrdquo Statistics and Computing vol 21 no 2pp 137ndash146 2011
8 Security and Communication Networks
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom
International Journal of
AerospaceEngineeringHindawiwwwhindawicom Volume 2018
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Shock and Vibration
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwwwhindawicom
Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwwwhindawicom Volume 2018
International Journal of
RotatingMachinery
Hindawiwwwhindawicom Volume 2018
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Navigation and Observation
International Journal of
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
Submit your manuscripts atwwwhindawicom