CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

14
Research Article CBO-IE: A Data Mining Approach for Healthcare IoT Dataset Using Chaotic Biogeography-Based Optimization and Information Entropy Manish Kumar Ahirwar , 1 Piyush Kumar Shukla , 1 and Rakesh Singhai 2 1 Department of Computer Science and Engineering, UIT-RGPV, Bhopal 462033, India 2 University Institute of Technology RGPV, Shivpuri, M.P. 473551, India Correspondence should be addressed to Piyush Kumar Shukla; [email protected] Received 24 July 2021; Accepted 22 September 2021; Published 8 October 2021 Academic Editor: Punit Gupta Copyright © 2021 Manish Kumar Ahirwar et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Data mining is mostly utilized for a huge variety of applications in several fields like education, medical, surveillance, and industries. e clustering is an important method of data mining, in which data elements are divided into groups (clusters) to provide better quality data analysis. e Biogeography-Based Optimization (BO) is the latest metaheuristic approach, which is applied to resolve several complex optimization problems. Here, a Chaotic Biogeography-Based Optimization approach using Information Entropy (CBO-IE) is implemented to perform clustering over healthcare IoT datasets. e main objective of CBO-IE is to provide proficient and precise data point distribution in datasets by using Information Entropy concepts and to initialize the population by using chaos theory. Both Information Entropy and chaos theory are facilitated to improve the convergence speed of BO in global search area for selecting the cluster heads and cluster members more accurately. e CBO-IE is implemented to a MATLAB 2021a tool over eight healthcare IoTdatasets, and the results illustrate the superior performance of CBO-IE based on F- Measure, intracluster distance, running time complexity, purity index, statistical analysis, root mean square error, accuracy, and standard deviation as compared to previous techniques of clustering like K-Means, GA, PSO, ALO, and BO approaches. 1.Introduction e big data [1, 2] is used and analyzed in several wireless applications by utilizing various characteristics like storage, processing, and maintenance of data. e preprocessing of a huge amount of data is performed before analysis to reduce the data redundancy with enhancing the data accuracy and efficiency [3–5]. e big data is processed by using various nature inspired optimization approaches like Genetic Al- gorithm (GA), Ant Colony Optimization (ACO), Ant Lion Optimization (ALO), and Particle Swarm Optimization (PSO) to perform optimal analysis [6]. e data mining [7–9] is a systematic procedure utilized for extracting the secret knowledge and model from an immense, multifaceted, and multidimensional dataset [10]. e association rule mining is one of the famous data mining approaches introduced with MapReduce concept to evaluate the relationship among data elements in a huge dataset [11]. ese data relationships are recognized to generate maxi- mum profit from marketing through IoT devices in in- dustries. e time and security are crucial issues in business, which are frequently resolved by using data mining [12–14]. e data clustering [15] is a type of data mining, in which data components are divided into various sets or groups. e data are collected from various heterogeneous resources; after that time series clustering is applied on this huge amount of data to improve the data accessibility. e in- dustry data are distributed more precisely and accurately by clustering to predict the future aspects of market [16]. e cybercrime data are analyzed after preprocessing to perform training and testing of clustering techniques. e nature of crime is easily understood and detected by clustering the Hindawi Scientific Programming Volume 2021, Article ID 8715668, 14 pages https://doi.org/10.1155/2021/8715668

Transcript of CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

Page 1: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

Research ArticleCBO-IE A Data Mining Approach for Healthcare IoT DatasetUsing Chaotic Biogeography-Based Optimization andInformation Entropy

Manish Kumar Ahirwar 1 Piyush Kumar Shukla 1 and Rakesh Singhai 2

1Department of Computer Science and Engineering UIT-RGPV Bhopal 462033 India2University Institute of Technology RGPV Shivpuri MP 473551 India

Correspondence should be addressed to Piyush Kumar Shukla pphdwssgmailcom

Received 24 July 2021 Accepted 22 September 2021 Published 8 October 2021

Academic Editor Punit Gupta

Copyright copy 2021 Manish Kumar Ahirwar et al )is is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in anymedium provided the original work isproperly cited

Data mining is mostly utilized for a huge variety of applications in several fields like education medical surveillance andindustries )e clustering is an important method of data mining in which data elements are divided into groups (clusters) toprovide better quality data analysis )e Biogeography-Based Optimization (BO) is the latest metaheuristic approach which isapplied to resolve several complex optimization problems Here a Chaotic Biogeography-Based Optimization approach usingInformation Entropy (CBO-IE) is implemented to perform clustering over healthcare IoTdatasets )e main objective of CBO-IEis to provide proficient and precise data point distribution in datasets by using Information Entropy concepts and to initialize thepopulation by using chaos theory Both Information Entropy and chaos theory are facilitated to improve the convergence speed ofBO in global search area for selecting the cluster heads and cluster members more accurately )e CBO-IE is implemented to aMATLAB 2021a tool over eight healthcare IoTdatasets and the results illustrate the superior performance of CBO-IE based on F-Measure intracluster distance running time complexity purity index statistical analysis root mean square error accuracy andstandard deviation as compared to previous techniques of clustering like K-Means GA PSO ALO and BO approaches

1 Introduction

)e big data [1 2] is used and analyzed in several wirelessapplications by utilizing various characteristics like storageprocessing and maintenance of data )e preprocessing of ahuge amount of data is performed before analysis to reducethe data redundancy with enhancing the data accuracy andefficiency [3ndash5] )e big data is processed by using variousnature inspired optimization approaches like Genetic Al-gorithm (GA) Ant Colony Optimization (ACO) Ant LionOptimization (ALO) and Particle Swarm Optimization(PSO) to perform optimal analysis [6]

)e data mining [7ndash9] is a systematic procedure utilizedfor extracting the secret knowledge and model from animmense multifaceted and multidimensional dataset [10])e association rule mining is one of the famous datamining

approaches introduced with MapReduce concept to evaluatethe relationship among data elements in a huge dataset [11])ese data relationships are recognized to generate maxi-mum profit from marketing through IoT devices in in-dustries )e time and security are crucial issues in businesswhich are frequently resolved by using data mining [12ndash14]

)e data clustering [15] is a type of data mining in whichdata components are divided into various sets or groups)edata are collected from various heterogeneous resourcesafter that time series clustering is applied on this hugeamount of data to improve the data accessibility )e in-dustry data are distributed more precisely and accurately byclustering to predict the future aspects of market [16] )ecybercrime data are analyzed after preprocessing to performtraining and testing of clustering techniques )e nature ofcrime is easily understood and detected by clustering the

HindawiScientific ProgrammingVolume 2021 Article ID 8715668 14 pageshttpsdoiorg10115520218715668

similar type of crime patterns to reduce the crime ratio [17])e healthcare data are also received by IoT devices in theform of data stream and divided into cluster by using streamclustering combining the cluster building and merging steps[18]

In above analysis the clustering of data is an extremelychallenging issue while it creates baffling to select the op-timal clustering strategy in nature Additionally it is to be anincredibly exigent attempt as every dataset is not expected tobe consistent allowing for the fact that infection form andprevious medical circumstance area might differ immenselyin training Previously having a meticulous prophecyscheme does not constantly offer a valuable height of ex-actness beneath entire healthcare application areas due tomostly depending on the situation utilized

A lot of clustering algorithms are introduced on variousdatasets to enhance the data extraction Several optimizationapproaches like GA ALO and PSO are also described forclustering Although few clustering techniques realize anadvanced output by means of a specified dataset the effi-ciency of such clustering techniques might be comple-mentary on other datasets )e nature of the clusteringmethods is steady by means of no-free-lunch theorem but athand no solitary clustering method survives which is able tobe a cure for entire problems )is means all the problemsare not resolved by any algorithm and convergence speed isalso a major issue in optimization techniques which areused for data clustering

Here a Chaotic Biogeography-Based Optimization ap-proach using Information Entropy (CBO-IE) is imple-mented for clustering over healthcare IoT datasets which isimproved form of Biogeography-Based Optimization (BO)approach In its place of basically performing a qualitativeexamination by utilizing a methodical mapping analysisregarding prior implemented works this proposed workcontributes on a quantitative examination of clusteringstrategy for healthcare dataset

)e contribution of implementing CBO-IE is as follows

(1) )e major intent of CBO-IE is to obtain accuratedata point distribution in dataset by utilizing In-formation Entropy strategy and to give initial valuesof the population by utilizing chaos theory

(2) Both Information Entropy and chaos theory areintroduced to enhance the convergence speed of BOin exploration field globally for generating the clusterheads and cluster members more precisely

(3) )e CBO-IE is applied over eight healthcare IoTdatasets and the outputs are evaluated on the basis ofF-Measure intracluster distance running timecomplexity purity index statistical analysis stan-dard deviation root mean square error and accuracyas compared to previous techniques of clustering likeK-Means GA PSO ALO and BO approaches

)e remainder of the paper is organized as followsSection 2 explains the literature survey of data miningtechniques big data processing and IoT dataset processingby utilizing numerous parameters Section 3 describes the

BO approach and the proposed CBO-IE is explained inSection 4 briefly according to flowchart algorithms andpreliminaries )e datasets performance factors and ex-perimental results are illustrated in Section 5 and at lastconclusions are given in Section 6

2 Literature Survey

)e IoT [19] is situated to transfigure entire future conse-quently for lives )e information extracted from the IoTsystems will be explored to recognize and organize multi-faceted surroundings about users fitting superior decisioncreation larger computerization advanced effectivenessefficiency exactness and prosperity production )is workexplores the accessibility of multiple popular data miningapproaches for IoT information )e results are evaluatedover artificial intelligence and neural network with superioraccuracy efficiency and velocity in comparison to priortechniques [20]

)e K-Means approach is applied for processing the bigdata with the help of Hadoop platform over Internet of)ings (IoT) aspects )e data from IoT devices are com-bined or distributed in several groups known as clusterswhich are used in various smart applications such asmedical disaster and industries applications )e powerexpenses and traffic over communication network was re-duced to use only necessary information not whole rawdata In actuality it supplies enormous flexibility permittingthe formation of clusters having millions of Hadoop illus-trations [21]

)is big data has raised the concept of space and timecomplexity due to storing and processing huge amount ofdata over mobile and dynamic network )e ExtensibleMark-up Language (XML) and machine learning methodsapply to big data processing like classification clusteringand preprocessing of IoT information [22 23] Latest IoTapplications and performances remain further on an in-tellectual indulgence of the surroundings from informationextracted by means of assorted sensors and small machinesHere a framework combines an ontology dependent de-scription of information dispensations by means of unusuallogic for pretty beneficial incident recognition by seasoningthe specific classification strategy of machine learning Aroad and traffic examination is performed to verify theresults of framework [24]

)e personal data of users are digitally distributed overnetwork through smart IoT devices and multiple artificialtechniques are introduced to control the digitally transferredinformation following few rules and policies An intellectualpolice examination strategy is introduced discovering thecontradictory policy or agreements of users over advocateplatform An intellectual decision-making method is appliedwith the help of fuzzy cognitive maps [25]

)e business is directly or indirectly related to customerbehaviour evaluated by trust mining )e trust of sellers isevaluated by dynamic clustering where data is modified dayby day over huge network area )e quick development ofonline shopping details illustrates a growing issue for cus-tomers who are comprised to select faithful sellers and

2 Scientific Programming

successful seller selections from numerous existing recordbeginning e-commerce usage areas to enhance the businessA dynamic clustering method is adopted to calculate thetrust of customers and differentiate the buying equivalenceof customers for predicting the customer behaviour )ereal-time and artificial datasets are utilized to performclustering and evaluate the results based on accuracy [26]

)e medical field [27] also well utilizes the miningtechniques like classification to identify the mental healthand evaluate the risk to manage the system )is classifi-cation is done by decision trees to discover and classify thepatients according to their mental health Here a relativecalculation of a huge array classifier is associated with severalareas like tree group neural possibility categorization andpolicy dependent classifiers )e linear and random classi-fiers are introduced to perform disease classification [28]

)e cancer disease in unstructured format is well clas-sified to discover several types of cancer disease It generatesvaluable results in terms of risk aspects treatment andmanagement )e text mining is well exploited in cancerdisease prediction like lung breast and ovarian cancer )edata about cancer patients is collected from several het-erogeneous resources and text mining is performed toprovide various cancer related information like survivaltreatment and risk of disease [29]

)e geographical data is collected from several hetero-geneous resources and data mining is initialized over thespatial data to manage the disaster )e natural hazards arepreviously identified before happening this is achieved byspatial data mining over huge geographic information )eseveral geographical information such as soil ocean earthrsquoscrust and air quality is used to employ the mining processwith the help of PSO and fuzzy logics )e clusters aredecided on the basis of spatial information and naturalbehaviours in environments [30]

)e data mining based educational information ex-traction is a new era of research in which distributed ed-ucational data is compiled and used for several purposes)estudents teachers and other staffs have played an importantrole in educational data mining )e educational data iscollected from several resources and saved in tabular format)ese student data can be further analyzed by university andexamined by data mining model to evaluate the results ofstudents and maintain the records in appropriate manner[31]

)e classification is also performed over data streamscombining the text music and video information with thehelp of decision trees and naive structure)ese data streamsalso combined the human activities like movement of handsand legs [32 33])ere are various key issues of classificationlike infinite size perception development and perceptionflow and characteristic assessment Various researchersmainly solved the issues of infinite size and perceptiondevelopment but the perception flow and characteristicassessment are not taken into consideration by researchersHere these two issues are removed in implementing aclassification method over big data and parameters like errorrate and running time are obtained to provide superiorefficiency of the method [34]

3 The Biogeography-Based Optimization(BO) Approach

ABiogeography-Based Optimization (BO) is a metaheuristicapproach based on island biogeography premise whichrelates with the migration evolution and destruction of thecastes in a habitat A Habitat Suitability Index (HSI) isutilized to generate the best solution globally and share thecharacteristics with week habitat )e population is en-hanced by taking the optimal solutions from prior iterationfrom emigrating [35] to immigrating habitats by using themigrating Suitability Index Variables (SIVs) in BOmigratingprocess A fresh characteristic is obtained in complete so-lution area using fitness function [36] and replaced everyhabitatrsquos SIVs arbitrarily and stochastically to enhance thepopulation [37] miscellany and searching strength in BOmutation process

31 Migration Process In migration stochastic processseveral parents can be utilized for a particular offspring andevery habitat (Ha) is modified by receiving the SIVs from asuperior HSI habitat in population (PS) )e migration ratesare straightly dependent on the caste number in a habitat forenhancing [38] the habitat miscellany and linear migration isevaluated by using the following equation

Ha(SIV) Hb(SIV) (1)

where Hb(SIV) is a superior HSI bth habitat (Hb) to transferthe SIV value to the ath habitat (Ha)

)e emigration rate (α) and immigration rate (β) areevaluated for c castes in habitat by using the two followingequations

αc Gc

Nmaximum (2)

βc M times 1 minus αc( 1113857 (3)

Here G and M are highest emigration and immigrationrates respectively and Nmaximum is maximum realizablenumber of castes utilized by habitat

)ere are six models developed for migration phase outof which sinusoidal model performed superior migration ascompared to the other five models )is model is evaluatedfor ath habitat by using the two following equations

βa M

21 + cos

aπNmaximum

1113888 11138891113888 1113889 (4)

αa G

21 minus cos

aπNmaximum

1113888 11138891113888 1113889 (5)

32 Mutation Process A mutation is a stochastic operatorenhancing the population miscellany to obtain optimalsolution [39] )e mutation process is utilized for updatingat least one arbitrary chosen SIV of a solution with the help

Scientific Programming 3

of mutation rate (Ta) and the previous possibility of sub-sistence (Sa) by using the following equation

Ta Tmaximum 1 minusSa

Smaximum1113888 1113889 (6)

where Tmaximum is the highest mutation rate term illustratedby user and Smaximum is the highest possibility of castes count

After that the mutation operator is improved by utilizingCauchy model to provide optimal exploration strength ofBO in huge searching area by reducing potential limitationof mutation strategy )e Cauchy distribution is evaluatedwith the help of probability density function which isformulized by the following equation

F(X 0 1) 1

π 1 + X2

1113872 1113873 (7)

Hence the Cauchy mutation equation is described byusing the following equation

Ha(SIV) minimum Ha(SIV)( 1113857 + maximum Ha(SIV)( 1113857(

minus minimum Ha(SIV)( 11138571113857 times π times F Ha(SIV) 0 1( 1113857

(8)

where Ha(SIV) is ath habitat F(Ha(SIV) 0 1) is Cauchydistribution (Algorithm 1)

4 TheChaotic BOApproachUsing InformationEntropy (CBO-IE)

An updated BO approach is proposed using the chaoticbehaviour and Information Entropy )e BO approach iswell suitable for exploration and exploitation in hugesearching space and generates the efficient results for nu-meric optimization [40] Still the numeric optimization isreasonably different from data clustering mechanism Heresome intrinsic characteristics are investigated and a chaoticBO is implemented using Information Entropy to apply theBO for data clustering

41 Element Information Entropy )e data points are dis-tributed in multidimensional area in data clustering strategy)e confusion degree of a probabilistic variable is calculatedby utilizing Information Entropy after that this is utilizedfor elements measurements to calculate the distribution)eentropy (Ea) is evaluated to isolate the element values withestimation of every value to its closer integer by using thefollowing equation

Ea minus 1113944

maxa

bmina

Rblog Rb( 1113857 (9)

where Ea is the entropy of ath element (a 1 2 3 L) and Lis number of elements in dataset mina and maxa are lowestand highest integers after discretization of element values Rbis bth element integer percentage value

)emigration procedure of BO population is guarded byusing element entropy hence maximum entropy is evalu-ated for every element in dataset by the following equation

maximum Ea( 1113857 minuslog1

maxa minus mina + 11113888 1113889 (10)

where maximum (Ea) is maximum entropy of ath element indataset

At last the normalized entropy for every element iscalculated by the following equation

nomalized Ea( 1113857 Ea

maximum Ea( 1113857 (11)

where normalized (Ea) is normalized entropy of ath elementin dataset

Equation (11) is repeated for all data elements in datasetand normalized entropy set (normalized (E)) is generated asfollows

Normalized (E) (normalized (E1) normalized(E2) normalized (EL))

42 Information Entropy-Based Migration Process in the BOApproach Here two mechanisms are introduced in mi-gration to improve the performance of BO approach )efirst one is original migration process of BO explained in theprevious section )e second one is updated mechanism oforiginal migration in which P particulars are elected arbi-trarily from present population (1lePlePS) and the oneoptimal particular is assigned as reference particular (Href ))e direction of migration is guided by Href and InformationEntropy is utilized to provide equivalency between pop-ulation miscellany and convergence speed )e higher In-formation Entropy of element indicates the maximumuncertainty which slows down the convergence speedHence the speed is enhanced for element by transferring itto the location based on reference particular globally withmaximum possibility as compared to the least InformationEntropy element

43 Chaos-Based Population Selection of the BO Approach)e chaos method is extremely related to initial circum-stances and successfully utilized for arbitrary numbergeneration using logistic function)e chaotic system is usedin the following equation

cf+1 μ times cf times 1 minus cf1113872 1113873 (12)

where μ represents constants in the range of [1 4] c rep-resents variables (c isin [0 1] f 0 1 )

)e population of BO is initialized by utilizing chaosfunction (equation (12)) which has improved the efficiencyof BO with proper use of huge solution area

44 6e Complete CBO-IE Approach for Data Mining)e Information Entropy-based migration and chaos-basedpopulation selection are combined in Chaotic Biogeogra-phy-Based Optimization using Information Entropy (CBO-IE) to solve the data mining that is data clustering inoptimal way (Figure 1) Firstly the population (PS) of BOapproach is initialized with the help of chaos function

4 Scientific Programming

(equation (12)) and every particular is denoted as a vectorwith size Lx L times K (L is number of elements in dataset Kis number of cluster centroids and Lx is dimension ofparticulars) )e K centroids locations are fixed into vectorin which 1st centroid relates with 1st L attributes 2nd centroidrelates to 2nd L attributes and so on )e primary particularvector values are generated arbitrarily and consistentlybetween lower and higher elements values in existing datasetwithin maximum number of iterations (Mi) )en CBO-IEis applied to calculate the fitness values of entire particularsby using equations (1) to (8) (Algorithm 2)

5 Results and Analysis

In this section the healthcare IoT datasets and performancefactors for proposed CBE-IE approach are explained briefly)e entire approaches are implemented with the help ofMATLAB 2021a tool using Core i3-3110M processor havingWindows 8 operating system with 2GB RAM and analyzedover 8 healthcare IoTdatasets (Table 1) )e proposed CBO-IE approach has evaluated the results for 500 iterations interms of intracluster distance purity index standard devi-ation root mean square error accuracy and F-Measure ascompared to prior algorithms like K-Means ACO PSOALO and BO over 30 independent runs

51 Healthcare IoT Datasets )e proposed CBO-IE ap-proach is applied on 8 dissimilar healthcare IoTdatasets takenfrom UCI repository (Table 1) )e datasets are thoracicsurgery breast cancer cryotherapy liver patient heart pa-tient chronic kidney disease diabetic retinopathy and bloodtransfusion)e 470 instances of thoracic surgery dataset with

17 attributes are divided into 2 classes (survive and notsurvive) the 699 instances of breast cancer dataset with 10attributes are distributed into 2 classes (benign and malig-nant) the 90 instances of cryotherapy dataset with 7 attributesare divided into 2 classes (success and failure) the 583 in-stances of liver patient dataset with 10 attributes are dis-tributed into 2 classes (liver patient and not) the 299 instancesof heart patient dataset with 12 attributes are divided into 2classes (heart failure and not) the 400 instances of chronickidney disease with 24 attributes are divided into 2 classes(disease found and not) the 1151 instances of diabetic reti-nopathy dataset with 18 attributes are distributed into 2classes (diabetic retinopathy and not) and the 748 instancesof blood transfusion dataset with 4 attributes are divided into2 classes (donating blood and not)

52 Performance Factors )e performance of proposedCBO-IE approach is evaluated in terms of intraclusterdistance purity index standard deviation root mean squareerror accuracy and F-Measure

521 Intracluster Distance Firstly the distances betweendata points are evaluated within a cluster After that theaverage of these distances is generated representing anintracluster distance)e optimal clustering is achieved withleast intracluster distance For a cluster the mean distance isevaluated between a centroid and total data points )is stepis continuously performed for every cluster and at last meanvalue of entire clustersrsquo intracluster distances is obtained

Table 2 illustrates that the CBO-IE generates leastintracluster distance value for all IoT datasets )e CBO-IEobtains 90 superior results than BO 92 superior results

STARTInitialize the parameters GM 1 Tmaximum 1 PS and Max_iterationInitialize the populations (arbitrary group of habitats) H1 H2 Hs

P

Evaluate fitness value (HSI) for every habitatWHILE (ending condition is not found)Evaluate αa βa and Ta for every habitatObtain a random isin (0 1)MigrationFOR every habitat (from minimum to maximum HSI values)Choose habitat Ha (SIV) stochastically proportional to βaIF randomlt βa and Ha (SIV) choose then

Choose habitat Hb (SIV) stochastically proportional to αaIF randomlt αa and Hb (SIV) choose thenHa (SIV)Hb (SIV)

END IFEND IF

END FORMUTATIONChoose Ha (SIV) with the help of mutation possibility proportional SaIF randomltTa thenArbitrary change the SIVs in Ha (SIV)

END IFEvaluate HSI value

END WHILESTOP

ALGORITHM 1 Biogeography-Based Optimization

Scientific Programming 5

than ALO 94 superior results than PSO 96 superiorresults than GA and 98 superior results than K-Means interms of intracluster distance for entire healthcare IoTdatasets )e average ranking of all approaches is evaluatedon the basis of minimum to maximum intracluster distance(from 1 to 6)

Figure 2 represents the better rank of proposed CBO-IEwith least intracluster distance values against other approachesK-Means GA PSO ALO and BO for all IoT datasets

522 Standard Deviation )e stiff data clustering in theregion of average value is described by a numerical char-acteristic called standard deviation (DS) )e best clusteringis achieved with less standard deviation which is obtained bythe following equation

DS

1113936(V minus V)

|H|

1113971

(13)

where |H| is dataset length V is dataset values (points) andV is dataset average value

Table 3 describes that the CBO-IE obtains less standarddeviation value for all IoT datasets )e CBO-IE generates93 better quality outcomes than BO 95 better qualityoutcomes than PSO and ALO 97 better quality outcomesthan GA and 98 better quality outcomes than K-Means interms of standard deviation for entire healthcare IoTdatasets

Figure 3 shows the better standard deviation of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

Initialize Particulars and entropy of everyelements

Initialize the population with the help of chaosfunction

Calculate Fitness values of particulars

Iteration = 0

Perform the information entropy basedmigration process (Algorithm 2)

Calculate the fitness values of new particulars

iteration = iteration + 1

Is ending situationsatisfied

NoYes

Result the optimal solution

End

Start

Figure 1 Flowchart of the CBO-IE approach

6 Scientific Programming

523 Purity Index )e correctness of clustering strategy isknown as purity in which accurate classification is per-formed over data elements Hence entire points of anisolated class can be accurately allocated to an isolatedcluster Purity index (IP) is generated with the help of purityby equations (14) and (15) Maximum purity is achieved withmaximum IP value nearer to 1

purity Rz( 1113857 maximum Rwz

111386811138681113868111386811138681113868111386811138681113872 1113873

Rz

11138681113868111386811138681113868111386811138681113868

(14)

IP

1113944K

z1

Rz

11138681113868111386811138681113868111386811138681113868Purity Rz( 11138571113872 1113873

|H| (15)

ALGORITHM 2 Information Entropy-based migration process in the BO approach

Table 1 Healthcare IoT datasets

Sr no Healthcare IoT dataset No of instances No of attributes No of clusters1 )oracic surgery 470 17 22 Breast cancer 699 10 23 Cryotherapy 90 7 24 Liver patient 583 10 25 Heart patient 299 12 26 Chronic kidney disease 400 24 27 Diabetic retinopathy 1151 18 28 Blood transfusion 748 4 2

Table 2 Average ranking of all algorithms based on intracluster distance

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 3364 (5) 07569 (4) 04256 (4) 02547 (3) 00657 (2) 000652 (1)Breast cancer 26265 (5) 08634 (4) 03126 (4) 02614 (3) 00874 (2) 000728 (1)Cryotherapy 243640 (5) 04562 (4) 02864 (4) 02167 (3) 00725 (2) 000548 (1)Liver patient 353245 (5) 05863 (4) 01567 (4) 01321 (3) 00635 (2) 000457 (1)Heart patient 142366 (5) 09362 (4) 07265 (4) 06492 (3) 00623 (2) 000832 (1)Chronic kidney disease 463625 (5) 05682 (4) 03568 (4) 02864 (3) 00742 (2) 000786 (1)Diabetic retinopathy 03658 (5) 02526 (4) 01737 (4) 01421 (3) 00967 (2) 000758 (1)Blood transfusion 06745 (5) 05684 (4) 02375 (4) 02068 (3) 00854 (2) 000852 (1)Average ranking (AKθ) 5 4 4 3 2 1

Scientific Programming 7

where Purity(Rz) is purity of zth cluster |Rz| is zth clusterlength |Rwz| is the number of data elements of wth classallocated to zth cluster

Table 4 explains that the CBO-IE generates maximumpurity index value for all IoT datasets )e CBO-IE obtains5 superior outputs than BO 7 superior outputs than

ALO 8 superior outputs than PSO 12 superior outputsthan GA and 15 superior outputs than K-Means in termsof purity index for entire healthcare IoT datasets

Figure 4 represents the better quality purity index ofproposed CBO-IE against other approaches K-Means GAPSO ALO and BO for all IoT datasets

524 F-Measure Firstly Precision (prsn) and Recall (rel)are evaluated to repossess the information by equations (16)and (17) After that both are combined to formalize the F-Measure (FM) by utilizing equations (18) and (19)

Table 3 Standard deviation for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 0116857 003984 0012036 001452 0007856 0000524Breast cancer 0002578 0168742 012542 015713 0003657 0000746Cryotherapy 00025143 0165875 0001254 0001674 0001036 0000845Liver patient 0256323 017589 0147856 012312 0096875 0000986Heart patient 0212678 0096852 0042574 0021563 0003687 0000251Chronic kidney disease 0020233 0103625 009854 0065214 0006574 0000865Diabetic retinopathy 0086574 0186523 0142657 0102647 0086478 00008657Blood transfusion 0213658 0196875 0185632 0152461 0010365 0000236

0123456

Datasets

Average Ranking

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sionIn

tra-c

luste

r Dist

ance

K-MeansGAPSO

ALOBOCBO-IE

Figure 2 Average ranking based on intracluster distance

0005

01015

02025

03

Stan

dard

Dev

iatio

n Standard Deviation

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 3 Standard deviation for all IoT datasets

Table 4 Purity index for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 083 086 089 089 091 093Breast cancer 086 088 090 089 093 095Cryotherapy 084 087 090 091 093 096Liver patient 086 087 088 086 090 092Heart patient 080 083 086 086 088 092Chronic kidneydisease 086 087 090 089 091 093

Diabetic retinopathy 088 090 091 091 093 095Blood transfusion 076 078 081 083 086 088

8 Scientific Programming

prsn(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rz

11138681113868111386811138681113868111386811138681113868

(16)

rel(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rw

11138681113868111386811138681113868111386811138681113868 (17)

FM(w z) 2 times prsn(w z) times rel(w z)

prsn(w z) + rel(w z) (18)

FM 1113944K

w1

Rw

11138681113868111386811138681113868111386811138681113868

|H|maximum FM(w z) (19)

where |Rw| is wth class lengthTable 5 describes that the CBO-IE obtains higher F-

Measure value for all IoTdatasets )e CBO-IE generates 4better outputs than BO 7 better outputs than ALO 8better outputs than PSO 13 better outputs than GA and16 better outputs than K-Means in terms of F-Measure forentire healthcare IoT datasets

Figure 5 shows the better F-Measure of proposed CBO-IE against other approaches K-Means GA PSO ALO andBO for all IoT datasets

525 Root Mean Square Error (RMSE) )e RMSE is de-fined as a divergence between predicted values and exper-imental (calculated) values )e best clustering is achievedwith minimum RMSE values of datasets It is evaluated byutilizing the following equation

RMSE

1|H|

1113944

|H|

j1Vj minus 1113954Vj1113872 1113873

2

11139741113972

(20)

where Vj is jth element value in dataset and 1113954Vj is j

th elementpredicted value in dataset

Table 6 explains that the CBO-IE generates least RMSEvalue for all IoT datasets )e CBO-IE generates 8 betterresults than BO 78 better results than ALO 81 better

results than PSO 87 better results than GA and 95 betterresults than K-Means in terms of RMSE for entire healthcareIoT datasets

Figure 6 shows the better RMSE of proposed CBO-IEagainst other approaches K-Means GA PSO ALO and BOfor all IoT datasets

526 Accuracy It evaluates the division of clusters that areaccurate (ie it evaluates proportion of verdicts that areexact) and describes the portion of clusters in the prevailinggroup

002040608

1

Purit

y In

dex

Purity Index

Datasets

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 4 Purity index for all IoT datasets

Table 5 F-Measure for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 081 084 087 087 089 091Breast cancer 084 086 087 088 090 093Cryotherapy 082 086 088 087 091 093Liver patient 084 086 087 087 089 091Heart patient 078 082 084 083 085 088Chronic kidneydisease 084 086 088 087 089 091

Diabetic retinopathy 086 089 090 091 092 094Blood transfusion 074 077 079 082 085 087

002040608

1

DatasetsTh

orac

ic S

urge

ry

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d

F-M

easu

re

F-Measure

K-MeansGAPSO

ALOBOCBO-IE

Figure 5 F-Measure for all IoT datasets

Table 6 RMSE values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-

IE)oracic surgery 0537 0327 0157 0124 00124 00034Breast cancer 0567 0386 0168 0264 00317 00075Cryotherapy 0621 0374 0234 0364 00421 00066Liver patient 0437 0361 0261 0213 00652 00054Heart patient 0537 0412 0201 0186 00145 00063Chronic kidneydisease 0610 0422 0341 0267 00254 00047

Diabeticretinopathy 0517 0414 0367 0257 00374 00087

Blood transfusion 0438 0314 0120 0265 00451 00071

Scientific Programming 9

accuracy 1113936

Kz1 Rz

|H|lowast 100 (21)

Table 7 represents that the CBO-IE obtains higher ac-curacy value for all IoT datasets )e CBO-IE has 3 su-perior outputs than BO 5 superior outputs than ALO 6superior outputs than PSO 9 superior outputs than GAand 13 superior outputs than K-Means in terms of ac-curacy for entire healthcare IoT datasets

Figure 7 represents the better accuracy of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

)e Information Entropy is used with BO approach forclustering to provide accurate and efficient distribution ofdata points in dataset )e chaos theory is utilized to ini-tialize the population of BO to improve the searching ca-pability of habitat which helps in cluster member selectionprocess more precisely )ese two strategies InformationEntropy and chaos theory have improved the performanceof BO to provide the best selection of cluster heads and theirmembers optimally So CBO-IE generates superior resultsthan BO ALO PSO GA and K-Means clusteringapproaches

53 Running Time Complexity of the CBO-IE Approach)e number of executions is directly related to timecomplexity of clustering approaches to run )e

complexity is calculated by utilizing few circumstances asfollows PS is population size L is number of elements indataset Lx represents dimensions of particulars (ele-ments) K is number of cluster centroids and Mi repre-sents maximum iterations )e cost of every execution isassumed as 1 unit )e Cost of all Executions (CE) isobtained to utilize Algorithm 3 by the followingequations

001020304050607

RMSE

Val

ues

RMSE Values

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 6 RMSE for all IoT datasets

Table 7 Accuracy values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 080 083 086 086 088 090Breast cancer 083 085 086 087 089 092Cryotherapy 081 085 087 086 090 092Liver patient 083 085 086 086 088 090Heart patient 077 081 083 082 084 087Chronic kidney disease 083 085 087 086 088 090Diabetic retinopathy 085 088 089 090 091 093Blood transfusion 073 076 078 081 084 086

002040608

1

Accu

racy

()

Accuracy

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 7 Accuracy for all IoT datasets

10 Scientific Programming

ALGORITHM 3 )e complete CBO-IE approach

Scientific Programming 11

CE 1 + L + 1 + PS

+ Mi + 1 + Mi times PS

+ Mi times PS

+ Mi times 2 times PS

times Lx + 3 times PS

+ 11113872 1113873

+ Mi times K times PS

+ 11113872 1113873 + Mi times K times PS

+ Mi times K times PS

+ Mi times K times PS

+ Mi times K times L + Mi times K times L + Mi times K times L

(22)

CE 2 times Mi times PS

times Lx + 4 times Mi times K times PS

+ 3 times Mi times K times L + 5 times Mi times PS

+ Mi times K + L + PS

+ 2 times Mi + 3 (23)

Supposing that all circumstances are equal in equation(23) in worst case equation (24) is generated as follows

CE 9n3

+ 6n2

+ 4n + 3 (24)

)e running time complexities of clustering approachesare O(n3) for CBO-IE O(n3) for BO O(n3) for ACO O(n3)for PSO O(n3) for ALO and O(n2) for K-Menas in worstcase Hence all are executable in polynomial time

54 Statistical Examination Measurement A statistical ex-amination is functioned to obtain the extension of signifi-cance dissimilarities in the effectiveness of clusteringtechniques Here a nonparametric Friedman Examination(FE) is utilized to discover dissimilarities amid the group ofserial appropriate variables )e entire clustering techniquesprovide equally effective explaining in the null hypothesis(Y0)

)e Friedman Examination (FE) is formulized by

FE 12 times N

DS

AC

AC

+ 11113872 11138731113944

5

θ1AKθ( 1113857

2minus

AC

times AC

+ 11113872 11138732

4⎡⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎦ (25)

where NDS is number of datasets AKθ is average rank of θth

approach and AC is number of clustering approaches)e FE critical value is 204925 taken from F-distribution

table [41] with (AC-1) and (ACndash1)lowast(NDSndash1) freedom degreewhich is obtained between (6ndash1) 5 and (6-1)lowast(8-1) 35 forapplying 6 clustering approaches (AC 6) over 8 datasets(NDS 8) with λ 010 (confidence stage) )e calculated FEvalue is higher as compared to the critical value for nullhypothesis (Y0) rejection if Y0 is not accepted)e evaluatedFE value is 1714 for initiating 6 clustering approaches(AC 6) over 8 datasets (NDS 8) with λ 010 Hence thecalculated FE is higher as compared to the critical FE andthen Y0 is rejected So it is to be summarized that entireclustering approaches are not equally effective

)erefore a post hoc examination is performed usingHolm strategy)e proposed CBE-IE is analyzed statisticallyagainst other clustering approaches in this examinationFirstly the z value is obtained from equation (24) and after

that probability (p) is generated with the help of z value andnormal distribution table [42] At last pj value is analyzedwith (λ(AC minus j)) (Table 8)

Z AKj minus AKθ

δ (26)

where

δ

AC

AC

+ 11113872 1113873

6 times NDS

1113971

(27)

Table 8 illustrates that the (λ(AC minus j)) value is higher ascompared to pj value this indicates the hypothesis rejectionfor entire cases Hence the proposed CBO-IE is superior inclustering as compared to the K-Means GA PSO and BOapproaches according to the above analysis

6 Conclusion

Various fields like medicine education and industriesutilize data mining for their useful applications )egrouping of data is performed on data clustering which is aspecific task in data mining to examine the database effi-ciently In this work a Chaotic Biogeography-Based Opti-mization approach using Information Entropy (CBO-IE) isproposed to obtain data clusters for healthcare IoT datasets)e Information Entropy is introduced with a BO approachto generate a better distribution of data points in the datasetand chaos theory is utilized to initialize the population of BOapproach )e Information Entropy and chaos theory arecombined with BO approach to generate optimal clusterheads and cluster members with enhanced convergencespeed of BO in huge search area )eMATLAB 2021a tool isused to implement the CBO-IE for eight healthcare IoTdatasets and the outcomes describe the better quality effi-ciency of CBO-IE on the basis of F-Measure intraclusterdistance running time complexity purity index statisticalanalysis standard deviation root mean square error andaccuracy as compared to previous techniques of clusteringlike K-Means GA PSO ALO and BO approaches )eFriedman Examination and Holm strategy are introduced to

Table 8 Holm strategy outputs

J Approaches z value p value (λ(AC minus j)) Hypothesis1 K-Means minus63245 000001 0020 Rejected2 GA minus50596 000001 0025 Rejected3 PSO minus37947 00001 0033 Rejected4 ALO minus25298 00057 005 Rejected5 BO minus13649 00861 010 Rejected

12 Scientific Programming

perform statistical analysis of proposed CBO-IE againstprevious techniques of clustering like K-Means GA PSOALO and BO approaches which represent that the CBO-IEgenerates 90 accurate results In future the proposedtechnique will be anticipated to estimate and be authorizedwith huge databases under big data Additionally a cross-layer communication will be anticipated to be offered andlegalized in IoT structural design in the future

Data Availability

)e data that support the findings of this study are availableupon request from the corresponding author

Conflicts of Interest

)e authors declare that there are no conflicts of interest

References

[1] A Haque L Khan and M Baron ldquoSAND semi-supervisedadaptive novel class detection and classification over datastreamrdquo in Proceedings of the 13th AAAI Conference on Ar-tificial Intelligence (AAAI-16) pp 1652ndash1658 Phoenix AZUSA February 2016

[2] N Mishra H K Soni S Sharma and A K UpadhyayldquoDevelopment and analysis of artificial neural networkmodels for rainfall prediction by using time-series datardquo I JIntelligent Systems and Applications vol 1 pp 16ndash23 2018

[3] H N Dai R C W Wong H Wang Z Zheng andA V Vasilakos ldquoBig data analytics for large scale wirelessnetworks challenges and opportunitiesrdquo ACM vol 52pp 1ndash46 2019

[4] A H Sabry W Z W Hasan M N MohtarR M K R Ahmad and H R Harun ldquoPlanter pressurerepetability data analysis for health adult based on EMEDsystemrdquo Malaysian Journal of Fundamental and AppliedSciences vol 14 no 1 pp 96ndash101 2018

[5] A Mensikova and C A Mattmann ldquoEnsemble sentimentanalysis to identify human trafficking in web datardquo in Pro-ceedings of the ACM Workshop on Graph Techniques forAdversarial Activity Analytics (GTA) pp 1ndash6 ACM NewYork Ny USA 2018

[6] S S Gill and R Buyya ldquoBio-inspired algorithms for big dataanalytics a survey taxonomy and open challengesrdquo Big DataAnalytics for Intelligent Healthcare Management ElsevierAmsterdam Netherlands pp 1ndash17 2019

[7] A Verma I Kaur and N Arora ldquoComparative analysis ofinformatio extraction techniques for data miningrdquo IndianJournal of Science and Technology vol 9 no 11 pp 1ndash182016

[8] B R Manju A Joshuva and V Sugumaran ldquoA data miningstudy for condition monitoring on wind turbine blades usinghoeffding tree algorithm through statistical and histogramfeaturesrdquo International Journal of Mechanical Engineering ampTechnology vol 9 no 1 pp 1061ndash1079 2018

[9] I Aytekin E Eyduran K Karadas R Akshan and I KeskinldquoPrediction of fattening final live weight from some bodymeasurements and fattening period in young bulls ofcrossbred and exotic breeds using MARS data mining algo-rithmrdquo Journal of Zoology vol 50 no 1 pp 189ndash195 2018

[10] S Ahmad and R Varma ldquoInformation extraction from textmessages using data mining techniquesrdquo Malaya Journal ofMatematik vol 5 no 1 pp 26ndash29 2018

[11] N E Oweis M M Fouad S R Oweis S S Owais andV Snasel ldquoA novel mapreduce lift assiciation riule miningalgorithm (MRLAR) for big datardquo International Journal ofAdvanced Computer Science and Applications vol 7 no 8pp 1ndash7 2016

[12] C W Tsai C F Lai M C Chiang and L T Yang ldquoDatamining for internet of things a surveyrdquo CommunicationsSurveys and Tutorials IEEE vol 16 no 1 pp 77ndash97 2014

[13] F Chen P Deng J Wan D Zhang A V Vasilakos andX Rong ldquoData mining for Internet of things literature reviewand challengesrdquo International Journal of Distributed SensorNetworks vol 2015 Article ID 365372 14 pages 2015

[14] S B Patel M N Patel and S M Shah ldquoFEA-HUIM fast andefficient algorithm for high utility item-set mining using noveldata structure and pruning strategyrdquo National Conference onAdvanced Research Trends in Information and Technologies(NCARTICT) IJSSET vol 4 no 2 pp 1ndash7 2018

[15] S Bhaskaran R Marappan and B Santhi ldquoDesign andanalysis of a cluster-based intelligent hybrid recommendationsystem for E-learning applicationsrdquo Mathematics MDPIvol 9 no 107 pp 1ndash21 2021

[16] D Giordano M Mellia and T Cerquitelli ldquoK-MDTSCK-Multi-Dimensional time-series clustering algorithmrdquoElectronics MDPI vol 10 no 1166 pp 1ndash21 2021

[17] R Mothukuri and B B Rao ldquoCluster Analysis of cyber crimedata using Rrdquo International Journal of Computer Science andMobile Applications vol 6 no 2 pp 62ndash70 2018

[18] A N Onaizah ldquoA novel data stream clustering algorithm inhealthcare IoTrdquo International Journal of Research in AppliedNatural and Social Sciences (IJRANSS) vol 5 no 6 pp 51ndash602017

[19] R Krishnamurthi A Kumar D Gopinathan A Nayyar andB Qureshi ldquoAn overview of IoT sensor data processingfusion and analysis techniquesrdquo Sensors MDPI vol 20pp 1ndash23 6076

[20] F Alam R Mehmood I Katib and A Albeshri ldquoAnalysis ofeight data mining algorithms for smarter Internet of things(IoT)rdquo in Proceedings of the International Workshop on DataMining in IoT Systems (DaMIS) pp 437ndash442 ElsevierLondon UK September 2016

[21] A M C Souza and J R A Amazonas ldquoAn outlier detectalgorithm using big data processing and Internet of thingsarchitecturerdquo in Proceedings of the International Workshop onBig Data and Data Mining Challenges on IoT and PervasiveSystems (BigD2M) pp 1010ndash1015 Elsevier London UK June2015

[22] D Gil M Johnsson H Mora and J Szymanski ldquoReview ofthe complexity of managing big data of the Internet of thingsrdquoComplexity vol 2019 Article ID 4592902 12 pages 2019

[23] M S Mahdavinejad M Rezvan M Barekatain P AdibiP Barnaghi and A P Sheth ldquoMachine Learning for Internetof things data analysis a surveyrdquoDigital Communications andNetworks vol 4 pp 161ndash175 2018

[24] M Ruta F Scioscia G Loseto A Pinto and E D SciascioldquoMachine learning in the Internet of things a semantic-en-hanced approachrdquo Semantic Web vol 10 pp 1ndash21 2017

[25] K Demertzis K Rantos and G Drosatos ldquoA dynamic in-telligent policies analysis mechanism for personal data pro-cessing in the IoT ecosystemrdquo Big Data And CognitiveComputing vol 4 no 9 pp 1ndash16 2020

Scientific Programming 13

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming

Page 2: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

similar type of crime patterns to reduce the crime ratio [17])e healthcare data are also received by IoT devices in theform of data stream and divided into cluster by using streamclustering combining the cluster building and merging steps[18]

In above analysis the clustering of data is an extremelychallenging issue while it creates baffling to select the op-timal clustering strategy in nature Additionally it is to be anincredibly exigent attempt as every dataset is not expected tobe consistent allowing for the fact that infection form andprevious medical circumstance area might differ immenselyin training Previously having a meticulous prophecyscheme does not constantly offer a valuable height of ex-actness beneath entire healthcare application areas due tomostly depending on the situation utilized

A lot of clustering algorithms are introduced on variousdatasets to enhance the data extraction Several optimizationapproaches like GA ALO and PSO are also described forclustering Although few clustering techniques realize anadvanced output by means of a specified dataset the effi-ciency of such clustering techniques might be comple-mentary on other datasets )e nature of the clusteringmethods is steady by means of no-free-lunch theorem but athand no solitary clustering method survives which is able tobe a cure for entire problems )is means all the problemsare not resolved by any algorithm and convergence speed isalso a major issue in optimization techniques which areused for data clustering

Here a Chaotic Biogeography-Based Optimization ap-proach using Information Entropy (CBO-IE) is imple-mented for clustering over healthcare IoT datasets which isimproved form of Biogeography-Based Optimization (BO)approach In its place of basically performing a qualitativeexamination by utilizing a methodical mapping analysisregarding prior implemented works this proposed workcontributes on a quantitative examination of clusteringstrategy for healthcare dataset

)e contribution of implementing CBO-IE is as follows

(1) )e major intent of CBO-IE is to obtain accuratedata point distribution in dataset by utilizing In-formation Entropy strategy and to give initial valuesof the population by utilizing chaos theory

(2) Both Information Entropy and chaos theory areintroduced to enhance the convergence speed of BOin exploration field globally for generating the clusterheads and cluster members more precisely

(3) )e CBO-IE is applied over eight healthcare IoTdatasets and the outputs are evaluated on the basis ofF-Measure intracluster distance running timecomplexity purity index statistical analysis stan-dard deviation root mean square error and accuracyas compared to previous techniques of clustering likeK-Means GA PSO ALO and BO approaches

)e remainder of the paper is organized as followsSection 2 explains the literature survey of data miningtechniques big data processing and IoT dataset processingby utilizing numerous parameters Section 3 describes the

BO approach and the proposed CBO-IE is explained inSection 4 briefly according to flowchart algorithms andpreliminaries )e datasets performance factors and ex-perimental results are illustrated in Section 5 and at lastconclusions are given in Section 6

2 Literature Survey

)e IoT [19] is situated to transfigure entire future conse-quently for lives )e information extracted from the IoTsystems will be explored to recognize and organize multi-faceted surroundings about users fitting superior decisioncreation larger computerization advanced effectivenessefficiency exactness and prosperity production )is workexplores the accessibility of multiple popular data miningapproaches for IoT information )e results are evaluatedover artificial intelligence and neural network with superioraccuracy efficiency and velocity in comparison to priortechniques [20]

)e K-Means approach is applied for processing the bigdata with the help of Hadoop platform over Internet of)ings (IoT) aspects )e data from IoT devices are com-bined or distributed in several groups known as clusterswhich are used in various smart applications such asmedical disaster and industries applications )e powerexpenses and traffic over communication network was re-duced to use only necessary information not whole rawdata In actuality it supplies enormous flexibility permittingthe formation of clusters having millions of Hadoop illus-trations [21]

)is big data has raised the concept of space and timecomplexity due to storing and processing huge amount ofdata over mobile and dynamic network )e ExtensibleMark-up Language (XML) and machine learning methodsapply to big data processing like classification clusteringand preprocessing of IoT information [22 23] Latest IoTapplications and performances remain further on an in-tellectual indulgence of the surroundings from informationextracted by means of assorted sensors and small machinesHere a framework combines an ontology dependent de-scription of information dispensations by means of unusuallogic for pretty beneficial incident recognition by seasoningthe specific classification strategy of machine learning Aroad and traffic examination is performed to verify theresults of framework [24]

)e personal data of users are digitally distributed overnetwork through smart IoT devices and multiple artificialtechniques are introduced to control the digitally transferredinformation following few rules and policies An intellectualpolice examination strategy is introduced discovering thecontradictory policy or agreements of users over advocateplatform An intellectual decision-making method is appliedwith the help of fuzzy cognitive maps [25]

)e business is directly or indirectly related to customerbehaviour evaluated by trust mining )e trust of sellers isevaluated by dynamic clustering where data is modified dayby day over huge network area )e quick development ofonline shopping details illustrates a growing issue for cus-tomers who are comprised to select faithful sellers and

2 Scientific Programming

successful seller selections from numerous existing recordbeginning e-commerce usage areas to enhance the businessA dynamic clustering method is adopted to calculate thetrust of customers and differentiate the buying equivalenceof customers for predicting the customer behaviour )ereal-time and artificial datasets are utilized to performclustering and evaluate the results based on accuracy [26]

)e medical field [27] also well utilizes the miningtechniques like classification to identify the mental healthand evaluate the risk to manage the system )is classifi-cation is done by decision trees to discover and classify thepatients according to their mental health Here a relativecalculation of a huge array classifier is associated with severalareas like tree group neural possibility categorization andpolicy dependent classifiers )e linear and random classi-fiers are introduced to perform disease classification [28]

)e cancer disease in unstructured format is well clas-sified to discover several types of cancer disease It generatesvaluable results in terms of risk aspects treatment andmanagement )e text mining is well exploited in cancerdisease prediction like lung breast and ovarian cancer )edata about cancer patients is collected from several het-erogeneous resources and text mining is performed toprovide various cancer related information like survivaltreatment and risk of disease [29]

)e geographical data is collected from several hetero-geneous resources and data mining is initialized over thespatial data to manage the disaster )e natural hazards arepreviously identified before happening this is achieved byspatial data mining over huge geographic information )eseveral geographical information such as soil ocean earthrsquoscrust and air quality is used to employ the mining processwith the help of PSO and fuzzy logics )e clusters aredecided on the basis of spatial information and naturalbehaviours in environments [30]

)e data mining based educational information ex-traction is a new era of research in which distributed ed-ucational data is compiled and used for several purposes)estudents teachers and other staffs have played an importantrole in educational data mining )e educational data iscollected from several resources and saved in tabular format)ese student data can be further analyzed by university andexamined by data mining model to evaluate the results ofstudents and maintain the records in appropriate manner[31]

)e classification is also performed over data streamscombining the text music and video information with thehelp of decision trees and naive structure)ese data streamsalso combined the human activities like movement of handsand legs [32 33])ere are various key issues of classificationlike infinite size perception development and perceptionflow and characteristic assessment Various researchersmainly solved the issues of infinite size and perceptiondevelopment but the perception flow and characteristicassessment are not taken into consideration by researchersHere these two issues are removed in implementing aclassification method over big data and parameters like errorrate and running time are obtained to provide superiorefficiency of the method [34]

3 The Biogeography-Based Optimization(BO) Approach

ABiogeography-Based Optimization (BO) is a metaheuristicapproach based on island biogeography premise whichrelates with the migration evolution and destruction of thecastes in a habitat A Habitat Suitability Index (HSI) isutilized to generate the best solution globally and share thecharacteristics with week habitat )e population is en-hanced by taking the optimal solutions from prior iterationfrom emigrating [35] to immigrating habitats by using themigrating Suitability Index Variables (SIVs) in BOmigratingprocess A fresh characteristic is obtained in complete so-lution area using fitness function [36] and replaced everyhabitatrsquos SIVs arbitrarily and stochastically to enhance thepopulation [37] miscellany and searching strength in BOmutation process

31 Migration Process In migration stochastic processseveral parents can be utilized for a particular offspring andevery habitat (Ha) is modified by receiving the SIVs from asuperior HSI habitat in population (PS) )e migration ratesare straightly dependent on the caste number in a habitat forenhancing [38] the habitat miscellany and linear migration isevaluated by using the following equation

Ha(SIV) Hb(SIV) (1)

where Hb(SIV) is a superior HSI bth habitat (Hb) to transferthe SIV value to the ath habitat (Ha)

)e emigration rate (α) and immigration rate (β) areevaluated for c castes in habitat by using the two followingequations

αc Gc

Nmaximum (2)

βc M times 1 minus αc( 1113857 (3)

Here G and M are highest emigration and immigrationrates respectively and Nmaximum is maximum realizablenumber of castes utilized by habitat

)ere are six models developed for migration phase outof which sinusoidal model performed superior migration ascompared to the other five models )is model is evaluatedfor ath habitat by using the two following equations

βa M

21 + cos

aπNmaximum

1113888 11138891113888 1113889 (4)

αa G

21 minus cos

aπNmaximum

1113888 11138891113888 1113889 (5)

32 Mutation Process A mutation is a stochastic operatorenhancing the population miscellany to obtain optimalsolution [39] )e mutation process is utilized for updatingat least one arbitrary chosen SIV of a solution with the help

Scientific Programming 3

of mutation rate (Ta) and the previous possibility of sub-sistence (Sa) by using the following equation

Ta Tmaximum 1 minusSa

Smaximum1113888 1113889 (6)

where Tmaximum is the highest mutation rate term illustratedby user and Smaximum is the highest possibility of castes count

After that the mutation operator is improved by utilizingCauchy model to provide optimal exploration strength ofBO in huge searching area by reducing potential limitationof mutation strategy )e Cauchy distribution is evaluatedwith the help of probability density function which isformulized by the following equation

F(X 0 1) 1

π 1 + X2

1113872 1113873 (7)

Hence the Cauchy mutation equation is described byusing the following equation

Ha(SIV) minimum Ha(SIV)( 1113857 + maximum Ha(SIV)( 1113857(

minus minimum Ha(SIV)( 11138571113857 times π times F Ha(SIV) 0 1( 1113857

(8)

where Ha(SIV) is ath habitat F(Ha(SIV) 0 1) is Cauchydistribution (Algorithm 1)

4 TheChaotic BOApproachUsing InformationEntropy (CBO-IE)

An updated BO approach is proposed using the chaoticbehaviour and Information Entropy )e BO approach iswell suitable for exploration and exploitation in hugesearching space and generates the efficient results for nu-meric optimization [40] Still the numeric optimization isreasonably different from data clustering mechanism Heresome intrinsic characteristics are investigated and a chaoticBO is implemented using Information Entropy to apply theBO for data clustering

41 Element Information Entropy )e data points are dis-tributed in multidimensional area in data clustering strategy)e confusion degree of a probabilistic variable is calculatedby utilizing Information Entropy after that this is utilizedfor elements measurements to calculate the distribution)eentropy (Ea) is evaluated to isolate the element values withestimation of every value to its closer integer by using thefollowing equation

Ea minus 1113944

maxa

bmina

Rblog Rb( 1113857 (9)

where Ea is the entropy of ath element (a 1 2 3 L) and Lis number of elements in dataset mina and maxa are lowestand highest integers after discretization of element values Rbis bth element integer percentage value

)emigration procedure of BO population is guarded byusing element entropy hence maximum entropy is evalu-ated for every element in dataset by the following equation

maximum Ea( 1113857 minuslog1

maxa minus mina + 11113888 1113889 (10)

where maximum (Ea) is maximum entropy of ath element indataset

At last the normalized entropy for every element iscalculated by the following equation

nomalized Ea( 1113857 Ea

maximum Ea( 1113857 (11)

where normalized (Ea) is normalized entropy of ath elementin dataset

Equation (11) is repeated for all data elements in datasetand normalized entropy set (normalized (E)) is generated asfollows

Normalized (E) (normalized (E1) normalized(E2) normalized (EL))

42 Information Entropy-Based Migration Process in the BOApproach Here two mechanisms are introduced in mi-gration to improve the performance of BO approach )efirst one is original migration process of BO explained in theprevious section )e second one is updated mechanism oforiginal migration in which P particulars are elected arbi-trarily from present population (1lePlePS) and the oneoptimal particular is assigned as reference particular (Href ))e direction of migration is guided by Href and InformationEntropy is utilized to provide equivalency between pop-ulation miscellany and convergence speed )e higher In-formation Entropy of element indicates the maximumuncertainty which slows down the convergence speedHence the speed is enhanced for element by transferring itto the location based on reference particular globally withmaximum possibility as compared to the least InformationEntropy element

43 Chaos-Based Population Selection of the BO Approach)e chaos method is extremely related to initial circum-stances and successfully utilized for arbitrary numbergeneration using logistic function)e chaotic system is usedin the following equation

cf+1 μ times cf times 1 minus cf1113872 1113873 (12)

where μ represents constants in the range of [1 4] c rep-resents variables (c isin [0 1] f 0 1 )

)e population of BO is initialized by utilizing chaosfunction (equation (12)) which has improved the efficiencyof BO with proper use of huge solution area

44 6e Complete CBO-IE Approach for Data Mining)e Information Entropy-based migration and chaos-basedpopulation selection are combined in Chaotic Biogeogra-phy-Based Optimization using Information Entropy (CBO-IE) to solve the data mining that is data clustering inoptimal way (Figure 1) Firstly the population (PS) of BOapproach is initialized with the help of chaos function

4 Scientific Programming

(equation (12)) and every particular is denoted as a vectorwith size Lx L times K (L is number of elements in dataset Kis number of cluster centroids and Lx is dimension ofparticulars) )e K centroids locations are fixed into vectorin which 1st centroid relates with 1st L attributes 2nd centroidrelates to 2nd L attributes and so on )e primary particularvector values are generated arbitrarily and consistentlybetween lower and higher elements values in existing datasetwithin maximum number of iterations (Mi) )en CBO-IEis applied to calculate the fitness values of entire particularsby using equations (1) to (8) (Algorithm 2)

5 Results and Analysis

In this section the healthcare IoT datasets and performancefactors for proposed CBE-IE approach are explained briefly)e entire approaches are implemented with the help ofMATLAB 2021a tool using Core i3-3110M processor havingWindows 8 operating system with 2GB RAM and analyzedover 8 healthcare IoTdatasets (Table 1) )e proposed CBO-IE approach has evaluated the results for 500 iterations interms of intracluster distance purity index standard devi-ation root mean square error accuracy and F-Measure ascompared to prior algorithms like K-Means ACO PSOALO and BO over 30 independent runs

51 Healthcare IoT Datasets )e proposed CBO-IE ap-proach is applied on 8 dissimilar healthcare IoTdatasets takenfrom UCI repository (Table 1) )e datasets are thoracicsurgery breast cancer cryotherapy liver patient heart pa-tient chronic kidney disease diabetic retinopathy and bloodtransfusion)e 470 instances of thoracic surgery dataset with

17 attributes are divided into 2 classes (survive and notsurvive) the 699 instances of breast cancer dataset with 10attributes are distributed into 2 classes (benign and malig-nant) the 90 instances of cryotherapy dataset with 7 attributesare divided into 2 classes (success and failure) the 583 in-stances of liver patient dataset with 10 attributes are dis-tributed into 2 classes (liver patient and not) the 299 instancesof heart patient dataset with 12 attributes are divided into 2classes (heart failure and not) the 400 instances of chronickidney disease with 24 attributes are divided into 2 classes(disease found and not) the 1151 instances of diabetic reti-nopathy dataset with 18 attributes are distributed into 2classes (diabetic retinopathy and not) and the 748 instancesof blood transfusion dataset with 4 attributes are divided into2 classes (donating blood and not)

52 Performance Factors )e performance of proposedCBO-IE approach is evaluated in terms of intraclusterdistance purity index standard deviation root mean squareerror accuracy and F-Measure

521 Intracluster Distance Firstly the distances betweendata points are evaluated within a cluster After that theaverage of these distances is generated representing anintracluster distance)e optimal clustering is achieved withleast intracluster distance For a cluster the mean distance isevaluated between a centroid and total data points )is stepis continuously performed for every cluster and at last meanvalue of entire clustersrsquo intracluster distances is obtained

Table 2 illustrates that the CBO-IE generates leastintracluster distance value for all IoT datasets )e CBO-IEobtains 90 superior results than BO 92 superior results

STARTInitialize the parameters GM 1 Tmaximum 1 PS and Max_iterationInitialize the populations (arbitrary group of habitats) H1 H2 Hs

P

Evaluate fitness value (HSI) for every habitatWHILE (ending condition is not found)Evaluate αa βa and Ta for every habitatObtain a random isin (0 1)MigrationFOR every habitat (from minimum to maximum HSI values)Choose habitat Ha (SIV) stochastically proportional to βaIF randomlt βa and Ha (SIV) choose then

Choose habitat Hb (SIV) stochastically proportional to αaIF randomlt αa and Hb (SIV) choose thenHa (SIV)Hb (SIV)

END IFEND IF

END FORMUTATIONChoose Ha (SIV) with the help of mutation possibility proportional SaIF randomltTa thenArbitrary change the SIVs in Ha (SIV)

END IFEvaluate HSI value

END WHILESTOP

ALGORITHM 1 Biogeography-Based Optimization

Scientific Programming 5

than ALO 94 superior results than PSO 96 superiorresults than GA and 98 superior results than K-Means interms of intracluster distance for entire healthcare IoTdatasets )e average ranking of all approaches is evaluatedon the basis of minimum to maximum intracluster distance(from 1 to 6)

Figure 2 represents the better rank of proposed CBO-IEwith least intracluster distance values against other approachesK-Means GA PSO ALO and BO for all IoT datasets

522 Standard Deviation )e stiff data clustering in theregion of average value is described by a numerical char-acteristic called standard deviation (DS) )e best clusteringis achieved with less standard deviation which is obtained bythe following equation

DS

1113936(V minus V)

|H|

1113971

(13)

where |H| is dataset length V is dataset values (points) andV is dataset average value

Table 3 describes that the CBO-IE obtains less standarddeviation value for all IoT datasets )e CBO-IE generates93 better quality outcomes than BO 95 better qualityoutcomes than PSO and ALO 97 better quality outcomesthan GA and 98 better quality outcomes than K-Means interms of standard deviation for entire healthcare IoTdatasets

Figure 3 shows the better standard deviation of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

Initialize Particulars and entropy of everyelements

Initialize the population with the help of chaosfunction

Calculate Fitness values of particulars

Iteration = 0

Perform the information entropy basedmigration process (Algorithm 2)

Calculate the fitness values of new particulars

iteration = iteration + 1

Is ending situationsatisfied

NoYes

Result the optimal solution

End

Start

Figure 1 Flowchart of the CBO-IE approach

6 Scientific Programming

523 Purity Index )e correctness of clustering strategy isknown as purity in which accurate classification is per-formed over data elements Hence entire points of anisolated class can be accurately allocated to an isolatedcluster Purity index (IP) is generated with the help of purityby equations (14) and (15) Maximum purity is achieved withmaximum IP value nearer to 1

purity Rz( 1113857 maximum Rwz

111386811138681113868111386811138681113868111386811138681113872 1113873

Rz

11138681113868111386811138681113868111386811138681113868

(14)

IP

1113944K

z1

Rz

11138681113868111386811138681113868111386811138681113868Purity Rz( 11138571113872 1113873

|H| (15)

ALGORITHM 2 Information Entropy-based migration process in the BO approach

Table 1 Healthcare IoT datasets

Sr no Healthcare IoT dataset No of instances No of attributes No of clusters1 )oracic surgery 470 17 22 Breast cancer 699 10 23 Cryotherapy 90 7 24 Liver patient 583 10 25 Heart patient 299 12 26 Chronic kidney disease 400 24 27 Diabetic retinopathy 1151 18 28 Blood transfusion 748 4 2

Table 2 Average ranking of all algorithms based on intracluster distance

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 3364 (5) 07569 (4) 04256 (4) 02547 (3) 00657 (2) 000652 (1)Breast cancer 26265 (5) 08634 (4) 03126 (4) 02614 (3) 00874 (2) 000728 (1)Cryotherapy 243640 (5) 04562 (4) 02864 (4) 02167 (3) 00725 (2) 000548 (1)Liver patient 353245 (5) 05863 (4) 01567 (4) 01321 (3) 00635 (2) 000457 (1)Heart patient 142366 (5) 09362 (4) 07265 (4) 06492 (3) 00623 (2) 000832 (1)Chronic kidney disease 463625 (5) 05682 (4) 03568 (4) 02864 (3) 00742 (2) 000786 (1)Diabetic retinopathy 03658 (5) 02526 (4) 01737 (4) 01421 (3) 00967 (2) 000758 (1)Blood transfusion 06745 (5) 05684 (4) 02375 (4) 02068 (3) 00854 (2) 000852 (1)Average ranking (AKθ) 5 4 4 3 2 1

Scientific Programming 7

where Purity(Rz) is purity of zth cluster |Rz| is zth clusterlength |Rwz| is the number of data elements of wth classallocated to zth cluster

Table 4 explains that the CBO-IE generates maximumpurity index value for all IoT datasets )e CBO-IE obtains5 superior outputs than BO 7 superior outputs than

ALO 8 superior outputs than PSO 12 superior outputsthan GA and 15 superior outputs than K-Means in termsof purity index for entire healthcare IoT datasets

Figure 4 represents the better quality purity index ofproposed CBO-IE against other approaches K-Means GAPSO ALO and BO for all IoT datasets

524 F-Measure Firstly Precision (prsn) and Recall (rel)are evaluated to repossess the information by equations (16)and (17) After that both are combined to formalize the F-Measure (FM) by utilizing equations (18) and (19)

Table 3 Standard deviation for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 0116857 003984 0012036 001452 0007856 0000524Breast cancer 0002578 0168742 012542 015713 0003657 0000746Cryotherapy 00025143 0165875 0001254 0001674 0001036 0000845Liver patient 0256323 017589 0147856 012312 0096875 0000986Heart patient 0212678 0096852 0042574 0021563 0003687 0000251Chronic kidney disease 0020233 0103625 009854 0065214 0006574 0000865Diabetic retinopathy 0086574 0186523 0142657 0102647 0086478 00008657Blood transfusion 0213658 0196875 0185632 0152461 0010365 0000236

0123456

Datasets

Average Ranking

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sionIn

tra-c

luste

r Dist

ance

K-MeansGAPSO

ALOBOCBO-IE

Figure 2 Average ranking based on intracluster distance

0005

01015

02025

03

Stan

dard

Dev

iatio

n Standard Deviation

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 3 Standard deviation for all IoT datasets

Table 4 Purity index for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 083 086 089 089 091 093Breast cancer 086 088 090 089 093 095Cryotherapy 084 087 090 091 093 096Liver patient 086 087 088 086 090 092Heart patient 080 083 086 086 088 092Chronic kidneydisease 086 087 090 089 091 093

Diabetic retinopathy 088 090 091 091 093 095Blood transfusion 076 078 081 083 086 088

8 Scientific Programming

prsn(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rz

11138681113868111386811138681113868111386811138681113868

(16)

rel(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rw

11138681113868111386811138681113868111386811138681113868 (17)

FM(w z) 2 times prsn(w z) times rel(w z)

prsn(w z) + rel(w z) (18)

FM 1113944K

w1

Rw

11138681113868111386811138681113868111386811138681113868

|H|maximum FM(w z) (19)

where |Rw| is wth class lengthTable 5 describes that the CBO-IE obtains higher F-

Measure value for all IoTdatasets )e CBO-IE generates 4better outputs than BO 7 better outputs than ALO 8better outputs than PSO 13 better outputs than GA and16 better outputs than K-Means in terms of F-Measure forentire healthcare IoT datasets

Figure 5 shows the better F-Measure of proposed CBO-IE against other approaches K-Means GA PSO ALO andBO for all IoT datasets

525 Root Mean Square Error (RMSE) )e RMSE is de-fined as a divergence between predicted values and exper-imental (calculated) values )e best clustering is achievedwith minimum RMSE values of datasets It is evaluated byutilizing the following equation

RMSE

1|H|

1113944

|H|

j1Vj minus 1113954Vj1113872 1113873

2

11139741113972

(20)

where Vj is jth element value in dataset and 1113954Vj is j

th elementpredicted value in dataset

Table 6 explains that the CBO-IE generates least RMSEvalue for all IoT datasets )e CBO-IE generates 8 betterresults than BO 78 better results than ALO 81 better

results than PSO 87 better results than GA and 95 betterresults than K-Means in terms of RMSE for entire healthcareIoT datasets

Figure 6 shows the better RMSE of proposed CBO-IEagainst other approaches K-Means GA PSO ALO and BOfor all IoT datasets

526 Accuracy It evaluates the division of clusters that areaccurate (ie it evaluates proportion of verdicts that areexact) and describes the portion of clusters in the prevailinggroup

002040608

1

Purit

y In

dex

Purity Index

Datasets

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 4 Purity index for all IoT datasets

Table 5 F-Measure for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 081 084 087 087 089 091Breast cancer 084 086 087 088 090 093Cryotherapy 082 086 088 087 091 093Liver patient 084 086 087 087 089 091Heart patient 078 082 084 083 085 088Chronic kidneydisease 084 086 088 087 089 091

Diabetic retinopathy 086 089 090 091 092 094Blood transfusion 074 077 079 082 085 087

002040608

1

DatasetsTh

orac

ic S

urge

ry

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d

F-M

easu

re

F-Measure

K-MeansGAPSO

ALOBOCBO-IE

Figure 5 F-Measure for all IoT datasets

Table 6 RMSE values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-

IE)oracic surgery 0537 0327 0157 0124 00124 00034Breast cancer 0567 0386 0168 0264 00317 00075Cryotherapy 0621 0374 0234 0364 00421 00066Liver patient 0437 0361 0261 0213 00652 00054Heart patient 0537 0412 0201 0186 00145 00063Chronic kidneydisease 0610 0422 0341 0267 00254 00047

Diabeticretinopathy 0517 0414 0367 0257 00374 00087

Blood transfusion 0438 0314 0120 0265 00451 00071

Scientific Programming 9

accuracy 1113936

Kz1 Rz

|H|lowast 100 (21)

Table 7 represents that the CBO-IE obtains higher ac-curacy value for all IoT datasets )e CBO-IE has 3 su-perior outputs than BO 5 superior outputs than ALO 6superior outputs than PSO 9 superior outputs than GAand 13 superior outputs than K-Means in terms of ac-curacy for entire healthcare IoT datasets

Figure 7 represents the better accuracy of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

)e Information Entropy is used with BO approach forclustering to provide accurate and efficient distribution ofdata points in dataset )e chaos theory is utilized to ini-tialize the population of BO to improve the searching ca-pability of habitat which helps in cluster member selectionprocess more precisely )ese two strategies InformationEntropy and chaos theory have improved the performanceof BO to provide the best selection of cluster heads and theirmembers optimally So CBO-IE generates superior resultsthan BO ALO PSO GA and K-Means clusteringapproaches

53 Running Time Complexity of the CBO-IE Approach)e number of executions is directly related to timecomplexity of clustering approaches to run )e

complexity is calculated by utilizing few circumstances asfollows PS is population size L is number of elements indataset Lx represents dimensions of particulars (ele-ments) K is number of cluster centroids and Mi repre-sents maximum iterations )e cost of every execution isassumed as 1 unit )e Cost of all Executions (CE) isobtained to utilize Algorithm 3 by the followingequations

001020304050607

RMSE

Val

ues

RMSE Values

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 6 RMSE for all IoT datasets

Table 7 Accuracy values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 080 083 086 086 088 090Breast cancer 083 085 086 087 089 092Cryotherapy 081 085 087 086 090 092Liver patient 083 085 086 086 088 090Heart patient 077 081 083 082 084 087Chronic kidney disease 083 085 087 086 088 090Diabetic retinopathy 085 088 089 090 091 093Blood transfusion 073 076 078 081 084 086

002040608

1

Accu

racy

()

Accuracy

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 7 Accuracy for all IoT datasets

10 Scientific Programming

ALGORITHM 3 )e complete CBO-IE approach

Scientific Programming 11

CE 1 + L + 1 + PS

+ Mi + 1 + Mi times PS

+ Mi times PS

+ Mi times 2 times PS

times Lx + 3 times PS

+ 11113872 1113873

+ Mi times K times PS

+ 11113872 1113873 + Mi times K times PS

+ Mi times K times PS

+ Mi times K times PS

+ Mi times K times L + Mi times K times L + Mi times K times L

(22)

CE 2 times Mi times PS

times Lx + 4 times Mi times K times PS

+ 3 times Mi times K times L + 5 times Mi times PS

+ Mi times K + L + PS

+ 2 times Mi + 3 (23)

Supposing that all circumstances are equal in equation(23) in worst case equation (24) is generated as follows

CE 9n3

+ 6n2

+ 4n + 3 (24)

)e running time complexities of clustering approachesare O(n3) for CBO-IE O(n3) for BO O(n3) for ACO O(n3)for PSO O(n3) for ALO and O(n2) for K-Menas in worstcase Hence all are executable in polynomial time

54 Statistical Examination Measurement A statistical ex-amination is functioned to obtain the extension of signifi-cance dissimilarities in the effectiveness of clusteringtechniques Here a nonparametric Friedman Examination(FE) is utilized to discover dissimilarities amid the group ofserial appropriate variables )e entire clustering techniquesprovide equally effective explaining in the null hypothesis(Y0)

)e Friedman Examination (FE) is formulized by

FE 12 times N

DS

AC

AC

+ 11113872 11138731113944

5

θ1AKθ( 1113857

2minus

AC

times AC

+ 11113872 11138732

4⎡⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎦ (25)

where NDS is number of datasets AKθ is average rank of θth

approach and AC is number of clustering approaches)e FE critical value is 204925 taken from F-distribution

table [41] with (AC-1) and (ACndash1)lowast(NDSndash1) freedom degreewhich is obtained between (6ndash1) 5 and (6-1)lowast(8-1) 35 forapplying 6 clustering approaches (AC 6) over 8 datasets(NDS 8) with λ 010 (confidence stage) )e calculated FEvalue is higher as compared to the critical value for nullhypothesis (Y0) rejection if Y0 is not accepted)e evaluatedFE value is 1714 for initiating 6 clustering approaches(AC 6) over 8 datasets (NDS 8) with λ 010 Hence thecalculated FE is higher as compared to the critical FE andthen Y0 is rejected So it is to be summarized that entireclustering approaches are not equally effective

)erefore a post hoc examination is performed usingHolm strategy)e proposed CBE-IE is analyzed statisticallyagainst other clustering approaches in this examinationFirstly the z value is obtained from equation (24) and after

that probability (p) is generated with the help of z value andnormal distribution table [42] At last pj value is analyzedwith (λ(AC minus j)) (Table 8)

Z AKj minus AKθ

δ (26)

where

δ

AC

AC

+ 11113872 1113873

6 times NDS

1113971

(27)

Table 8 illustrates that the (λ(AC minus j)) value is higher ascompared to pj value this indicates the hypothesis rejectionfor entire cases Hence the proposed CBO-IE is superior inclustering as compared to the K-Means GA PSO and BOapproaches according to the above analysis

6 Conclusion

Various fields like medicine education and industriesutilize data mining for their useful applications )egrouping of data is performed on data clustering which is aspecific task in data mining to examine the database effi-ciently In this work a Chaotic Biogeography-Based Opti-mization approach using Information Entropy (CBO-IE) isproposed to obtain data clusters for healthcare IoT datasets)e Information Entropy is introduced with a BO approachto generate a better distribution of data points in the datasetand chaos theory is utilized to initialize the population of BOapproach )e Information Entropy and chaos theory arecombined with BO approach to generate optimal clusterheads and cluster members with enhanced convergencespeed of BO in huge search area )eMATLAB 2021a tool isused to implement the CBO-IE for eight healthcare IoTdatasets and the outcomes describe the better quality effi-ciency of CBO-IE on the basis of F-Measure intraclusterdistance running time complexity purity index statisticalanalysis standard deviation root mean square error andaccuracy as compared to previous techniques of clusteringlike K-Means GA PSO ALO and BO approaches )eFriedman Examination and Holm strategy are introduced to

Table 8 Holm strategy outputs

J Approaches z value p value (λ(AC minus j)) Hypothesis1 K-Means minus63245 000001 0020 Rejected2 GA minus50596 000001 0025 Rejected3 PSO minus37947 00001 0033 Rejected4 ALO minus25298 00057 005 Rejected5 BO minus13649 00861 010 Rejected

12 Scientific Programming

perform statistical analysis of proposed CBO-IE againstprevious techniques of clustering like K-Means GA PSOALO and BO approaches which represent that the CBO-IEgenerates 90 accurate results In future the proposedtechnique will be anticipated to estimate and be authorizedwith huge databases under big data Additionally a cross-layer communication will be anticipated to be offered andlegalized in IoT structural design in the future

Data Availability

)e data that support the findings of this study are availableupon request from the corresponding author

Conflicts of Interest

)e authors declare that there are no conflicts of interest

References

[1] A Haque L Khan and M Baron ldquoSAND semi-supervisedadaptive novel class detection and classification over datastreamrdquo in Proceedings of the 13th AAAI Conference on Ar-tificial Intelligence (AAAI-16) pp 1652ndash1658 Phoenix AZUSA February 2016

[2] N Mishra H K Soni S Sharma and A K UpadhyayldquoDevelopment and analysis of artificial neural networkmodels for rainfall prediction by using time-series datardquo I JIntelligent Systems and Applications vol 1 pp 16ndash23 2018

[3] H N Dai R C W Wong H Wang Z Zheng andA V Vasilakos ldquoBig data analytics for large scale wirelessnetworks challenges and opportunitiesrdquo ACM vol 52pp 1ndash46 2019

[4] A H Sabry W Z W Hasan M N MohtarR M K R Ahmad and H R Harun ldquoPlanter pressurerepetability data analysis for health adult based on EMEDsystemrdquo Malaysian Journal of Fundamental and AppliedSciences vol 14 no 1 pp 96ndash101 2018

[5] A Mensikova and C A Mattmann ldquoEnsemble sentimentanalysis to identify human trafficking in web datardquo in Pro-ceedings of the ACM Workshop on Graph Techniques forAdversarial Activity Analytics (GTA) pp 1ndash6 ACM NewYork Ny USA 2018

[6] S S Gill and R Buyya ldquoBio-inspired algorithms for big dataanalytics a survey taxonomy and open challengesrdquo Big DataAnalytics for Intelligent Healthcare Management ElsevierAmsterdam Netherlands pp 1ndash17 2019

[7] A Verma I Kaur and N Arora ldquoComparative analysis ofinformatio extraction techniques for data miningrdquo IndianJournal of Science and Technology vol 9 no 11 pp 1ndash182016

[8] B R Manju A Joshuva and V Sugumaran ldquoA data miningstudy for condition monitoring on wind turbine blades usinghoeffding tree algorithm through statistical and histogramfeaturesrdquo International Journal of Mechanical Engineering ampTechnology vol 9 no 1 pp 1061ndash1079 2018

[9] I Aytekin E Eyduran K Karadas R Akshan and I KeskinldquoPrediction of fattening final live weight from some bodymeasurements and fattening period in young bulls ofcrossbred and exotic breeds using MARS data mining algo-rithmrdquo Journal of Zoology vol 50 no 1 pp 189ndash195 2018

[10] S Ahmad and R Varma ldquoInformation extraction from textmessages using data mining techniquesrdquo Malaya Journal ofMatematik vol 5 no 1 pp 26ndash29 2018

[11] N E Oweis M M Fouad S R Oweis S S Owais andV Snasel ldquoA novel mapreduce lift assiciation riule miningalgorithm (MRLAR) for big datardquo International Journal ofAdvanced Computer Science and Applications vol 7 no 8pp 1ndash7 2016

[12] C W Tsai C F Lai M C Chiang and L T Yang ldquoDatamining for internet of things a surveyrdquo CommunicationsSurveys and Tutorials IEEE vol 16 no 1 pp 77ndash97 2014

[13] F Chen P Deng J Wan D Zhang A V Vasilakos andX Rong ldquoData mining for Internet of things literature reviewand challengesrdquo International Journal of Distributed SensorNetworks vol 2015 Article ID 365372 14 pages 2015

[14] S B Patel M N Patel and S M Shah ldquoFEA-HUIM fast andefficient algorithm for high utility item-set mining using noveldata structure and pruning strategyrdquo National Conference onAdvanced Research Trends in Information and Technologies(NCARTICT) IJSSET vol 4 no 2 pp 1ndash7 2018

[15] S Bhaskaran R Marappan and B Santhi ldquoDesign andanalysis of a cluster-based intelligent hybrid recommendationsystem for E-learning applicationsrdquo Mathematics MDPIvol 9 no 107 pp 1ndash21 2021

[16] D Giordano M Mellia and T Cerquitelli ldquoK-MDTSCK-Multi-Dimensional time-series clustering algorithmrdquoElectronics MDPI vol 10 no 1166 pp 1ndash21 2021

[17] R Mothukuri and B B Rao ldquoCluster Analysis of cyber crimedata using Rrdquo International Journal of Computer Science andMobile Applications vol 6 no 2 pp 62ndash70 2018

[18] A N Onaizah ldquoA novel data stream clustering algorithm inhealthcare IoTrdquo International Journal of Research in AppliedNatural and Social Sciences (IJRANSS) vol 5 no 6 pp 51ndash602017

[19] R Krishnamurthi A Kumar D Gopinathan A Nayyar andB Qureshi ldquoAn overview of IoT sensor data processingfusion and analysis techniquesrdquo Sensors MDPI vol 20pp 1ndash23 6076

[20] F Alam R Mehmood I Katib and A Albeshri ldquoAnalysis ofeight data mining algorithms for smarter Internet of things(IoT)rdquo in Proceedings of the International Workshop on DataMining in IoT Systems (DaMIS) pp 437ndash442 ElsevierLondon UK September 2016

[21] A M C Souza and J R A Amazonas ldquoAn outlier detectalgorithm using big data processing and Internet of thingsarchitecturerdquo in Proceedings of the International Workshop onBig Data and Data Mining Challenges on IoT and PervasiveSystems (BigD2M) pp 1010ndash1015 Elsevier London UK June2015

[22] D Gil M Johnsson H Mora and J Szymanski ldquoReview ofthe complexity of managing big data of the Internet of thingsrdquoComplexity vol 2019 Article ID 4592902 12 pages 2019

[23] M S Mahdavinejad M Rezvan M Barekatain P AdibiP Barnaghi and A P Sheth ldquoMachine Learning for Internetof things data analysis a surveyrdquoDigital Communications andNetworks vol 4 pp 161ndash175 2018

[24] M Ruta F Scioscia G Loseto A Pinto and E D SciascioldquoMachine learning in the Internet of things a semantic-en-hanced approachrdquo Semantic Web vol 10 pp 1ndash21 2017

[25] K Demertzis K Rantos and G Drosatos ldquoA dynamic in-telligent policies analysis mechanism for personal data pro-cessing in the IoT ecosystemrdquo Big Data And CognitiveComputing vol 4 no 9 pp 1ndash16 2020

Scientific Programming 13

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming

Page 3: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

successful seller selections from numerous existing recordbeginning e-commerce usage areas to enhance the businessA dynamic clustering method is adopted to calculate thetrust of customers and differentiate the buying equivalenceof customers for predicting the customer behaviour )ereal-time and artificial datasets are utilized to performclustering and evaluate the results based on accuracy [26]

)e medical field [27] also well utilizes the miningtechniques like classification to identify the mental healthand evaluate the risk to manage the system )is classifi-cation is done by decision trees to discover and classify thepatients according to their mental health Here a relativecalculation of a huge array classifier is associated with severalareas like tree group neural possibility categorization andpolicy dependent classifiers )e linear and random classi-fiers are introduced to perform disease classification [28]

)e cancer disease in unstructured format is well clas-sified to discover several types of cancer disease It generatesvaluable results in terms of risk aspects treatment andmanagement )e text mining is well exploited in cancerdisease prediction like lung breast and ovarian cancer )edata about cancer patients is collected from several het-erogeneous resources and text mining is performed toprovide various cancer related information like survivaltreatment and risk of disease [29]

)e geographical data is collected from several hetero-geneous resources and data mining is initialized over thespatial data to manage the disaster )e natural hazards arepreviously identified before happening this is achieved byspatial data mining over huge geographic information )eseveral geographical information such as soil ocean earthrsquoscrust and air quality is used to employ the mining processwith the help of PSO and fuzzy logics )e clusters aredecided on the basis of spatial information and naturalbehaviours in environments [30]

)e data mining based educational information ex-traction is a new era of research in which distributed ed-ucational data is compiled and used for several purposes)estudents teachers and other staffs have played an importantrole in educational data mining )e educational data iscollected from several resources and saved in tabular format)ese student data can be further analyzed by university andexamined by data mining model to evaluate the results ofstudents and maintain the records in appropriate manner[31]

)e classification is also performed over data streamscombining the text music and video information with thehelp of decision trees and naive structure)ese data streamsalso combined the human activities like movement of handsand legs [32 33])ere are various key issues of classificationlike infinite size perception development and perceptionflow and characteristic assessment Various researchersmainly solved the issues of infinite size and perceptiondevelopment but the perception flow and characteristicassessment are not taken into consideration by researchersHere these two issues are removed in implementing aclassification method over big data and parameters like errorrate and running time are obtained to provide superiorefficiency of the method [34]

3 The Biogeography-Based Optimization(BO) Approach

ABiogeography-Based Optimization (BO) is a metaheuristicapproach based on island biogeography premise whichrelates with the migration evolution and destruction of thecastes in a habitat A Habitat Suitability Index (HSI) isutilized to generate the best solution globally and share thecharacteristics with week habitat )e population is en-hanced by taking the optimal solutions from prior iterationfrom emigrating [35] to immigrating habitats by using themigrating Suitability Index Variables (SIVs) in BOmigratingprocess A fresh characteristic is obtained in complete so-lution area using fitness function [36] and replaced everyhabitatrsquos SIVs arbitrarily and stochastically to enhance thepopulation [37] miscellany and searching strength in BOmutation process

31 Migration Process In migration stochastic processseveral parents can be utilized for a particular offspring andevery habitat (Ha) is modified by receiving the SIVs from asuperior HSI habitat in population (PS) )e migration ratesare straightly dependent on the caste number in a habitat forenhancing [38] the habitat miscellany and linear migration isevaluated by using the following equation

Ha(SIV) Hb(SIV) (1)

where Hb(SIV) is a superior HSI bth habitat (Hb) to transferthe SIV value to the ath habitat (Ha)

)e emigration rate (α) and immigration rate (β) areevaluated for c castes in habitat by using the two followingequations

αc Gc

Nmaximum (2)

βc M times 1 minus αc( 1113857 (3)

Here G and M are highest emigration and immigrationrates respectively and Nmaximum is maximum realizablenumber of castes utilized by habitat

)ere are six models developed for migration phase outof which sinusoidal model performed superior migration ascompared to the other five models )is model is evaluatedfor ath habitat by using the two following equations

βa M

21 + cos

aπNmaximum

1113888 11138891113888 1113889 (4)

αa G

21 minus cos

aπNmaximum

1113888 11138891113888 1113889 (5)

32 Mutation Process A mutation is a stochastic operatorenhancing the population miscellany to obtain optimalsolution [39] )e mutation process is utilized for updatingat least one arbitrary chosen SIV of a solution with the help

Scientific Programming 3

of mutation rate (Ta) and the previous possibility of sub-sistence (Sa) by using the following equation

Ta Tmaximum 1 minusSa

Smaximum1113888 1113889 (6)

where Tmaximum is the highest mutation rate term illustratedby user and Smaximum is the highest possibility of castes count

After that the mutation operator is improved by utilizingCauchy model to provide optimal exploration strength ofBO in huge searching area by reducing potential limitationof mutation strategy )e Cauchy distribution is evaluatedwith the help of probability density function which isformulized by the following equation

F(X 0 1) 1

π 1 + X2

1113872 1113873 (7)

Hence the Cauchy mutation equation is described byusing the following equation

Ha(SIV) minimum Ha(SIV)( 1113857 + maximum Ha(SIV)( 1113857(

minus minimum Ha(SIV)( 11138571113857 times π times F Ha(SIV) 0 1( 1113857

(8)

where Ha(SIV) is ath habitat F(Ha(SIV) 0 1) is Cauchydistribution (Algorithm 1)

4 TheChaotic BOApproachUsing InformationEntropy (CBO-IE)

An updated BO approach is proposed using the chaoticbehaviour and Information Entropy )e BO approach iswell suitable for exploration and exploitation in hugesearching space and generates the efficient results for nu-meric optimization [40] Still the numeric optimization isreasonably different from data clustering mechanism Heresome intrinsic characteristics are investigated and a chaoticBO is implemented using Information Entropy to apply theBO for data clustering

41 Element Information Entropy )e data points are dis-tributed in multidimensional area in data clustering strategy)e confusion degree of a probabilistic variable is calculatedby utilizing Information Entropy after that this is utilizedfor elements measurements to calculate the distribution)eentropy (Ea) is evaluated to isolate the element values withestimation of every value to its closer integer by using thefollowing equation

Ea minus 1113944

maxa

bmina

Rblog Rb( 1113857 (9)

where Ea is the entropy of ath element (a 1 2 3 L) and Lis number of elements in dataset mina and maxa are lowestand highest integers after discretization of element values Rbis bth element integer percentage value

)emigration procedure of BO population is guarded byusing element entropy hence maximum entropy is evalu-ated for every element in dataset by the following equation

maximum Ea( 1113857 minuslog1

maxa minus mina + 11113888 1113889 (10)

where maximum (Ea) is maximum entropy of ath element indataset

At last the normalized entropy for every element iscalculated by the following equation

nomalized Ea( 1113857 Ea

maximum Ea( 1113857 (11)

where normalized (Ea) is normalized entropy of ath elementin dataset

Equation (11) is repeated for all data elements in datasetand normalized entropy set (normalized (E)) is generated asfollows

Normalized (E) (normalized (E1) normalized(E2) normalized (EL))

42 Information Entropy-Based Migration Process in the BOApproach Here two mechanisms are introduced in mi-gration to improve the performance of BO approach )efirst one is original migration process of BO explained in theprevious section )e second one is updated mechanism oforiginal migration in which P particulars are elected arbi-trarily from present population (1lePlePS) and the oneoptimal particular is assigned as reference particular (Href ))e direction of migration is guided by Href and InformationEntropy is utilized to provide equivalency between pop-ulation miscellany and convergence speed )e higher In-formation Entropy of element indicates the maximumuncertainty which slows down the convergence speedHence the speed is enhanced for element by transferring itto the location based on reference particular globally withmaximum possibility as compared to the least InformationEntropy element

43 Chaos-Based Population Selection of the BO Approach)e chaos method is extremely related to initial circum-stances and successfully utilized for arbitrary numbergeneration using logistic function)e chaotic system is usedin the following equation

cf+1 μ times cf times 1 minus cf1113872 1113873 (12)

where μ represents constants in the range of [1 4] c rep-resents variables (c isin [0 1] f 0 1 )

)e population of BO is initialized by utilizing chaosfunction (equation (12)) which has improved the efficiencyof BO with proper use of huge solution area

44 6e Complete CBO-IE Approach for Data Mining)e Information Entropy-based migration and chaos-basedpopulation selection are combined in Chaotic Biogeogra-phy-Based Optimization using Information Entropy (CBO-IE) to solve the data mining that is data clustering inoptimal way (Figure 1) Firstly the population (PS) of BOapproach is initialized with the help of chaos function

4 Scientific Programming

(equation (12)) and every particular is denoted as a vectorwith size Lx L times K (L is number of elements in dataset Kis number of cluster centroids and Lx is dimension ofparticulars) )e K centroids locations are fixed into vectorin which 1st centroid relates with 1st L attributes 2nd centroidrelates to 2nd L attributes and so on )e primary particularvector values are generated arbitrarily and consistentlybetween lower and higher elements values in existing datasetwithin maximum number of iterations (Mi) )en CBO-IEis applied to calculate the fitness values of entire particularsby using equations (1) to (8) (Algorithm 2)

5 Results and Analysis

In this section the healthcare IoT datasets and performancefactors for proposed CBE-IE approach are explained briefly)e entire approaches are implemented with the help ofMATLAB 2021a tool using Core i3-3110M processor havingWindows 8 operating system with 2GB RAM and analyzedover 8 healthcare IoTdatasets (Table 1) )e proposed CBO-IE approach has evaluated the results for 500 iterations interms of intracluster distance purity index standard devi-ation root mean square error accuracy and F-Measure ascompared to prior algorithms like K-Means ACO PSOALO and BO over 30 independent runs

51 Healthcare IoT Datasets )e proposed CBO-IE ap-proach is applied on 8 dissimilar healthcare IoTdatasets takenfrom UCI repository (Table 1) )e datasets are thoracicsurgery breast cancer cryotherapy liver patient heart pa-tient chronic kidney disease diabetic retinopathy and bloodtransfusion)e 470 instances of thoracic surgery dataset with

17 attributes are divided into 2 classes (survive and notsurvive) the 699 instances of breast cancer dataset with 10attributes are distributed into 2 classes (benign and malig-nant) the 90 instances of cryotherapy dataset with 7 attributesare divided into 2 classes (success and failure) the 583 in-stances of liver patient dataset with 10 attributes are dis-tributed into 2 classes (liver patient and not) the 299 instancesof heart patient dataset with 12 attributes are divided into 2classes (heart failure and not) the 400 instances of chronickidney disease with 24 attributes are divided into 2 classes(disease found and not) the 1151 instances of diabetic reti-nopathy dataset with 18 attributes are distributed into 2classes (diabetic retinopathy and not) and the 748 instancesof blood transfusion dataset with 4 attributes are divided into2 classes (donating blood and not)

52 Performance Factors )e performance of proposedCBO-IE approach is evaluated in terms of intraclusterdistance purity index standard deviation root mean squareerror accuracy and F-Measure

521 Intracluster Distance Firstly the distances betweendata points are evaluated within a cluster After that theaverage of these distances is generated representing anintracluster distance)e optimal clustering is achieved withleast intracluster distance For a cluster the mean distance isevaluated between a centroid and total data points )is stepis continuously performed for every cluster and at last meanvalue of entire clustersrsquo intracluster distances is obtained

Table 2 illustrates that the CBO-IE generates leastintracluster distance value for all IoT datasets )e CBO-IEobtains 90 superior results than BO 92 superior results

STARTInitialize the parameters GM 1 Tmaximum 1 PS and Max_iterationInitialize the populations (arbitrary group of habitats) H1 H2 Hs

P

Evaluate fitness value (HSI) for every habitatWHILE (ending condition is not found)Evaluate αa βa and Ta for every habitatObtain a random isin (0 1)MigrationFOR every habitat (from minimum to maximum HSI values)Choose habitat Ha (SIV) stochastically proportional to βaIF randomlt βa and Ha (SIV) choose then

Choose habitat Hb (SIV) stochastically proportional to αaIF randomlt αa and Hb (SIV) choose thenHa (SIV)Hb (SIV)

END IFEND IF

END FORMUTATIONChoose Ha (SIV) with the help of mutation possibility proportional SaIF randomltTa thenArbitrary change the SIVs in Ha (SIV)

END IFEvaluate HSI value

END WHILESTOP

ALGORITHM 1 Biogeography-Based Optimization

Scientific Programming 5

than ALO 94 superior results than PSO 96 superiorresults than GA and 98 superior results than K-Means interms of intracluster distance for entire healthcare IoTdatasets )e average ranking of all approaches is evaluatedon the basis of minimum to maximum intracluster distance(from 1 to 6)

Figure 2 represents the better rank of proposed CBO-IEwith least intracluster distance values against other approachesK-Means GA PSO ALO and BO for all IoT datasets

522 Standard Deviation )e stiff data clustering in theregion of average value is described by a numerical char-acteristic called standard deviation (DS) )e best clusteringis achieved with less standard deviation which is obtained bythe following equation

DS

1113936(V minus V)

|H|

1113971

(13)

where |H| is dataset length V is dataset values (points) andV is dataset average value

Table 3 describes that the CBO-IE obtains less standarddeviation value for all IoT datasets )e CBO-IE generates93 better quality outcomes than BO 95 better qualityoutcomes than PSO and ALO 97 better quality outcomesthan GA and 98 better quality outcomes than K-Means interms of standard deviation for entire healthcare IoTdatasets

Figure 3 shows the better standard deviation of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

Initialize Particulars and entropy of everyelements

Initialize the population with the help of chaosfunction

Calculate Fitness values of particulars

Iteration = 0

Perform the information entropy basedmigration process (Algorithm 2)

Calculate the fitness values of new particulars

iteration = iteration + 1

Is ending situationsatisfied

NoYes

Result the optimal solution

End

Start

Figure 1 Flowchart of the CBO-IE approach

6 Scientific Programming

523 Purity Index )e correctness of clustering strategy isknown as purity in which accurate classification is per-formed over data elements Hence entire points of anisolated class can be accurately allocated to an isolatedcluster Purity index (IP) is generated with the help of purityby equations (14) and (15) Maximum purity is achieved withmaximum IP value nearer to 1

purity Rz( 1113857 maximum Rwz

111386811138681113868111386811138681113868111386811138681113872 1113873

Rz

11138681113868111386811138681113868111386811138681113868

(14)

IP

1113944K

z1

Rz

11138681113868111386811138681113868111386811138681113868Purity Rz( 11138571113872 1113873

|H| (15)

ALGORITHM 2 Information Entropy-based migration process in the BO approach

Table 1 Healthcare IoT datasets

Sr no Healthcare IoT dataset No of instances No of attributes No of clusters1 )oracic surgery 470 17 22 Breast cancer 699 10 23 Cryotherapy 90 7 24 Liver patient 583 10 25 Heart patient 299 12 26 Chronic kidney disease 400 24 27 Diabetic retinopathy 1151 18 28 Blood transfusion 748 4 2

Table 2 Average ranking of all algorithms based on intracluster distance

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 3364 (5) 07569 (4) 04256 (4) 02547 (3) 00657 (2) 000652 (1)Breast cancer 26265 (5) 08634 (4) 03126 (4) 02614 (3) 00874 (2) 000728 (1)Cryotherapy 243640 (5) 04562 (4) 02864 (4) 02167 (3) 00725 (2) 000548 (1)Liver patient 353245 (5) 05863 (4) 01567 (4) 01321 (3) 00635 (2) 000457 (1)Heart patient 142366 (5) 09362 (4) 07265 (4) 06492 (3) 00623 (2) 000832 (1)Chronic kidney disease 463625 (5) 05682 (4) 03568 (4) 02864 (3) 00742 (2) 000786 (1)Diabetic retinopathy 03658 (5) 02526 (4) 01737 (4) 01421 (3) 00967 (2) 000758 (1)Blood transfusion 06745 (5) 05684 (4) 02375 (4) 02068 (3) 00854 (2) 000852 (1)Average ranking (AKθ) 5 4 4 3 2 1

Scientific Programming 7

where Purity(Rz) is purity of zth cluster |Rz| is zth clusterlength |Rwz| is the number of data elements of wth classallocated to zth cluster

Table 4 explains that the CBO-IE generates maximumpurity index value for all IoT datasets )e CBO-IE obtains5 superior outputs than BO 7 superior outputs than

ALO 8 superior outputs than PSO 12 superior outputsthan GA and 15 superior outputs than K-Means in termsof purity index for entire healthcare IoT datasets

Figure 4 represents the better quality purity index ofproposed CBO-IE against other approaches K-Means GAPSO ALO and BO for all IoT datasets

524 F-Measure Firstly Precision (prsn) and Recall (rel)are evaluated to repossess the information by equations (16)and (17) After that both are combined to formalize the F-Measure (FM) by utilizing equations (18) and (19)

Table 3 Standard deviation for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 0116857 003984 0012036 001452 0007856 0000524Breast cancer 0002578 0168742 012542 015713 0003657 0000746Cryotherapy 00025143 0165875 0001254 0001674 0001036 0000845Liver patient 0256323 017589 0147856 012312 0096875 0000986Heart patient 0212678 0096852 0042574 0021563 0003687 0000251Chronic kidney disease 0020233 0103625 009854 0065214 0006574 0000865Diabetic retinopathy 0086574 0186523 0142657 0102647 0086478 00008657Blood transfusion 0213658 0196875 0185632 0152461 0010365 0000236

0123456

Datasets

Average Ranking

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sionIn

tra-c

luste

r Dist

ance

K-MeansGAPSO

ALOBOCBO-IE

Figure 2 Average ranking based on intracluster distance

0005

01015

02025

03

Stan

dard

Dev

iatio

n Standard Deviation

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 3 Standard deviation for all IoT datasets

Table 4 Purity index for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 083 086 089 089 091 093Breast cancer 086 088 090 089 093 095Cryotherapy 084 087 090 091 093 096Liver patient 086 087 088 086 090 092Heart patient 080 083 086 086 088 092Chronic kidneydisease 086 087 090 089 091 093

Diabetic retinopathy 088 090 091 091 093 095Blood transfusion 076 078 081 083 086 088

8 Scientific Programming

prsn(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rz

11138681113868111386811138681113868111386811138681113868

(16)

rel(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rw

11138681113868111386811138681113868111386811138681113868 (17)

FM(w z) 2 times prsn(w z) times rel(w z)

prsn(w z) + rel(w z) (18)

FM 1113944K

w1

Rw

11138681113868111386811138681113868111386811138681113868

|H|maximum FM(w z) (19)

where |Rw| is wth class lengthTable 5 describes that the CBO-IE obtains higher F-

Measure value for all IoTdatasets )e CBO-IE generates 4better outputs than BO 7 better outputs than ALO 8better outputs than PSO 13 better outputs than GA and16 better outputs than K-Means in terms of F-Measure forentire healthcare IoT datasets

Figure 5 shows the better F-Measure of proposed CBO-IE against other approaches K-Means GA PSO ALO andBO for all IoT datasets

525 Root Mean Square Error (RMSE) )e RMSE is de-fined as a divergence between predicted values and exper-imental (calculated) values )e best clustering is achievedwith minimum RMSE values of datasets It is evaluated byutilizing the following equation

RMSE

1|H|

1113944

|H|

j1Vj minus 1113954Vj1113872 1113873

2

11139741113972

(20)

where Vj is jth element value in dataset and 1113954Vj is j

th elementpredicted value in dataset

Table 6 explains that the CBO-IE generates least RMSEvalue for all IoT datasets )e CBO-IE generates 8 betterresults than BO 78 better results than ALO 81 better

results than PSO 87 better results than GA and 95 betterresults than K-Means in terms of RMSE for entire healthcareIoT datasets

Figure 6 shows the better RMSE of proposed CBO-IEagainst other approaches K-Means GA PSO ALO and BOfor all IoT datasets

526 Accuracy It evaluates the division of clusters that areaccurate (ie it evaluates proportion of verdicts that areexact) and describes the portion of clusters in the prevailinggroup

002040608

1

Purit

y In

dex

Purity Index

Datasets

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 4 Purity index for all IoT datasets

Table 5 F-Measure for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 081 084 087 087 089 091Breast cancer 084 086 087 088 090 093Cryotherapy 082 086 088 087 091 093Liver patient 084 086 087 087 089 091Heart patient 078 082 084 083 085 088Chronic kidneydisease 084 086 088 087 089 091

Diabetic retinopathy 086 089 090 091 092 094Blood transfusion 074 077 079 082 085 087

002040608

1

DatasetsTh

orac

ic S

urge

ry

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d

F-M

easu

re

F-Measure

K-MeansGAPSO

ALOBOCBO-IE

Figure 5 F-Measure for all IoT datasets

Table 6 RMSE values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-

IE)oracic surgery 0537 0327 0157 0124 00124 00034Breast cancer 0567 0386 0168 0264 00317 00075Cryotherapy 0621 0374 0234 0364 00421 00066Liver patient 0437 0361 0261 0213 00652 00054Heart patient 0537 0412 0201 0186 00145 00063Chronic kidneydisease 0610 0422 0341 0267 00254 00047

Diabeticretinopathy 0517 0414 0367 0257 00374 00087

Blood transfusion 0438 0314 0120 0265 00451 00071

Scientific Programming 9

accuracy 1113936

Kz1 Rz

|H|lowast 100 (21)

Table 7 represents that the CBO-IE obtains higher ac-curacy value for all IoT datasets )e CBO-IE has 3 su-perior outputs than BO 5 superior outputs than ALO 6superior outputs than PSO 9 superior outputs than GAand 13 superior outputs than K-Means in terms of ac-curacy for entire healthcare IoT datasets

Figure 7 represents the better accuracy of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

)e Information Entropy is used with BO approach forclustering to provide accurate and efficient distribution ofdata points in dataset )e chaos theory is utilized to ini-tialize the population of BO to improve the searching ca-pability of habitat which helps in cluster member selectionprocess more precisely )ese two strategies InformationEntropy and chaos theory have improved the performanceof BO to provide the best selection of cluster heads and theirmembers optimally So CBO-IE generates superior resultsthan BO ALO PSO GA and K-Means clusteringapproaches

53 Running Time Complexity of the CBO-IE Approach)e number of executions is directly related to timecomplexity of clustering approaches to run )e

complexity is calculated by utilizing few circumstances asfollows PS is population size L is number of elements indataset Lx represents dimensions of particulars (ele-ments) K is number of cluster centroids and Mi repre-sents maximum iterations )e cost of every execution isassumed as 1 unit )e Cost of all Executions (CE) isobtained to utilize Algorithm 3 by the followingequations

001020304050607

RMSE

Val

ues

RMSE Values

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 6 RMSE for all IoT datasets

Table 7 Accuracy values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 080 083 086 086 088 090Breast cancer 083 085 086 087 089 092Cryotherapy 081 085 087 086 090 092Liver patient 083 085 086 086 088 090Heart patient 077 081 083 082 084 087Chronic kidney disease 083 085 087 086 088 090Diabetic retinopathy 085 088 089 090 091 093Blood transfusion 073 076 078 081 084 086

002040608

1

Accu

racy

()

Accuracy

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 7 Accuracy for all IoT datasets

10 Scientific Programming

ALGORITHM 3 )e complete CBO-IE approach

Scientific Programming 11

CE 1 + L + 1 + PS

+ Mi + 1 + Mi times PS

+ Mi times PS

+ Mi times 2 times PS

times Lx + 3 times PS

+ 11113872 1113873

+ Mi times K times PS

+ 11113872 1113873 + Mi times K times PS

+ Mi times K times PS

+ Mi times K times PS

+ Mi times K times L + Mi times K times L + Mi times K times L

(22)

CE 2 times Mi times PS

times Lx + 4 times Mi times K times PS

+ 3 times Mi times K times L + 5 times Mi times PS

+ Mi times K + L + PS

+ 2 times Mi + 3 (23)

Supposing that all circumstances are equal in equation(23) in worst case equation (24) is generated as follows

CE 9n3

+ 6n2

+ 4n + 3 (24)

)e running time complexities of clustering approachesare O(n3) for CBO-IE O(n3) for BO O(n3) for ACO O(n3)for PSO O(n3) for ALO and O(n2) for K-Menas in worstcase Hence all are executable in polynomial time

54 Statistical Examination Measurement A statistical ex-amination is functioned to obtain the extension of signifi-cance dissimilarities in the effectiveness of clusteringtechniques Here a nonparametric Friedman Examination(FE) is utilized to discover dissimilarities amid the group ofserial appropriate variables )e entire clustering techniquesprovide equally effective explaining in the null hypothesis(Y0)

)e Friedman Examination (FE) is formulized by

FE 12 times N

DS

AC

AC

+ 11113872 11138731113944

5

θ1AKθ( 1113857

2minus

AC

times AC

+ 11113872 11138732

4⎡⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎦ (25)

where NDS is number of datasets AKθ is average rank of θth

approach and AC is number of clustering approaches)e FE critical value is 204925 taken from F-distribution

table [41] with (AC-1) and (ACndash1)lowast(NDSndash1) freedom degreewhich is obtained between (6ndash1) 5 and (6-1)lowast(8-1) 35 forapplying 6 clustering approaches (AC 6) over 8 datasets(NDS 8) with λ 010 (confidence stage) )e calculated FEvalue is higher as compared to the critical value for nullhypothesis (Y0) rejection if Y0 is not accepted)e evaluatedFE value is 1714 for initiating 6 clustering approaches(AC 6) over 8 datasets (NDS 8) with λ 010 Hence thecalculated FE is higher as compared to the critical FE andthen Y0 is rejected So it is to be summarized that entireclustering approaches are not equally effective

)erefore a post hoc examination is performed usingHolm strategy)e proposed CBE-IE is analyzed statisticallyagainst other clustering approaches in this examinationFirstly the z value is obtained from equation (24) and after

that probability (p) is generated with the help of z value andnormal distribution table [42] At last pj value is analyzedwith (λ(AC minus j)) (Table 8)

Z AKj minus AKθ

δ (26)

where

δ

AC

AC

+ 11113872 1113873

6 times NDS

1113971

(27)

Table 8 illustrates that the (λ(AC minus j)) value is higher ascompared to pj value this indicates the hypothesis rejectionfor entire cases Hence the proposed CBO-IE is superior inclustering as compared to the K-Means GA PSO and BOapproaches according to the above analysis

6 Conclusion

Various fields like medicine education and industriesutilize data mining for their useful applications )egrouping of data is performed on data clustering which is aspecific task in data mining to examine the database effi-ciently In this work a Chaotic Biogeography-Based Opti-mization approach using Information Entropy (CBO-IE) isproposed to obtain data clusters for healthcare IoT datasets)e Information Entropy is introduced with a BO approachto generate a better distribution of data points in the datasetand chaos theory is utilized to initialize the population of BOapproach )e Information Entropy and chaos theory arecombined with BO approach to generate optimal clusterheads and cluster members with enhanced convergencespeed of BO in huge search area )eMATLAB 2021a tool isused to implement the CBO-IE for eight healthcare IoTdatasets and the outcomes describe the better quality effi-ciency of CBO-IE on the basis of F-Measure intraclusterdistance running time complexity purity index statisticalanalysis standard deviation root mean square error andaccuracy as compared to previous techniques of clusteringlike K-Means GA PSO ALO and BO approaches )eFriedman Examination and Holm strategy are introduced to

Table 8 Holm strategy outputs

J Approaches z value p value (λ(AC minus j)) Hypothesis1 K-Means minus63245 000001 0020 Rejected2 GA minus50596 000001 0025 Rejected3 PSO minus37947 00001 0033 Rejected4 ALO minus25298 00057 005 Rejected5 BO minus13649 00861 010 Rejected

12 Scientific Programming

perform statistical analysis of proposed CBO-IE againstprevious techniques of clustering like K-Means GA PSOALO and BO approaches which represent that the CBO-IEgenerates 90 accurate results In future the proposedtechnique will be anticipated to estimate and be authorizedwith huge databases under big data Additionally a cross-layer communication will be anticipated to be offered andlegalized in IoT structural design in the future

Data Availability

)e data that support the findings of this study are availableupon request from the corresponding author

Conflicts of Interest

)e authors declare that there are no conflicts of interest

References

[1] A Haque L Khan and M Baron ldquoSAND semi-supervisedadaptive novel class detection and classification over datastreamrdquo in Proceedings of the 13th AAAI Conference on Ar-tificial Intelligence (AAAI-16) pp 1652ndash1658 Phoenix AZUSA February 2016

[2] N Mishra H K Soni S Sharma and A K UpadhyayldquoDevelopment and analysis of artificial neural networkmodels for rainfall prediction by using time-series datardquo I JIntelligent Systems and Applications vol 1 pp 16ndash23 2018

[3] H N Dai R C W Wong H Wang Z Zheng andA V Vasilakos ldquoBig data analytics for large scale wirelessnetworks challenges and opportunitiesrdquo ACM vol 52pp 1ndash46 2019

[4] A H Sabry W Z W Hasan M N MohtarR M K R Ahmad and H R Harun ldquoPlanter pressurerepetability data analysis for health adult based on EMEDsystemrdquo Malaysian Journal of Fundamental and AppliedSciences vol 14 no 1 pp 96ndash101 2018

[5] A Mensikova and C A Mattmann ldquoEnsemble sentimentanalysis to identify human trafficking in web datardquo in Pro-ceedings of the ACM Workshop on Graph Techniques forAdversarial Activity Analytics (GTA) pp 1ndash6 ACM NewYork Ny USA 2018

[6] S S Gill and R Buyya ldquoBio-inspired algorithms for big dataanalytics a survey taxonomy and open challengesrdquo Big DataAnalytics for Intelligent Healthcare Management ElsevierAmsterdam Netherlands pp 1ndash17 2019

[7] A Verma I Kaur and N Arora ldquoComparative analysis ofinformatio extraction techniques for data miningrdquo IndianJournal of Science and Technology vol 9 no 11 pp 1ndash182016

[8] B R Manju A Joshuva and V Sugumaran ldquoA data miningstudy for condition monitoring on wind turbine blades usinghoeffding tree algorithm through statistical and histogramfeaturesrdquo International Journal of Mechanical Engineering ampTechnology vol 9 no 1 pp 1061ndash1079 2018

[9] I Aytekin E Eyduran K Karadas R Akshan and I KeskinldquoPrediction of fattening final live weight from some bodymeasurements and fattening period in young bulls ofcrossbred and exotic breeds using MARS data mining algo-rithmrdquo Journal of Zoology vol 50 no 1 pp 189ndash195 2018

[10] S Ahmad and R Varma ldquoInformation extraction from textmessages using data mining techniquesrdquo Malaya Journal ofMatematik vol 5 no 1 pp 26ndash29 2018

[11] N E Oweis M M Fouad S R Oweis S S Owais andV Snasel ldquoA novel mapreduce lift assiciation riule miningalgorithm (MRLAR) for big datardquo International Journal ofAdvanced Computer Science and Applications vol 7 no 8pp 1ndash7 2016

[12] C W Tsai C F Lai M C Chiang and L T Yang ldquoDatamining for internet of things a surveyrdquo CommunicationsSurveys and Tutorials IEEE vol 16 no 1 pp 77ndash97 2014

[13] F Chen P Deng J Wan D Zhang A V Vasilakos andX Rong ldquoData mining for Internet of things literature reviewand challengesrdquo International Journal of Distributed SensorNetworks vol 2015 Article ID 365372 14 pages 2015

[14] S B Patel M N Patel and S M Shah ldquoFEA-HUIM fast andefficient algorithm for high utility item-set mining using noveldata structure and pruning strategyrdquo National Conference onAdvanced Research Trends in Information and Technologies(NCARTICT) IJSSET vol 4 no 2 pp 1ndash7 2018

[15] S Bhaskaran R Marappan and B Santhi ldquoDesign andanalysis of a cluster-based intelligent hybrid recommendationsystem for E-learning applicationsrdquo Mathematics MDPIvol 9 no 107 pp 1ndash21 2021

[16] D Giordano M Mellia and T Cerquitelli ldquoK-MDTSCK-Multi-Dimensional time-series clustering algorithmrdquoElectronics MDPI vol 10 no 1166 pp 1ndash21 2021

[17] R Mothukuri and B B Rao ldquoCluster Analysis of cyber crimedata using Rrdquo International Journal of Computer Science andMobile Applications vol 6 no 2 pp 62ndash70 2018

[18] A N Onaizah ldquoA novel data stream clustering algorithm inhealthcare IoTrdquo International Journal of Research in AppliedNatural and Social Sciences (IJRANSS) vol 5 no 6 pp 51ndash602017

[19] R Krishnamurthi A Kumar D Gopinathan A Nayyar andB Qureshi ldquoAn overview of IoT sensor data processingfusion and analysis techniquesrdquo Sensors MDPI vol 20pp 1ndash23 6076

[20] F Alam R Mehmood I Katib and A Albeshri ldquoAnalysis ofeight data mining algorithms for smarter Internet of things(IoT)rdquo in Proceedings of the International Workshop on DataMining in IoT Systems (DaMIS) pp 437ndash442 ElsevierLondon UK September 2016

[21] A M C Souza and J R A Amazonas ldquoAn outlier detectalgorithm using big data processing and Internet of thingsarchitecturerdquo in Proceedings of the International Workshop onBig Data and Data Mining Challenges on IoT and PervasiveSystems (BigD2M) pp 1010ndash1015 Elsevier London UK June2015

[22] D Gil M Johnsson H Mora and J Szymanski ldquoReview ofthe complexity of managing big data of the Internet of thingsrdquoComplexity vol 2019 Article ID 4592902 12 pages 2019

[23] M S Mahdavinejad M Rezvan M Barekatain P AdibiP Barnaghi and A P Sheth ldquoMachine Learning for Internetof things data analysis a surveyrdquoDigital Communications andNetworks vol 4 pp 161ndash175 2018

[24] M Ruta F Scioscia G Loseto A Pinto and E D SciascioldquoMachine learning in the Internet of things a semantic-en-hanced approachrdquo Semantic Web vol 10 pp 1ndash21 2017

[25] K Demertzis K Rantos and G Drosatos ldquoA dynamic in-telligent policies analysis mechanism for personal data pro-cessing in the IoT ecosystemrdquo Big Data And CognitiveComputing vol 4 no 9 pp 1ndash16 2020

Scientific Programming 13

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming

Page 4: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

of mutation rate (Ta) and the previous possibility of sub-sistence (Sa) by using the following equation

Ta Tmaximum 1 minusSa

Smaximum1113888 1113889 (6)

where Tmaximum is the highest mutation rate term illustratedby user and Smaximum is the highest possibility of castes count

After that the mutation operator is improved by utilizingCauchy model to provide optimal exploration strength ofBO in huge searching area by reducing potential limitationof mutation strategy )e Cauchy distribution is evaluatedwith the help of probability density function which isformulized by the following equation

F(X 0 1) 1

π 1 + X2

1113872 1113873 (7)

Hence the Cauchy mutation equation is described byusing the following equation

Ha(SIV) minimum Ha(SIV)( 1113857 + maximum Ha(SIV)( 1113857(

minus minimum Ha(SIV)( 11138571113857 times π times F Ha(SIV) 0 1( 1113857

(8)

where Ha(SIV) is ath habitat F(Ha(SIV) 0 1) is Cauchydistribution (Algorithm 1)

4 TheChaotic BOApproachUsing InformationEntropy (CBO-IE)

An updated BO approach is proposed using the chaoticbehaviour and Information Entropy )e BO approach iswell suitable for exploration and exploitation in hugesearching space and generates the efficient results for nu-meric optimization [40] Still the numeric optimization isreasonably different from data clustering mechanism Heresome intrinsic characteristics are investigated and a chaoticBO is implemented using Information Entropy to apply theBO for data clustering

41 Element Information Entropy )e data points are dis-tributed in multidimensional area in data clustering strategy)e confusion degree of a probabilistic variable is calculatedby utilizing Information Entropy after that this is utilizedfor elements measurements to calculate the distribution)eentropy (Ea) is evaluated to isolate the element values withestimation of every value to its closer integer by using thefollowing equation

Ea minus 1113944

maxa

bmina

Rblog Rb( 1113857 (9)

where Ea is the entropy of ath element (a 1 2 3 L) and Lis number of elements in dataset mina and maxa are lowestand highest integers after discretization of element values Rbis bth element integer percentage value

)emigration procedure of BO population is guarded byusing element entropy hence maximum entropy is evalu-ated for every element in dataset by the following equation

maximum Ea( 1113857 minuslog1

maxa minus mina + 11113888 1113889 (10)

where maximum (Ea) is maximum entropy of ath element indataset

At last the normalized entropy for every element iscalculated by the following equation

nomalized Ea( 1113857 Ea

maximum Ea( 1113857 (11)

where normalized (Ea) is normalized entropy of ath elementin dataset

Equation (11) is repeated for all data elements in datasetand normalized entropy set (normalized (E)) is generated asfollows

Normalized (E) (normalized (E1) normalized(E2) normalized (EL))

42 Information Entropy-Based Migration Process in the BOApproach Here two mechanisms are introduced in mi-gration to improve the performance of BO approach )efirst one is original migration process of BO explained in theprevious section )e second one is updated mechanism oforiginal migration in which P particulars are elected arbi-trarily from present population (1lePlePS) and the oneoptimal particular is assigned as reference particular (Href ))e direction of migration is guided by Href and InformationEntropy is utilized to provide equivalency between pop-ulation miscellany and convergence speed )e higher In-formation Entropy of element indicates the maximumuncertainty which slows down the convergence speedHence the speed is enhanced for element by transferring itto the location based on reference particular globally withmaximum possibility as compared to the least InformationEntropy element

43 Chaos-Based Population Selection of the BO Approach)e chaos method is extremely related to initial circum-stances and successfully utilized for arbitrary numbergeneration using logistic function)e chaotic system is usedin the following equation

cf+1 μ times cf times 1 minus cf1113872 1113873 (12)

where μ represents constants in the range of [1 4] c rep-resents variables (c isin [0 1] f 0 1 )

)e population of BO is initialized by utilizing chaosfunction (equation (12)) which has improved the efficiencyof BO with proper use of huge solution area

44 6e Complete CBO-IE Approach for Data Mining)e Information Entropy-based migration and chaos-basedpopulation selection are combined in Chaotic Biogeogra-phy-Based Optimization using Information Entropy (CBO-IE) to solve the data mining that is data clustering inoptimal way (Figure 1) Firstly the population (PS) of BOapproach is initialized with the help of chaos function

4 Scientific Programming

(equation (12)) and every particular is denoted as a vectorwith size Lx L times K (L is number of elements in dataset Kis number of cluster centroids and Lx is dimension ofparticulars) )e K centroids locations are fixed into vectorin which 1st centroid relates with 1st L attributes 2nd centroidrelates to 2nd L attributes and so on )e primary particularvector values are generated arbitrarily and consistentlybetween lower and higher elements values in existing datasetwithin maximum number of iterations (Mi) )en CBO-IEis applied to calculate the fitness values of entire particularsby using equations (1) to (8) (Algorithm 2)

5 Results and Analysis

In this section the healthcare IoT datasets and performancefactors for proposed CBE-IE approach are explained briefly)e entire approaches are implemented with the help ofMATLAB 2021a tool using Core i3-3110M processor havingWindows 8 operating system with 2GB RAM and analyzedover 8 healthcare IoTdatasets (Table 1) )e proposed CBO-IE approach has evaluated the results for 500 iterations interms of intracluster distance purity index standard devi-ation root mean square error accuracy and F-Measure ascompared to prior algorithms like K-Means ACO PSOALO and BO over 30 independent runs

51 Healthcare IoT Datasets )e proposed CBO-IE ap-proach is applied on 8 dissimilar healthcare IoTdatasets takenfrom UCI repository (Table 1) )e datasets are thoracicsurgery breast cancer cryotherapy liver patient heart pa-tient chronic kidney disease diabetic retinopathy and bloodtransfusion)e 470 instances of thoracic surgery dataset with

17 attributes are divided into 2 classes (survive and notsurvive) the 699 instances of breast cancer dataset with 10attributes are distributed into 2 classes (benign and malig-nant) the 90 instances of cryotherapy dataset with 7 attributesare divided into 2 classes (success and failure) the 583 in-stances of liver patient dataset with 10 attributes are dis-tributed into 2 classes (liver patient and not) the 299 instancesof heart patient dataset with 12 attributes are divided into 2classes (heart failure and not) the 400 instances of chronickidney disease with 24 attributes are divided into 2 classes(disease found and not) the 1151 instances of diabetic reti-nopathy dataset with 18 attributes are distributed into 2classes (diabetic retinopathy and not) and the 748 instancesof blood transfusion dataset with 4 attributes are divided into2 classes (donating blood and not)

52 Performance Factors )e performance of proposedCBO-IE approach is evaluated in terms of intraclusterdistance purity index standard deviation root mean squareerror accuracy and F-Measure

521 Intracluster Distance Firstly the distances betweendata points are evaluated within a cluster After that theaverage of these distances is generated representing anintracluster distance)e optimal clustering is achieved withleast intracluster distance For a cluster the mean distance isevaluated between a centroid and total data points )is stepis continuously performed for every cluster and at last meanvalue of entire clustersrsquo intracluster distances is obtained

Table 2 illustrates that the CBO-IE generates leastintracluster distance value for all IoT datasets )e CBO-IEobtains 90 superior results than BO 92 superior results

STARTInitialize the parameters GM 1 Tmaximum 1 PS and Max_iterationInitialize the populations (arbitrary group of habitats) H1 H2 Hs

P

Evaluate fitness value (HSI) for every habitatWHILE (ending condition is not found)Evaluate αa βa and Ta for every habitatObtain a random isin (0 1)MigrationFOR every habitat (from minimum to maximum HSI values)Choose habitat Ha (SIV) stochastically proportional to βaIF randomlt βa and Ha (SIV) choose then

Choose habitat Hb (SIV) stochastically proportional to αaIF randomlt αa and Hb (SIV) choose thenHa (SIV)Hb (SIV)

END IFEND IF

END FORMUTATIONChoose Ha (SIV) with the help of mutation possibility proportional SaIF randomltTa thenArbitrary change the SIVs in Ha (SIV)

END IFEvaluate HSI value

END WHILESTOP

ALGORITHM 1 Biogeography-Based Optimization

Scientific Programming 5

than ALO 94 superior results than PSO 96 superiorresults than GA and 98 superior results than K-Means interms of intracluster distance for entire healthcare IoTdatasets )e average ranking of all approaches is evaluatedon the basis of minimum to maximum intracluster distance(from 1 to 6)

Figure 2 represents the better rank of proposed CBO-IEwith least intracluster distance values against other approachesK-Means GA PSO ALO and BO for all IoT datasets

522 Standard Deviation )e stiff data clustering in theregion of average value is described by a numerical char-acteristic called standard deviation (DS) )e best clusteringis achieved with less standard deviation which is obtained bythe following equation

DS

1113936(V minus V)

|H|

1113971

(13)

where |H| is dataset length V is dataset values (points) andV is dataset average value

Table 3 describes that the CBO-IE obtains less standarddeviation value for all IoT datasets )e CBO-IE generates93 better quality outcomes than BO 95 better qualityoutcomes than PSO and ALO 97 better quality outcomesthan GA and 98 better quality outcomes than K-Means interms of standard deviation for entire healthcare IoTdatasets

Figure 3 shows the better standard deviation of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

Initialize Particulars and entropy of everyelements

Initialize the population with the help of chaosfunction

Calculate Fitness values of particulars

Iteration = 0

Perform the information entropy basedmigration process (Algorithm 2)

Calculate the fitness values of new particulars

iteration = iteration + 1

Is ending situationsatisfied

NoYes

Result the optimal solution

End

Start

Figure 1 Flowchart of the CBO-IE approach

6 Scientific Programming

523 Purity Index )e correctness of clustering strategy isknown as purity in which accurate classification is per-formed over data elements Hence entire points of anisolated class can be accurately allocated to an isolatedcluster Purity index (IP) is generated with the help of purityby equations (14) and (15) Maximum purity is achieved withmaximum IP value nearer to 1

purity Rz( 1113857 maximum Rwz

111386811138681113868111386811138681113868111386811138681113872 1113873

Rz

11138681113868111386811138681113868111386811138681113868

(14)

IP

1113944K

z1

Rz

11138681113868111386811138681113868111386811138681113868Purity Rz( 11138571113872 1113873

|H| (15)

ALGORITHM 2 Information Entropy-based migration process in the BO approach

Table 1 Healthcare IoT datasets

Sr no Healthcare IoT dataset No of instances No of attributes No of clusters1 )oracic surgery 470 17 22 Breast cancer 699 10 23 Cryotherapy 90 7 24 Liver patient 583 10 25 Heart patient 299 12 26 Chronic kidney disease 400 24 27 Diabetic retinopathy 1151 18 28 Blood transfusion 748 4 2

Table 2 Average ranking of all algorithms based on intracluster distance

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 3364 (5) 07569 (4) 04256 (4) 02547 (3) 00657 (2) 000652 (1)Breast cancer 26265 (5) 08634 (4) 03126 (4) 02614 (3) 00874 (2) 000728 (1)Cryotherapy 243640 (5) 04562 (4) 02864 (4) 02167 (3) 00725 (2) 000548 (1)Liver patient 353245 (5) 05863 (4) 01567 (4) 01321 (3) 00635 (2) 000457 (1)Heart patient 142366 (5) 09362 (4) 07265 (4) 06492 (3) 00623 (2) 000832 (1)Chronic kidney disease 463625 (5) 05682 (4) 03568 (4) 02864 (3) 00742 (2) 000786 (1)Diabetic retinopathy 03658 (5) 02526 (4) 01737 (4) 01421 (3) 00967 (2) 000758 (1)Blood transfusion 06745 (5) 05684 (4) 02375 (4) 02068 (3) 00854 (2) 000852 (1)Average ranking (AKθ) 5 4 4 3 2 1

Scientific Programming 7

where Purity(Rz) is purity of zth cluster |Rz| is zth clusterlength |Rwz| is the number of data elements of wth classallocated to zth cluster

Table 4 explains that the CBO-IE generates maximumpurity index value for all IoT datasets )e CBO-IE obtains5 superior outputs than BO 7 superior outputs than

ALO 8 superior outputs than PSO 12 superior outputsthan GA and 15 superior outputs than K-Means in termsof purity index for entire healthcare IoT datasets

Figure 4 represents the better quality purity index ofproposed CBO-IE against other approaches K-Means GAPSO ALO and BO for all IoT datasets

524 F-Measure Firstly Precision (prsn) and Recall (rel)are evaluated to repossess the information by equations (16)and (17) After that both are combined to formalize the F-Measure (FM) by utilizing equations (18) and (19)

Table 3 Standard deviation for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 0116857 003984 0012036 001452 0007856 0000524Breast cancer 0002578 0168742 012542 015713 0003657 0000746Cryotherapy 00025143 0165875 0001254 0001674 0001036 0000845Liver patient 0256323 017589 0147856 012312 0096875 0000986Heart patient 0212678 0096852 0042574 0021563 0003687 0000251Chronic kidney disease 0020233 0103625 009854 0065214 0006574 0000865Diabetic retinopathy 0086574 0186523 0142657 0102647 0086478 00008657Blood transfusion 0213658 0196875 0185632 0152461 0010365 0000236

0123456

Datasets

Average Ranking

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sionIn

tra-c

luste

r Dist

ance

K-MeansGAPSO

ALOBOCBO-IE

Figure 2 Average ranking based on intracluster distance

0005

01015

02025

03

Stan

dard

Dev

iatio

n Standard Deviation

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 3 Standard deviation for all IoT datasets

Table 4 Purity index for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 083 086 089 089 091 093Breast cancer 086 088 090 089 093 095Cryotherapy 084 087 090 091 093 096Liver patient 086 087 088 086 090 092Heart patient 080 083 086 086 088 092Chronic kidneydisease 086 087 090 089 091 093

Diabetic retinopathy 088 090 091 091 093 095Blood transfusion 076 078 081 083 086 088

8 Scientific Programming

prsn(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rz

11138681113868111386811138681113868111386811138681113868

(16)

rel(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rw

11138681113868111386811138681113868111386811138681113868 (17)

FM(w z) 2 times prsn(w z) times rel(w z)

prsn(w z) + rel(w z) (18)

FM 1113944K

w1

Rw

11138681113868111386811138681113868111386811138681113868

|H|maximum FM(w z) (19)

where |Rw| is wth class lengthTable 5 describes that the CBO-IE obtains higher F-

Measure value for all IoTdatasets )e CBO-IE generates 4better outputs than BO 7 better outputs than ALO 8better outputs than PSO 13 better outputs than GA and16 better outputs than K-Means in terms of F-Measure forentire healthcare IoT datasets

Figure 5 shows the better F-Measure of proposed CBO-IE against other approaches K-Means GA PSO ALO andBO for all IoT datasets

525 Root Mean Square Error (RMSE) )e RMSE is de-fined as a divergence between predicted values and exper-imental (calculated) values )e best clustering is achievedwith minimum RMSE values of datasets It is evaluated byutilizing the following equation

RMSE

1|H|

1113944

|H|

j1Vj minus 1113954Vj1113872 1113873

2

11139741113972

(20)

where Vj is jth element value in dataset and 1113954Vj is j

th elementpredicted value in dataset

Table 6 explains that the CBO-IE generates least RMSEvalue for all IoT datasets )e CBO-IE generates 8 betterresults than BO 78 better results than ALO 81 better

results than PSO 87 better results than GA and 95 betterresults than K-Means in terms of RMSE for entire healthcareIoT datasets

Figure 6 shows the better RMSE of proposed CBO-IEagainst other approaches K-Means GA PSO ALO and BOfor all IoT datasets

526 Accuracy It evaluates the division of clusters that areaccurate (ie it evaluates proportion of verdicts that areexact) and describes the portion of clusters in the prevailinggroup

002040608

1

Purit

y In

dex

Purity Index

Datasets

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 4 Purity index for all IoT datasets

Table 5 F-Measure for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 081 084 087 087 089 091Breast cancer 084 086 087 088 090 093Cryotherapy 082 086 088 087 091 093Liver patient 084 086 087 087 089 091Heart patient 078 082 084 083 085 088Chronic kidneydisease 084 086 088 087 089 091

Diabetic retinopathy 086 089 090 091 092 094Blood transfusion 074 077 079 082 085 087

002040608

1

DatasetsTh

orac

ic S

urge

ry

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d

F-M

easu

re

F-Measure

K-MeansGAPSO

ALOBOCBO-IE

Figure 5 F-Measure for all IoT datasets

Table 6 RMSE values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-

IE)oracic surgery 0537 0327 0157 0124 00124 00034Breast cancer 0567 0386 0168 0264 00317 00075Cryotherapy 0621 0374 0234 0364 00421 00066Liver patient 0437 0361 0261 0213 00652 00054Heart patient 0537 0412 0201 0186 00145 00063Chronic kidneydisease 0610 0422 0341 0267 00254 00047

Diabeticretinopathy 0517 0414 0367 0257 00374 00087

Blood transfusion 0438 0314 0120 0265 00451 00071

Scientific Programming 9

accuracy 1113936

Kz1 Rz

|H|lowast 100 (21)

Table 7 represents that the CBO-IE obtains higher ac-curacy value for all IoT datasets )e CBO-IE has 3 su-perior outputs than BO 5 superior outputs than ALO 6superior outputs than PSO 9 superior outputs than GAand 13 superior outputs than K-Means in terms of ac-curacy for entire healthcare IoT datasets

Figure 7 represents the better accuracy of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

)e Information Entropy is used with BO approach forclustering to provide accurate and efficient distribution ofdata points in dataset )e chaos theory is utilized to ini-tialize the population of BO to improve the searching ca-pability of habitat which helps in cluster member selectionprocess more precisely )ese two strategies InformationEntropy and chaos theory have improved the performanceof BO to provide the best selection of cluster heads and theirmembers optimally So CBO-IE generates superior resultsthan BO ALO PSO GA and K-Means clusteringapproaches

53 Running Time Complexity of the CBO-IE Approach)e number of executions is directly related to timecomplexity of clustering approaches to run )e

complexity is calculated by utilizing few circumstances asfollows PS is population size L is number of elements indataset Lx represents dimensions of particulars (ele-ments) K is number of cluster centroids and Mi repre-sents maximum iterations )e cost of every execution isassumed as 1 unit )e Cost of all Executions (CE) isobtained to utilize Algorithm 3 by the followingequations

001020304050607

RMSE

Val

ues

RMSE Values

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 6 RMSE for all IoT datasets

Table 7 Accuracy values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 080 083 086 086 088 090Breast cancer 083 085 086 087 089 092Cryotherapy 081 085 087 086 090 092Liver patient 083 085 086 086 088 090Heart patient 077 081 083 082 084 087Chronic kidney disease 083 085 087 086 088 090Diabetic retinopathy 085 088 089 090 091 093Blood transfusion 073 076 078 081 084 086

002040608

1

Accu

racy

()

Accuracy

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 7 Accuracy for all IoT datasets

10 Scientific Programming

ALGORITHM 3 )e complete CBO-IE approach

Scientific Programming 11

CE 1 + L + 1 + PS

+ Mi + 1 + Mi times PS

+ Mi times PS

+ Mi times 2 times PS

times Lx + 3 times PS

+ 11113872 1113873

+ Mi times K times PS

+ 11113872 1113873 + Mi times K times PS

+ Mi times K times PS

+ Mi times K times PS

+ Mi times K times L + Mi times K times L + Mi times K times L

(22)

CE 2 times Mi times PS

times Lx + 4 times Mi times K times PS

+ 3 times Mi times K times L + 5 times Mi times PS

+ Mi times K + L + PS

+ 2 times Mi + 3 (23)

Supposing that all circumstances are equal in equation(23) in worst case equation (24) is generated as follows

CE 9n3

+ 6n2

+ 4n + 3 (24)

)e running time complexities of clustering approachesare O(n3) for CBO-IE O(n3) for BO O(n3) for ACO O(n3)for PSO O(n3) for ALO and O(n2) for K-Menas in worstcase Hence all are executable in polynomial time

54 Statistical Examination Measurement A statistical ex-amination is functioned to obtain the extension of signifi-cance dissimilarities in the effectiveness of clusteringtechniques Here a nonparametric Friedman Examination(FE) is utilized to discover dissimilarities amid the group ofserial appropriate variables )e entire clustering techniquesprovide equally effective explaining in the null hypothesis(Y0)

)e Friedman Examination (FE) is formulized by

FE 12 times N

DS

AC

AC

+ 11113872 11138731113944

5

θ1AKθ( 1113857

2minus

AC

times AC

+ 11113872 11138732

4⎡⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎦ (25)

where NDS is number of datasets AKθ is average rank of θth

approach and AC is number of clustering approaches)e FE critical value is 204925 taken from F-distribution

table [41] with (AC-1) and (ACndash1)lowast(NDSndash1) freedom degreewhich is obtained between (6ndash1) 5 and (6-1)lowast(8-1) 35 forapplying 6 clustering approaches (AC 6) over 8 datasets(NDS 8) with λ 010 (confidence stage) )e calculated FEvalue is higher as compared to the critical value for nullhypothesis (Y0) rejection if Y0 is not accepted)e evaluatedFE value is 1714 for initiating 6 clustering approaches(AC 6) over 8 datasets (NDS 8) with λ 010 Hence thecalculated FE is higher as compared to the critical FE andthen Y0 is rejected So it is to be summarized that entireclustering approaches are not equally effective

)erefore a post hoc examination is performed usingHolm strategy)e proposed CBE-IE is analyzed statisticallyagainst other clustering approaches in this examinationFirstly the z value is obtained from equation (24) and after

that probability (p) is generated with the help of z value andnormal distribution table [42] At last pj value is analyzedwith (λ(AC minus j)) (Table 8)

Z AKj minus AKθ

δ (26)

where

δ

AC

AC

+ 11113872 1113873

6 times NDS

1113971

(27)

Table 8 illustrates that the (λ(AC minus j)) value is higher ascompared to pj value this indicates the hypothesis rejectionfor entire cases Hence the proposed CBO-IE is superior inclustering as compared to the K-Means GA PSO and BOapproaches according to the above analysis

6 Conclusion

Various fields like medicine education and industriesutilize data mining for their useful applications )egrouping of data is performed on data clustering which is aspecific task in data mining to examine the database effi-ciently In this work a Chaotic Biogeography-Based Opti-mization approach using Information Entropy (CBO-IE) isproposed to obtain data clusters for healthcare IoT datasets)e Information Entropy is introduced with a BO approachto generate a better distribution of data points in the datasetand chaos theory is utilized to initialize the population of BOapproach )e Information Entropy and chaos theory arecombined with BO approach to generate optimal clusterheads and cluster members with enhanced convergencespeed of BO in huge search area )eMATLAB 2021a tool isused to implement the CBO-IE for eight healthcare IoTdatasets and the outcomes describe the better quality effi-ciency of CBO-IE on the basis of F-Measure intraclusterdistance running time complexity purity index statisticalanalysis standard deviation root mean square error andaccuracy as compared to previous techniques of clusteringlike K-Means GA PSO ALO and BO approaches )eFriedman Examination and Holm strategy are introduced to

Table 8 Holm strategy outputs

J Approaches z value p value (λ(AC minus j)) Hypothesis1 K-Means minus63245 000001 0020 Rejected2 GA minus50596 000001 0025 Rejected3 PSO minus37947 00001 0033 Rejected4 ALO minus25298 00057 005 Rejected5 BO minus13649 00861 010 Rejected

12 Scientific Programming

perform statistical analysis of proposed CBO-IE againstprevious techniques of clustering like K-Means GA PSOALO and BO approaches which represent that the CBO-IEgenerates 90 accurate results In future the proposedtechnique will be anticipated to estimate and be authorizedwith huge databases under big data Additionally a cross-layer communication will be anticipated to be offered andlegalized in IoT structural design in the future

Data Availability

)e data that support the findings of this study are availableupon request from the corresponding author

Conflicts of Interest

)e authors declare that there are no conflicts of interest

References

[1] A Haque L Khan and M Baron ldquoSAND semi-supervisedadaptive novel class detection and classification over datastreamrdquo in Proceedings of the 13th AAAI Conference on Ar-tificial Intelligence (AAAI-16) pp 1652ndash1658 Phoenix AZUSA February 2016

[2] N Mishra H K Soni S Sharma and A K UpadhyayldquoDevelopment and analysis of artificial neural networkmodels for rainfall prediction by using time-series datardquo I JIntelligent Systems and Applications vol 1 pp 16ndash23 2018

[3] H N Dai R C W Wong H Wang Z Zheng andA V Vasilakos ldquoBig data analytics for large scale wirelessnetworks challenges and opportunitiesrdquo ACM vol 52pp 1ndash46 2019

[4] A H Sabry W Z W Hasan M N MohtarR M K R Ahmad and H R Harun ldquoPlanter pressurerepetability data analysis for health adult based on EMEDsystemrdquo Malaysian Journal of Fundamental and AppliedSciences vol 14 no 1 pp 96ndash101 2018

[5] A Mensikova and C A Mattmann ldquoEnsemble sentimentanalysis to identify human trafficking in web datardquo in Pro-ceedings of the ACM Workshop on Graph Techniques forAdversarial Activity Analytics (GTA) pp 1ndash6 ACM NewYork Ny USA 2018

[6] S S Gill and R Buyya ldquoBio-inspired algorithms for big dataanalytics a survey taxonomy and open challengesrdquo Big DataAnalytics for Intelligent Healthcare Management ElsevierAmsterdam Netherlands pp 1ndash17 2019

[7] A Verma I Kaur and N Arora ldquoComparative analysis ofinformatio extraction techniques for data miningrdquo IndianJournal of Science and Technology vol 9 no 11 pp 1ndash182016

[8] B R Manju A Joshuva and V Sugumaran ldquoA data miningstudy for condition monitoring on wind turbine blades usinghoeffding tree algorithm through statistical and histogramfeaturesrdquo International Journal of Mechanical Engineering ampTechnology vol 9 no 1 pp 1061ndash1079 2018

[9] I Aytekin E Eyduran K Karadas R Akshan and I KeskinldquoPrediction of fattening final live weight from some bodymeasurements and fattening period in young bulls ofcrossbred and exotic breeds using MARS data mining algo-rithmrdquo Journal of Zoology vol 50 no 1 pp 189ndash195 2018

[10] S Ahmad and R Varma ldquoInformation extraction from textmessages using data mining techniquesrdquo Malaya Journal ofMatematik vol 5 no 1 pp 26ndash29 2018

[11] N E Oweis M M Fouad S R Oweis S S Owais andV Snasel ldquoA novel mapreduce lift assiciation riule miningalgorithm (MRLAR) for big datardquo International Journal ofAdvanced Computer Science and Applications vol 7 no 8pp 1ndash7 2016

[12] C W Tsai C F Lai M C Chiang and L T Yang ldquoDatamining for internet of things a surveyrdquo CommunicationsSurveys and Tutorials IEEE vol 16 no 1 pp 77ndash97 2014

[13] F Chen P Deng J Wan D Zhang A V Vasilakos andX Rong ldquoData mining for Internet of things literature reviewand challengesrdquo International Journal of Distributed SensorNetworks vol 2015 Article ID 365372 14 pages 2015

[14] S B Patel M N Patel and S M Shah ldquoFEA-HUIM fast andefficient algorithm for high utility item-set mining using noveldata structure and pruning strategyrdquo National Conference onAdvanced Research Trends in Information and Technologies(NCARTICT) IJSSET vol 4 no 2 pp 1ndash7 2018

[15] S Bhaskaran R Marappan and B Santhi ldquoDesign andanalysis of a cluster-based intelligent hybrid recommendationsystem for E-learning applicationsrdquo Mathematics MDPIvol 9 no 107 pp 1ndash21 2021

[16] D Giordano M Mellia and T Cerquitelli ldquoK-MDTSCK-Multi-Dimensional time-series clustering algorithmrdquoElectronics MDPI vol 10 no 1166 pp 1ndash21 2021

[17] R Mothukuri and B B Rao ldquoCluster Analysis of cyber crimedata using Rrdquo International Journal of Computer Science andMobile Applications vol 6 no 2 pp 62ndash70 2018

[18] A N Onaizah ldquoA novel data stream clustering algorithm inhealthcare IoTrdquo International Journal of Research in AppliedNatural and Social Sciences (IJRANSS) vol 5 no 6 pp 51ndash602017

[19] R Krishnamurthi A Kumar D Gopinathan A Nayyar andB Qureshi ldquoAn overview of IoT sensor data processingfusion and analysis techniquesrdquo Sensors MDPI vol 20pp 1ndash23 6076

[20] F Alam R Mehmood I Katib and A Albeshri ldquoAnalysis ofeight data mining algorithms for smarter Internet of things(IoT)rdquo in Proceedings of the International Workshop on DataMining in IoT Systems (DaMIS) pp 437ndash442 ElsevierLondon UK September 2016

[21] A M C Souza and J R A Amazonas ldquoAn outlier detectalgorithm using big data processing and Internet of thingsarchitecturerdquo in Proceedings of the International Workshop onBig Data and Data Mining Challenges on IoT and PervasiveSystems (BigD2M) pp 1010ndash1015 Elsevier London UK June2015

[22] D Gil M Johnsson H Mora and J Szymanski ldquoReview ofthe complexity of managing big data of the Internet of thingsrdquoComplexity vol 2019 Article ID 4592902 12 pages 2019

[23] M S Mahdavinejad M Rezvan M Barekatain P AdibiP Barnaghi and A P Sheth ldquoMachine Learning for Internetof things data analysis a surveyrdquoDigital Communications andNetworks vol 4 pp 161ndash175 2018

[24] M Ruta F Scioscia G Loseto A Pinto and E D SciascioldquoMachine learning in the Internet of things a semantic-en-hanced approachrdquo Semantic Web vol 10 pp 1ndash21 2017

[25] K Demertzis K Rantos and G Drosatos ldquoA dynamic in-telligent policies analysis mechanism for personal data pro-cessing in the IoT ecosystemrdquo Big Data And CognitiveComputing vol 4 no 9 pp 1ndash16 2020

Scientific Programming 13

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming

Page 5: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

(equation (12)) and every particular is denoted as a vectorwith size Lx L times K (L is number of elements in dataset Kis number of cluster centroids and Lx is dimension ofparticulars) )e K centroids locations are fixed into vectorin which 1st centroid relates with 1st L attributes 2nd centroidrelates to 2nd L attributes and so on )e primary particularvector values are generated arbitrarily and consistentlybetween lower and higher elements values in existing datasetwithin maximum number of iterations (Mi) )en CBO-IEis applied to calculate the fitness values of entire particularsby using equations (1) to (8) (Algorithm 2)

5 Results and Analysis

In this section the healthcare IoT datasets and performancefactors for proposed CBE-IE approach are explained briefly)e entire approaches are implemented with the help ofMATLAB 2021a tool using Core i3-3110M processor havingWindows 8 operating system with 2GB RAM and analyzedover 8 healthcare IoTdatasets (Table 1) )e proposed CBO-IE approach has evaluated the results for 500 iterations interms of intracluster distance purity index standard devi-ation root mean square error accuracy and F-Measure ascompared to prior algorithms like K-Means ACO PSOALO and BO over 30 independent runs

51 Healthcare IoT Datasets )e proposed CBO-IE ap-proach is applied on 8 dissimilar healthcare IoTdatasets takenfrom UCI repository (Table 1) )e datasets are thoracicsurgery breast cancer cryotherapy liver patient heart pa-tient chronic kidney disease diabetic retinopathy and bloodtransfusion)e 470 instances of thoracic surgery dataset with

17 attributes are divided into 2 classes (survive and notsurvive) the 699 instances of breast cancer dataset with 10attributes are distributed into 2 classes (benign and malig-nant) the 90 instances of cryotherapy dataset with 7 attributesare divided into 2 classes (success and failure) the 583 in-stances of liver patient dataset with 10 attributes are dis-tributed into 2 classes (liver patient and not) the 299 instancesof heart patient dataset with 12 attributes are divided into 2classes (heart failure and not) the 400 instances of chronickidney disease with 24 attributes are divided into 2 classes(disease found and not) the 1151 instances of diabetic reti-nopathy dataset with 18 attributes are distributed into 2classes (diabetic retinopathy and not) and the 748 instancesof blood transfusion dataset with 4 attributes are divided into2 classes (donating blood and not)

52 Performance Factors )e performance of proposedCBO-IE approach is evaluated in terms of intraclusterdistance purity index standard deviation root mean squareerror accuracy and F-Measure

521 Intracluster Distance Firstly the distances betweendata points are evaluated within a cluster After that theaverage of these distances is generated representing anintracluster distance)e optimal clustering is achieved withleast intracluster distance For a cluster the mean distance isevaluated between a centroid and total data points )is stepis continuously performed for every cluster and at last meanvalue of entire clustersrsquo intracluster distances is obtained

Table 2 illustrates that the CBO-IE generates leastintracluster distance value for all IoT datasets )e CBO-IEobtains 90 superior results than BO 92 superior results

STARTInitialize the parameters GM 1 Tmaximum 1 PS and Max_iterationInitialize the populations (arbitrary group of habitats) H1 H2 Hs

P

Evaluate fitness value (HSI) for every habitatWHILE (ending condition is not found)Evaluate αa βa and Ta for every habitatObtain a random isin (0 1)MigrationFOR every habitat (from minimum to maximum HSI values)Choose habitat Ha (SIV) stochastically proportional to βaIF randomlt βa and Ha (SIV) choose then

Choose habitat Hb (SIV) stochastically proportional to αaIF randomlt αa and Hb (SIV) choose thenHa (SIV)Hb (SIV)

END IFEND IF

END FORMUTATIONChoose Ha (SIV) with the help of mutation possibility proportional SaIF randomltTa thenArbitrary change the SIVs in Ha (SIV)

END IFEvaluate HSI value

END WHILESTOP

ALGORITHM 1 Biogeography-Based Optimization

Scientific Programming 5

than ALO 94 superior results than PSO 96 superiorresults than GA and 98 superior results than K-Means interms of intracluster distance for entire healthcare IoTdatasets )e average ranking of all approaches is evaluatedon the basis of minimum to maximum intracluster distance(from 1 to 6)

Figure 2 represents the better rank of proposed CBO-IEwith least intracluster distance values against other approachesK-Means GA PSO ALO and BO for all IoT datasets

522 Standard Deviation )e stiff data clustering in theregion of average value is described by a numerical char-acteristic called standard deviation (DS) )e best clusteringis achieved with less standard deviation which is obtained bythe following equation

DS

1113936(V minus V)

|H|

1113971

(13)

where |H| is dataset length V is dataset values (points) andV is dataset average value

Table 3 describes that the CBO-IE obtains less standarddeviation value for all IoT datasets )e CBO-IE generates93 better quality outcomes than BO 95 better qualityoutcomes than PSO and ALO 97 better quality outcomesthan GA and 98 better quality outcomes than K-Means interms of standard deviation for entire healthcare IoTdatasets

Figure 3 shows the better standard deviation of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

Initialize Particulars and entropy of everyelements

Initialize the population with the help of chaosfunction

Calculate Fitness values of particulars

Iteration = 0

Perform the information entropy basedmigration process (Algorithm 2)

Calculate the fitness values of new particulars

iteration = iteration + 1

Is ending situationsatisfied

NoYes

Result the optimal solution

End

Start

Figure 1 Flowchart of the CBO-IE approach

6 Scientific Programming

523 Purity Index )e correctness of clustering strategy isknown as purity in which accurate classification is per-formed over data elements Hence entire points of anisolated class can be accurately allocated to an isolatedcluster Purity index (IP) is generated with the help of purityby equations (14) and (15) Maximum purity is achieved withmaximum IP value nearer to 1

purity Rz( 1113857 maximum Rwz

111386811138681113868111386811138681113868111386811138681113872 1113873

Rz

11138681113868111386811138681113868111386811138681113868

(14)

IP

1113944K

z1

Rz

11138681113868111386811138681113868111386811138681113868Purity Rz( 11138571113872 1113873

|H| (15)

ALGORITHM 2 Information Entropy-based migration process in the BO approach

Table 1 Healthcare IoT datasets

Sr no Healthcare IoT dataset No of instances No of attributes No of clusters1 )oracic surgery 470 17 22 Breast cancer 699 10 23 Cryotherapy 90 7 24 Liver patient 583 10 25 Heart patient 299 12 26 Chronic kidney disease 400 24 27 Diabetic retinopathy 1151 18 28 Blood transfusion 748 4 2

Table 2 Average ranking of all algorithms based on intracluster distance

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 3364 (5) 07569 (4) 04256 (4) 02547 (3) 00657 (2) 000652 (1)Breast cancer 26265 (5) 08634 (4) 03126 (4) 02614 (3) 00874 (2) 000728 (1)Cryotherapy 243640 (5) 04562 (4) 02864 (4) 02167 (3) 00725 (2) 000548 (1)Liver patient 353245 (5) 05863 (4) 01567 (4) 01321 (3) 00635 (2) 000457 (1)Heart patient 142366 (5) 09362 (4) 07265 (4) 06492 (3) 00623 (2) 000832 (1)Chronic kidney disease 463625 (5) 05682 (4) 03568 (4) 02864 (3) 00742 (2) 000786 (1)Diabetic retinopathy 03658 (5) 02526 (4) 01737 (4) 01421 (3) 00967 (2) 000758 (1)Blood transfusion 06745 (5) 05684 (4) 02375 (4) 02068 (3) 00854 (2) 000852 (1)Average ranking (AKθ) 5 4 4 3 2 1

Scientific Programming 7

where Purity(Rz) is purity of zth cluster |Rz| is zth clusterlength |Rwz| is the number of data elements of wth classallocated to zth cluster

Table 4 explains that the CBO-IE generates maximumpurity index value for all IoT datasets )e CBO-IE obtains5 superior outputs than BO 7 superior outputs than

ALO 8 superior outputs than PSO 12 superior outputsthan GA and 15 superior outputs than K-Means in termsof purity index for entire healthcare IoT datasets

Figure 4 represents the better quality purity index ofproposed CBO-IE against other approaches K-Means GAPSO ALO and BO for all IoT datasets

524 F-Measure Firstly Precision (prsn) and Recall (rel)are evaluated to repossess the information by equations (16)and (17) After that both are combined to formalize the F-Measure (FM) by utilizing equations (18) and (19)

Table 3 Standard deviation for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 0116857 003984 0012036 001452 0007856 0000524Breast cancer 0002578 0168742 012542 015713 0003657 0000746Cryotherapy 00025143 0165875 0001254 0001674 0001036 0000845Liver patient 0256323 017589 0147856 012312 0096875 0000986Heart patient 0212678 0096852 0042574 0021563 0003687 0000251Chronic kidney disease 0020233 0103625 009854 0065214 0006574 0000865Diabetic retinopathy 0086574 0186523 0142657 0102647 0086478 00008657Blood transfusion 0213658 0196875 0185632 0152461 0010365 0000236

0123456

Datasets

Average Ranking

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sionIn

tra-c

luste

r Dist

ance

K-MeansGAPSO

ALOBOCBO-IE

Figure 2 Average ranking based on intracluster distance

0005

01015

02025

03

Stan

dard

Dev

iatio

n Standard Deviation

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 3 Standard deviation for all IoT datasets

Table 4 Purity index for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 083 086 089 089 091 093Breast cancer 086 088 090 089 093 095Cryotherapy 084 087 090 091 093 096Liver patient 086 087 088 086 090 092Heart patient 080 083 086 086 088 092Chronic kidneydisease 086 087 090 089 091 093

Diabetic retinopathy 088 090 091 091 093 095Blood transfusion 076 078 081 083 086 088

8 Scientific Programming

prsn(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rz

11138681113868111386811138681113868111386811138681113868

(16)

rel(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rw

11138681113868111386811138681113868111386811138681113868 (17)

FM(w z) 2 times prsn(w z) times rel(w z)

prsn(w z) + rel(w z) (18)

FM 1113944K

w1

Rw

11138681113868111386811138681113868111386811138681113868

|H|maximum FM(w z) (19)

where |Rw| is wth class lengthTable 5 describes that the CBO-IE obtains higher F-

Measure value for all IoTdatasets )e CBO-IE generates 4better outputs than BO 7 better outputs than ALO 8better outputs than PSO 13 better outputs than GA and16 better outputs than K-Means in terms of F-Measure forentire healthcare IoT datasets

Figure 5 shows the better F-Measure of proposed CBO-IE against other approaches K-Means GA PSO ALO andBO for all IoT datasets

525 Root Mean Square Error (RMSE) )e RMSE is de-fined as a divergence between predicted values and exper-imental (calculated) values )e best clustering is achievedwith minimum RMSE values of datasets It is evaluated byutilizing the following equation

RMSE

1|H|

1113944

|H|

j1Vj minus 1113954Vj1113872 1113873

2

11139741113972

(20)

where Vj is jth element value in dataset and 1113954Vj is j

th elementpredicted value in dataset

Table 6 explains that the CBO-IE generates least RMSEvalue for all IoT datasets )e CBO-IE generates 8 betterresults than BO 78 better results than ALO 81 better

results than PSO 87 better results than GA and 95 betterresults than K-Means in terms of RMSE for entire healthcareIoT datasets

Figure 6 shows the better RMSE of proposed CBO-IEagainst other approaches K-Means GA PSO ALO and BOfor all IoT datasets

526 Accuracy It evaluates the division of clusters that areaccurate (ie it evaluates proportion of verdicts that areexact) and describes the portion of clusters in the prevailinggroup

002040608

1

Purit

y In

dex

Purity Index

Datasets

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 4 Purity index for all IoT datasets

Table 5 F-Measure for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 081 084 087 087 089 091Breast cancer 084 086 087 088 090 093Cryotherapy 082 086 088 087 091 093Liver patient 084 086 087 087 089 091Heart patient 078 082 084 083 085 088Chronic kidneydisease 084 086 088 087 089 091

Diabetic retinopathy 086 089 090 091 092 094Blood transfusion 074 077 079 082 085 087

002040608

1

DatasetsTh

orac

ic S

urge

ry

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d

F-M

easu

re

F-Measure

K-MeansGAPSO

ALOBOCBO-IE

Figure 5 F-Measure for all IoT datasets

Table 6 RMSE values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-

IE)oracic surgery 0537 0327 0157 0124 00124 00034Breast cancer 0567 0386 0168 0264 00317 00075Cryotherapy 0621 0374 0234 0364 00421 00066Liver patient 0437 0361 0261 0213 00652 00054Heart patient 0537 0412 0201 0186 00145 00063Chronic kidneydisease 0610 0422 0341 0267 00254 00047

Diabeticretinopathy 0517 0414 0367 0257 00374 00087

Blood transfusion 0438 0314 0120 0265 00451 00071

Scientific Programming 9

accuracy 1113936

Kz1 Rz

|H|lowast 100 (21)

Table 7 represents that the CBO-IE obtains higher ac-curacy value for all IoT datasets )e CBO-IE has 3 su-perior outputs than BO 5 superior outputs than ALO 6superior outputs than PSO 9 superior outputs than GAand 13 superior outputs than K-Means in terms of ac-curacy for entire healthcare IoT datasets

Figure 7 represents the better accuracy of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

)e Information Entropy is used with BO approach forclustering to provide accurate and efficient distribution ofdata points in dataset )e chaos theory is utilized to ini-tialize the population of BO to improve the searching ca-pability of habitat which helps in cluster member selectionprocess more precisely )ese two strategies InformationEntropy and chaos theory have improved the performanceof BO to provide the best selection of cluster heads and theirmembers optimally So CBO-IE generates superior resultsthan BO ALO PSO GA and K-Means clusteringapproaches

53 Running Time Complexity of the CBO-IE Approach)e number of executions is directly related to timecomplexity of clustering approaches to run )e

complexity is calculated by utilizing few circumstances asfollows PS is population size L is number of elements indataset Lx represents dimensions of particulars (ele-ments) K is number of cluster centroids and Mi repre-sents maximum iterations )e cost of every execution isassumed as 1 unit )e Cost of all Executions (CE) isobtained to utilize Algorithm 3 by the followingequations

001020304050607

RMSE

Val

ues

RMSE Values

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 6 RMSE for all IoT datasets

Table 7 Accuracy values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 080 083 086 086 088 090Breast cancer 083 085 086 087 089 092Cryotherapy 081 085 087 086 090 092Liver patient 083 085 086 086 088 090Heart patient 077 081 083 082 084 087Chronic kidney disease 083 085 087 086 088 090Diabetic retinopathy 085 088 089 090 091 093Blood transfusion 073 076 078 081 084 086

002040608

1

Accu

racy

()

Accuracy

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 7 Accuracy for all IoT datasets

10 Scientific Programming

ALGORITHM 3 )e complete CBO-IE approach

Scientific Programming 11

CE 1 + L + 1 + PS

+ Mi + 1 + Mi times PS

+ Mi times PS

+ Mi times 2 times PS

times Lx + 3 times PS

+ 11113872 1113873

+ Mi times K times PS

+ 11113872 1113873 + Mi times K times PS

+ Mi times K times PS

+ Mi times K times PS

+ Mi times K times L + Mi times K times L + Mi times K times L

(22)

CE 2 times Mi times PS

times Lx + 4 times Mi times K times PS

+ 3 times Mi times K times L + 5 times Mi times PS

+ Mi times K + L + PS

+ 2 times Mi + 3 (23)

Supposing that all circumstances are equal in equation(23) in worst case equation (24) is generated as follows

CE 9n3

+ 6n2

+ 4n + 3 (24)

)e running time complexities of clustering approachesare O(n3) for CBO-IE O(n3) for BO O(n3) for ACO O(n3)for PSO O(n3) for ALO and O(n2) for K-Menas in worstcase Hence all are executable in polynomial time

54 Statistical Examination Measurement A statistical ex-amination is functioned to obtain the extension of signifi-cance dissimilarities in the effectiveness of clusteringtechniques Here a nonparametric Friedman Examination(FE) is utilized to discover dissimilarities amid the group ofserial appropriate variables )e entire clustering techniquesprovide equally effective explaining in the null hypothesis(Y0)

)e Friedman Examination (FE) is formulized by

FE 12 times N

DS

AC

AC

+ 11113872 11138731113944

5

θ1AKθ( 1113857

2minus

AC

times AC

+ 11113872 11138732

4⎡⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎦ (25)

where NDS is number of datasets AKθ is average rank of θth

approach and AC is number of clustering approaches)e FE critical value is 204925 taken from F-distribution

table [41] with (AC-1) and (ACndash1)lowast(NDSndash1) freedom degreewhich is obtained between (6ndash1) 5 and (6-1)lowast(8-1) 35 forapplying 6 clustering approaches (AC 6) over 8 datasets(NDS 8) with λ 010 (confidence stage) )e calculated FEvalue is higher as compared to the critical value for nullhypothesis (Y0) rejection if Y0 is not accepted)e evaluatedFE value is 1714 for initiating 6 clustering approaches(AC 6) over 8 datasets (NDS 8) with λ 010 Hence thecalculated FE is higher as compared to the critical FE andthen Y0 is rejected So it is to be summarized that entireclustering approaches are not equally effective

)erefore a post hoc examination is performed usingHolm strategy)e proposed CBE-IE is analyzed statisticallyagainst other clustering approaches in this examinationFirstly the z value is obtained from equation (24) and after

that probability (p) is generated with the help of z value andnormal distribution table [42] At last pj value is analyzedwith (λ(AC minus j)) (Table 8)

Z AKj minus AKθ

δ (26)

where

δ

AC

AC

+ 11113872 1113873

6 times NDS

1113971

(27)

Table 8 illustrates that the (λ(AC minus j)) value is higher ascompared to pj value this indicates the hypothesis rejectionfor entire cases Hence the proposed CBO-IE is superior inclustering as compared to the K-Means GA PSO and BOapproaches according to the above analysis

6 Conclusion

Various fields like medicine education and industriesutilize data mining for their useful applications )egrouping of data is performed on data clustering which is aspecific task in data mining to examine the database effi-ciently In this work a Chaotic Biogeography-Based Opti-mization approach using Information Entropy (CBO-IE) isproposed to obtain data clusters for healthcare IoT datasets)e Information Entropy is introduced with a BO approachto generate a better distribution of data points in the datasetand chaos theory is utilized to initialize the population of BOapproach )e Information Entropy and chaos theory arecombined with BO approach to generate optimal clusterheads and cluster members with enhanced convergencespeed of BO in huge search area )eMATLAB 2021a tool isused to implement the CBO-IE for eight healthcare IoTdatasets and the outcomes describe the better quality effi-ciency of CBO-IE on the basis of F-Measure intraclusterdistance running time complexity purity index statisticalanalysis standard deviation root mean square error andaccuracy as compared to previous techniques of clusteringlike K-Means GA PSO ALO and BO approaches )eFriedman Examination and Holm strategy are introduced to

Table 8 Holm strategy outputs

J Approaches z value p value (λ(AC minus j)) Hypothesis1 K-Means minus63245 000001 0020 Rejected2 GA minus50596 000001 0025 Rejected3 PSO minus37947 00001 0033 Rejected4 ALO minus25298 00057 005 Rejected5 BO minus13649 00861 010 Rejected

12 Scientific Programming

perform statistical analysis of proposed CBO-IE againstprevious techniques of clustering like K-Means GA PSOALO and BO approaches which represent that the CBO-IEgenerates 90 accurate results In future the proposedtechnique will be anticipated to estimate and be authorizedwith huge databases under big data Additionally a cross-layer communication will be anticipated to be offered andlegalized in IoT structural design in the future

Data Availability

)e data that support the findings of this study are availableupon request from the corresponding author

Conflicts of Interest

)e authors declare that there are no conflicts of interest

References

[1] A Haque L Khan and M Baron ldquoSAND semi-supervisedadaptive novel class detection and classification over datastreamrdquo in Proceedings of the 13th AAAI Conference on Ar-tificial Intelligence (AAAI-16) pp 1652ndash1658 Phoenix AZUSA February 2016

[2] N Mishra H K Soni S Sharma and A K UpadhyayldquoDevelopment and analysis of artificial neural networkmodels for rainfall prediction by using time-series datardquo I JIntelligent Systems and Applications vol 1 pp 16ndash23 2018

[3] H N Dai R C W Wong H Wang Z Zheng andA V Vasilakos ldquoBig data analytics for large scale wirelessnetworks challenges and opportunitiesrdquo ACM vol 52pp 1ndash46 2019

[4] A H Sabry W Z W Hasan M N MohtarR M K R Ahmad and H R Harun ldquoPlanter pressurerepetability data analysis for health adult based on EMEDsystemrdquo Malaysian Journal of Fundamental and AppliedSciences vol 14 no 1 pp 96ndash101 2018

[5] A Mensikova and C A Mattmann ldquoEnsemble sentimentanalysis to identify human trafficking in web datardquo in Pro-ceedings of the ACM Workshop on Graph Techniques forAdversarial Activity Analytics (GTA) pp 1ndash6 ACM NewYork Ny USA 2018

[6] S S Gill and R Buyya ldquoBio-inspired algorithms for big dataanalytics a survey taxonomy and open challengesrdquo Big DataAnalytics for Intelligent Healthcare Management ElsevierAmsterdam Netherlands pp 1ndash17 2019

[7] A Verma I Kaur and N Arora ldquoComparative analysis ofinformatio extraction techniques for data miningrdquo IndianJournal of Science and Technology vol 9 no 11 pp 1ndash182016

[8] B R Manju A Joshuva and V Sugumaran ldquoA data miningstudy for condition monitoring on wind turbine blades usinghoeffding tree algorithm through statistical and histogramfeaturesrdquo International Journal of Mechanical Engineering ampTechnology vol 9 no 1 pp 1061ndash1079 2018

[9] I Aytekin E Eyduran K Karadas R Akshan and I KeskinldquoPrediction of fattening final live weight from some bodymeasurements and fattening period in young bulls ofcrossbred and exotic breeds using MARS data mining algo-rithmrdquo Journal of Zoology vol 50 no 1 pp 189ndash195 2018

[10] S Ahmad and R Varma ldquoInformation extraction from textmessages using data mining techniquesrdquo Malaya Journal ofMatematik vol 5 no 1 pp 26ndash29 2018

[11] N E Oweis M M Fouad S R Oweis S S Owais andV Snasel ldquoA novel mapreduce lift assiciation riule miningalgorithm (MRLAR) for big datardquo International Journal ofAdvanced Computer Science and Applications vol 7 no 8pp 1ndash7 2016

[12] C W Tsai C F Lai M C Chiang and L T Yang ldquoDatamining for internet of things a surveyrdquo CommunicationsSurveys and Tutorials IEEE vol 16 no 1 pp 77ndash97 2014

[13] F Chen P Deng J Wan D Zhang A V Vasilakos andX Rong ldquoData mining for Internet of things literature reviewand challengesrdquo International Journal of Distributed SensorNetworks vol 2015 Article ID 365372 14 pages 2015

[14] S B Patel M N Patel and S M Shah ldquoFEA-HUIM fast andefficient algorithm for high utility item-set mining using noveldata structure and pruning strategyrdquo National Conference onAdvanced Research Trends in Information and Technologies(NCARTICT) IJSSET vol 4 no 2 pp 1ndash7 2018

[15] S Bhaskaran R Marappan and B Santhi ldquoDesign andanalysis of a cluster-based intelligent hybrid recommendationsystem for E-learning applicationsrdquo Mathematics MDPIvol 9 no 107 pp 1ndash21 2021

[16] D Giordano M Mellia and T Cerquitelli ldquoK-MDTSCK-Multi-Dimensional time-series clustering algorithmrdquoElectronics MDPI vol 10 no 1166 pp 1ndash21 2021

[17] R Mothukuri and B B Rao ldquoCluster Analysis of cyber crimedata using Rrdquo International Journal of Computer Science andMobile Applications vol 6 no 2 pp 62ndash70 2018

[18] A N Onaizah ldquoA novel data stream clustering algorithm inhealthcare IoTrdquo International Journal of Research in AppliedNatural and Social Sciences (IJRANSS) vol 5 no 6 pp 51ndash602017

[19] R Krishnamurthi A Kumar D Gopinathan A Nayyar andB Qureshi ldquoAn overview of IoT sensor data processingfusion and analysis techniquesrdquo Sensors MDPI vol 20pp 1ndash23 6076

[20] F Alam R Mehmood I Katib and A Albeshri ldquoAnalysis ofeight data mining algorithms for smarter Internet of things(IoT)rdquo in Proceedings of the International Workshop on DataMining in IoT Systems (DaMIS) pp 437ndash442 ElsevierLondon UK September 2016

[21] A M C Souza and J R A Amazonas ldquoAn outlier detectalgorithm using big data processing and Internet of thingsarchitecturerdquo in Proceedings of the International Workshop onBig Data and Data Mining Challenges on IoT and PervasiveSystems (BigD2M) pp 1010ndash1015 Elsevier London UK June2015

[22] D Gil M Johnsson H Mora and J Szymanski ldquoReview ofthe complexity of managing big data of the Internet of thingsrdquoComplexity vol 2019 Article ID 4592902 12 pages 2019

[23] M S Mahdavinejad M Rezvan M Barekatain P AdibiP Barnaghi and A P Sheth ldquoMachine Learning for Internetof things data analysis a surveyrdquoDigital Communications andNetworks vol 4 pp 161ndash175 2018

[24] M Ruta F Scioscia G Loseto A Pinto and E D SciascioldquoMachine learning in the Internet of things a semantic-en-hanced approachrdquo Semantic Web vol 10 pp 1ndash21 2017

[25] K Demertzis K Rantos and G Drosatos ldquoA dynamic in-telligent policies analysis mechanism for personal data pro-cessing in the IoT ecosystemrdquo Big Data And CognitiveComputing vol 4 no 9 pp 1ndash16 2020

Scientific Programming 13

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming

Page 6: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

than ALO 94 superior results than PSO 96 superiorresults than GA and 98 superior results than K-Means interms of intracluster distance for entire healthcare IoTdatasets )e average ranking of all approaches is evaluatedon the basis of minimum to maximum intracluster distance(from 1 to 6)

Figure 2 represents the better rank of proposed CBO-IEwith least intracluster distance values against other approachesK-Means GA PSO ALO and BO for all IoT datasets

522 Standard Deviation )e stiff data clustering in theregion of average value is described by a numerical char-acteristic called standard deviation (DS) )e best clusteringis achieved with less standard deviation which is obtained bythe following equation

DS

1113936(V minus V)

|H|

1113971

(13)

where |H| is dataset length V is dataset values (points) andV is dataset average value

Table 3 describes that the CBO-IE obtains less standarddeviation value for all IoT datasets )e CBO-IE generates93 better quality outcomes than BO 95 better qualityoutcomes than PSO and ALO 97 better quality outcomesthan GA and 98 better quality outcomes than K-Means interms of standard deviation for entire healthcare IoTdatasets

Figure 3 shows the better standard deviation of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

Initialize Particulars and entropy of everyelements

Initialize the population with the help of chaosfunction

Calculate Fitness values of particulars

Iteration = 0

Perform the information entropy basedmigration process (Algorithm 2)

Calculate the fitness values of new particulars

iteration = iteration + 1

Is ending situationsatisfied

NoYes

Result the optimal solution

End

Start

Figure 1 Flowchart of the CBO-IE approach

6 Scientific Programming

523 Purity Index )e correctness of clustering strategy isknown as purity in which accurate classification is per-formed over data elements Hence entire points of anisolated class can be accurately allocated to an isolatedcluster Purity index (IP) is generated with the help of purityby equations (14) and (15) Maximum purity is achieved withmaximum IP value nearer to 1

purity Rz( 1113857 maximum Rwz

111386811138681113868111386811138681113868111386811138681113872 1113873

Rz

11138681113868111386811138681113868111386811138681113868

(14)

IP

1113944K

z1

Rz

11138681113868111386811138681113868111386811138681113868Purity Rz( 11138571113872 1113873

|H| (15)

ALGORITHM 2 Information Entropy-based migration process in the BO approach

Table 1 Healthcare IoT datasets

Sr no Healthcare IoT dataset No of instances No of attributes No of clusters1 )oracic surgery 470 17 22 Breast cancer 699 10 23 Cryotherapy 90 7 24 Liver patient 583 10 25 Heart patient 299 12 26 Chronic kidney disease 400 24 27 Diabetic retinopathy 1151 18 28 Blood transfusion 748 4 2

Table 2 Average ranking of all algorithms based on intracluster distance

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 3364 (5) 07569 (4) 04256 (4) 02547 (3) 00657 (2) 000652 (1)Breast cancer 26265 (5) 08634 (4) 03126 (4) 02614 (3) 00874 (2) 000728 (1)Cryotherapy 243640 (5) 04562 (4) 02864 (4) 02167 (3) 00725 (2) 000548 (1)Liver patient 353245 (5) 05863 (4) 01567 (4) 01321 (3) 00635 (2) 000457 (1)Heart patient 142366 (5) 09362 (4) 07265 (4) 06492 (3) 00623 (2) 000832 (1)Chronic kidney disease 463625 (5) 05682 (4) 03568 (4) 02864 (3) 00742 (2) 000786 (1)Diabetic retinopathy 03658 (5) 02526 (4) 01737 (4) 01421 (3) 00967 (2) 000758 (1)Blood transfusion 06745 (5) 05684 (4) 02375 (4) 02068 (3) 00854 (2) 000852 (1)Average ranking (AKθ) 5 4 4 3 2 1

Scientific Programming 7

where Purity(Rz) is purity of zth cluster |Rz| is zth clusterlength |Rwz| is the number of data elements of wth classallocated to zth cluster

Table 4 explains that the CBO-IE generates maximumpurity index value for all IoT datasets )e CBO-IE obtains5 superior outputs than BO 7 superior outputs than

ALO 8 superior outputs than PSO 12 superior outputsthan GA and 15 superior outputs than K-Means in termsof purity index for entire healthcare IoT datasets

Figure 4 represents the better quality purity index ofproposed CBO-IE against other approaches K-Means GAPSO ALO and BO for all IoT datasets

524 F-Measure Firstly Precision (prsn) and Recall (rel)are evaluated to repossess the information by equations (16)and (17) After that both are combined to formalize the F-Measure (FM) by utilizing equations (18) and (19)

Table 3 Standard deviation for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 0116857 003984 0012036 001452 0007856 0000524Breast cancer 0002578 0168742 012542 015713 0003657 0000746Cryotherapy 00025143 0165875 0001254 0001674 0001036 0000845Liver patient 0256323 017589 0147856 012312 0096875 0000986Heart patient 0212678 0096852 0042574 0021563 0003687 0000251Chronic kidney disease 0020233 0103625 009854 0065214 0006574 0000865Diabetic retinopathy 0086574 0186523 0142657 0102647 0086478 00008657Blood transfusion 0213658 0196875 0185632 0152461 0010365 0000236

0123456

Datasets

Average Ranking

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sionIn

tra-c

luste

r Dist

ance

K-MeansGAPSO

ALOBOCBO-IE

Figure 2 Average ranking based on intracluster distance

0005

01015

02025

03

Stan

dard

Dev

iatio

n Standard Deviation

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 3 Standard deviation for all IoT datasets

Table 4 Purity index for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 083 086 089 089 091 093Breast cancer 086 088 090 089 093 095Cryotherapy 084 087 090 091 093 096Liver patient 086 087 088 086 090 092Heart patient 080 083 086 086 088 092Chronic kidneydisease 086 087 090 089 091 093

Diabetic retinopathy 088 090 091 091 093 095Blood transfusion 076 078 081 083 086 088

8 Scientific Programming

prsn(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rz

11138681113868111386811138681113868111386811138681113868

(16)

rel(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rw

11138681113868111386811138681113868111386811138681113868 (17)

FM(w z) 2 times prsn(w z) times rel(w z)

prsn(w z) + rel(w z) (18)

FM 1113944K

w1

Rw

11138681113868111386811138681113868111386811138681113868

|H|maximum FM(w z) (19)

where |Rw| is wth class lengthTable 5 describes that the CBO-IE obtains higher F-

Measure value for all IoTdatasets )e CBO-IE generates 4better outputs than BO 7 better outputs than ALO 8better outputs than PSO 13 better outputs than GA and16 better outputs than K-Means in terms of F-Measure forentire healthcare IoT datasets

Figure 5 shows the better F-Measure of proposed CBO-IE against other approaches K-Means GA PSO ALO andBO for all IoT datasets

525 Root Mean Square Error (RMSE) )e RMSE is de-fined as a divergence between predicted values and exper-imental (calculated) values )e best clustering is achievedwith minimum RMSE values of datasets It is evaluated byutilizing the following equation

RMSE

1|H|

1113944

|H|

j1Vj minus 1113954Vj1113872 1113873

2

11139741113972

(20)

where Vj is jth element value in dataset and 1113954Vj is j

th elementpredicted value in dataset

Table 6 explains that the CBO-IE generates least RMSEvalue for all IoT datasets )e CBO-IE generates 8 betterresults than BO 78 better results than ALO 81 better

results than PSO 87 better results than GA and 95 betterresults than K-Means in terms of RMSE for entire healthcareIoT datasets

Figure 6 shows the better RMSE of proposed CBO-IEagainst other approaches K-Means GA PSO ALO and BOfor all IoT datasets

526 Accuracy It evaluates the division of clusters that areaccurate (ie it evaluates proportion of verdicts that areexact) and describes the portion of clusters in the prevailinggroup

002040608

1

Purit

y In

dex

Purity Index

Datasets

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 4 Purity index for all IoT datasets

Table 5 F-Measure for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 081 084 087 087 089 091Breast cancer 084 086 087 088 090 093Cryotherapy 082 086 088 087 091 093Liver patient 084 086 087 087 089 091Heart patient 078 082 084 083 085 088Chronic kidneydisease 084 086 088 087 089 091

Diabetic retinopathy 086 089 090 091 092 094Blood transfusion 074 077 079 082 085 087

002040608

1

DatasetsTh

orac

ic S

urge

ry

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d

F-M

easu

re

F-Measure

K-MeansGAPSO

ALOBOCBO-IE

Figure 5 F-Measure for all IoT datasets

Table 6 RMSE values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-

IE)oracic surgery 0537 0327 0157 0124 00124 00034Breast cancer 0567 0386 0168 0264 00317 00075Cryotherapy 0621 0374 0234 0364 00421 00066Liver patient 0437 0361 0261 0213 00652 00054Heart patient 0537 0412 0201 0186 00145 00063Chronic kidneydisease 0610 0422 0341 0267 00254 00047

Diabeticretinopathy 0517 0414 0367 0257 00374 00087

Blood transfusion 0438 0314 0120 0265 00451 00071

Scientific Programming 9

accuracy 1113936

Kz1 Rz

|H|lowast 100 (21)

Table 7 represents that the CBO-IE obtains higher ac-curacy value for all IoT datasets )e CBO-IE has 3 su-perior outputs than BO 5 superior outputs than ALO 6superior outputs than PSO 9 superior outputs than GAand 13 superior outputs than K-Means in terms of ac-curacy for entire healthcare IoT datasets

Figure 7 represents the better accuracy of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

)e Information Entropy is used with BO approach forclustering to provide accurate and efficient distribution ofdata points in dataset )e chaos theory is utilized to ini-tialize the population of BO to improve the searching ca-pability of habitat which helps in cluster member selectionprocess more precisely )ese two strategies InformationEntropy and chaos theory have improved the performanceof BO to provide the best selection of cluster heads and theirmembers optimally So CBO-IE generates superior resultsthan BO ALO PSO GA and K-Means clusteringapproaches

53 Running Time Complexity of the CBO-IE Approach)e number of executions is directly related to timecomplexity of clustering approaches to run )e

complexity is calculated by utilizing few circumstances asfollows PS is population size L is number of elements indataset Lx represents dimensions of particulars (ele-ments) K is number of cluster centroids and Mi repre-sents maximum iterations )e cost of every execution isassumed as 1 unit )e Cost of all Executions (CE) isobtained to utilize Algorithm 3 by the followingequations

001020304050607

RMSE

Val

ues

RMSE Values

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 6 RMSE for all IoT datasets

Table 7 Accuracy values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 080 083 086 086 088 090Breast cancer 083 085 086 087 089 092Cryotherapy 081 085 087 086 090 092Liver patient 083 085 086 086 088 090Heart patient 077 081 083 082 084 087Chronic kidney disease 083 085 087 086 088 090Diabetic retinopathy 085 088 089 090 091 093Blood transfusion 073 076 078 081 084 086

002040608

1

Accu

racy

()

Accuracy

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 7 Accuracy for all IoT datasets

10 Scientific Programming

ALGORITHM 3 )e complete CBO-IE approach

Scientific Programming 11

CE 1 + L + 1 + PS

+ Mi + 1 + Mi times PS

+ Mi times PS

+ Mi times 2 times PS

times Lx + 3 times PS

+ 11113872 1113873

+ Mi times K times PS

+ 11113872 1113873 + Mi times K times PS

+ Mi times K times PS

+ Mi times K times PS

+ Mi times K times L + Mi times K times L + Mi times K times L

(22)

CE 2 times Mi times PS

times Lx + 4 times Mi times K times PS

+ 3 times Mi times K times L + 5 times Mi times PS

+ Mi times K + L + PS

+ 2 times Mi + 3 (23)

Supposing that all circumstances are equal in equation(23) in worst case equation (24) is generated as follows

CE 9n3

+ 6n2

+ 4n + 3 (24)

)e running time complexities of clustering approachesare O(n3) for CBO-IE O(n3) for BO O(n3) for ACO O(n3)for PSO O(n3) for ALO and O(n2) for K-Menas in worstcase Hence all are executable in polynomial time

54 Statistical Examination Measurement A statistical ex-amination is functioned to obtain the extension of signifi-cance dissimilarities in the effectiveness of clusteringtechniques Here a nonparametric Friedman Examination(FE) is utilized to discover dissimilarities amid the group ofserial appropriate variables )e entire clustering techniquesprovide equally effective explaining in the null hypothesis(Y0)

)e Friedman Examination (FE) is formulized by

FE 12 times N

DS

AC

AC

+ 11113872 11138731113944

5

θ1AKθ( 1113857

2minus

AC

times AC

+ 11113872 11138732

4⎡⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎦ (25)

where NDS is number of datasets AKθ is average rank of θth

approach and AC is number of clustering approaches)e FE critical value is 204925 taken from F-distribution

table [41] with (AC-1) and (ACndash1)lowast(NDSndash1) freedom degreewhich is obtained between (6ndash1) 5 and (6-1)lowast(8-1) 35 forapplying 6 clustering approaches (AC 6) over 8 datasets(NDS 8) with λ 010 (confidence stage) )e calculated FEvalue is higher as compared to the critical value for nullhypothesis (Y0) rejection if Y0 is not accepted)e evaluatedFE value is 1714 for initiating 6 clustering approaches(AC 6) over 8 datasets (NDS 8) with λ 010 Hence thecalculated FE is higher as compared to the critical FE andthen Y0 is rejected So it is to be summarized that entireclustering approaches are not equally effective

)erefore a post hoc examination is performed usingHolm strategy)e proposed CBE-IE is analyzed statisticallyagainst other clustering approaches in this examinationFirstly the z value is obtained from equation (24) and after

that probability (p) is generated with the help of z value andnormal distribution table [42] At last pj value is analyzedwith (λ(AC minus j)) (Table 8)

Z AKj minus AKθ

δ (26)

where

δ

AC

AC

+ 11113872 1113873

6 times NDS

1113971

(27)

Table 8 illustrates that the (λ(AC minus j)) value is higher ascompared to pj value this indicates the hypothesis rejectionfor entire cases Hence the proposed CBO-IE is superior inclustering as compared to the K-Means GA PSO and BOapproaches according to the above analysis

6 Conclusion

Various fields like medicine education and industriesutilize data mining for their useful applications )egrouping of data is performed on data clustering which is aspecific task in data mining to examine the database effi-ciently In this work a Chaotic Biogeography-Based Opti-mization approach using Information Entropy (CBO-IE) isproposed to obtain data clusters for healthcare IoT datasets)e Information Entropy is introduced with a BO approachto generate a better distribution of data points in the datasetand chaos theory is utilized to initialize the population of BOapproach )e Information Entropy and chaos theory arecombined with BO approach to generate optimal clusterheads and cluster members with enhanced convergencespeed of BO in huge search area )eMATLAB 2021a tool isused to implement the CBO-IE for eight healthcare IoTdatasets and the outcomes describe the better quality effi-ciency of CBO-IE on the basis of F-Measure intraclusterdistance running time complexity purity index statisticalanalysis standard deviation root mean square error andaccuracy as compared to previous techniques of clusteringlike K-Means GA PSO ALO and BO approaches )eFriedman Examination and Holm strategy are introduced to

Table 8 Holm strategy outputs

J Approaches z value p value (λ(AC minus j)) Hypothesis1 K-Means minus63245 000001 0020 Rejected2 GA minus50596 000001 0025 Rejected3 PSO minus37947 00001 0033 Rejected4 ALO minus25298 00057 005 Rejected5 BO minus13649 00861 010 Rejected

12 Scientific Programming

perform statistical analysis of proposed CBO-IE againstprevious techniques of clustering like K-Means GA PSOALO and BO approaches which represent that the CBO-IEgenerates 90 accurate results In future the proposedtechnique will be anticipated to estimate and be authorizedwith huge databases under big data Additionally a cross-layer communication will be anticipated to be offered andlegalized in IoT structural design in the future

Data Availability

)e data that support the findings of this study are availableupon request from the corresponding author

Conflicts of Interest

)e authors declare that there are no conflicts of interest

References

[1] A Haque L Khan and M Baron ldquoSAND semi-supervisedadaptive novel class detection and classification over datastreamrdquo in Proceedings of the 13th AAAI Conference on Ar-tificial Intelligence (AAAI-16) pp 1652ndash1658 Phoenix AZUSA February 2016

[2] N Mishra H K Soni S Sharma and A K UpadhyayldquoDevelopment and analysis of artificial neural networkmodels for rainfall prediction by using time-series datardquo I JIntelligent Systems and Applications vol 1 pp 16ndash23 2018

[3] H N Dai R C W Wong H Wang Z Zheng andA V Vasilakos ldquoBig data analytics for large scale wirelessnetworks challenges and opportunitiesrdquo ACM vol 52pp 1ndash46 2019

[4] A H Sabry W Z W Hasan M N MohtarR M K R Ahmad and H R Harun ldquoPlanter pressurerepetability data analysis for health adult based on EMEDsystemrdquo Malaysian Journal of Fundamental and AppliedSciences vol 14 no 1 pp 96ndash101 2018

[5] A Mensikova and C A Mattmann ldquoEnsemble sentimentanalysis to identify human trafficking in web datardquo in Pro-ceedings of the ACM Workshop on Graph Techniques forAdversarial Activity Analytics (GTA) pp 1ndash6 ACM NewYork Ny USA 2018

[6] S S Gill and R Buyya ldquoBio-inspired algorithms for big dataanalytics a survey taxonomy and open challengesrdquo Big DataAnalytics for Intelligent Healthcare Management ElsevierAmsterdam Netherlands pp 1ndash17 2019

[7] A Verma I Kaur and N Arora ldquoComparative analysis ofinformatio extraction techniques for data miningrdquo IndianJournal of Science and Technology vol 9 no 11 pp 1ndash182016

[8] B R Manju A Joshuva and V Sugumaran ldquoA data miningstudy for condition monitoring on wind turbine blades usinghoeffding tree algorithm through statistical and histogramfeaturesrdquo International Journal of Mechanical Engineering ampTechnology vol 9 no 1 pp 1061ndash1079 2018

[9] I Aytekin E Eyduran K Karadas R Akshan and I KeskinldquoPrediction of fattening final live weight from some bodymeasurements and fattening period in young bulls ofcrossbred and exotic breeds using MARS data mining algo-rithmrdquo Journal of Zoology vol 50 no 1 pp 189ndash195 2018

[10] S Ahmad and R Varma ldquoInformation extraction from textmessages using data mining techniquesrdquo Malaya Journal ofMatematik vol 5 no 1 pp 26ndash29 2018

[11] N E Oweis M M Fouad S R Oweis S S Owais andV Snasel ldquoA novel mapreduce lift assiciation riule miningalgorithm (MRLAR) for big datardquo International Journal ofAdvanced Computer Science and Applications vol 7 no 8pp 1ndash7 2016

[12] C W Tsai C F Lai M C Chiang and L T Yang ldquoDatamining for internet of things a surveyrdquo CommunicationsSurveys and Tutorials IEEE vol 16 no 1 pp 77ndash97 2014

[13] F Chen P Deng J Wan D Zhang A V Vasilakos andX Rong ldquoData mining for Internet of things literature reviewand challengesrdquo International Journal of Distributed SensorNetworks vol 2015 Article ID 365372 14 pages 2015

[14] S B Patel M N Patel and S M Shah ldquoFEA-HUIM fast andefficient algorithm for high utility item-set mining using noveldata structure and pruning strategyrdquo National Conference onAdvanced Research Trends in Information and Technologies(NCARTICT) IJSSET vol 4 no 2 pp 1ndash7 2018

[15] S Bhaskaran R Marappan and B Santhi ldquoDesign andanalysis of a cluster-based intelligent hybrid recommendationsystem for E-learning applicationsrdquo Mathematics MDPIvol 9 no 107 pp 1ndash21 2021

[16] D Giordano M Mellia and T Cerquitelli ldquoK-MDTSCK-Multi-Dimensional time-series clustering algorithmrdquoElectronics MDPI vol 10 no 1166 pp 1ndash21 2021

[17] R Mothukuri and B B Rao ldquoCluster Analysis of cyber crimedata using Rrdquo International Journal of Computer Science andMobile Applications vol 6 no 2 pp 62ndash70 2018

[18] A N Onaizah ldquoA novel data stream clustering algorithm inhealthcare IoTrdquo International Journal of Research in AppliedNatural and Social Sciences (IJRANSS) vol 5 no 6 pp 51ndash602017

[19] R Krishnamurthi A Kumar D Gopinathan A Nayyar andB Qureshi ldquoAn overview of IoT sensor data processingfusion and analysis techniquesrdquo Sensors MDPI vol 20pp 1ndash23 6076

[20] F Alam R Mehmood I Katib and A Albeshri ldquoAnalysis ofeight data mining algorithms for smarter Internet of things(IoT)rdquo in Proceedings of the International Workshop on DataMining in IoT Systems (DaMIS) pp 437ndash442 ElsevierLondon UK September 2016

[21] A M C Souza and J R A Amazonas ldquoAn outlier detectalgorithm using big data processing and Internet of thingsarchitecturerdquo in Proceedings of the International Workshop onBig Data and Data Mining Challenges on IoT and PervasiveSystems (BigD2M) pp 1010ndash1015 Elsevier London UK June2015

[22] D Gil M Johnsson H Mora and J Szymanski ldquoReview ofthe complexity of managing big data of the Internet of thingsrdquoComplexity vol 2019 Article ID 4592902 12 pages 2019

[23] M S Mahdavinejad M Rezvan M Barekatain P AdibiP Barnaghi and A P Sheth ldquoMachine Learning for Internetof things data analysis a surveyrdquoDigital Communications andNetworks vol 4 pp 161ndash175 2018

[24] M Ruta F Scioscia G Loseto A Pinto and E D SciascioldquoMachine learning in the Internet of things a semantic-en-hanced approachrdquo Semantic Web vol 10 pp 1ndash21 2017

[25] K Demertzis K Rantos and G Drosatos ldquoA dynamic in-telligent policies analysis mechanism for personal data pro-cessing in the IoT ecosystemrdquo Big Data And CognitiveComputing vol 4 no 9 pp 1ndash16 2020

Scientific Programming 13

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming

Page 7: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

523 Purity Index )e correctness of clustering strategy isknown as purity in which accurate classification is per-formed over data elements Hence entire points of anisolated class can be accurately allocated to an isolatedcluster Purity index (IP) is generated with the help of purityby equations (14) and (15) Maximum purity is achieved withmaximum IP value nearer to 1

purity Rz( 1113857 maximum Rwz

111386811138681113868111386811138681113868111386811138681113872 1113873

Rz

11138681113868111386811138681113868111386811138681113868

(14)

IP

1113944K

z1

Rz

11138681113868111386811138681113868111386811138681113868Purity Rz( 11138571113872 1113873

|H| (15)

ALGORITHM 2 Information Entropy-based migration process in the BO approach

Table 1 Healthcare IoT datasets

Sr no Healthcare IoT dataset No of instances No of attributes No of clusters1 )oracic surgery 470 17 22 Breast cancer 699 10 23 Cryotherapy 90 7 24 Liver patient 583 10 25 Heart patient 299 12 26 Chronic kidney disease 400 24 27 Diabetic retinopathy 1151 18 28 Blood transfusion 748 4 2

Table 2 Average ranking of all algorithms based on intracluster distance

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 3364 (5) 07569 (4) 04256 (4) 02547 (3) 00657 (2) 000652 (1)Breast cancer 26265 (5) 08634 (4) 03126 (4) 02614 (3) 00874 (2) 000728 (1)Cryotherapy 243640 (5) 04562 (4) 02864 (4) 02167 (3) 00725 (2) 000548 (1)Liver patient 353245 (5) 05863 (4) 01567 (4) 01321 (3) 00635 (2) 000457 (1)Heart patient 142366 (5) 09362 (4) 07265 (4) 06492 (3) 00623 (2) 000832 (1)Chronic kidney disease 463625 (5) 05682 (4) 03568 (4) 02864 (3) 00742 (2) 000786 (1)Diabetic retinopathy 03658 (5) 02526 (4) 01737 (4) 01421 (3) 00967 (2) 000758 (1)Blood transfusion 06745 (5) 05684 (4) 02375 (4) 02068 (3) 00854 (2) 000852 (1)Average ranking (AKθ) 5 4 4 3 2 1

Scientific Programming 7

where Purity(Rz) is purity of zth cluster |Rz| is zth clusterlength |Rwz| is the number of data elements of wth classallocated to zth cluster

Table 4 explains that the CBO-IE generates maximumpurity index value for all IoT datasets )e CBO-IE obtains5 superior outputs than BO 7 superior outputs than

ALO 8 superior outputs than PSO 12 superior outputsthan GA and 15 superior outputs than K-Means in termsof purity index for entire healthcare IoT datasets

Figure 4 represents the better quality purity index ofproposed CBO-IE against other approaches K-Means GAPSO ALO and BO for all IoT datasets

524 F-Measure Firstly Precision (prsn) and Recall (rel)are evaluated to repossess the information by equations (16)and (17) After that both are combined to formalize the F-Measure (FM) by utilizing equations (18) and (19)

Table 3 Standard deviation for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 0116857 003984 0012036 001452 0007856 0000524Breast cancer 0002578 0168742 012542 015713 0003657 0000746Cryotherapy 00025143 0165875 0001254 0001674 0001036 0000845Liver patient 0256323 017589 0147856 012312 0096875 0000986Heart patient 0212678 0096852 0042574 0021563 0003687 0000251Chronic kidney disease 0020233 0103625 009854 0065214 0006574 0000865Diabetic retinopathy 0086574 0186523 0142657 0102647 0086478 00008657Blood transfusion 0213658 0196875 0185632 0152461 0010365 0000236

0123456

Datasets

Average Ranking

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sionIn

tra-c

luste

r Dist

ance

K-MeansGAPSO

ALOBOCBO-IE

Figure 2 Average ranking based on intracluster distance

0005

01015

02025

03

Stan

dard

Dev

iatio

n Standard Deviation

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 3 Standard deviation for all IoT datasets

Table 4 Purity index for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 083 086 089 089 091 093Breast cancer 086 088 090 089 093 095Cryotherapy 084 087 090 091 093 096Liver patient 086 087 088 086 090 092Heart patient 080 083 086 086 088 092Chronic kidneydisease 086 087 090 089 091 093

Diabetic retinopathy 088 090 091 091 093 095Blood transfusion 076 078 081 083 086 088

8 Scientific Programming

prsn(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rz

11138681113868111386811138681113868111386811138681113868

(16)

rel(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rw

11138681113868111386811138681113868111386811138681113868 (17)

FM(w z) 2 times prsn(w z) times rel(w z)

prsn(w z) + rel(w z) (18)

FM 1113944K

w1

Rw

11138681113868111386811138681113868111386811138681113868

|H|maximum FM(w z) (19)

where |Rw| is wth class lengthTable 5 describes that the CBO-IE obtains higher F-

Measure value for all IoTdatasets )e CBO-IE generates 4better outputs than BO 7 better outputs than ALO 8better outputs than PSO 13 better outputs than GA and16 better outputs than K-Means in terms of F-Measure forentire healthcare IoT datasets

Figure 5 shows the better F-Measure of proposed CBO-IE against other approaches K-Means GA PSO ALO andBO for all IoT datasets

525 Root Mean Square Error (RMSE) )e RMSE is de-fined as a divergence between predicted values and exper-imental (calculated) values )e best clustering is achievedwith minimum RMSE values of datasets It is evaluated byutilizing the following equation

RMSE

1|H|

1113944

|H|

j1Vj minus 1113954Vj1113872 1113873

2

11139741113972

(20)

where Vj is jth element value in dataset and 1113954Vj is j

th elementpredicted value in dataset

Table 6 explains that the CBO-IE generates least RMSEvalue for all IoT datasets )e CBO-IE generates 8 betterresults than BO 78 better results than ALO 81 better

results than PSO 87 better results than GA and 95 betterresults than K-Means in terms of RMSE for entire healthcareIoT datasets

Figure 6 shows the better RMSE of proposed CBO-IEagainst other approaches K-Means GA PSO ALO and BOfor all IoT datasets

526 Accuracy It evaluates the division of clusters that areaccurate (ie it evaluates proportion of verdicts that areexact) and describes the portion of clusters in the prevailinggroup

002040608

1

Purit

y In

dex

Purity Index

Datasets

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 4 Purity index for all IoT datasets

Table 5 F-Measure for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 081 084 087 087 089 091Breast cancer 084 086 087 088 090 093Cryotherapy 082 086 088 087 091 093Liver patient 084 086 087 087 089 091Heart patient 078 082 084 083 085 088Chronic kidneydisease 084 086 088 087 089 091

Diabetic retinopathy 086 089 090 091 092 094Blood transfusion 074 077 079 082 085 087

002040608

1

DatasetsTh

orac

ic S

urge

ry

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d

F-M

easu

re

F-Measure

K-MeansGAPSO

ALOBOCBO-IE

Figure 5 F-Measure for all IoT datasets

Table 6 RMSE values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-

IE)oracic surgery 0537 0327 0157 0124 00124 00034Breast cancer 0567 0386 0168 0264 00317 00075Cryotherapy 0621 0374 0234 0364 00421 00066Liver patient 0437 0361 0261 0213 00652 00054Heart patient 0537 0412 0201 0186 00145 00063Chronic kidneydisease 0610 0422 0341 0267 00254 00047

Diabeticretinopathy 0517 0414 0367 0257 00374 00087

Blood transfusion 0438 0314 0120 0265 00451 00071

Scientific Programming 9

accuracy 1113936

Kz1 Rz

|H|lowast 100 (21)

Table 7 represents that the CBO-IE obtains higher ac-curacy value for all IoT datasets )e CBO-IE has 3 su-perior outputs than BO 5 superior outputs than ALO 6superior outputs than PSO 9 superior outputs than GAand 13 superior outputs than K-Means in terms of ac-curacy for entire healthcare IoT datasets

Figure 7 represents the better accuracy of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

)e Information Entropy is used with BO approach forclustering to provide accurate and efficient distribution ofdata points in dataset )e chaos theory is utilized to ini-tialize the population of BO to improve the searching ca-pability of habitat which helps in cluster member selectionprocess more precisely )ese two strategies InformationEntropy and chaos theory have improved the performanceof BO to provide the best selection of cluster heads and theirmembers optimally So CBO-IE generates superior resultsthan BO ALO PSO GA and K-Means clusteringapproaches

53 Running Time Complexity of the CBO-IE Approach)e number of executions is directly related to timecomplexity of clustering approaches to run )e

complexity is calculated by utilizing few circumstances asfollows PS is population size L is number of elements indataset Lx represents dimensions of particulars (ele-ments) K is number of cluster centroids and Mi repre-sents maximum iterations )e cost of every execution isassumed as 1 unit )e Cost of all Executions (CE) isobtained to utilize Algorithm 3 by the followingequations

001020304050607

RMSE

Val

ues

RMSE Values

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 6 RMSE for all IoT datasets

Table 7 Accuracy values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 080 083 086 086 088 090Breast cancer 083 085 086 087 089 092Cryotherapy 081 085 087 086 090 092Liver patient 083 085 086 086 088 090Heart patient 077 081 083 082 084 087Chronic kidney disease 083 085 087 086 088 090Diabetic retinopathy 085 088 089 090 091 093Blood transfusion 073 076 078 081 084 086

002040608

1

Accu

racy

()

Accuracy

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 7 Accuracy for all IoT datasets

10 Scientific Programming

ALGORITHM 3 )e complete CBO-IE approach

Scientific Programming 11

CE 1 + L + 1 + PS

+ Mi + 1 + Mi times PS

+ Mi times PS

+ Mi times 2 times PS

times Lx + 3 times PS

+ 11113872 1113873

+ Mi times K times PS

+ 11113872 1113873 + Mi times K times PS

+ Mi times K times PS

+ Mi times K times PS

+ Mi times K times L + Mi times K times L + Mi times K times L

(22)

CE 2 times Mi times PS

times Lx + 4 times Mi times K times PS

+ 3 times Mi times K times L + 5 times Mi times PS

+ Mi times K + L + PS

+ 2 times Mi + 3 (23)

Supposing that all circumstances are equal in equation(23) in worst case equation (24) is generated as follows

CE 9n3

+ 6n2

+ 4n + 3 (24)

)e running time complexities of clustering approachesare O(n3) for CBO-IE O(n3) for BO O(n3) for ACO O(n3)for PSO O(n3) for ALO and O(n2) for K-Menas in worstcase Hence all are executable in polynomial time

54 Statistical Examination Measurement A statistical ex-amination is functioned to obtain the extension of signifi-cance dissimilarities in the effectiveness of clusteringtechniques Here a nonparametric Friedman Examination(FE) is utilized to discover dissimilarities amid the group ofserial appropriate variables )e entire clustering techniquesprovide equally effective explaining in the null hypothesis(Y0)

)e Friedman Examination (FE) is formulized by

FE 12 times N

DS

AC

AC

+ 11113872 11138731113944

5

θ1AKθ( 1113857

2minus

AC

times AC

+ 11113872 11138732

4⎡⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎦ (25)

where NDS is number of datasets AKθ is average rank of θth

approach and AC is number of clustering approaches)e FE critical value is 204925 taken from F-distribution

table [41] with (AC-1) and (ACndash1)lowast(NDSndash1) freedom degreewhich is obtained between (6ndash1) 5 and (6-1)lowast(8-1) 35 forapplying 6 clustering approaches (AC 6) over 8 datasets(NDS 8) with λ 010 (confidence stage) )e calculated FEvalue is higher as compared to the critical value for nullhypothesis (Y0) rejection if Y0 is not accepted)e evaluatedFE value is 1714 for initiating 6 clustering approaches(AC 6) over 8 datasets (NDS 8) with λ 010 Hence thecalculated FE is higher as compared to the critical FE andthen Y0 is rejected So it is to be summarized that entireclustering approaches are not equally effective

)erefore a post hoc examination is performed usingHolm strategy)e proposed CBE-IE is analyzed statisticallyagainst other clustering approaches in this examinationFirstly the z value is obtained from equation (24) and after

that probability (p) is generated with the help of z value andnormal distribution table [42] At last pj value is analyzedwith (λ(AC minus j)) (Table 8)

Z AKj minus AKθ

δ (26)

where

δ

AC

AC

+ 11113872 1113873

6 times NDS

1113971

(27)

Table 8 illustrates that the (λ(AC minus j)) value is higher ascompared to pj value this indicates the hypothesis rejectionfor entire cases Hence the proposed CBO-IE is superior inclustering as compared to the K-Means GA PSO and BOapproaches according to the above analysis

6 Conclusion

Various fields like medicine education and industriesutilize data mining for their useful applications )egrouping of data is performed on data clustering which is aspecific task in data mining to examine the database effi-ciently In this work a Chaotic Biogeography-Based Opti-mization approach using Information Entropy (CBO-IE) isproposed to obtain data clusters for healthcare IoT datasets)e Information Entropy is introduced with a BO approachto generate a better distribution of data points in the datasetand chaos theory is utilized to initialize the population of BOapproach )e Information Entropy and chaos theory arecombined with BO approach to generate optimal clusterheads and cluster members with enhanced convergencespeed of BO in huge search area )eMATLAB 2021a tool isused to implement the CBO-IE for eight healthcare IoTdatasets and the outcomes describe the better quality effi-ciency of CBO-IE on the basis of F-Measure intraclusterdistance running time complexity purity index statisticalanalysis standard deviation root mean square error andaccuracy as compared to previous techniques of clusteringlike K-Means GA PSO ALO and BO approaches )eFriedman Examination and Holm strategy are introduced to

Table 8 Holm strategy outputs

J Approaches z value p value (λ(AC minus j)) Hypothesis1 K-Means minus63245 000001 0020 Rejected2 GA minus50596 000001 0025 Rejected3 PSO minus37947 00001 0033 Rejected4 ALO minus25298 00057 005 Rejected5 BO minus13649 00861 010 Rejected

12 Scientific Programming

perform statistical analysis of proposed CBO-IE againstprevious techniques of clustering like K-Means GA PSOALO and BO approaches which represent that the CBO-IEgenerates 90 accurate results In future the proposedtechnique will be anticipated to estimate and be authorizedwith huge databases under big data Additionally a cross-layer communication will be anticipated to be offered andlegalized in IoT structural design in the future

Data Availability

)e data that support the findings of this study are availableupon request from the corresponding author

Conflicts of Interest

)e authors declare that there are no conflicts of interest

References

[1] A Haque L Khan and M Baron ldquoSAND semi-supervisedadaptive novel class detection and classification over datastreamrdquo in Proceedings of the 13th AAAI Conference on Ar-tificial Intelligence (AAAI-16) pp 1652ndash1658 Phoenix AZUSA February 2016

[2] N Mishra H K Soni S Sharma and A K UpadhyayldquoDevelopment and analysis of artificial neural networkmodels for rainfall prediction by using time-series datardquo I JIntelligent Systems and Applications vol 1 pp 16ndash23 2018

[3] H N Dai R C W Wong H Wang Z Zheng andA V Vasilakos ldquoBig data analytics for large scale wirelessnetworks challenges and opportunitiesrdquo ACM vol 52pp 1ndash46 2019

[4] A H Sabry W Z W Hasan M N MohtarR M K R Ahmad and H R Harun ldquoPlanter pressurerepetability data analysis for health adult based on EMEDsystemrdquo Malaysian Journal of Fundamental and AppliedSciences vol 14 no 1 pp 96ndash101 2018

[5] A Mensikova and C A Mattmann ldquoEnsemble sentimentanalysis to identify human trafficking in web datardquo in Pro-ceedings of the ACM Workshop on Graph Techniques forAdversarial Activity Analytics (GTA) pp 1ndash6 ACM NewYork Ny USA 2018

[6] S S Gill and R Buyya ldquoBio-inspired algorithms for big dataanalytics a survey taxonomy and open challengesrdquo Big DataAnalytics for Intelligent Healthcare Management ElsevierAmsterdam Netherlands pp 1ndash17 2019

[7] A Verma I Kaur and N Arora ldquoComparative analysis ofinformatio extraction techniques for data miningrdquo IndianJournal of Science and Technology vol 9 no 11 pp 1ndash182016

[8] B R Manju A Joshuva and V Sugumaran ldquoA data miningstudy for condition monitoring on wind turbine blades usinghoeffding tree algorithm through statistical and histogramfeaturesrdquo International Journal of Mechanical Engineering ampTechnology vol 9 no 1 pp 1061ndash1079 2018

[9] I Aytekin E Eyduran K Karadas R Akshan and I KeskinldquoPrediction of fattening final live weight from some bodymeasurements and fattening period in young bulls ofcrossbred and exotic breeds using MARS data mining algo-rithmrdquo Journal of Zoology vol 50 no 1 pp 189ndash195 2018

[10] S Ahmad and R Varma ldquoInformation extraction from textmessages using data mining techniquesrdquo Malaya Journal ofMatematik vol 5 no 1 pp 26ndash29 2018

[11] N E Oweis M M Fouad S R Oweis S S Owais andV Snasel ldquoA novel mapreduce lift assiciation riule miningalgorithm (MRLAR) for big datardquo International Journal ofAdvanced Computer Science and Applications vol 7 no 8pp 1ndash7 2016

[12] C W Tsai C F Lai M C Chiang and L T Yang ldquoDatamining for internet of things a surveyrdquo CommunicationsSurveys and Tutorials IEEE vol 16 no 1 pp 77ndash97 2014

[13] F Chen P Deng J Wan D Zhang A V Vasilakos andX Rong ldquoData mining for Internet of things literature reviewand challengesrdquo International Journal of Distributed SensorNetworks vol 2015 Article ID 365372 14 pages 2015

[14] S B Patel M N Patel and S M Shah ldquoFEA-HUIM fast andefficient algorithm for high utility item-set mining using noveldata structure and pruning strategyrdquo National Conference onAdvanced Research Trends in Information and Technologies(NCARTICT) IJSSET vol 4 no 2 pp 1ndash7 2018

[15] S Bhaskaran R Marappan and B Santhi ldquoDesign andanalysis of a cluster-based intelligent hybrid recommendationsystem for E-learning applicationsrdquo Mathematics MDPIvol 9 no 107 pp 1ndash21 2021

[16] D Giordano M Mellia and T Cerquitelli ldquoK-MDTSCK-Multi-Dimensional time-series clustering algorithmrdquoElectronics MDPI vol 10 no 1166 pp 1ndash21 2021

[17] R Mothukuri and B B Rao ldquoCluster Analysis of cyber crimedata using Rrdquo International Journal of Computer Science andMobile Applications vol 6 no 2 pp 62ndash70 2018

[18] A N Onaizah ldquoA novel data stream clustering algorithm inhealthcare IoTrdquo International Journal of Research in AppliedNatural and Social Sciences (IJRANSS) vol 5 no 6 pp 51ndash602017

[19] R Krishnamurthi A Kumar D Gopinathan A Nayyar andB Qureshi ldquoAn overview of IoT sensor data processingfusion and analysis techniquesrdquo Sensors MDPI vol 20pp 1ndash23 6076

[20] F Alam R Mehmood I Katib and A Albeshri ldquoAnalysis ofeight data mining algorithms for smarter Internet of things(IoT)rdquo in Proceedings of the International Workshop on DataMining in IoT Systems (DaMIS) pp 437ndash442 ElsevierLondon UK September 2016

[21] A M C Souza and J R A Amazonas ldquoAn outlier detectalgorithm using big data processing and Internet of thingsarchitecturerdquo in Proceedings of the International Workshop onBig Data and Data Mining Challenges on IoT and PervasiveSystems (BigD2M) pp 1010ndash1015 Elsevier London UK June2015

[22] D Gil M Johnsson H Mora and J Szymanski ldquoReview ofthe complexity of managing big data of the Internet of thingsrdquoComplexity vol 2019 Article ID 4592902 12 pages 2019

[23] M S Mahdavinejad M Rezvan M Barekatain P AdibiP Barnaghi and A P Sheth ldquoMachine Learning for Internetof things data analysis a surveyrdquoDigital Communications andNetworks vol 4 pp 161ndash175 2018

[24] M Ruta F Scioscia G Loseto A Pinto and E D SciascioldquoMachine learning in the Internet of things a semantic-en-hanced approachrdquo Semantic Web vol 10 pp 1ndash21 2017

[25] K Demertzis K Rantos and G Drosatos ldquoA dynamic in-telligent policies analysis mechanism for personal data pro-cessing in the IoT ecosystemrdquo Big Data And CognitiveComputing vol 4 no 9 pp 1ndash16 2020

Scientific Programming 13

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming

Page 8: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

where Purity(Rz) is purity of zth cluster |Rz| is zth clusterlength |Rwz| is the number of data elements of wth classallocated to zth cluster

Table 4 explains that the CBO-IE generates maximumpurity index value for all IoT datasets )e CBO-IE obtains5 superior outputs than BO 7 superior outputs than

ALO 8 superior outputs than PSO 12 superior outputsthan GA and 15 superior outputs than K-Means in termsof purity index for entire healthcare IoT datasets

Figure 4 represents the better quality purity index ofproposed CBO-IE against other approaches K-Means GAPSO ALO and BO for all IoT datasets

524 F-Measure Firstly Precision (prsn) and Recall (rel)are evaluated to repossess the information by equations (16)and (17) After that both are combined to formalize the F-Measure (FM) by utilizing equations (18) and (19)

Table 3 Standard deviation for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 0116857 003984 0012036 001452 0007856 0000524Breast cancer 0002578 0168742 012542 015713 0003657 0000746Cryotherapy 00025143 0165875 0001254 0001674 0001036 0000845Liver patient 0256323 017589 0147856 012312 0096875 0000986Heart patient 0212678 0096852 0042574 0021563 0003687 0000251Chronic kidney disease 0020233 0103625 009854 0065214 0006574 0000865Diabetic retinopathy 0086574 0186523 0142657 0102647 0086478 00008657Blood transfusion 0213658 0196875 0185632 0152461 0010365 0000236

0123456

Datasets

Average Ranking

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sionIn

tra-c

luste

r Dist

ance

K-MeansGAPSO

ALOBOCBO-IE

Figure 2 Average ranking based on intracluster distance

0005

01015

02025

03

Stan

dard

Dev

iatio

n Standard Deviation

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 3 Standard deviation for all IoT datasets

Table 4 Purity index for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 083 086 089 089 091 093Breast cancer 086 088 090 089 093 095Cryotherapy 084 087 090 091 093 096Liver patient 086 087 088 086 090 092Heart patient 080 083 086 086 088 092Chronic kidneydisease 086 087 090 089 091 093

Diabetic retinopathy 088 090 091 091 093 095Blood transfusion 076 078 081 083 086 088

8 Scientific Programming

prsn(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rz

11138681113868111386811138681113868111386811138681113868

(16)

rel(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rw

11138681113868111386811138681113868111386811138681113868 (17)

FM(w z) 2 times prsn(w z) times rel(w z)

prsn(w z) + rel(w z) (18)

FM 1113944K

w1

Rw

11138681113868111386811138681113868111386811138681113868

|H|maximum FM(w z) (19)

where |Rw| is wth class lengthTable 5 describes that the CBO-IE obtains higher F-

Measure value for all IoTdatasets )e CBO-IE generates 4better outputs than BO 7 better outputs than ALO 8better outputs than PSO 13 better outputs than GA and16 better outputs than K-Means in terms of F-Measure forentire healthcare IoT datasets

Figure 5 shows the better F-Measure of proposed CBO-IE against other approaches K-Means GA PSO ALO andBO for all IoT datasets

525 Root Mean Square Error (RMSE) )e RMSE is de-fined as a divergence between predicted values and exper-imental (calculated) values )e best clustering is achievedwith minimum RMSE values of datasets It is evaluated byutilizing the following equation

RMSE

1|H|

1113944

|H|

j1Vj minus 1113954Vj1113872 1113873

2

11139741113972

(20)

where Vj is jth element value in dataset and 1113954Vj is j

th elementpredicted value in dataset

Table 6 explains that the CBO-IE generates least RMSEvalue for all IoT datasets )e CBO-IE generates 8 betterresults than BO 78 better results than ALO 81 better

results than PSO 87 better results than GA and 95 betterresults than K-Means in terms of RMSE for entire healthcareIoT datasets

Figure 6 shows the better RMSE of proposed CBO-IEagainst other approaches K-Means GA PSO ALO and BOfor all IoT datasets

526 Accuracy It evaluates the division of clusters that areaccurate (ie it evaluates proportion of verdicts that areexact) and describes the portion of clusters in the prevailinggroup

002040608

1

Purit

y In

dex

Purity Index

Datasets

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 4 Purity index for all IoT datasets

Table 5 F-Measure for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 081 084 087 087 089 091Breast cancer 084 086 087 088 090 093Cryotherapy 082 086 088 087 091 093Liver patient 084 086 087 087 089 091Heart patient 078 082 084 083 085 088Chronic kidneydisease 084 086 088 087 089 091

Diabetic retinopathy 086 089 090 091 092 094Blood transfusion 074 077 079 082 085 087

002040608

1

DatasetsTh

orac

ic S

urge

ry

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d

F-M

easu

re

F-Measure

K-MeansGAPSO

ALOBOCBO-IE

Figure 5 F-Measure for all IoT datasets

Table 6 RMSE values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-

IE)oracic surgery 0537 0327 0157 0124 00124 00034Breast cancer 0567 0386 0168 0264 00317 00075Cryotherapy 0621 0374 0234 0364 00421 00066Liver patient 0437 0361 0261 0213 00652 00054Heart patient 0537 0412 0201 0186 00145 00063Chronic kidneydisease 0610 0422 0341 0267 00254 00047

Diabeticretinopathy 0517 0414 0367 0257 00374 00087

Blood transfusion 0438 0314 0120 0265 00451 00071

Scientific Programming 9

accuracy 1113936

Kz1 Rz

|H|lowast 100 (21)

Table 7 represents that the CBO-IE obtains higher ac-curacy value for all IoT datasets )e CBO-IE has 3 su-perior outputs than BO 5 superior outputs than ALO 6superior outputs than PSO 9 superior outputs than GAand 13 superior outputs than K-Means in terms of ac-curacy for entire healthcare IoT datasets

Figure 7 represents the better accuracy of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

)e Information Entropy is used with BO approach forclustering to provide accurate and efficient distribution ofdata points in dataset )e chaos theory is utilized to ini-tialize the population of BO to improve the searching ca-pability of habitat which helps in cluster member selectionprocess more precisely )ese two strategies InformationEntropy and chaos theory have improved the performanceof BO to provide the best selection of cluster heads and theirmembers optimally So CBO-IE generates superior resultsthan BO ALO PSO GA and K-Means clusteringapproaches

53 Running Time Complexity of the CBO-IE Approach)e number of executions is directly related to timecomplexity of clustering approaches to run )e

complexity is calculated by utilizing few circumstances asfollows PS is population size L is number of elements indataset Lx represents dimensions of particulars (ele-ments) K is number of cluster centroids and Mi repre-sents maximum iterations )e cost of every execution isassumed as 1 unit )e Cost of all Executions (CE) isobtained to utilize Algorithm 3 by the followingequations

001020304050607

RMSE

Val

ues

RMSE Values

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 6 RMSE for all IoT datasets

Table 7 Accuracy values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 080 083 086 086 088 090Breast cancer 083 085 086 087 089 092Cryotherapy 081 085 087 086 090 092Liver patient 083 085 086 086 088 090Heart patient 077 081 083 082 084 087Chronic kidney disease 083 085 087 086 088 090Diabetic retinopathy 085 088 089 090 091 093Blood transfusion 073 076 078 081 084 086

002040608

1

Accu

racy

()

Accuracy

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 7 Accuracy for all IoT datasets

10 Scientific Programming

ALGORITHM 3 )e complete CBO-IE approach

Scientific Programming 11

CE 1 + L + 1 + PS

+ Mi + 1 + Mi times PS

+ Mi times PS

+ Mi times 2 times PS

times Lx + 3 times PS

+ 11113872 1113873

+ Mi times K times PS

+ 11113872 1113873 + Mi times K times PS

+ Mi times K times PS

+ Mi times K times PS

+ Mi times K times L + Mi times K times L + Mi times K times L

(22)

CE 2 times Mi times PS

times Lx + 4 times Mi times K times PS

+ 3 times Mi times K times L + 5 times Mi times PS

+ Mi times K + L + PS

+ 2 times Mi + 3 (23)

Supposing that all circumstances are equal in equation(23) in worst case equation (24) is generated as follows

CE 9n3

+ 6n2

+ 4n + 3 (24)

)e running time complexities of clustering approachesare O(n3) for CBO-IE O(n3) for BO O(n3) for ACO O(n3)for PSO O(n3) for ALO and O(n2) for K-Menas in worstcase Hence all are executable in polynomial time

54 Statistical Examination Measurement A statistical ex-amination is functioned to obtain the extension of signifi-cance dissimilarities in the effectiveness of clusteringtechniques Here a nonparametric Friedman Examination(FE) is utilized to discover dissimilarities amid the group ofserial appropriate variables )e entire clustering techniquesprovide equally effective explaining in the null hypothesis(Y0)

)e Friedman Examination (FE) is formulized by

FE 12 times N

DS

AC

AC

+ 11113872 11138731113944

5

θ1AKθ( 1113857

2minus

AC

times AC

+ 11113872 11138732

4⎡⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎦ (25)

where NDS is number of datasets AKθ is average rank of θth

approach and AC is number of clustering approaches)e FE critical value is 204925 taken from F-distribution

table [41] with (AC-1) and (ACndash1)lowast(NDSndash1) freedom degreewhich is obtained between (6ndash1) 5 and (6-1)lowast(8-1) 35 forapplying 6 clustering approaches (AC 6) over 8 datasets(NDS 8) with λ 010 (confidence stage) )e calculated FEvalue is higher as compared to the critical value for nullhypothesis (Y0) rejection if Y0 is not accepted)e evaluatedFE value is 1714 for initiating 6 clustering approaches(AC 6) over 8 datasets (NDS 8) with λ 010 Hence thecalculated FE is higher as compared to the critical FE andthen Y0 is rejected So it is to be summarized that entireclustering approaches are not equally effective

)erefore a post hoc examination is performed usingHolm strategy)e proposed CBE-IE is analyzed statisticallyagainst other clustering approaches in this examinationFirstly the z value is obtained from equation (24) and after

that probability (p) is generated with the help of z value andnormal distribution table [42] At last pj value is analyzedwith (λ(AC minus j)) (Table 8)

Z AKj minus AKθ

δ (26)

where

δ

AC

AC

+ 11113872 1113873

6 times NDS

1113971

(27)

Table 8 illustrates that the (λ(AC minus j)) value is higher ascompared to pj value this indicates the hypothesis rejectionfor entire cases Hence the proposed CBO-IE is superior inclustering as compared to the K-Means GA PSO and BOapproaches according to the above analysis

6 Conclusion

Various fields like medicine education and industriesutilize data mining for their useful applications )egrouping of data is performed on data clustering which is aspecific task in data mining to examine the database effi-ciently In this work a Chaotic Biogeography-Based Opti-mization approach using Information Entropy (CBO-IE) isproposed to obtain data clusters for healthcare IoT datasets)e Information Entropy is introduced with a BO approachto generate a better distribution of data points in the datasetand chaos theory is utilized to initialize the population of BOapproach )e Information Entropy and chaos theory arecombined with BO approach to generate optimal clusterheads and cluster members with enhanced convergencespeed of BO in huge search area )eMATLAB 2021a tool isused to implement the CBO-IE for eight healthcare IoTdatasets and the outcomes describe the better quality effi-ciency of CBO-IE on the basis of F-Measure intraclusterdistance running time complexity purity index statisticalanalysis standard deviation root mean square error andaccuracy as compared to previous techniques of clusteringlike K-Means GA PSO ALO and BO approaches )eFriedman Examination and Holm strategy are introduced to

Table 8 Holm strategy outputs

J Approaches z value p value (λ(AC minus j)) Hypothesis1 K-Means minus63245 000001 0020 Rejected2 GA minus50596 000001 0025 Rejected3 PSO minus37947 00001 0033 Rejected4 ALO minus25298 00057 005 Rejected5 BO minus13649 00861 010 Rejected

12 Scientific Programming

perform statistical analysis of proposed CBO-IE againstprevious techniques of clustering like K-Means GA PSOALO and BO approaches which represent that the CBO-IEgenerates 90 accurate results In future the proposedtechnique will be anticipated to estimate and be authorizedwith huge databases under big data Additionally a cross-layer communication will be anticipated to be offered andlegalized in IoT structural design in the future

Data Availability

)e data that support the findings of this study are availableupon request from the corresponding author

Conflicts of Interest

)e authors declare that there are no conflicts of interest

References

[1] A Haque L Khan and M Baron ldquoSAND semi-supervisedadaptive novel class detection and classification over datastreamrdquo in Proceedings of the 13th AAAI Conference on Ar-tificial Intelligence (AAAI-16) pp 1652ndash1658 Phoenix AZUSA February 2016

[2] N Mishra H K Soni S Sharma and A K UpadhyayldquoDevelopment and analysis of artificial neural networkmodels for rainfall prediction by using time-series datardquo I JIntelligent Systems and Applications vol 1 pp 16ndash23 2018

[3] H N Dai R C W Wong H Wang Z Zheng andA V Vasilakos ldquoBig data analytics for large scale wirelessnetworks challenges and opportunitiesrdquo ACM vol 52pp 1ndash46 2019

[4] A H Sabry W Z W Hasan M N MohtarR M K R Ahmad and H R Harun ldquoPlanter pressurerepetability data analysis for health adult based on EMEDsystemrdquo Malaysian Journal of Fundamental and AppliedSciences vol 14 no 1 pp 96ndash101 2018

[5] A Mensikova and C A Mattmann ldquoEnsemble sentimentanalysis to identify human trafficking in web datardquo in Pro-ceedings of the ACM Workshop on Graph Techniques forAdversarial Activity Analytics (GTA) pp 1ndash6 ACM NewYork Ny USA 2018

[6] S S Gill and R Buyya ldquoBio-inspired algorithms for big dataanalytics a survey taxonomy and open challengesrdquo Big DataAnalytics for Intelligent Healthcare Management ElsevierAmsterdam Netherlands pp 1ndash17 2019

[7] A Verma I Kaur and N Arora ldquoComparative analysis ofinformatio extraction techniques for data miningrdquo IndianJournal of Science and Technology vol 9 no 11 pp 1ndash182016

[8] B R Manju A Joshuva and V Sugumaran ldquoA data miningstudy for condition monitoring on wind turbine blades usinghoeffding tree algorithm through statistical and histogramfeaturesrdquo International Journal of Mechanical Engineering ampTechnology vol 9 no 1 pp 1061ndash1079 2018

[9] I Aytekin E Eyduran K Karadas R Akshan and I KeskinldquoPrediction of fattening final live weight from some bodymeasurements and fattening period in young bulls ofcrossbred and exotic breeds using MARS data mining algo-rithmrdquo Journal of Zoology vol 50 no 1 pp 189ndash195 2018

[10] S Ahmad and R Varma ldquoInformation extraction from textmessages using data mining techniquesrdquo Malaya Journal ofMatematik vol 5 no 1 pp 26ndash29 2018

[11] N E Oweis M M Fouad S R Oweis S S Owais andV Snasel ldquoA novel mapreduce lift assiciation riule miningalgorithm (MRLAR) for big datardquo International Journal ofAdvanced Computer Science and Applications vol 7 no 8pp 1ndash7 2016

[12] C W Tsai C F Lai M C Chiang and L T Yang ldquoDatamining for internet of things a surveyrdquo CommunicationsSurveys and Tutorials IEEE vol 16 no 1 pp 77ndash97 2014

[13] F Chen P Deng J Wan D Zhang A V Vasilakos andX Rong ldquoData mining for Internet of things literature reviewand challengesrdquo International Journal of Distributed SensorNetworks vol 2015 Article ID 365372 14 pages 2015

[14] S B Patel M N Patel and S M Shah ldquoFEA-HUIM fast andefficient algorithm for high utility item-set mining using noveldata structure and pruning strategyrdquo National Conference onAdvanced Research Trends in Information and Technologies(NCARTICT) IJSSET vol 4 no 2 pp 1ndash7 2018

[15] S Bhaskaran R Marappan and B Santhi ldquoDesign andanalysis of a cluster-based intelligent hybrid recommendationsystem for E-learning applicationsrdquo Mathematics MDPIvol 9 no 107 pp 1ndash21 2021

[16] D Giordano M Mellia and T Cerquitelli ldquoK-MDTSCK-Multi-Dimensional time-series clustering algorithmrdquoElectronics MDPI vol 10 no 1166 pp 1ndash21 2021

[17] R Mothukuri and B B Rao ldquoCluster Analysis of cyber crimedata using Rrdquo International Journal of Computer Science andMobile Applications vol 6 no 2 pp 62ndash70 2018

[18] A N Onaizah ldquoA novel data stream clustering algorithm inhealthcare IoTrdquo International Journal of Research in AppliedNatural and Social Sciences (IJRANSS) vol 5 no 6 pp 51ndash602017

[19] R Krishnamurthi A Kumar D Gopinathan A Nayyar andB Qureshi ldquoAn overview of IoT sensor data processingfusion and analysis techniquesrdquo Sensors MDPI vol 20pp 1ndash23 6076

[20] F Alam R Mehmood I Katib and A Albeshri ldquoAnalysis ofeight data mining algorithms for smarter Internet of things(IoT)rdquo in Proceedings of the International Workshop on DataMining in IoT Systems (DaMIS) pp 437ndash442 ElsevierLondon UK September 2016

[21] A M C Souza and J R A Amazonas ldquoAn outlier detectalgorithm using big data processing and Internet of thingsarchitecturerdquo in Proceedings of the International Workshop onBig Data and Data Mining Challenges on IoT and PervasiveSystems (BigD2M) pp 1010ndash1015 Elsevier London UK June2015

[22] D Gil M Johnsson H Mora and J Szymanski ldquoReview ofthe complexity of managing big data of the Internet of thingsrdquoComplexity vol 2019 Article ID 4592902 12 pages 2019

[23] M S Mahdavinejad M Rezvan M Barekatain P AdibiP Barnaghi and A P Sheth ldquoMachine Learning for Internetof things data analysis a surveyrdquoDigital Communications andNetworks vol 4 pp 161ndash175 2018

[24] M Ruta F Scioscia G Loseto A Pinto and E D SciascioldquoMachine learning in the Internet of things a semantic-en-hanced approachrdquo Semantic Web vol 10 pp 1ndash21 2017

[25] K Demertzis K Rantos and G Drosatos ldquoA dynamic in-telligent policies analysis mechanism for personal data pro-cessing in the IoT ecosystemrdquo Big Data And CognitiveComputing vol 4 no 9 pp 1ndash16 2020

Scientific Programming 13

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming

Page 9: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

prsn(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rz

11138681113868111386811138681113868111386811138681113868

(16)

rel(w z) Rwz

11138681113868111386811138681113868111386811138681113868

Rw

11138681113868111386811138681113868111386811138681113868 (17)

FM(w z) 2 times prsn(w z) times rel(w z)

prsn(w z) + rel(w z) (18)

FM 1113944K

w1

Rw

11138681113868111386811138681113868111386811138681113868

|H|maximum FM(w z) (19)

where |Rw| is wth class lengthTable 5 describes that the CBO-IE obtains higher F-

Measure value for all IoTdatasets )e CBO-IE generates 4better outputs than BO 7 better outputs than ALO 8better outputs than PSO 13 better outputs than GA and16 better outputs than K-Means in terms of F-Measure forentire healthcare IoT datasets

Figure 5 shows the better F-Measure of proposed CBO-IE against other approaches K-Means GA PSO ALO andBO for all IoT datasets

525 Root Mean Square Error (RMSE) )e RMSE is de-fined as a divergence between predicted values and exper-imental (calculated) values )e best clustering is achievedwith minimum RMSE values of datasets It is evaluated byutilizing the following equation

RMSE

1|H|

1113944

|H|

j1Vj minus 1113954Vj1113872 1113873

2

11139741113972

(20)

where Vj is jth element value in dataset and 1113954Vj is j

th elementpredicted value in dataset

Table 6 explains that the CBO-IE generates least RMSEvalue for all IoT datasets )e CBO-IE generates 8 betterresults than BO 78 better results than ALO 81 better

results than PSO 87 better results than GA and 95 betterresults than K-Means in terms of RMSE for entire healthcareIoT datasets

Figure 6 shows the better RMSE of proposed CBO-IEagainst other approaches K-Means GA PSO ALO and BOfor all IoT datasets

526 Accuracy It evaluates the division of clusters that areaccurate (ie it evaluates proportion of verdicts that areexact) and describes the portion of clusters in the prevailinggroup

002040608

1

Purit

y In

dex

Purity Index

Datasets

Thor

acic

Sur

gery

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 4 Purity index for all IoT datasets

Table 5 F-Measure for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 081 084 087 087 089 091Breast cancer 084 086 087 088 090 093Cryotherapy 082 086 088 087 091 093Liver patient 084 086 087 087 089 091Heart patient 078 082 084 083 085 088Chronic kidneydisease 084 086 088 087 089 091

Diabetic retinopathy 086 089 090 091 092 094Blood transfusion 074 077 079 082 085 087

002040608

1

DatasetsTh

orac

ic S

urge

ry

Brea

t Can

cer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d

F-M

easu

re

F-Measure

K-MeansGAPSO

ALOBOCBO-IE

Figure 5 F-Measure for all IoT datasets

Table 6 RMSE values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-

IE)oracic surgery 0537 0327 0157 0124 00124 00034Breast cancer 0567 0386 0168 0264 00317 00075Cryotherapy 0621 0374 0234 0364 00421 00066Liver patient 0437 0361 0261 0213 00652 00054Heart patient 0537 0412 0201 0186 00145 00063Chronic kidneydisease 0610 0422 0341 0267 00254 00047

Diabeticretinopathy 0517 0414 0367 0257 00374 00087

Blood transfusion 0438 0314 0120 0265 00451 00071

Scientific Programming 9

accuracy 1113936

Kz1 Rz

|H|lowast 100 (21)

Table 7 represents that the CBO-IE obtains higher ac-curacy value for all IoT datasets )e CBO-IE has 3 su-perior outputs than BO 5 superior outputs than ALO 6superior outputs than PSO 9 superior outputs than GAand 13 superior outputs than K-Means in terms of ac-curacy for entire healthcare IoT datasets

Figure 7 represents the better accuracy of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

)e Information Entropy is used with BO approach forclustering to provide accurate and efficient distribution ofdata points in dataset )e chaos theory is utilized to ini-tialize the population of BO to improve the searching ca-pability of habitat which helps in cluster member selectionprocess more precisely )ese two strategies InformationEntropy and chaos theory have improved the performanceof BO to provide the best selection of cluster heads and theirmembers optimally So CBO-IE generates superior resultsthan BO ALO PSO GA and K-Means clusteringapproaches

53 Running Time Complexity of the CBO-IE Approach)e number of executions is directly related to timecomplexity of clustering approaches to run )e

complexity is calculated by utilizing few circumstances asfollows PS is population size L is number of elements indataset Lx represents dimensions of particulars (ele-ments) K is number of cluster centroids and Mi repre-sents maximum iterations )e cost of every execution isassumed as 1 unit )e Cost of all Executions (CE) isobtained to utilize Algorithm 3 by the followingequations

001020304050607

RMSE

Val

ues

RMSE Values

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 6 RMSE for all IoT datasets

Table 7 Accuracy values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 080 083 086 086 088 090Breast cancer 083 085 086 087 089 092Cryotherapy 081 085 087 086 090 092Liver patient 083 085 086 086 088 090Heart patient 077 081 083 082 084 087Chronic kidney disease 083 085 087 086 088 090Diabetic retinopathy 085 088 089 090 091 093Blood transfusion 073 076 078 081 084 086

002040608

1

Accu

racy

()

Accuracy

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 7 Accuracy for all IoT datasets

10 Scientific Programming

ALGORITHM 3 )e complete CBO-IE approach

Scientific Programming 11

CE 1 + L + 1 + PS

+ Mi + 1 + Mi times PS

+ Mi times PS

+ Mi times 2 times PS

times Lx + 3 times PS

+ 11113872 1113873

+ Mi times K times PS

+ 11113872 1113873 + Mi times K times PS

+ Mi times K times PS

+ Mi times K times PS

+ Mi times K times L + Mi times K times L + Mi times K times L

(22)

CE 2 times Mi times PS

times Lx + 4 times Mi times K times PS

+ 3 times Mi times K times L + 5 times Mi times PS

+ Mi times K + L + PS

+ 2 times Mi + 3 (23)

Supposing that all circumstances are equal in equation(23) in worst case equation (24) is generated as follows

CE 9n3

+ 6n2

+ 4n + 3 (24)

)e running time complexities of clustering approachesare O(n3) for CBO-IE O(n3) for BO O(n3) for ACO O(n3)for PSO O(n3) for ALO and O(n2) for K-Menas in worstcase Hence all are executable in polynomial time

54 Statistical Examination Measurement A statistical ex-amination is functioned to obtain the extension of signifi-cance dissimilarities in the effectiveness of clusteringtechniques Here a nonparametric Friedman Examination(FE) is utilized to discover dissimilarities amid the group ofserial appropriate variables )e entire clustering techniquesprovide equally effective explaining in the null hypothesis(Y0)

)e Friedman Examination (FE) is formulized by

FE 12 times N

DS

AC

AC

+ 11113872 11138731113944

5

θ1AKθ( 1113857

2minus

AC

times AC

+ 11113872 11138732

4⎡⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎦ (25)

where NDS is number of datasets AKθ is average rank of θth

approach and AC is number of clustering approaches)e FE critical value is 204925 taken from F-distribution

table [41] with (AC-1) and (ACndash1)lowast(NDSndash1) freedom degreewhich is obtained between (6ndash1) 5 and (6-1)lowast(8-1) 35 forapplying 6 clustering approaches (AC 6) over 8 datasets(NDS 8) with λ 010 (confidence stage) )e calculated FEvalue is higher as compared to the critical value for nullhypothesis (Y0) rejection if Y0 is not accepted)e evaluatedFE value is 1714 for initiating 6 clustering approaches(AC 6) over 8 datasets (NDS 8) with λ 010 Hence thecalculated FE is higher as compared to the critical FE andthen Y0 is rejected So it is to be summarized that entireclustering approaches are not equally effective

)erefore a post hoc examination is performed usingHolm strategy)e proposed CBE-IE is analyzed statisticallyagainst other clustering approaches in this examinationFirstly the z value is obtained from equation (24) and after

that probability (p) is generated with the help of z value andnormal distribution table [42] At last pj value is analyzedwith (λ(AC minus j)) (Table 8)

Z AKj minus AKθ

δ (26)

where

δ

AC

AC

+ 11113872 1113873

6 times NDS

1113971

(27)

Table 8 illustrates that the (λ(AC minus j)) value is higher ascompared to pj value this indicates the hypothesis rejectionfor entire cases Hence the proposed CBO-IE is superior inclustering as compared to the K-Means GA PSO and BOapproaches according to the above analysis

6 Conclusion

Various fields like medicine education and industriesutilize data mining for their useful applications )egrouping of data is performed on data clustering which is aspecific task in data mining to examine the database effi-ciently In this work a Chaotic Biogeography-Based Opti-mization approach using Information Entropy (CBO-IE) isproposed to obtain data clusters for healthcare IoT datasets)e Information Entropy is introduced with a BO approachto generate a better distribution of data points in the datasetand chaos theory is utilized to initialize the population of BOapproach )e Information Entropy and chaos theory arecombined with BO approach to generate optimal clusterheads and cluster members with enhanced convergencespeed of BO in huge search area )eMATLAB 2021a tool isused to implement the CBO-IE for eight healthcare IoTdatasets and the outcomes describe the better quality effi-ciency of CBO-IE on the basis of F-Measure intraclusterdistance running time complexity purity index statisticalanalysis standard deviation root mean square error andaccuracy as compared to previous techniques of clusteringlike K-Means GA PSO ALO and BO approaches )eFriedman Examination and Holm strategy are introduced to

Table 8 Holm strategy outputs

J Approaches z value p value (λ(AC minus j)) Hypothesis1 K-Means minus63245 000001 0020 Rejected2 GA minus50596 000001 0025 Rejected3 PSO minus37947 00001 0033 Rejected4 ALO minus25298 00057 005 Rejected5 BO minus13649 00861 010 Rejected

12 Scientific Programming

perform statistical analysis of proposed CBO-IE againstprevious techniques of clustering like K-Means GA PSOALO and BO approaches which represent that the CBO-IEgenerates 90 accurate results In future the proposedtechnique will be anticipated to estimate and be authorizedwith huge databases under big data Additionally a cross-layer communication will be anticipated to be offered andlegalized in IoT structural design in the future

Data Availability

)e data that support the findings of this study are availableupon request from the corresponding author

Conflicts of Interest

)e authors declare that there are no conflicts of interest

References

[1] A Haque L Khan and M Baron ldquoSAND semi-supervisedadaptive novel class detection and classification over datastreamrdquo in Proceedings of the 13th AAAI Conference on Ar-tificial Intelligence (AAAI-16) pp 1652ndash1658 Phoenix AZUSA February 2016

[2] N Mishra H K Soni S Sharma and A K UpadhyayldquoDevelopment and analysis of artificial neural networkmodels for rainfall prediction by using time-series datardquo I JIntelligent Systems and Applications vol 1 pp 16ndash23 2018

[3] H N Dai R C W Wong H Wang Z Zheng andA V Vasilakos ldquoBig data analytics for large scale wirelessnetworks challenges and opportunitiesrdquo ACM vol 52pp 1ndash46 2019

[4] A H Sabry W Z W Hasan M N MohtarR M K R Ahmad and H R Harun ldquoPlanter pressurerepetability data analysis for health adult based on EMEDsystemrdquo Malaysian Journal of Fundamental and AppliedSciences vol 14 no 1 pp 96ndash101 2018

[5] A Mensikova and C A Mattmann ldquoEnsemble sentimentanalysis to identify human trafficking in web datardquo in Pro-ceedings of the ACM Workshop on Graph Techniques forAdversarial Activity Analytics (GTA) pp 1ndash6 ACM NewYork Ny USA 2018

[6] S S Gill and R Buyya ldquoBio-inspired algorithms for big dataanalytics a survey taxonomy and open challengesrdquo Big DataAnalytics for Intelligent Healthcare Management ElsevierAmsterdam Netherlands pp 1ndash17 2019

[7] A Verma I Kaur and N Arora ldquoComparative analysis ofinformatio extraction techniques for data miningrdquo IndianJournal of Science and Technology vol 9 no 11 pp 1ndash182016

[8] B R Manju A Joshuva and V Sugumaran ldquoA data miningstudy for condition monitoring on wind turbine blades usinghoeffding tree algorithm through statistical and histogramfeaturesrdquo International Journal of Mechanical Engineering ampTechnology vol 9 no 1 pp 1061ndash1079 2018

[9] I Aytekin E Eyduran K Karadas R Akshan and I KeskinldquoPrediction of fattening final live weight from some bodymeasurements and fattening period in young bulls ofcrossbred and exotic breeds using MARS data mining algo-rithmrdquo Journal of Zoology vol 50 no 1 pp 189ndash195 2018

[10] S Ahmad and R Varma ldquoInformation extraction from textmessages using data mining techniquesrdquo Malaya Journal ofMatematik vol 5 no 1 pp 26ndash29 2018

[11] N E Oweis M M Fouad S R Oweis S S Owais andV Snasel ldquoA novel mapreduce lift assiciation riule miningalgorithm (MRLAR) for big datardquo International Journal ofAdvanced Computer Science and Applications vol 7 no 8pp 1ndash7 2016

[12] C W Tsai C F Lai M C Chiang and L T Yang ldquoDatamining for internet of things a surveyrdquo CommunicationsSurveys and Tutorials IEEE vol 16 no 1 pp 77ndash97 2014

[13] F Chen P Deng J Wan D Zhang A V Vasilakos andX Rong ldquoData mining for Internet of things literature reviewand challengesrdquo International Journal of Distributed SensorNetworks vol 2015 Article ID 365372 14 pages 2015

[14] S B Patel M N Patel and S M Shah ldquoFEA-HUIM fast andefficient algorithm for high utility item-set mining using noveldata structure and pruning strategyrdquo National Conference onAdvanced Research Trends in Information and Technologies(NCARTICT) IJSSET vol 4 no 2 pp 1ndash7 2018

[15] S Bhaskaran R Marappan and B Santhi ldquoDesign andanalysis of a cluster-based intelligent hybrid recommendationsystem for E-learning applicationsrdquo Mathematics MDPIvol 9 no 107 pp 1ndash21 2021

[16] D Giordano M Mellia and T Cerquitelli ldquoK-MDTSCK-Multi-Dimensional time-series clustering algorithmrdquoElectronics MDPI vol 10 no 1166 pp 1ndash21 2021

[17] R Mothukuri and B B Rao ldquoCluster Analysis of cyber crimedata using Rrdquo International Journal of Computer Science andMobile Applications vol 6 no 2 pp 62ndash70 2018

[18] A N Onaizah ldquoA novel data stream clustering algorithm inhealthcare IoTrdquo International Journal of Research in AppliedNatural and Social Sciences (IJRANSS) vol 5 no 6 pp 51ndash602017

[19] R Krishnamurthi A Kumar D Gopinathan A Nayyar andB Qureshi ldquoAn overview of IoT sensor data processingfusion and analysis techniquesrdquo Sensors MDPI vol 20pp 1ndash23 6076

[20] F Alam R Mehmood I Katib and A Albeshri ldquoAnalysis ofeight data mining algorithms for smarter Internet of things(IoT)rdquo in Proceedings of the International Workshop on DataMining in IoT Systems (DaMIS) pp 437ndash442 ElsevierLondon UK September 2016

[21] A M C Souza and J R A Amazonas ldquoAn outlier detectalgorithm using big data processing and Internet of thingsarchitecturerdquo in Proceedings of the International Workshop onBig Data and Data Mining Challenges on IoT and PervasiveSystems (BigD2M) pp 1010ndash1015 Elsevier London UK June2015

[22] D Gil M Johnsson H Mora and J Szymanski ldquoReview ofthe complexity of managing big data of the Internet of thingsrdquoComplexity vol 2019 Article ID 4592902 12 pages 2019

[23] M S Mahdavinejad M Rezvan M Barekatain P AdibiP Barnaghi and A P Sheth ldquoMachine Learning for Internetof things data analysis a surveyrdquoDigital Communications andNetworks vol 4 pp 161ndash175 2018

[24] M Ruta F Scioscia G Loseto A Pinto and E D SciascioldquoMachine learning in the Internet of things a semantic-en-hanced approachrdquo Semantic Web vol 10 pp 1ndash21 2017

[25] K Demertzis K Rantos and G Drosatos ldquoA dynamic in-telligent policies analysis mechanism for personal data pro-cessing in the IoT ecosystemrdquo Big Data And CognitiveComputing vol 4 no 9 pp 1ndash16 2020

Scientific Programming 13

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming

Page 10: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

accuracy 1113936

Kz1 Rz

|H|lowast 100 (21)

Table 7 represents that the CBO-IE obtains higher ac-curacy value for all IoT datasets )e CBO-IE has 3 su-perior outputs than BO 5 superior outputs than ALO 6superior outputs than PSO 9 superior outputs than GAand 13 superior outputs than K-Means in terms of ac-curacy for entire healthcare IoT datasets

Figure 7 represents the better accuracy of proposedCBO-IE against other approaches K-Means GA PSO ALOand BO for all IoT datasets

)e Information Entropy is used with BO approach forclustering to provide accurate and efficient distribution ofdata points in dataset )e chaos theory is utilized to ini-tialize the population of BO to improve the searching ca-pability of habitat which helps in cluster member selectionprocess more precisely )ese two strategies InformationEntropy and chaos theory have improved the performanceof BO to provide the best selection of cluster heads and theirmembers optimally So CBO-IE generates superior resultsthan BO ALO PSO GA and K-Means clusteringapproaches

53 Running Time Complexity of the CBO-IE Approach)e number of executions is directly related to timecomplexity of clustering approaches to run )e

complexity is calculated by utilizing few circumstances asfollows PS is population size L is number of elements indataset Lx represents dimensions of particulars (ele-ments) K is number of cluster centroids and Mi repre-sents maximum iterations )e cost of every execution isassumed as 1 unit )e Cost of all Executions (CE) isobtained to utilize Algorithm 3 by the followingequations

001020304050607

RMSE

Val

ues

RMSE Values

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 6 RMSE for all IoT datasets

Table 7 Accuracy values for entire IoT datasets

Dataset K-Means GA PSO ALO BO CBO-IE)oracic surgery 080 083 086 086 088 090Breast cancer 083 085 086 087 089 092Cryotherapy 081 085 087 086 090 092Liver patient 083 085 086 086 088 090Heart patient 077 081 083 082 084 087Chronic kidney disease 083 085 087 086 088 090Diabetic retinopathy 085 088 089 090 091 093Blood transfusion 073 076 078 081 084 086

002040608

1

Accu

racy

()

Accuracy

Datasets

Thor

acic

Sur

gery

Brea

st Ca

ncer

Cryo

ther

apy

Live

r Pat

ient

Hea

rt P

atie

nts

Chro

nic K

idne

y

Dia

betic

Bloo

d Tr

ansfu

sion

K-MeansGAPSO

ALOBOCBO-IE

Figure 7 Accuracy for all IoT datasets

10 Scientific Programming

ALGORITHM 3 )e complete CBO-IE approach

Scientific Programming 11

CE 1 + L + 1 + PS

+ Mi + 1 + Mi times PS

+ Mi times PS

+ Mi times 2 times PS

times Lx + 3 times PS

+ 11113872 1113873

+ Mi times K times PS

+ 11113872 1113873 + Mi times K times PS

+ Mi times K times PS

+ Mi times K times PS

+ Mi times K times L + Mi times K times L + Mi times K times L

(22)

CE 2 times Mi times PS

times Lx + 4 times Mi times K times PS

+ 3 times Mi times K times L + 5 times Mi times PS

+ Mi times K + L + PS

+ 2 times Mi + 3 (23)

Supposing that all circumstances are equal in equation(23) in worst case equation (24) is generated as follows

CE 9n3

+ 6n2

+ 4n + 3 (24)

)e running time complexities of clustering approachesare O(n3) for CBO-IE O(n3) for BO O(n3) for ACO O(n3)for PSO O(n3) for ALO and O(n2) for K-Menas in worstcase Hence all are executable in polynomial time

54 Statistical Examination Measurement A statistical ex-amination is functioned to obtain the extension of signifi-cance dissimilarities in the effectiveness of clusteringtechniques Here a nonparametric Friedman Examination(FE) is utilized to discover dissimilarities amid the group ofserial appropriate variables )e entire clustering techniquesprovide equally effective explaining in the null hypothesis(Y0)

)e Friedman Examination (FE) is formulized by

FE 12 times N

DS

AC

AC

+ 11113872 11138731113944

5

θ1AKθ( 1113857

2minus

AC

times AC

+ 11113872 11138732

4⎡⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎦ (25)

where NDS is number of datasets AKθ is average rank of θth

approach and AC is number of clustering approaches)e FE critical value is 204925 taken from F-distribution

table [41] with (AC-1) and (ACndash1)lowast(NDSndash1) freedom degreewhich is obtained between (6ndash1) 5 and (6-1)lowast(8-1) 35 forapplying 6 clustering approaches (AC 6) over 8 datasets(NDS 8) with λ 010 (confidence stage) )e calculated FEvalue is higher as compared to the critical value for nullhypothesis (Y0) rejection if Y0 is not accepted)e evaluatedFE value is 1714 for initiating 6 clustering approaches(AC 6) over 8 datasets (NDS 8) with λ 010 Hence thecalculated FE is higher as compared to the critical FE andthen Y0 is rejected So it is to be summarized that entireclustering approaches are not equally effective

)erefore a post hoc examination is performed usingHolm strategy)e proposed CBE-IE is analyzed statisticallyagainst other clustering approaches in this examinationFirstly the z value is obtained from equation (24) and after

that probability (p) is generated with the help of z value andnormal distribution table [42] At last pj value is analyzedwith (λ(AC minus j)) (Table 8)

Z AKj minus AKθ

δ (26)

where

δ

AC

AC

+ 11113872 1113873

6 times NDS

1113971

(27)

Table 8 illustrates that the (λ(AC minus j)) value is higher ascompared to pj value this indicates the hypothesis rejectionfor entire cases Hence the proposed CBO-IE is superior inclustering as compared to the K-Means GA PSO and BOapproaches according to the above analysis

6 Conclusion

Various fields like medicine education and industriesutilize data mining for their useful applications )egrouping of data is performed on data clustering which is aspecific task in data mining to examine the database effi-ciently In this work a Chaotic Biogeography-Based Opti-mization approach using Information Entropy (CBO-IE) isproposed to obtain data clusters for healthcare IoT datasets)e Information Entropy is introduced with a BO approachto generate a better distribution of data points in the datasetand chaos theory is utilized to initialize the population of BOapproach )e Information Entropy and chaos theory arecombined with BO approach to generate optimal clusterheads and cluster members with enhanced convergencespeed of BO in huge search area )eMATLAB 2021a tool isused to implement the CBO-IE for eight healthcare IoTdatasets and the outcomes describe the better quality effi-ciency of CBO-IE on the basis of F-Measure intraclusterdistance running time complexity purity index statisticalanalysis standard deviation root mean square error andaccuracy as compared to previous techniques of clusteringlike K-Means GA PSO ALO and BO approaches )eFriedman Examination and Holm strategy are introduced to

Table 8 Holm strategy outputs

J Approaches z value p value (λ(AC minus j)) Hypothesis1 K-Means minus63245 000001 0020 Rejected2 GA minus50596 000001 0025 Rejected3 PSO minus37947 00001 0033 Rejected4 ALO minus25298 00057 005 Rejected5 BO minus13649 00861 010 Rejected

12 Scientific Programming

perform statistical analysis of proposed CBO-IE againstprevious techniques of clustering like K-Means GA PSOALO and BO approaches which represent that the CBO-IEgenerates 90 accurate results In future the proposedtechnique will be anticipated to estimate and be authorizedwith huge databases under big data Additionally a cross-layer communication will be anticipated to be offered andlegalized in IoT structural design in the future

Data Availability

)e data that support the findings of this study are availableupon request from the corresponding author

Conflicts of Interest

)e authors declare that there are no conflicts of interest

References

[1] A Haque L Khan and M Baron ldquoSAND semi-supervisedadaptive novel class detection and classification over datastreamrdquo in Proceedings of the 13th AAAI Conference on Ar-tificial Intelligence (AAAI-16) pp 1652ndash1658 Phoenix AZUSA February 2016

[2] N Mishra H K Soni S Sharma and A K UpadhyayldquoDevelopment and analysis of artificial neural networkmodels for rainfall prediction by using time-series datardquo I JIntelligent Systems and Applications vol 1 pp 16ndash23 2018

[3] H N Dai R C W Wong H Wang Z Zheng andA V Vasilakos ldquoBig data analytics for large scale wirelessnetworks challenges and opportunitiesrdquo ACM vol 52pp 1ndash46 2019

[4] A H Sabry W Z W Hasan M N MohtarR M K R Ahmad and H R Harun ldquoPlanter pressurerepetability data analysis for health adult based on EMEDsystemrdquo Malaysian Journal of Fundamental and AppliedSciences vol 14 no 1 pp 96ndash101 2018

[5] A Mensikova and C A Mattmann ldquoEnsemble sentimentanalysis to identify human trafficking in web datardquo in Pro-ceedings of the ACM Workshop on Graph Techniques forAdversarial Activity Analytics (GTA) pp 1ndash6 ACM NewYork Ny USA 2018

[6] S S Gill and R Buyya ldquoBio-inspired algorithms for big dataanalytics a survey taxonomy and open challengesrdquo Big DataAnalytics for Intelligent Healthcare Management ElsevierAmsterdam Netherlands pp 1ndash17 2019

[7] A Verma I Kaur and N Arora ldquoComparative analysis ofinformatio extraction techniques for data miningrdquo IndianJournal of Science and Technology vol 9 no 11 pp 1ndash182016

[8] B R Manju A Joshuva and V Sugumaran ldquoA data miningstudy for condition monitoring on wind turbine blades usinghoeffding tree algorithm through statistical and histogramfeaturesrdquo International Journal of Mechanical Engineering ampTechnology vol 9 no 1 pp 1061ndash1079 2018

[9] I Aytekin E Eyduran K Karadas R Akshan and I KeskinldquoPrediction of fattening final live weight from some bodymeasurements and fattening period in young bulls ofcrossbred and exotic breeds using MARS data mining algo-rithmrdquo Journal of Zoology vol 50 no 1 pp 189ndash195 2018

[10] S Ahmad and R Varma ldquoInformation extraction from textmessages using data mining techniquesrdquo Malaya Journal ofMatematik vol 5 no 1 pp 26ndash29 2018

[11] N E Oweis M M Fouad S R Oweis S S Owais andV Snasel ldquoA novel mapreduce lift assiciation riule miningalgorithm (MRLAR) for big datardquo International Journal ofAdvanced Computer Science and Applications vol 7 no 8pp 1ndash7 2016

[12] C W Tsai C F Lai M C Chiang and L T Yang ldquoDatamining for internet of things a surveyrdquo CommunicationsSurveys and Tutorials IEEE vol 16 no 1 pp 77ndash97 2014

[13] F Chen P Deng J Wan D Zhang A V Vasilakos andX Rong ldquoData mining for Internet of things literature reviewand challengesrdquo International Journal of Distributed SensorNetworks vol 2015 Article ID 365372 14 pages 2015

[14] S B Patel M N Patel and S M Shah ldquoFEA-HUIM fast andefficient algorithm for high utility item-set mining using noveldata structure and pruning strategyrdquo National Conference onAdvanced Research Trends in Information and Technologies(NCARTICT) IJSSET vol 4 no 2 pp 1ndash7 2018

[15] S Bhaskaran R Marappan and B Santhi ldquoDesign andanalysis of a cluster-based intelligent hybrid recommendationsystem for E-learning applicationsrdquo Mathematics MDPIvol 9 no 107 pp 1ndash21 2021

[16] D Giordano M Mellia and T Cerquitelli ldquoK-MDTSCK-Multi-Dimensional time-series clustering algorithmrdquoElectronics MDPI vol 10 no 1166 pp 1ndash21 2021

[17] R Mothukuri and B B Rao ldquoCluster Analysis of cyber crimedata using Rrdquo International Journal of Computer Science andMobile Applications vol 6 no 2 pp 62ndash70 2018

[18] A N Onaizah ldquoA novel data stream clustering algorithm inhealthcare IoTrdquo International Journal of Research in AppliedNatural and Social Sciences (IJRANSS) vol 5 no 6 pp 51ndash602017

[19] R Krishnamurthi A Kumar D Gopinathan A Nayyar andB Qureshi ldquoAn overview of IoT sensor data processingfusion and analysis techniquesrdquo Sensors MDPI vol 20pp 1ndash23 6076

[20] F Alam R Mehmood I Katib and A Albeshri ldquoAnalysis ofeight data mining algorithms for smarter Internet of things(IoT)rdquo in Proceedings of the International Workshop on DataMining in IoT Systems (DaMIS) pp 437ndash442 ElsevierLondon UK September 2016

[21] A M C Souza and J R A Amazonas ldquoAn outlier detectalgorithm using big data processing and Internet of thingsarchitecturerdquo in Proceedings of the International Workshop onBig Data and Data Mining Challenges on IoT and PervasiveSystems (BigD2M) pp 1010ndash1015 Elsevier London UK June2015

[22] D Gil M Johnsson H Mora and J Szymanski ldquoReview ofthe complexity of managing big data of the Internet of thingsrdquoComplexity vol 2019 Article ID 4592902 12 pages 2019

[23] M S Mahdavinejad M Rezvan M Barekatain P AdibiP Barnaghi and A P Sheth ldquoMachine Learning for Internetof things data analysis a surveyrdquoDigital Communications andNetworks vol 4 pp 161ndash175 2018

[24] M Ruta F Scioscia G Loseto A Pinto and E D SciascioldquoMachine learning in the Internet of things a semantic-en-hanced approachrdquo Semantic Web vol 10 pp 1ndash21 2017

[25] K Demertzis K Rantos and G Drosatos ldquoA dynamic in-telligent policies analysis mechanism for personal data pro-cessing in the IoT ecosystemrdquo Big Data And CognitiveComputing vol 4 no 9 pp 1ndash16 2020

Scientific Programming 13

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming

Page 11: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

ALGORITHM 3 )e complete CBO-IE approach

Scientific Programming 11

CE 1 + L + 1 + PS

+ Mi + 1 + Mi times PS

+ Mi times PS

+ Mi times 2 times PS

times Lx + 3 times PS

+ 11113872 1113873

+ Mi times K times PS

+ 11113872 1113873 + Mi times K times PS

+ Mi times K times PS

+ Mi times K times PS

+ Mi times K times L + Mi times K times L + Mi times K times L

(22)

CE 2 times Mi times PS

times Lx + 4 times Mi times K times PS

+ 3 times Mi times K times L + 5 times Mi times PS

+ Mi times K + L + PS

+ 2 times Mi + 3 (23)

Supposing that all circumstances are equal in equation(23) in worst case equation (24) is generated as follows

CE 9n3

+ 6n2

+ 4n + 3 (24)

)e running time complexities of clustering approachesare O(n3) for CBO-IE O(n3) for BO O(n3) for ACO O(n3)for PSO O(n3) for ALO and O(n2) for K-Menas in worstcase Hence all are executable in polynomial time

54 Statistical Examination Measurement A statistical ex-amination is functioned to obtain the extension of signifi-cance dissimilarities in the effectiveness of clusteringtechniques Here a nonparametric Friedman Examination(FE) is utilized to discover dissimilarities amid the group ofserial appropriate variables )e entire clustering techniquesprovide equally effective explaining in the null hypothesis(Y0)

)e Friedman Examination (FE) is formulized by

FE 12 times N

DS

AC

AC

+ 11113872 11138731113944

5

θ1AKθ( 1113857

2minus

AC

times AC

+ 11113872 11138732

4⎡⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎦ (25)

where NDS is number of datasets AKθ is average rank of θth

approach and AC is number of clustering approaches)e FE critical value is 204925 taken from F-distribution

table [41] with (AC-1) and (ACndash1)lowast(NDSndash1) freedom degreewhich is obtained between (6ndash1) 5 and (6-1)lowast(8-1) 35 forapplying 6 clustering approaches (AC 6) over 8 datasets(NDS 8) with λ 010 (confidence stage) )e calculated FEvalue is higher as compared to the critical value for nullhypothesis (Y0) rejection if Y0 is not accepted)e evaluatedFE value is 1714 for initiating 6 clustering approaches(AC 6) over 8 datasets (NDS 8) with λ 010 Hence thecalculated FE is higher as compared to the critical FE andthen Y0 is rejected So it is to be summarized that entireclustering approaches are not equally effective

)erefore a post hoc examination is performed usingHolm strategy)e proposed CBE-IE is analyzed statisticallyagainst other clustering approaches in this examinationFirstly the z value is obtained from equation (24) and after

that probability (p) is generated with the help of z value andnormal distribution table [42] At last pj value is analyzedwith (λ(AC minus j)) (Table 8)

Z AKj minus AKθ

δ (26)

where

δ

AC

AC

+ 11113872 1113873

6 times NDS

1113971

(27)

Table 8 illustrates that the (λ(AC minus j)) value is higher ascompared to pj value this indicates the hypothesis rejectionfor entire cases Hence the proposed CBO-IE is superior inclustering as compared to the K-Means GA PSO and BOapproaches according to the above analysis

6 Conclusion

Various fields like medicine education and industriesutilize data mining for their useful applications )egrouping of data is performed on data clustering which is aspecific task in data mining to examine the database effi-ciently In this work a Chaotic Biogeography-Based Opti-mization approach using Information Entropy (CBO-IE) isproposed to obtain data clusters for healthcare IoT datasets)e Information Entropy is introduced with a BO approachto generate a better distribution of data points in the datasetand chaos theory is utilized to initialize the population of BOapproach )e Information Entropy and chaos theory arecombined with BO approach to generate optimal clusterheads and cluster members with enhanced convergencespeed of BO in huge search area )eMATLAB 2021a tool isused to implement the CBO-IE for eight healthcare IoTdatasets and the outcomes describe the better quality effi-ciency of CBO-IE on the basis of F-Measure intraclusterdistance running time complexity purity index statisticalanalysis standard deviation root mean square error andaccuracy as compared to previous techniques of clusteringlike K-Means GA PSO ALO and BO approaches )eFriedman Examination and Holm strategy are introduced to

Table 8 Holm strategy outputs

J Approaches z value p value (λ(AC minus j)) Hypothesis1 K-Means minus63245 000001 0020 Rejected2 GA minus50596 000001 0025 Rejected3 PSO minus37947 00001 0033 Rejected4 ALO minus25298 00057 005 Rejected5 BO minus13649 00861 010 Rejected

12 Scientific Programming

perform statistical analysis of proposed CBO-IE againstprevious techniques of clustering like K-Means GA PSOALO and BO approaches which represent that the CBO-IEgenerates 90 accurate results In future the proposedtechnique will be anticipated to estimate and be authorizedwith huge databases under big data Additionally a cross-layer communication will be anticipated to be offered andlegalized in IoT structural design in the future

Data Availability

)e data that support the findings of this study are availableupon request from the corresponding author

Conflicts of Interest

)e authors declare that there are no conflicts of interest

References

[1] A Haque L Khan and M Baron ldquoSAND semi-supervisedadaptive novel class detection and classification over datastreamrdquo in Proceedings of the 13th AAAI Conference on Ar-tificial Intelligence (AAAI-16) pp 1652ndash1658 Phoenix AZUSA February 2016

[2] N Mishra H K Soni S Sharma and A K UpadhyayldquoDevelopment and analysis of artificial neural networkmodels for rainfall prediction by using time-series datardquo I JIntelligent Systems and Applications vol 1 pp 16ndash23 2018

[3] H N Dai R C W Wong H Wang Z Zheng andA V Vasilakos ldquoBig data analytics for large scale wirelessnetworks challenges and opportunitiesrdquo ACM vol 52pp 1ndash46 2019

[4] A H Sabry W Z W Hasan M N MohtarR M K R Ahmad and H R Harun ldquoPlanter pressurerepetability data analysis for health adult based on EMEDsystemrdquo Malaysian Journal of Fundamental and AppliedSciences vol 14 no 1 pp 96ndash101 2018

[5] A Mensikova and C A Mattmann ldquoEnsemble sentimentanalysis to identify human trafficking in web datardquo in Pro-ceedings of the ACM Workshop on Graph Techniques forAdversarial Activity Analytics (GTA) pp 1ndash6 ACM NewYork Ny USA 2018

[6] S S Gill and R Buyya ldquoBio-inspired algorithms for big dataanalytics a survey taxonomy and open challengesrdquo Big DataAnalytics for Intelligent Healthcare Management ElsevierAmsterdam Netherlands pp 1ndash17 2019

[7] A Verma I Kaur and N Arora ldquoComparative analysis ofinformatio extraction techniques for data miningrdquo IndianJournal of Science and Technology vol 9 no 11 pp 1ndash182016

[8] B R Manju A Joshuva and V Sugumaran ldquoA data miningstudy for condition monitoring on wind turbine blades usinghoeffding tree algorithm through statistical and histogramfeaturesrdquo International Journal of Mechanical Engineering ampTechnology vol 9 no 1 pp 1061ndash1079 2018

[9] I Aytekin E Eyduran K Karadas R Akshan and I KeskinldquoPrediction of fattening final live weight from some bodymeasurements and fattening period in young bulls ofcrossbred and exotic breeds using MARS data mining algo-rithmrdquo Journal of Zoology vol 50 no 1 pp 189ndash195 2018

[10] S Ahmad and R Varma ldquoInformation extraction from textmessages using data mining techniquesrdquo Malaya Journal ofMatematik vol 5 no 1 pp 26ndash29 2018

[11] N E Oweis M M Fouad S R Oweis S S Owais andV Snasel ldquoA novel mapreduce lift assiciation riule miningalgorithm (MRLAR) for big datardquo International Journal ofAdvanced Computer Science and Applications vol 7 no 8pp 1ndash7 2016

[12] C W Tsai C F Lai M C Chiang and L T Yang ldquoDatamining for internet of things a surveyrdquo CommunicationsSurveys and Tutorials IEEE vol 16 no 1 pp 77ndash97 2014

[13] F Chen P Deng J Wan D Zhang A V Vasilakos andX Rong ldquoData mining for Internet of things literature reviewand challengesrdquo International Journal of Distributed SensorNetworks vol 2015 Article ID 365372 14 pages 2015

[14] S B Patel M N Patel and S M Shah ldquoFEA-HUIM fast andefficient algorithm for high utility item-set mining using noveldata structure and pruning strategyrdquo National Conference onAdvanced Research Trends in Information and Technologies(NCARTICT) IJSSET vol 4 no 2 pp 1ndash7 2018

[15] S Bhaskaran R Marappan and B Santhi ldquoDesign andanalysis of a cluster-based intelligent hybrid recommendationsystem for E-learning applicationsrdquo Mathematics MDPIvol 9 no 107 pp 1ndash21 2021

[16] D Giordano M Mellia and T Cerquitelli ldquoK-MDTSCK-Multi-Dimensional time-series clustering algorithmrdquoElectronics MDPI vol 10 no 1166 pp 1ndash21 2021

[17] R Mothukuri and B B Rao ldquoCluster Analysis of cyber crimedata using Rrdquo International Journal of Computer Science andMobile Applications vol 6 no 2 pp 62ndash70 2018

[18] A N Onaizah ldquoA novel data stream clustering algorithm inhealthcare IoTrdquo International Journal of Research in AppliedNatural and Social Sciences (IJRANSS) vol 5 no 6 pp 51ndash602017

[19] R Krishnamurthi A Kumar D Gopinathan A Nayyar andB Qureshi ldquoAn overview of IoT sensor data processingfusion and analysis techniquesrdquo Sensors MDPI vol 20pp 1ndash23 6076

[20] F Alam R Mehmood I Katib and A Albeshri ldquoAnalysis ofeight data mining algorithms for smarter Internet of things(IoT)rdquo in Proceedings of the International Workshop on DataMining in IoT Systems (DaMIS) pp 437ndash442 ElsevierLondon UK September 2016

[21] A M C Souza and J R A Amazonas ldquoAn outlier detectalgorithm using big data processing and Internet of thingsarchitecturerdquo in Proceedings of the International Workshop onBig Data and Data Mining Challenges on IoT and PervasiveSystems (BigD2M) pp 1010ndash1015 Elsevier London UK June2015

[22] D Gil M Johnsson H Mora and J Szymanski ldquoReview ofthe complexity of managing big data of the Internet of thingsrdquoComplexity vol 2019 Article ID 4592902 12 pages 2019

[23] M S Mahdavinejad M Rezvan M Barekatain P AdibiP Barnaghi and A P Sheth ldquoMachine Learning for Internetof things data analysis a surveyrdquoDigital Communications andNetworks vol 4 pp 161ndash175 2018

[24] M Ruta F Scioscia G Loseto A Pinto and E D SciascioldquoMachine learning in the Internet of things a semantic-en-hanced approachrdquo Semantic Web vol 10 pp 1ndash21 2017

[25] K Demertzis K Rantos and G Drosatos ldquoA dynamic in-telligent policies analysis mechanism for personal data pro-cessing in the IoT ecosystemrdquo Big Data And CognitiveComputing vol 4 no 9 pp 1ndash16 2020

Scientific Programming 13

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming

Page 12: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

CE 1 + L + 1 + PS

+ Mi + 1 + Mi times PS

+ Mi times PS

+ Mi times 2 times PS

times Lx + 3 times PS

+ 11113872 1113873

+ Mi times K times PS

+ 11113872 1113873 + Mi times K times PS

+ Mi times K times PS

+ Mi times K times PS

+ Mi times K times L + Mi times K times L + Mi times K times L

(22)

CE 2 times Mi times PS

times Lx + 4 times Mi times K times PS

+ 3 times Mi times K times L + 5 times Mi times PS

+ Mi times K + L + PS

+ 2 times Mi + 3 (23)

Supposing that all circumstances are equal in equation(23) in worst case equation (24) is generated as follows

CE 9n3

+ 6n2

+ 4n + 3 (24)

)e running time complexities of clustering approachesare O(n3) for CBO-IE O(n3) for BO O(n3) for ACO O(n3)for PSO O(n3) for ALO and O(n2) for K-Menas in worstcase Hence all are executable in polynomial time

54 Statistical Examination Measurement A statistical ex-amination is functioned to obtain the extension of signifi-cance dissimilarities in the effectiveness of clusteringtechniques Here a nonparametric Friedman Examination(FE) is utilized to discover dissimilarities amid the group ofserial appropriate variables )e entire clustering techniquesprovide equally effective explaining in the null hypothesis(Y0)

)e Friedman Examination (FE) is formulized by

FE 12 times N

DS

AC

AC

+ 11113872 11138731113944

5

θ1AKθ( 1113857

2minus

AC

times AC

+ 11113872 11138732

4⎡⎢⎢⎢⎢⎣ ⎤⎥⎥⎥⎥⎦ (25)

where NDS is number of datasets AKθ is average rank of θth

approach and AC is number of clustering approaches)e FE critical value is 204925 taken from F-distribution

table [41] with (AC-1) and (ACndash1)lowast(NDSndash1) freedom degreewhich is obtained between (6ndash1) 5 and (6-1)lowast(8-1) 35 forapplying 6 clustering approaches (AC 6) over 8 datasets(NDS 8) with λ 010 (confidence stage) )e calculated FEvalue is higher as compared to the critical value for nullhypothesis (Y0) rejection if Y0 is not accepted)e evaluatedFE value is 1714 for initiating 6 clustering approaches(AC 6) over 8 datasets (NDS 8) with λ 010 Hence thecalculated FE is higher as compared to the critical FE andthen Y0 is rejected So it is to be summarized that entireclustering approaches are not equally effective

)erefore a post hoc examination is performed usingHolm strategy)e proposed CBE-IE is analyzed statisticallyagainst other clustering approaches in this examinationFirstly the z value is obtained from equation (24) and after

that probability (p) is generated with the help of z value andnormal distribution table [42] At last pj value is analyzedwith (λ(AC minus j)) (Table 8)

Z AKj minus AKθ

δ (26)

where

δ

AC

AC

+ 11113872 1113873

6 times NDS

1113971

(27)

Table 8 illustrates that the (λ(AC minus j)) value is higher ascompared to pj value this indicates the hypothesis rejectionfor entire cases Hence the proposed CBO-IE is superior inclustering as compared to the K-Means GA PSO and BOapproaches according to the above analysis

6 Conclusion

Various fields like medicine education and industriesutilize data mining for their useful applications )egrouping of data is performed on data clustering which is aspecific task in data mining to examine the database effi-ciently In this work a Chaotic Biogeography-Based Opti-mization approach using Information Entropy (CBO-IE) isproposed to obtain data clusters for healthcare IoT datasets)e Information Entropy is introduced with a BO approachto generate a better distribution of data points in the datasetand chaos theory is utilized to initialize the population of BOapproach )e Information Entropy and chaos theory arecombined with BO approach to generate optimal clusterheads and cluster members with enhanced convergencespeed of BO in huge search area )eMATLAB 2021a tool isused to implement the CBO-IE for eight healthcare IoTdatasets and the outcomes describe the better quality effi-ciency of CBO-IE on the basis of F-Measure intraclusterdistance running time complexity purity index statisticalanalysis standard deviation root mean square error andaccuracy as compared to previous techniques of clusteringlike K-Means GA PSO ALO and BO approaches )eFriedman Examination and Holm strategy are introduced to

Table 8 Holm strategy outputs

J Approaches z value p value (λ(AC minus j)) Hypothesis1 K-Means minus63245 000001 0020 Rejected2 GA minus50596 000001 0025 Rejected3 PSO minus37947 00001 0033 Rejected4 ALO minus25298 00057 005 Rejected5 BO minus13649 00861 010 Rejected

12 Scientific Programming

perform statistical analysis of proposed CBO-IE againstprevious techniques of clustering like K-Means GA PSOALO and BO approaches which represent that the CBO-IEgenerates 90 accurate results In future the proposedtechnique will be anticipated to estimate and be authorizedwith huge databases under big data Additionally a cross-layer communication will be anticipated to be offered andlegalized in IoT structural design in the future

Data Availability

)e data that support the findings of this study are availableupon request from the corresponding author

Conflicts of Interest

)e authors declare that there are no conflicts of interest

References

[1] A Haque L Khan and M Baron ldquoSAND semi-supervisedadaptive novel class detection and classification over datastreamrdquo in Proceedings of the 13th AAAI Conference on Ar-tificial Intelligence (AAAI-16) pp 1652ndash1658 Phoenix AZUSA February 2016

[2] N Mishra H K Soni S Sharma and A K UpadhyayldquoDevelopment and analysis of artificial neural networkmodels for rainfall prediction by using time-series datardquo I JIntelligent Systems and Applications vol 1 pp 16ndash23 2018

[3] H N Dai R C W Wong H Wang Z Zheng andA V Vasilakos ldquoBig data analytics for large scale wirelessnetworks challenges and opportunitiesrdquo ACM vol 52pp 1ndash46 2019

[4] A H Sabry W Z W Hasan M N MohtarR M K R Ahmad and H R Harun ldquoPlanter pressurerepetability data analysis for health adult based on EMEDsystemrdquo Malaysian Journal of Fundamental and AppliedSciences vol 14 no 1 pp 96ndash101 2018

[5] A Mensikova and C A Mattmann ldquoEnsemble sentimentanalysis to identify human trafficking in web datardquo in Pro-ceedings of the ACM Workshop on Graph Techniques forAdversarial Activity Analytics (GTA) pp 1ndash6 ACM NewYork Ny USA 2018

[6] S S Gill and R Buyya ldquoBio-inspired algorithms for big dataanalytics a survey taxonomy and open challengesrdquo Big DataAnalytics for Intelligent Healthcare Management ElsevierAmsterdam Netherlands pp 1ndash17 2019

[7] A Verma I Kaur and N Arora ldquoComparative analysis ofinformatio extraction techniques for data miningrdquo IndianJournal of Science and Technology vol 9 no 11 pp 1ndash182016

[8] B R Manju A Joshuva and V Sugumaran ldquoA data miningstudy for condition monitoring on wind turbine blades usinghoeffding tree algorithm through statistical and histogramfeaturesrdquo International Journal of Mechanical Engineering ampTechnology vol 9 no 1 pp 1061ndash1079 2018

[9] I Aytekin E Eyduran K Karadas R Akshan and I KeskinldquoPrediction of fattening final live weight from some bodymeasurements and fattening period in young bulls ofcrossbred and exotic breeds using MARS data mining algo-rithmrdquo Journal of Zoology vol 50 no 1 pp 189ndash195 2018

[10] S Ahmad and R Varma ldquoInformation extraction from textmessages using data mining techniquesrdquo Malaya Journal ofMatematik vol 5 no 1 pp 26ndash29 2018

[11] N E Oweis M M Fouad S R Oweis S S Owais andV Snasel ldquoA novel mapreduce lift assiciation riule miningalgorithm (MRLAR) for big datardquo International Journal ofAdvanced Computer Science and Applications vol 7 no 8pp 1ndash7 2016

[12] C W Tsai C F Lai M C Chiang and L T Yang ldquoDatamining for internet of things a surveyrdquo CommunicationsSurveys and Tutorials IEEE vol 16 no 1 pp 77ndash97 2014

[13] F Chen P Deng J Wan D Zhang A V Vasilakos andX Rong ldquoData mining for Internet of things literature reviewand challengesrdquo International Journal of Distributed SensorNetworks vol 2015 Article ID 365372 14 pages 2015

[14] S B Patel M N Patel and S M Shah ldquoFEA-HUIM fast andefficient algorithm for high utility item-set mining using noveldata structure and pruning strategyrdquo National Conference onAdvanced Research Trends in Information and Technologies(NCARTICT) IJSSET vol 4 no 2 pp 1ndash7 2018

[15] S Bhaskaran R Marappan and B Santhi ldquoDesign andanalysis of a cluster-based intelligent hybrid recommendationsystem for E-learning applicationsrdquo Mathematics MDPIvol 9 no 107 pp 1ndash21 2021

[16] D Giordano M Mellia and T Cerquitelli ldquoK-MDTSCK-Multi-Dimensional time-series clustering algorithmrdquoElectronics MDPI vol 10 no 1166 pp 1ndash21 2021

[17] R Mothukuri and B B Rao ldquoCluster Analysis of cyber crimedata using Rrdquo International Journal of Computer Science andMobile Applications vol 6 no 2 pp 62ndash70 2018

[18] A N Onaizah ldquoA novel data stream clustering algorithm inhealthcare IoTrdquo International Journal of Research in AppliedNatural and Social Sciences (IJRANSS) vol 5 no 6 pp 51ndash602017

[19] R Krishnamurthi A Kumar D Gopinathan A Nayyar andB Qureshi ldquoAn overview of IoT sensor data processingfusion and analysis techniquesrdquo Sensors MDPI vol 20pp 1ndash23 6076

[20] F Alam R Mehmood I Katib and A Albeshri ldquoAnalysis ofeight data mining algorithms for smarter Internet of things(IoT)rdquo in Proceedings of the International Workshop on DataMining in IoT Systems (DaMIS) pp 437ndash442 ElsevierLondon UK September 2016

[21] A M C Souza and J R A Amazonas ldquoAn outlier detectalgorithm using big data processing and Internet of thingsarchitecturerdquo in Proceedings of the International Workshop onBig Data and Data Mining Challenges on IoT and PervasiveSystems (BigD2M) pp 1010ndash1015 Elsevier London UK June2015

[22] D Gil M Johnsson H Mora and J Szymanski ldquoReview ofthe complexity of managing big data of the Internet of thingsrdquoComplexity vol 2019 Article ID 4592902 12 pages 2019

[23] M S Mahdavinejad M Rezvan M Barekatain P AdibiP Barnaghi and A P Sheth ldquoMachine Learning for Internetof things data analysis a surveyrdquoDigital Communications andNetworks vol 4 pp 161ndash175 2018

[24] M Ruta F Scioscia G Loseto A Pinto and E D SciascioldquoMachine learning in the Internet of things a semantic-en-hanced approachrdquo Semantic Web vol 10 pp 1ndash21 2017

[25] K Demertzis K Rantos and G Drosatos ldquoA dynamic in-telligent policies analysis mechanism for personal data pro-cessing in the IoT ecosystemrdquo Big Data And CognitiveComputing vol 4 no 9 pp 1ndash16 2020

Scientific Programming 13

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming

Page 13: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

perform statistical analysis of proposed CBO-IE againstprevious techniques of clustering like K-Means GA PSOALO and BO approaches which represent that the CBO-IEgenerates 90 accurate results In future the proposedtechnique will be anticipated to estimate and be authorizedwith huge databases under big data Additionally a cross-layer communication will be anticipated to be offered andlegalized in IoT structural design in the future

Data Availability

)e data that support the findings of this study are availableupon request from the corresponding author

Conflicts of Interest

)e authors declare that there are no conflicts of interest

References

[1] A Haque L Khan and M Baron ldquoSAND semi-supervisedadaptive novel class detection and classification over datastreamrdquo in Proceedings of the 13th AAAI Conference on Ar-tificial Intelligence (AAAI-16) pp 1652ndash1658 Phoenix AZUSA February 2016

[2] N Mishra H K Soni S Sharma and A K UpadhyayldquoDevelopment and analysis of artificial neural networkmodels for rainfall prediction by using time-series datardquo I JIntelligent Systems and Applications vol 1 pp 16ndash23 2018

[3] H N Dai R C W Wong H Wang Z Zheng andA V Vasilakos ldquoBig data analytics for large scale wirelessnetworks challenges and opportunitiesrdquo ACM vol 52pp 1ndash46 2019

[4] A H Sabry W Z W Hasan M N MohtarR M K R Ahmad and H R Harun ldquoPlanter pressurerepetability data analysis for health adult based on EMEDsystemrdquo Malaysian Journal of Fundamental and AppliedSciences vol 14 no 1 pp 96ndash101 2018

[5] A Mensikova and C A Mattmann ldquoEnsemble sentimentanalysis to identify human trafficking in web datardquo in Pro-ceedings of the ACM Workshop on Graph Techniques forAdversarial Activity Analytics (GTA) pp 1ndash6 ACM NewYork Ny USA 2018

[6] S S Gill and R Buyya ldquoBio-inspired algorithms for big dataanalytics a survey taxonomy and open challengesrdquo Big DataAnalytics for Intelligent Healthcare Management ElsevierAmsterdam Netherlands pp 1ndash17 2019

[7] A Verma I Kaur and N Arora ldquoComparative analysis ofinformatio extraction techniques for data miningrdquo IndianJournal of Science and Technology vol 9 no 11 pp 1ndash182016

[8] B R Manju A Joshuva and V Sugumaran ldquoA data miningstudy for condition monitoring on wind turbine blades usinghoeffding tree algorithm through statistical and histogramfeaturesrdquo International Journal of Mechanical Engineering ampTechnology vol 9 no 1 pp 1061ndash1079 2018

[9] I Aytekin E Eyduran K Karadas R Akshan and I KeskinldquoPrediction of fattening final live weight from some bodymeasurements and fattening period in young bulls ofcrossbred and exotic breeds using MARS data mining algo-rithmrdquo Journal of Zoology vol 50 no 1 pp 189ndash195 2018

[10] S Ahmad and R Varma ldquoInformation extraction from textmessages using data mining techniquesrdquo Malaya Journal ofMatematik vol 5 no 1 pp 26ndash29 2018

[11] N E Oweis M M Fouad S R Oweis S S Owais andV Snasel ldquoA novel mapreduce lift assiciation riule miningalgorithm (MRLAR) for big datardquo International Journal ofAdvanced Computer Science and Applications vol 7 no 8pp 1ndash7 2016

[12] C W Tsai C F Lai M C Chiang and L T Yang ldquoDatamining for internet of things a surveyrdquo CommunicationsSurveys and Tutorials IEEE vol 16 no 1 pp 77ndash97 2014

[13] F Chen P Deng J Wan D Zhang A V Vasilakos andX Rong ldquoData mining for Internet of things literature reviewand challengesrdquo International Journal of Distributed SensorNetworks vol 2015 Article ID 365372 14 pages 2015

[14] S B Patel M N Patel and S M Shah ldquoFEA-HUIM fast andefficient algorithm for high utility item-set mining using noveldata structure and pruning strategyrdquo National Conference onAdvanced Research Trends in Information and Technologies(NCARTICT) IJSSET vol 4 no 2 pp 1ndash7 2018

[15] S Bhaskaran R Marappan and B Santhi ldquoDesign andanalysis of a cluster-based intelligent hybrid recommendationsystem for E-learning applicationsrdquo Mathematics MDPIvol 9 no 107 pp 1ndash21 2021

[16] D Giordano M Mellia and T Cerquitelli ldquoK-MDTSCK-Multi-Dimensional time-series clustering algorithmrdquoElectronics MDPI vol 10 no 1166 pp 1ndash21 2021

[17] R Mothukuri and B B Rao ldquoCluster Analysis of cyber crimedata using Rrdquo International Journal of Computer Science andMobile Applications vol 6 no 2 pp 62ndash70 2018

[18] A N Onaizah ldquoA novel data stream clustering algorithm inhealthcare IoTrdquo International Journal of Research in AppliedNatural and Social Sciences (IJRANSS) vol 5 no 6 pp 51ndash602017

[19] R Krishnamurthi A Kumar D Gopinathan A Nayyar andB Qureshi ldquoAn overview of IoT sensor data processingfusion and analysis techniquesrdquo Sensors MDPI vol 20pp 1ndash23 6076

[20] F Alam R Mehmood I Katib and A Albeshri ldquoAnalysis ofeight data mining algorithms for smarter Internet of things(IoT)rdquo in Proceedings of the International Workshop on DataMining in IoT Systems (DaMIS) pp 437ndash442 ElsevierLondon UK September 2016

[21] A M C Souza and J R A Amazonas ldquoAn outlier detectalgorithm using big data processing and Internet of thingsarchitecturerdquo in Proceedings of the International Workshop onBig Data and Data Mining Challenges on IoT and PervasiveSystems (BigD2M) pp 1010ndash1015 Elsevier London UK June2015

[22] D Gil M Johnsson H Mora and J Szymanski ldquoReview ofthe complexity of managing big data of the Internet of thingsrdquoComplexity vol 2019 Article ID 4592902 12 pages 2019

[23] M S Mahdavinejad M Rezvan M Barekatain P AdibiP Barnaghi and A P Sheth ldquoMachine Learning for Internetof things data analysis a surveyrdquoDigital Communications andNetworks vol 4 pp 161ndash175 2018

[24] M Ruta F Scioscia G Loseto A Pinto and E D SciascioldquoMachine learning in the Internet of things a semantic-en-hanced approachrdquo Semantic Web vol 10 pp 1ndash21 2017

[25] K Demertzis K Rantos and G Drosatos ldquoA dynamic in-telligent policies analysis mechanism for personal data pro-cessing in the IoT ecosystemrdquo Big Data And CognitiveComputing vol 4 no 9 pp 1ndash16 2020

Scientific Programming 13

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming

Page 14: CBO-IE:ADataMiningApproachforHealthcareIoTDataset ...

[26] N Joseph and B S Kumar ldquoTop-K competitor trust miningand customer behavior investigation using data miningtechniquerdquo Journal of Network Communications andEmerging Technologies (JNCET) vol 8 no 2 pp 26ndash30 2018

[27] M P Dooshima E N Chidozie B J Ademola O O Sekoniand I P Adebayo ldquoA predictive model for the risk of mentalillness in Nigeria using data miningrdquo International Journal ofImmonology vol 6 no 1 pp 5ndash16 2018

[28] B A Tama and S Lim ldquoA comparative performance eval-uation of classification algorithms for clinical decision sup-port systemsrdquo Mathematics MDPI vol 8 no 1814 pp 1ndash252020

[29] C C N Wang I S Chang P C Y Sheu and J J P TsaildquoApplication of semantic computing in cancer on secondarydata analysisrdquo in Proceedings of the 2nd International Con-ference on Robotic Computing IEEE pp 407ndash413 LagunaHills CA USA February 2018

[30] K Ravikumar and A R Kannan ldquoSpatial data mining forprediction of natural events and disaster management basedon fuzzy logic using hybrid PSOrdquo Taga Journal vol 14pp 858ndash878 2018

[31] B M M Alom and M Courtney ldquoldquoEducational data mininga case study perspectives from primary to university educa-tion in Australiardquo International Journal of InformationTechnology and Computer Science vol 2 pp 1ndash9 2018

[32] S Celik and O Yilmaz ldquoPrediction of body weight of Turkishtazi dogs using data mining techniques classification andregression tree (CART) and multivariate adaptive regressionsplines (MARS)rdquo Journal of Zoology vol 50 no 2pp 575ndash583 2018

[33] W Ugulino D Cardador K Vega E Velloso R Miliditi andH Fuks ldquoWearable computing accelerometers data classi-fication of body postures and movementsrdquo in Proceedings ofthe 21st Brazilian Symposium on Artificial Intelligence Ad-vances in Artificial Intelligence Lecture Notes in ComputerScience pp 52ndash61 Springer Stuttgart Germany September2012

[34] M B Chandak ldquoRole of big-data in classification and novalclass detection in data streamsrdquo Journal of Big Data Springervol 23 no 5 pp 1ndash9 2016

[35] M Gupta K K Gupta and P K Shukla ldquoSession key basedfast secure and lightweight image encryption algorithmrdquoMultimedia Tools and Applications vol 80 pp 10391ndash104162021

[36] H S Pannu D Singh and A K Malhi ldquoMulti-objectiveparticle swarm optimization-based adaptive neuro-fuzzy in-ference system for benzene monitoringrdquoNeural Computing ampApplications vol 31 pp 2195ndash2205 2019

[37] D Singh J Singh and A Chhabra ldquoHigh availability ofclouds failover strategies for cloud computing using inte-grated check pointing algorithmsrdquo in Proceedings of the 2012International Conference on Communication Systems andNetwork Technologies pp 698ndash703 Rajkot India May 2012

[38] D Singh and V Kumar ldquoA comprehensive review of com-putational dehazing techniquesrdquo Archives of ComputationalMethods in Engineering vol 26 pp 1395ndash1413 2019

[39] R Gupta ldquoPerformance analysis of anti-phishing tools andstudy of classification data mining algorithms for a novel anti-phishing systemrdquo International Journal of Computer Networkand Information Security vol 12 pp 70ndash77 2015

[40] R Bhatt P Maheshwary P Shukla P Shukla M Shrivastavaand S Changlani ldquoImplementation of fruit fly optimizationalgorithm (FFOA) to escalate the attacking efficiency of nodecapture attack in wireless sensor networks (WSN)rdquo Computer

Communications vol 149 pp 134ndash145 2020 ISSN 0140-3664

[41] F Distribution Table httpwwwsocruclaeduappletsdirf_tablehtml 2018

[42] Normal Distribution Table httpmatharizonaedusimrsimsma464standardnormaltablepdf

14 Scientific Programming